• Nebyly nalezeny žádné výsledky

In weakly supervised learning, labeling function ensembling is used to weakly annotate the dataset. This weakly supervised dataset is then used to train an end model, which is the main output of our framework.

We encourage the use of multiple supervised classifiers. There should be at least one simple classifier that would allow for near immediate training and inference. This fast classifier should be run in parallel to interactive labeling after each keyword list change. The framework enables the user to continuously monitor classifier performance. We suggest that when sampling texts during keyword validation, the model prediction should be shown to the user for each sample. If a dev set is provided, evaluation metrics should be available to the framework user on-the-fly.

State-of-the art models that employ transfer learning should also be run in parallel as often as possible, with their results visible. This model should be able to generalize and provide much better results than simple keyword labeling.

With many sophisticated text classification algorithms to choose from, we should explore if these more complicated algorithms, demanding more computational resources, achieve better results than the simpler model. If they do, they pro-vide a higher realistic overall performance estimate. The algorithm that will be used for inference should be chosen based on evaluation results, model simplicity, ease of deployment, inference speed and extent of training resources needed for retraining.

Complicated models providing good results are great candidates for model generalization exploitation. We use the model to predict data points from the

training set. The difference between the predicted labels and labels generated by keyword labeling may be used for label expansion and label debugging.

3.4 Label Expansion

In the literature survey in section 2.3, we identified a selection of methods that may be used to add labels to unannotated data points on the basis of existing annotation. A confidence threshold is selected by the user to enrich the annotated training data. Here, we propose two methods for label expansion inspired by those in the survey:

Sentence Similarity: Use a sentence similarity model to search for sentences on which labeling functions abstain that are similar to those already clas-sified. The hope is that these similar sentences should belong to that same class.

Model Generalization: A supervised classifier is trained on all labeled data.

Then, we let the classifier label data on which the labeling functions are abstaining so far. The training dataset can then be expanded by predictions the classifier is confident in.

Two large benefits of creating text classification models via keyword labeling are explainability and ease of modification. We suggest two strategies not to lose these benefits:

1. Use texts that newly received a label by label expansion to auto-suggest keywords for keyword expansion.

2. Link label expansion to keyword groups. Label expansion generates labeling functions whose parameters are a set of keywords in a keyword group and a threshold.

3.5 Label Debugging

In the literature survey in section 2.4, we identified a selection of methods that may be used to correct misclassifications. We propose to use such methods in our framework for label debugging. Label debugging improves model precision and bug-fixes individual misclassifications. When the recall for a particular class gets high enough, the pipeline user is encouraged to debug the annotated training set they created. Debugging increases precision while maintaining recall.

Besides method covered in the survey, the following methods come to mind for label debugging:

1. Word sense disambiguation: Otherwise good keywords missclassify in cer-tain cases due to homonymy. Word sense disambiguation can be used to also allow certain word senses for a given keywords.

2. Linguistic logical operator: Allow the use of logical operators, such as AND, for combining keyword groups to model more complicated relationships in the text. Other operators would include using wild-cards and relations for various linguistic phenomena, such as part-of-speech tags, dependency tags, dependency relations, lemmas, etc.

3. Keyword group outliers: Find outlier keyword within keyword groups.

4. Hard annotation: Annotate a subset of texts labeled by a given labeling function and use it as hard labels for the classifier.

4. Keyword Labeler Prototype

Previously, in section 3.1, we presented the design for a keyword labeler. In this chapter, we present our prototype keyword labeler implementation. This prototype is implemented for the purpose of conducting a user study described in chapter 5.

4.1 Interactive Keyword Labeler

People tasked with keyword labeling from the top of their head find it difficult and very quickly run out of keyword ideas, miss important keywords, or are slow at it. The purpose of this keyword labeler prototype is to provide users with an interactive tool and a set of strategies to allow for creating large and accurate keyword lists fast and easily.

Implemented keyword labeler features include:

1. A single interactive tool that keeps track of keyword lists, accuracies, and more.

2. Keyword groups.

3. Provide coverage information to the user.

4. Validate ideas by adding a keyword group at a time, annotating it and adjusting it in the meantime.

5. Auto-suggest keywords based on a few seed keywords using keyword-based similarity search.

6. Discourage browsing the dataset. Focus on showing dataset samples when validating adding keyword groups instead.

7. Encourage the user to focus on one label at a time.