Performing the User Study - Petr Laitoch Text Classification with Limited Training Data

Each of the four participants sat down individually in a one-on-one session with the author of this thesis for a net duration of 3 hours. Short breaks were taken in between scheduled time segments upon request, which made the actual duration of the study surpass 3 hours. The time-frame was set up on-the-fly with the first participant. Time was spent as follows:

• On-boarding Instructions – 35 minutes – We explained keyword-based la-beling principles along with technical details on using the keyword labeler prototype in the form of a one-on-one lesson. Users were encouraged to ask questions along the way. When there were no questions, instructions were presented as a monologue.

• On-boarding Tutorial – 50 minutes – The participant experimented with the prototype, working on a tutorial task (Food&Drink Quality label). We made sure they understood all underlying principles, and became familiar with using the prototype. We explained additional tips and tricks and had them try them out. We guided users try out different workflow aspects to equitable fill the allocated 50 minutes.

• Task 1 (Environment label) – 50 minutes – Users were asked to perform the first task (Environment label). Users were notified when their progress slowed, for instance, they were doing something wrong, or adopting a sub-par workflow, or incorrectly interpreting annotation guidelines. We did

not otherwise influence their work. Keyword-based similarity search was banned from use for the first 30 minutes. In the last 20 minutes, they tried keyword-based similarity search.

• Task 2 (Hospitality label) – 30 minutes – Users were asked to perform the second task (Hospitality label). We only communicated with them to clarify annotation guidelines.

• Post-task interview– 15 minutes – We interviewed the users by asking open-ended questions about the keyword labeler, the workflows and strategies adopted, the tasks performed and their achieved results.

5.1.1 On-boarding

A non-trivial user on-boarding procedure is required to teach users how to prop-erly perform keyword-based labeling. The on-boarding was performed verbally.

A write-up of user study instructions that were re-formulated in the on-boarding is in in section A.2.

The following concepts must be explained to the users to allow for successful keyword labeling:

• Keyword-Based Labeling – classification, keywords, correlation.

• Coverage, Accuracy, Keyword Validation – find frequently occurring key-words that predict the desired label with a high accuracy. Validate that accuracy by random sampling the unannotated dataset.

• Annotation Guidelines – classification labels are explained intuitively by keywords and example sentences in the annotation guidelines. Annotators should consult with somebody if not sure what should be in the label and what shouldn’t.

• UI usage – how to use the keyword labeler user interface.

Strategies that users were instructed to follow:

• Ideation:

– Think of synonyms.

– Browse text for ideas.

– Group similar keywords.

– Use keyword similarity search.

• Validation:

– Manually annotate data samples containing the keyword being added.

– Have a good mental model of what a keyword is.

– Decide if the keyword indicates the label, or if the label is just a coincidence.

– Validate groups of similar keywords together – if you are sure they really do behave in the same way.

To simplify prototype implementation, we decided to implement it as a command-line application. We believe that an intuitive GUI with a short well-written interactive tutorial would cut down on-boarding time significantly.

5.1.2 Tasks

We decided to use a proprietary restaurant review dataset provided by the IT company Geneea Analytics.¹ The dataset contains a collection of restaurant re-views, some of them annotated. The annotation is done into 21 different classes and is a multi-class classification problem. The classes represent topics the text is talking about. The text can be talking about multiple topics at the same time.

A subset of the unannotated dataset containing Yelp² restaurant reviews is released as sample data along with the implemented prototype. A dataset de-scription is available in subsection A.1.1.

The tasks performed are presented through the following annotator guidelines:

Tutorial Task: Food & drink – quality

Keywords:

• taste, freshness, healthfulness, cold food, under/overcooked pasta/chicken Examples:

• “Food was excellent.”

• “All dishes were delicious”

• “To put it bluntly, it is hard to fail scrambled eggs, and for that price you would hope that at least they can make decent ones. But the eggs were a thick spongy white block, in fact, I don’t even know how to do that myself, and the bacon was oily and overcooked...”

• “I’m not really a fan of cheese but this was really good!!”

• “You can tell the ingredients are fresh and the pizzas are all made with lots of love”

Task 1: Environment

Keywords:

• cleanliness, atmosphere, decor, noise, seating/parking availability, ...

Examples:

• “the environment is luxurious yet relaxed”

• “the music not that cheerful”

• “Too loud to enjoy”

• “prob our favourite london tea venue in terms of decor/ambience”

1https://geneea.com/

2https://www.yelp.com/dataset/

Task 2: Hospitality

Keywords:

• nice/rude staff; great staff; staff attitude Examples:

• “The hospitality is simply superb”

• “really friendly and efficient staff who have to manage working (with a smile!) in very tight space”

• “The welcome was friendly but a little cold initially from our waitress but as the evening went on she was v friendly and very informed regarding the menu.”

In document Petr Laitoch Text Classification with Limited Training Data (Stránka 58-61)