• Nebyly nalezeny žádné výsledky

List of Figures and Tables

Figure 1. The data mining life cycle (IBM)

Figure 2. Most Common Words Visualisation (Jupyter Notebook) Figure 3. Dependency Parse Visualisation(Explosion.ai)

Figure 4. Coreference Resolution (Towardsdatascience.com) Figure 5. The DialogFlow for QnA HR bot (Draw.io)

Figure 6. Questions - Input Example for Dialog Building (Excel) Figure 7. QnA Dialog Building Process (Qnamaker.ai)

Figure 8. BERT input representation. The input embeddings are the sum of the token embeddings, the segmentation embeddings and the position embeddings (Google AI Language)

Figure 9. Azure Cognitive Services pricing (Azure.microsoft.com) Figure 10. End-to-end spaCy workflows (SpaCy.io)

Figure 11. Data Separation schema. Example (Draw.io)

Figure 12. LSI - Amount of topics estimation (Jupyter Notebook) Figure 13. Organization of Azure ML Workspace (docs.microsoft.com) Figure 14. Text Classification - Wikipedia SP 500 Dataset (Github.com) Figure 15. Confusion Matrix (Jupyter Notebook)

Figure 16. ROC Curve (scikit-learn.org) Figure 17. Feature Explanation (SHAP) Figure 18. One Item Explanation (SHAP)

Figure 19. Classes Distribution by length of messages

Table 1. Services for Chatbots building (Author) Tabel 2. Chatbot Requirement Catalog (Author) Tabel 3. Chatbot design layout (Botframe.comi) Tabel 4. RACI Matrix - chatbot (Author)

Tabel 5. Mailbot Requirement Catalog (Author) Tabel 6. Mailbot RACI Matrix (Author)

Table 7. Algorithms Testing and comparison with unbalanced data (Jupyter Notebook) Table 8. Algorithms Testing and comparison with balanced data (Jupyter Notebook) Table 9 . LSI - Most Common words for each topic (Jupyter Notebook)

Table 10. Cluster Analysis Distribution (Jupyter Notebook) Table 11. Classification Report Example (Jupyter Notebook)

References

Blay Whitby. Artificial Intelligence: A Beginner's Guide (Beginner's Guides). (2008).

Oneword Publication. Oxford.

Brownlee, J. (2020, August 19). Difference Between Algorithm and Model in Machine Learning.

Retrieved from

https://machinelearningmastery.com/difference-between-algorithm-and-model-in-machine-learni ng/

Davenport, T., & Ronanki, R. (2018, February). Artificial Intelligence for the Real World.

Retrieved from https://hbr.org/2018/01/artificial-intelligence-for-the-real-world Jurafsky, D., & Manning, C. (2012). Introduction to NLP. Retrieved from http://spark-public.s3.amazonaws.com/nlp/slides/intro.pdf

Mielniczuk, P., & Maślankowska, M. (2020, January 7). Intro to coreference resolution in NLP.

Retrieved from

https://towardsdatascience.com/intro-to-coreference-resolution-in-nlp-19788a75adee Global Natural Language Processing Market Research. 2019. Retrieved from

https://www.reportlinker.com/p05838704/Global-Natural-Language-Processing-Market.html Pandey, P. (2018, September 23). Simplifying Sentiment Analysis using VADER in Python.

Retrieved from

https://medium.com/analytics-vidhya/simplifying-social-media-sentiment-analysis-using-vader-in -python-f9e6ec6fc52f

Sarkar, D. (2016).Text Analytics with Python. Bangalor: Apress.

doi:10.1007/978-1-4842-2388-8_3

CRISP-DM Help Overview. (1999). Retrieved from

https://www.ibm.com/docs/en/spss-modeler/SaaS?topic=dm-crisp-help-overview

Jurafsky, D., & Martin, J. H. (2009).Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, NJ: Pearson Education.

Horev, R. (2018, November 10). BERT Explained: State of the art language model for NLP.

Retrieved from

https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9 b6270

Harned, B. (2019, September 16). How to Clear Project Confusion with a RACI Chart [Template]. Retrieved from

https://www.teamgantt.com/blog/raci-chart-definition-tips-and-example

Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019, May 24). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Google AI Language. Retrieved from https://arxiv.org/pdf/1810.04805.pdf

Shah, T. (2017, December 06). About Train, Validation and Test Sets in Machine Learning.

Retrieved from https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7 Bird, S., Klein, E., & Loper, E. (2009).Natural language processing with Python. Bejing: OReilly.

Kamath, U., Liu, J., & Whitaker, J. (2019).Deep Learning for Nlp and Speech Recognition.

Cham: Springer International Publishing.

Rieser, V., & Lemon, O. (2014).Reinforcement Learning for Adaptive Dialogue Systems A Data-driven Methodology for Dialogue Management and Natural Language Generation. Berlin:

Springer Berlin.

Pustejovsky, J., & Stubbs, A. (2013). Natural language annotation for machine learning: A Guide to Corpus-Building for Applications. Beijing: O'Reilly.

Rehuek, R. (2011, February 28). Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms. Retrieved from https://arxiv.org/pdf/1102.5597.pdf

Global Chatbot Market Anticipated to Reach $9.4 Billion by 2024 - Robust Opportunities to Arise in Retail & eCommerce. (2019, December 12). Retrieved from

https://markets.businessinsider.com/news/stocks/global-chatbot-market-anticipated-to-reach-9-4 -billion-by-2024-robust-opportunities-to-arise-in-retail-ecommerce-1028759508

Build a classifier to predict company category using Azure Machine Learning designer. (n.d.).

Retrieved from

https://github.com/Azure/MachineLearningDesigner/blob/master/articles/samples/text-classificati on-wiki.md

Create and run machine learning pipelines with Azure Machine Learning SDK. (2021, February 03). Retrieved from

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-machine-learning-pipeli nes

Google Machine Learning. Training and Test Sets: Splitting Data. Retrieved from

https://developers.google.com/machine-learning/crash-course/training-and-test-sets/splitting-dat a

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval.

Cambridge University Press, pp. 234-265.

https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html Samuels, A. (2020, August 6). Millions of Americans Have Lost Jobs in the Pandemic—And Robots and AI Are Replacing Them Faster Than Ever. Retrieved from

https://time.com/5876604/machines-jobs-coronavirus/

Underfitting and Overfitting in Machine Learning. (2020, May 18). Retrieved from https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/

Heaven, W. D. (2021, February 24). Why GPT-3 is the best and worst of AI right now. Retrieved from

https://www.technologyreview.com/2021/02/24/1017797/gpt3-best-worst-ai-openai-natural-langu age/

Annexes

Annex A: Libraries and tools for NLP Analysis and Model Building

In this Annex has been collected the list of libraries being utilized for NLP techniques such as n-grams and common words analysis, topic extraction, tokenization, stopwords removal, stemming, part-of-speech identifying, context understanding. Preparation for models building with TF-IDF and classification or other models, BERT.

Services for bots maintain and build

IBM Watson. Retrieved from https://www.ibm.com/cloud/watson-assistant/pricing/

LUIS. Retrieved from

https://azure.microsoft.com/en-us/pricing/details/cognitive-services/language-understanding-inte lligent-services/

Amazon Lex. Retrieved from https://aws.amazon.com/lex/pricing/

Microsoft QnA. Retrieved from https://www.qnamaker.ai/

Azure Bot Service. Retrieved from https://azure.microsoft.com/en-us/services/bot-services/

List of Python Libraries for NLP

The most popular libraries describe techniques important for creation and maintaining applications and programs based on NLP.

Machine Learning

NLTK - provides easy-to-use interfaces to over 50 corpora and lexical resources such as

WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries,

libraries being utilized for building models, parameters tuning, determining accuracy etc.

Source:http://www.nltk.org/,http://www.nltk.org/book_1ed/

Scikit-learn - ML library in Python. Provides classification, cluster and regression analysis. Data

Comparing, validating and choosing parameters and models. Source:

https://scikit-learn.org/stable/

PyTorch - scientific computing package and deep learning research platform. Source:

https://pytorch.org/

Spacy - General-purpose pretrained models to predict named entities, part-of-speech tags and syntactic dependencies. Can be used out-of-the-box and fine-tuned on more specific data.

Source:https://spacy.io/

Textblob - library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. Source:

https://textblob.readthedocs.io/en/dev/

Pytorch NLP - pre-trained embeddings, samplers, dataset loaders, metrics, neural network modules and text encoders. Source:https://pytorchnlp.readthedocs.io/en/latest/

TensorFlow - machine learning models for desktop, mobile, web, and cloud graphs and metrics.

Source:https://www.tensorflow.org/tutorials

Gensim - library for the variete of NLP tasks such as semantic analysis, texts vectorisation, topics extraction. Source: https://radimrehurek.com/gensim/

Data Understanding

Libraries being utilized for texts reading and saving data, text translation, visualization, reports Pandas - library for creation and downloading data frames, data manipulation, reading and saving files. Source:https://pandas.pydata.org/pandas-docs/stable/index.html

Matplotlib - library for creating static, animated, and interactive plots and charts. Source:

https://matplotlib.org/

Seaborn - interactive visualization library with a lot of features It provides a high-level interface for drawing attractive and informative statistical graphics. Source:https://seaborn.pydata.org/

Translator - part of Azure Cognitive Services, is a cloud-based machine translation service supporting more than 60 languages. Source:

https://www.microsoft.com/en-us/translator/business/translator-api/

langdetect - Language detection library ported from Google's language-detection. Source:

https://pypi.org/project/langdetect/

Googletrans - Googletrans is a free and unlimited python library that implements Google Translate API. This uses the Google Translate Ajax API to make calls to such methods as detect and translate. Source:https://pypi.org/project/googletrans/

Text Preparation

Libraries being utilized for texts cleaning, preprocessing, normalization

Re - Regex library for Python. Source:https://docs.python.org/3/library/re.html NumPy - working with n-dimensional (n>=2) arrays. Source:https://numpy.org/

Scipy - collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics etc. Source:https://www.scipy.org/

Model Explanation

SHAP - (SHapley Additive exPlanations) is a tool to explain the output of any machine learning model. Source: https://shap.readthedocs.io/en/latest/

Annex B: RACI Explanation

Responsible - The team member who does the work to complete the task

Accountable - The person who delegates work and provides final review on a task or deliverables before it's deemed complete

Consulted - People who provide input for deliverable based on the impact on their work or their domain of expertise

Informed - People who need to be kept in the loop on the project progress

Annex C: Interviews with NLP and AI Professionals

Interviewed specialists represent applications and tools based on NLP and being utilized for business

Questions

1. What technologies are connected with NLP and AI you use? It can be both on-premises and external tools/applications/methods.

2. Can you determine the impact of these technologies on the business in fields such as cost reduction, workflow improvement? Can you provide some examples?

3. Are you going to increase the usage of the NLP and AI in your work processes? If yes, how exactly are you planning to?

4. What do you think will be the most important tool/platform/approach based on AI and specifically NLP for the business in future? How will it influence budgets and workflow?

Interview 1

AI Product Owner, Siemens

Experienced in RPA and NLP tasks manager, working with mailbot implementation 1. Python (BERT, SpaCy), Azure ML, Recurrent Neural Network to classify the request and extract relevant data

2. Helping business to process large amounts of customer requests classified manually and high volume of manual repetitive tasks in just seconds or minutes.

- works at whatever scale business needs, 24/7, in real time - more accurate results

- improved KPIs from in-depth analysis of customer requests - data becomes more structured

- full lifecycle automation of every email, end-to-end.

- increased customers‘ satisfaction

- employees are able to focus on what matters: solving more complicated issues Examples: mail traffic, help desk, case processing

3. Yes, we are planning to implement next technologies:

- Detect missing email information - Sentiment Analysis

- FAQ integration

- Generate answers to the users (AI) - Attachment handling

- E-mail translation / Multi-Language support - Orchestration between with Chatbots

- Single-point of contact - user sends any email to a generic address. Mailbot routes the ticket based on content to the appropriate queue

- Automatic follow ups with customer/inquirer - E2E automation bundling

4. NLP for decision making - the increasing data access and improved quality of data will allow businesses to save the budget and time to make it ready for decision making. And also easy access to information for business users with no technical knowledge, more advanced voice recognition – to solve more complex problems, and a more customer-oriented approach. NLP will be built-in with different technologies such as, for example, gesture and facial recognition to enterprise revenues and make them more efficient and agile

Considering the current levels of competency within AI and machine learning-with continuous advancements in AI paired with NLP- the possibility of having machines that can listen and comprehend written and spoken language like humans make the future of AI exciting.

Interview 2

VP AI & Efficiency, Sbermarket

Head of AI Implementation in one of the biggest Russian marketplace

1. We mostly use our own developments and even write our own libraries in Python Language.

To create new models, we use supercomputers and clusters from Sberbank and created products solve the following tasks:

- Transcription of calls - to understand the context, the essence of the dialogue. We plan to analyze and evaluate the quality of support on this data.

- Clustering of calls to the call center to understand the main topics of complaints. This task is a preparation for creating your own chat bot.

- Content-filtering recommendation system based on product description. Basically, we use NLP models to understand the proximity of products by their description.

One of the tasks - automatic generation of product descriptions - is on process and will be done by a contractor for us, this decision will be based on GPT-3 technology.

2. The business impact is enormous. From the point of view of recommendation systems, this is an increase in conversions to an order, to adding an item to the cart. From the service support -some processes automatization with chat-bots and voice-bots. Our turnover has grown

incredibly - in three years the number of orders has grown 120 times, and the number of requests has grown linearly It is easy to imagine that the cost of the service center and order processing, the call center has also increased significantly - from a single digit among all costs it has turned into a double digit.

There are other successful cases in the decreasing cost with chat-bots approach - colleagues from Yandex Go (taxi and delivery application) created a call center that reduced the call center by 70%.

3. NLP tasks are often created on the basis of neural networks, and this requires big data and huge capacities, so I believe in the development of the Neural Processing Unit - chips optimized for solving problems in the field of artificial intelligence. This will greatly speed up and reduce the cost of training.

4. I think now there will be a boom in the development of startups that will prepare and mark up data for training, because more and more companies understand how important the analysis of text and audio data is for business.

E-commerce Director, Tech shop

IT Manager with more when 10 years of experience with NLP (recommendation systems) and AI (edtech)

1. Our company uses external platforms - voice assistants based on Yandex SpeechKit for speech synthesis recognition and Just AI as a platform for preparing dialogue scripts.

2. The use of a voice assistant and chat bots helps us to reduce the cost of first-line support personnel, and at the moment this is about 20% of costs that can be easily cut.

3. Now we plan to transfer to the voice assistant by not only contacting the call center, but also try active sales with the help of an assistant - and this is the selection and search for goods, placing an order, choosing a delivery method and other skills. And besides, we plan to translate the scenario of active sales.

4. I think it is critically important to develop platforms that allow automating the development of scenarios for voice assistants and chatbots. Nowadays you have to devote a lot of time to working out each scenario, but I would like the bot to be able to independently learn for a group of answer-questions and subsequently generate answers on its own and not answer in advance written patterns - but here the question arises as to the quality of such responses and to what extent they can be adequate to the client's request and whether the bot can independently solve the buyer's tasks. I also think it would be nice to see solutions for platforms with scripts on the market that are tailored for processing requests and conducting active sales in the field of e-commerce.

Annex D: Questionnaire for NLP and AI Professionals

For the given questionnaire eight AI project managers and six AI/NLP developers from different countries (USA, Russia, Czech Republic, China) were proposed to answer three questions about chatbot and mailbot and determine impact on HR department work.

Answers Examples

● Strongly Agreed

● Agreed

● Neutral

● Disagreed

● Strongly Disagreed

Case 1. The industrial company has implemented the Chatbot that answers employee's questions about salary, benefits, vacations and helps to search documents.

Do you agree that the Chatbot can...

- Decrease HR department costs - 64% agreed, 14% strongly agreed, 14% neutral, 7%

disagreed

- Improve company's workflow - 57% agreed, 14,3% strongly agreed, 28,7% neutral - Reduce the time employee needs get a service/answer - 42,9% agreed, 35,7% strongly agreed, 21,4% neutral

Case 2. The industrial company started to use the Mailbot to classify external and internal employees' mails and create a ticket for the responsible person. Do you agree that the Mailbot can...

- Decrease HR department costs - 50% agreed, 7,1% strongly agreed, 48,8% neutral

- Reduce the time on assigning tasks- 57,1% agreed, 14,3% strongly agreed, 28,5% neutral - Reduce the time employee needs get a service/answer - 57,1% agreed, 35,7% neutral, 7%

disagreed