Insert here your thesis’ task.

(1)

(2)

Integration of Robotic Process Automation and Optical Character Recognition

Bc. Manasa Wool la

Department of Software Engineering Supervisor: Mgr. Dvoˇr´ak Ondˇrej

June 25, 2021

(3)

I would first like to thank my supervisor, Mgr. Dvoˇr´ak Ondˇrej, whose exper- tise was invaluable in formulating the research questions and methodology.

His insightful feedback pushed me to sharpen my thinking and brought my work to a higher level. In addition, I would like to thank my family for their wise counsel and sympathetic ear. Finally, I could not have completed this dissertation without the support of my friends, who provided stimulating dis-

(4)

I hereby declare that the presented thesis is my own work and that I have cited all sources of information in accordance with the Guideline for adhering to ethical principles when elaborating an academic final thesis.

I acknowledge that my thesis is subject to the rights and obligations stip- ulated by the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular that the Czech Technical University in Prague has the right to conclude a license agreement on the utilization of this thesis as a school work under the provisions of Article 60 (1) of the Act.

In Prague on June 25, 2021 . . .. . .. . .. . .. . .. . .. . .

(5)

This thesis is school work as defined by Copyright Act of the Czech Republic.

It has been submitted at Czech Technical University in Prague, Faculty of Information Technology. The thesis is protected by the Copyright Act and its usage without author’s permission is prohibited (with exceptions defined by the Copyright Act).

Citation of this thesis

Woolla, Manasa. Integration of Robotic Process Automation and Optical Char- acter Recognition. Master’s thesis. Czech Technical University in Prague,

(6)

Neustálý pokrok v digitáln´ım svˇetˇe si ˇzádá vytváˇren´ı rychlejˇs´ıch technologi´ı, které pomohou organizac´ım udrˇzet krok. I pˇres to, ˇze souˇcasné technologie poskytuj´ı nepˇreberné mnoˇzstv´ı zp˚usob˚u, jak pomoci zamˇestnanc˚um s jejich kaˇzdodenn´ımi úkoly, existuje stále mnoho organizac´ı, které zpracovávaj´ı ru- tinn´ı agendu ruˇcnˇe. Mus´ı neustále opakovat obdobné úkony za pomoci office aplikac´ı a pˇrizp˚usobených informaˇcn´ıch systém˚u. Tato práce poskytuje pˇrehled, jak robotizace proces˚u (RPA) integrovaná s optickým rozpoznáván´ım znak˚u (OCR) m˚uˇze pomoci s t´ımto problémem. Popisuje r˚uzné dodavatele RPA a OCR, analyzuje jejich notace a nastiˇnuje, jakou hodnotu pˇrináˇs´ıorgani- zac´ım. Nav´ıc zprostˇredkovává pohled na implementaci ˇreˇsen´ı integrován´ı RPA a OCR technologi´ı spolu s hodnocen´ım jejich výhod a nevýhod. Výsledkem této práce je prototyp s integrovaným RPA a OCR pˇredstavuj´ıc´ı ˇreˇsen´ı pro proces managementu objednávek.

Kl´ıˇcová slova automatizace, robotická automatizace proces˚u, optické roz- poznáván´ı znak˚u, RPA, OCR, správa objednávek, O2C.

(7)

The constant advancement of the digital world necessitates the creation of faster technology that will assist organizations in staying on track. Even though the current technologies offer numerous ways to support employees in their everyday tasks, many organizations still handle tedious administrative work manually. They must repeat similar activities on a regular basis using office applications and customized information systems. This thesis provides an overview of how Robotic Process Automation integrated with Optical Char- acter Recognition can address this problem. This work describes various RPA and OCR vendors, analyses their notations and outlines the value that they bring to organizations. Furthermore, it aims at the notion of implementing a solution integrating RPA and OCR technologies along with assessing their pros and cons. A prototype of an integrated RPA and OCR model as a solution for an order management process is the product of this work.

Keywords automation, RPA, robotic process automation, OCR, optical character recognition, O2C, order management.

(8)

1 Introduction 1

1.1 Motivation - What makes OCR a perfect match for RPA? . . . 1

1.2 Limitations of RPA . . . 2

1.3 Goals . . . 3

1.4 Structure of the thesis . . . 4

2 State-of-the-art 5 2.1 Analysis of RPA Tools . . . 5

2.1.1 RPA Vendors . . . 5

2.1.1.1 UiPath . . . 7

2.1.1.2 Automation Anywhere . . . 9

2.1.1.3 BluePrism . . . 10

2.1.1.4 Kryon RPA . . . 11

2.1.1.5 Summary . . . 12

2.2 Analysis of OCR Tools . . . 13

2.2.1 OCR Vendors . . . 14

2.2.1.1 Adobe Systems . . . 14

2.2.1.2 ABBY . . . 14

2.2.1.3 Google . . . 15

2.2.1.4 Amazon . . . 15

2.2.1.5 Summary . . . 16

2.3 Summary . . . 16

3 Neural Networks with RPA vs OCR in RPA 18 3.1 Neural Networks with RPA . . . 18

3.1.1 What is Neural Network? . . . 18

3.1.2 Downsides of Neural Network . . . 19

3.1.3 RPA with Neural Network . . . 20

3.1.4 An Example: Case Study . . . 21

(9)

3.1.4.3 The Solution . . . 21

3.2 RPA with OCR . . . 22

3.2.1 What is OCR? . . . 22

3.2.1.1 Classification of Documents . . . 23

3.2.2 OCR in RPA . . . 24

3.2.3 RPA OCR Compatibility . . . 25

3.3 Summary . . . 26

4 Goals Revisited 27 5 Analysis and Design 28 5.1 Examine the UiPath notation . . . 28

5.1.1 Flowchart . . . 28

5.1.2 Sequence . . . 29

5.1.3 State Machine . . . 29

5.1.4 Global Exception Handler . . . 29

5.1.5 Data . . . 29

5.1.6 Choices . . . 29

5.1.7 Comments and Annotation . . . 30

5.2 OCR Activities in UiPath . . . 30

5.2.1 Get OCR Text . . . 30

5.2.2 Find OCR Text Position . . . 31

5.2.3 OCR Text Exists . . . 31

5.2.4 Double Click OCR Text, Click OCR Text and Hover OCR Text . . . 31

5.3 OCR Engines in UiPath . . . 31

5.3.1 Using Tesseract/ Google OCR . . . 32

5.3.2 Using Abbyy Cloud OCR . . . 33

5.4 Summary . . . 34

6 Implementation 35 6.1 Introduction . . . 35

6.2 Implementation . . . 36

6.2.1 Automation of the process in UiPath . . . 38

6.2.2 Configuration of the OCR Engine . . . 39

6.2.3 Combination of UiPath and Abbyy OCR . . . 43

6.3 Testing . . . 48

6.3.1 Testing in UiPath . . . 48

6.4 Summary . . . 50

7 Related and Future Work 51 7.1 Future Work . . . 52

(10)

9 Conclusion 55

Bibliography 57

A Acronyms 62

B Contents of enclosed CD 64

(11)

1.1 RPA Workflow . . . 1

2.1 Everest Group Process Automation(RPA) Products PEAK Matrix Assessment 2020 . . . 7

2.2 An Example of OCR Output . . . 13

3.1 A Neural Network . . . 19

5.1 Demonstration of UiPath OCR Activities with Google OCR . . . . 33

5.2 Demonstration of UiPath OCR Activities with Abbyy Cloud OCR with properties panel . . . 34

6.1 Flowchart representation of the problem scenario . . . 37

6.2 Main Process implementation in UiPath . . . 39

6.3 Sequence to check mailbox for unread emails . . . 40

6.4 Properties Panel to configure Mail Activity . . . 40

6.5 Activity for Delay . . . 41

6.6 Check for attachments and format . . . 41

6.7 Get OCR Text Activity . . . 42

6.8 Find OCR Text Activity . . . 43

6.9 Properties Panel for Abbyy Cloud OCR Engine . . . 44

6.10 Send Outlook Mail Message Activity . . . 44

6.11 Sample Prescription Used . . . 45

6.12 Sample Order Used . . . 46

6.13 Test Case in UiPath Studio Pro for ”Order Management Process” 49 6.14 UiPath Studio Pro Test Case Activity Coverage . . . 49

(12)

Chapter 1 Introduction

1.1 Motivation - What makes OCR a perfect match for RPA?

RPA, in general, is a technology that helps automate administrative tasks via software-hardware bots. These bots take advantage of user interfaces to capture the data and manipulate applications as humans do. For example, an RPA can look at a series of tasks taken in a GUI, say moving cursors, connect to APIs, copy-pastes the data, and formulates the same sequence of actions in an RPA wireframe that translates to code. Further, these tasks can be performed without human intervention in the future.

In a recent study, it was said that by automating only 29% of functions for a task using RPAs, finance departments alone save more than 25,000 hours of rework caused by human errors at the cost of $878,000 per year for an organization with 40 full-time accounting staff [1].

Figure 1.1: RPA Workflow [2]

Optical Character Recognition, as the name implies, refers to software which can electronically extract text from visual stimuli such as images and documents. A common case scenario is its use in reading a paper invoice and subsequently selecting the relevant data to be processed into a PDF file. It is often likened to the ’eyes of the robot’ and used for many types of documents, including common office files, contracts, pictures, invoices, and reports. Its

(13)

functions are not restricted to the identification of printed and clear texts only, as it can also decipher handwriting and process it into a digital database.

When combined with RPA, OCR displays remarkable capabilities. It is popular in financial institutions, where stacks of unstructured data in paper format are common. This can be problematic for tools solely reliant on RPA without an OCR function since RPA’s capabilities are limited to identifying and processing structured and electronic data.

An only-RPA tool needs an employee to manually enter information from the target file onto an electronic document (such as Excel or Word), after which the RPA tool can be made use of for automation. However, when OCR is added to the mix, there is no intervention required from an individual in the beginning. OCR scans and extracts relevant data from the target file, whether it is a contract, an invoice, or a ledger. Following that, it automatically lets the RPA bot take over for further processing. This method significantly reduces the number of steps that human workers must complete and as a result, it is both time and cost-effective.

Four of PepsiCo’s largest entities in Europe still required people to input invoice data and credit memo information by hand, but OCR enhanced the existing process. A three-month test of the technology involved 40,000 pages of content representing five languages. PepsiCo representatives reported that the technology ran smoothly despite the size of the project and the multilingual nature of the content. It caused the company to look for ways to implement similar OCR-driven processes worldwide due to the efficiency achieved [3].

A research paper on the topic ”Apply RPA (Robotic Process Automation) in Semiconductor Smart Manufacturing” displays the application of RPA and OCR technology in a factory. It is interesting to note that the application is not just limited to finance domain but can be extended to various areas such as manufacturing. [4]

The thesis’ main goal is to investigate the idea of RPA integrated with OCR and its use in various fields, as well as the advantages it brings to organizations. Moreover, I would like to explore the various ways in which my suggested approach will assist.

1.2 Limitations of RPA

Let us first address an important question - What are some things that RPA cannot do? According to some industry experts, the following is a comprehensive list of RPA’s disadvantages [5]:

• Improvement of processes or cognitive abilities. ”RPA is not a cognitive computing solution”, says Reddy Subramanyam, who works at a Tech Services Company. Instead, it is ”best suited for rules-based vs judgement-based processes”, as Gaston Mbonglou sees it. To work

(14)

recommends using ”smart AI and ML integrations that understand exceptions and can provide recommendations.”

• Data must be formatted for RPA to work. Michael Grant, an Account Executive, says that ”RPA requires structured data but 80% of enterprise data is buried in unstructured documents: emails, letters of credit, invoices, passports, sanction lists, etc.” Catherine Berten-Gutch adds to those limitations to include ”voice and call back processes and processes that require human subjectivity.” Although unstructured data is an issue for bots, other tools can be used to structure the data before using RPA bots.

• Image or graphic data reading and interpretation. Anurag Vishnoi, Head of RPA at Nokia, shared an experience where it wasn’t possible to ”read a network topology or some machine drawing.”

• Documents that are handwritten. One of our members, Dedan Kanyuira, says that handwritten documents present a challenge but it ”is slowly being addressed and hopefully in the next few years we will see more intelligent ’handwritten notes’ recognition.”

One way to overcome content processing drawbacks is by combining other intelligent technologies and integrating it into the system.

”RPA technology is mainly used for automating rule-based pro- cesses and mimicking human actions, such as processing an invoice and entering data into SAP or Oracle systems from a Microsoft Excel spreadsheet [6].”

explained Gopal Ramasubramanian, senior director, intelligent automation &

technology at Cognizant.

Unstructured data can be found in several formats, including documents, audio files, videos, emails, photos, and log files, to name a few. Unstructured data now makes up roughly 80% to 90% of all data [7]. Despite its abundance and importance, unstructured data is one of the most underutilized enterprise resources due to a lack of tools to retrieve and analyse it.

1.3 Goals

The following steps will be taken to achieve the final aim in the framework of this study. After the State-of-the-Art section, the steps will be revisited and, if necessary, redefined in greater detail.

1. Investigate the relation between OCR/Neural Networks and RPA.

(15)

2. Review available case-studies of bigger to medium sized RPA/OCR im- plementations.

3. Analyse documents where OCR seems to be mostly usable in a context of RPA.

4. Propose how the RPA and OCR can be smoothly combined.

5. Implement a prototype demonstrating an integration of RPA and an OCR engine in the context of Order Management Process.

6. Examine and comment on the findings obtained.

1.4 Structure of the thesis

The thesis will be divided into several parts, beginning with the Introduction (Chapter1), in which I will explain why I chose this topic and the objectives I have set for myself within the framework of this thesis.

Following the introduction, Chapter 2 will cover the state-of-the-art. In this chapter, I’ll describe Robotic Process Automation and go over some of the most popular products on the market right now. Furthermore, we will be looking into Optical Character Recognition solutions that are trending.

Chapter 3 shows a brief comparison with an alternate solution to RPA with OCR, which is RPA with Neural Networks. Chapter 2 and 3 should offer me with sufficient information to revisit and reframe my goals in greater depth in Chapter 4.

The next chapter, Analysis and Design, will compare the notation used by the RPA tool, UiPath, with the OCR language utilized by the majority of OCR vendors on the market. I will also go through how these notations are related and whether they allow for interaction. The possibilities for integrating RPA and OCR systems will be outlined in the chapter’s last section.

In the introduction section of Chapter 6, which is for Implementation, I will give an example of an order management process. Later, using one of the RPA technologies discussed in Chapter 2, the same example will be automated. The collaboration between RPA and the OCR system will be prototyped next. The testing section will be at the end of the chapter.

In Chapter 7, I will present a quick summary of related work and ideas for future work before finishing with an assessment of the work done in my thesis.

The thesis will be ended with an Evaluation section in Chapter 8, in which I will assess whether my objectives from Chapter 4 were met, and a Conclusion section in Chapter 9, which will be a summary of the thesis.

(16)

Chapter 2 State-of-the-art

In this chapter, I intend to provide a detailed discussion about the State-of- the-Art Robotic Process Automation as well as Optical Character Recognition technology. This chapter will provide a review of the tools currently trending in the market, following which, I will conclude on the tools that I intend to use for implementation of the solution that this thesis expects to address.

2.1 Analysis of RPA Tools

It is not surprising that the competition is fierce in the RPA market. In the past two years, this market has witnessed a three-fold increase in revenue, perhaps as a result of the craze around AI [8]. Before diving into the details of the leading competitors in the market, let us take a quick refresher about RPA.

Robotic process automation (RPA) is the use of software robots to perform simple or complex, repetitive tasks. Robots are entities that mimic human actions. A process is a set of steps that results in a relevant activity. Finally, automation is a process that is carried out with minimal human intervention.

2.1.1 RPA Vendors

There are currently more than 80 RPA vendors globally with most of Fortune 500 companies using RPA software [9]. For instance, UiPath claims that 8 out of Fortune 10 use their software [10]. According to IT Central Station’s in-depth analysis based on key factors such as user reviews, pros and cons, the following are the top eight vendors in this market [11].

• UiPath • Automation Anywhere

• Blue Prism • Kryon RPA

• Microsoft Power Automate • Blue Prism Cloud

(17)

• VisualCron • Workfusion

The pre-requirement to choosing a suitable RPA vendor is to identify the processes that needs to be automated. Following which, the key parameters to be considered while choosing a vendor can be addressed.

“The first rule of automation used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency.” Bill Gates [12].”

Some of the major criteria are summarized below:

• Product Specifications: When I say product specifications, I am talking about four different aspects: architecture, security, scalability and inno- vation. The product should be able to manage the type of automation (attended or unattended) that the organization plans to use. The choice of on-premise versus cloud implementation, security certification, Cit- rix versus desktop automation some of the key factors to consider when choosing a vendor.

• Integration with External Applications: Irrespective of the choice, it is most-likely that a product cannot meet all the needs. Hence, there is a need for good integration support with external apps that is simple and cost-effective to implement. Lack of integration will result in a high dependency on the RPA tool and vendor, which is undesirable.

• Training & Learning: Though training and learning may not seem to be a critical parameter at first glance, it is critical to know what kind of training materials are available for a given solution. The technology team, like any other software product, must be qualified to manage RPA production and maintenance. It will be a challenge if training content is not available or is prohibitively costly.

• Ease of Implementation: Customization and off-the-shelf tasks are all included in the Ease of Implementation category. All the major RPA vendors offer pre-built activities that can or cannot be personalized to meet your specific needs. Identifying the correct product requires a thorough understanding of what can be customized using the drag-and- drop feature or which features require intensive coding and what can be readily deployed. By evaluating these factors, deployment activity can be accelerated by reducing development time.

According to a report by Everest Group, which involved a study of 21 RPA technology vendors categorizing them into Leaders, Major Contenders,

(18)

and Aspirants based on their capabilities and offerings, the key players in this market are UiPath, Automation Anywhere and Blue Prism as shown in the Figure 2.1. Another report with a similar level of comprehension that favours the same vendors is the Gartner Magic Quadrant for Robotic Process Automation 2020 [13].

Figure 2.1: Everest Group Process Automation(RPA) Products PEAK Matrix Assessment 2020

[14]

As a result, I will dwell on these three RPA market leaders, who provide extensively used resources with similar functionalities and structure, but each with its own set of advantages and disadvantages.

2.1.1.1 UiPath

UiPath’s popularity is easy to gauge - according to the Everest Group, UiPath’s RPA company grew by 100 percent year over year in 2019 and added nearly 1,500 new customers in the first half of 2020 alone [15]. UiPath has grown into the market’s first RPA platform designed to support the entire automation lifecycle, from discovery to measurement. Its product line continues to be cutting-edge, with tools like process mining, embedded analytics, enhanced AI fabric components, SaaS-based RPA, and test automation being added to its conventional RPA offering capabilities [10].

(19)

Using a thorough and transparent evaluation methodology, The Forrester Wave™: Robotic Process Automation, Q1 2021 named UiPath a Leader with the highest ranking in each of three categories: Current Offering, Strategy, and Market Presence [16].

[UiPath] “. . .offers an enterprise-grade and innovative RPA solu- tion augmented by a large ecosystem of partners, making it a good fit for large, global enterprises with demanding needs for support and governance.”

- The Forrester Wave™: Robotic Process Automation, Q1 2021 Here is a list of technologies that are covered by UiPath:

• Citrix Automation • Cloud Automation

• Desktop Automation • Excel Automation

• GUI Automation • Macro Recorder

• Mainframe Automation • SAP Automation

• Screen Scraping • Web Automation

Using UiPath automation software, robots can be configured to simulate human behaviour on computer systems’ user interfaces. The components of the UiPath platform are UiPath Studio, UiPath Robot, and UiPath Orchestrator.

I will briefly define the application of each of the above-mentioned components in the UiPath framework.

1. UiPath Studio: It helps users to design Robotic processes in an interface.

It is a flowchart-based modelling tool. Thus, automation is faster and more convenient. Multiple people can contribute to the same workflow.

The presence of a visual signal that points out errors in the model, and a recorder that performs what the user executes, make modelling much easier.

2. UiPath Robot: It runs the processes designed in UiPath Studio. It works in both attended (working only on human trigger) and unattended environments (self-trigger and work on their own).

3. UiPath Orchestrator: It is a web-based platform that runs and manages Robots. It can deploy multiple Robots, and monitoring and inspecting their activities.

(20)

UiPath is one of the industry’s fastest RPA solutions, in some cases, 3-4 times faster than other RPA products. Its ease of development is also much superior to that of it’s competitors, where the need for considerably higher coding skills makes implementation far more time-consuming. They are pacing up the enterprise automation with AI and ML. Even their FTS release, 2020 concentrates high on Hyperautomation [17].

2.1.1.2 Automation Anywhere

Automation Anywhere is one of the most well-known RPA vendors, with strong and user-friendly RPA capabilities for automating human-driven business processes. AA is a web-based management system that gives businesses the ability to run end-to-end automated business activities.

The vision of their company, as stated by the CEO, is to see a world where every employee will work side by side with Digital Workers, taking the robot out of the human, making them exponentially more productive and far more fulfilled [18].

”Think ahead 10 years—about 1 billion children will be born on our planet. Children that will see Digital Workers as the norm.

We must get this right, not just in quality and execution but in ethics and morality as well.”

- Mihir Shukla, Chairman & CEO

AA allows building scripts to perform repetitive tasks instead of writing code. They can automate a wide range of tasks, from basic Windows setup to advanced networking and remote database management. Automation Any- where Architecture has 3 primary components -

1. Control Room - The Control Room acts as a central location for all RPA robots. From the control room, robots can be started, halted, stopped, or scheduled. The control room can be used to push and retrieve code.

Credentials and audit logs can be kept here as well.

2. Bot Creator - Bot Creator serves as the development environment. De- velopers build rule-based automations using a drag-and-drop technique, which are then moved to the control room and, if applicable, deployed.

3. Bot Runner – As its name suggests, it runs robots on dedicated machines.

It has a similar appearance to the bot creator component, but its primary function is to run robots. The bot runner’s execution is monitored from start to finish and reported to the control room.

Automation Anywhere allows you to create processes using 3 types of robots:

Task Bot, Meta Bot and IQ Bot. While using all three in one process is not

(21)

required for a successful result, understanding their functions and the types of processes that are particularly suitable to each is critical.

• Task bot: This robot is ideally suited to processes that are routine, follow rules, and contain structured data.

• Meta bot: A meta bot may be used instead of rewriting redundant code for processes. This robot is ideally suited for scalable, complex processes.

• IQ bot: IQ bot’s main goal and best use case is to organize unstructured data and develop its skills and efficiency by learning and improving with each process run.

This product was created from the ground up to meet the challenging needs of IT and business users in industries such as banking, finance, insurance, health care, manufacturing, logistics, and others. In exchange, companies can build software robots that can automate any operation from start to finish.

The enterprise platform includes a credential vault, role-based access control, and data encryption at rest and in motion, all of which are built on a strong foundation. Additionally, the product uses advanced cognitive technologies to automate business processes, making sense of unstructured data and allowing more complicated business processes to be automated.

2.1.1.3 BluePrism

Blue Prism is a Robotic Process Automation software development company based in the United Kingdom. Blue Prism is the oldest RPA vendor, founded in 2001 and they claim to have coined the term “Robotic Process Automation”

[19]. The company offers software robots that help automate clerical back- office operations in the same way that humans do. This tool is unique in the industry because it employs a Top-Down approach. In addition, Blue Prism provides a graphic designer without the use of recorders, scripts, or any other form of interference.

The four main components of Blue Prism are:

1. Process Diagrams - They are business workflows, that act as software programs. These diagrams use core programming principles to construct operating process flows like flow charts since Blue Prism is based on Java. The diagrams are graphical representations of workflows that can be used to develop, evaluate, alter, and scale business capabilities.

2. Process Studio - The area where Process Diagrams are created is called Process Studio. Business logic, object calls, control loops, and variables are all present in this Blue Prism component.

(22)

3. Object Studio - To automate activities, almost all enterprises need con- tact with external applications. Since this is not possible in Process Stu- dio, Object Studio is used. Visual Business Objects (VBO) are created in the Object Studio. These objects are nothing more than diagram- matical programs that communicate with external applications to carry out tasks.

4. Application Modeller - It is the functionality to create application models with Object Studio. The UI Elements of a target application are exposed to the Blue Prism software using this.

With this innovative RPA platform, companies can now deploy and manage a digital workforce composed not of humans but of software robots. As the bots take over the mundane tasks, the human workforce can now concentrate their energy and time on the business processes that directly affect the growth and success of the company. Blue Prism is easy to use. It is also designed for easy set-up and promises bank-grade security to organizations utilizing it.

2.1.1.4 Kryon RPA

Kryon is a leader in enterprise automation, offering the only platform on the market which encompasses both Process Discovery technology and Robotic Process Automation (RPA). The Kryon Full-Cycle Automation solution max- imizes ROI by 352% according to Forrester Research and cuts RPA implementation time by up to 80 percent [20]. Powered by proprietary AI technology, Kryon Process Discovery™ automatically generates a comprehensive picture of business processes, evaluates them, and recommends which ones to automate. Kryon offers desktop-based attended RPA, virtual-machine- based unattended RPA, or a hybrid combination of both. The company’s award-winning suite is used by enterprises worldwide, including AIG, Al- lianz, Deutsche Telekom, EY, Ferring Pharmaceuticals, HP, Kasikorn Bank (KBANK), Verizon, and Wyndham Hotel Group [21].

Kryon Architecture comprises of the following components:

• Kryon Process Discovery - Kryon Process Discovery provides complete insight into business processes. The Discovery Robots collect data that offers actionable information on all the processes in your enterprise that are suitable for automation and should be automated to save costs. In the Kryon Studio, the system then generates automation scenarios automatically.

• Kryon Studio - Business users create and manage automation process workflows in Kryon’s robust authoring Studio. Its user-friendly visual interface enables users to easily record tasks and drag and drop process operations with little or no learning curve, eliminating the need for in- house developers or external resources.

(23)

• Kryon Console - The web-based Kryon Console allows you to track, configure, schedule, and control your robotic workforce in real time.

Anyone in the enterprise can deploy and handle Kryon robots with ease thanks to the intuitive and interactive dashboard.

• Kryon Robots - Kryon Robots are the ultimate virtual staff, assisting you in making better automation decisions and automating processes.

From collecting data on user performance for Process Discovery to ex- ecuting tasks and optimizing performance, the Discovery, Unattended, and Attended Robots help end-to-end business processes.

• Kryon Admin - For optimum protection and scaling of RPA operations, the Kryon Admin allows to control roles and permissions. Users can create entirely different working environments and teams, as well as assign resources such as RPA developers, managers, and robots to each team, enabling in-team collaboration.

Kryon’s cutting-edge AI-powered platform now allows businesses to adopt digital transformation. They are redefining core parts of business infrastructure with technology to make business operations and work environments more effective.

2.1.1.5 Summary

RPA tools help you automate tasks so they can be performed quicker and more effectively than if they were completed by a single person. These are tools that provide strategies for performing time-consuming routine tasks, decreasing operational costs and reducing human error.

In a nutshell, robotic process automation allows enterprises to extend their capabilities and become more profitable. However, in many instances, ap- plying this technology necessitates rethinking processes and evaluating what problems an RPA may address as well as its disadvantages.

Robotic process automation is simple and easy to set up, but it takes a well-thought-out integration strategy to ensure success. First and foremost, we must remember that RPAs are structured to automate specific tasks rather than entire processes. While a robot can mimic human behaviour, it lacks the ability to adapt to change. As a result, if we change a procedure, a robot would be inefficient in attempting to solve its tasks, resulting in a far larger problem than if those tasks were performed by a human.

Despite its drawbacks, RPA technology is a critical component of a digital transformation strategy. RPA, as previously said, can handle specific tasks but is not designed to handle processes. Consequently, it seems reasonable to believe that when used in combination with other, more advanced tools, it can improve efficiency.

(24)

2.2 Analysis of OCR Tools

OCR is a specialized technology that recognizes text characters in images such as printed books, pictures, and scanned documents. It transforms text containing images into characters that can be edited, computed, and analysed by computers in subsequent steps. An example of how OCR digitizes text in a receipt is shown below.

Figure 2.2: An Example of OCR Output [22]

A market research report on ”Optical Character Recognition Market Size, Share & Trends Analysis Report By Type (Software, Services), By Vertical, By End Use (B2B, B2C), By Region, And Segment Forecasts” by Grand View Research estimated that in 2020, the global optical character recognition market was to be worth USD 7.46 billion. From 2021 to 2028, it is predicted to increase at a compound annual growth rate (CAGR) of 16.7% [23].

With the rise of digitization, businesses are investing heavily in technology to boost job efficiency and production. For example, in August 2020, Infosys Limited, an Indian multinational business, collaborated with Blue Prism Lim- ited, a U.K.-based multinational software corporation, to offer an AI-driven solution to automate helpdesk operations.

Optical character recognition technology has a number of advantages, including saving time spent manually entering data into a computer, improving work management, lowering the cost of translating documents to digital form, and minimizing manual errors, to name a few. For instance, THIRDEYE,

(25)

an AR/VR enterprise solutions supplier, teamed up with NuEyes, a vision impairment solutions provider, in May 2020. NuEyes will introduce smart glasses to help persons with vision loss as part of this collaboration. Third- Eye’s lightweight X2 MR Glasses with optical character recognition capability will be used in these smart glasses [24].

2.2.1 OCR Vendors

Optical character recognition (OCR) is the method of extracting text from images automatically. There are numerous tools and resources available today that are simple to use and make this job a no-brainer. I will compare some of the most popular tools in this section.

2.2.1.1 Adobe Systems

Adobe, based in San Jose, California, is a software company that specializes in the creation and dissemination of a wide range of content, including graphics, photography, illustration, animation, multimedia/video, motion pictures, and print.

Adobe Acrobat is an OCR system that helps you to convert scanned PDF files, images into searchable/editable documents. It provides custom fonts that look similar to printouts. The list of features include:

• You can instantly edit any printed document.

• It enables you to easily cut and paste the text into other applications.

• Acrobat enables you to export the file to Microsoft office.

• You can convert scanned documents to PDF file and move the data from one location to another.

• This tool helps you to keep the look and feel of documents like the original.

Adobe Acrobat Pro is capable of analyzing documents in a variety of ways.

It can analyze photographs as they are scanned into the computer, as well as existing images, PDF files, and other file formats after they have been converted to PDF.

2.2.1.2 ABBY

Through strategic relationships and incorporated technology licenses with top suppliers, Abbyy is a prominent player in the RPA and wider intelligent automation sector. Abby is a multinational firm with operations in 14 countries and headquarters in the United States.Intelligent Document Processing (IDP)

(26)

ABBYY FineReader is a tool that recognizes a full printed or handwritten page. It can detect more than 200 languages. This tool helps you to transform PDF/image to searchable MS Word, Excel, PDF, etc. format. The following are some of the features:

• It supports Mobile devices and desktop PC.

• This tool can recognize receipt and business cards.

• ABBYY FineReader provides REST (Representational State Transfer).

• It converts recognized data into XML (Extensible Markup Language).

• This tool provides a library for Java, .NET, iOS, and Python.

In Abbyy FineReader, before performing OCR, the system evaluates the structure of the document to identify any areas that contain text, images, tables, or barcodes. The results of the recognition are then displayed in the text window. Uncertain characters are highlighted in this window, allowing the user to rapidly detect and rectify potential problems within ABBYY FineReader.Users will be able to copy and paste text sections, search the text in PDF readers, and search the text in word processing tools since the files are editable and searchable.The text can also be imported into analytic software such as ATLAS.ti or NVivo.

2.2.1.3 Google

Google Cloud Vision is an API that can detect text in images. It allows you to convert PDF, PNG, JPEG, etc. file format to machine-readable text. Some of the characteristics are as follows:

• You can use this application on a computer, Android phone, iPhone, iPad, and more.

• It can detect handwriting in images.

• This tool can extract and save text from uploaded images.

• It triggers cloud function in order to save text to online storage.

• Google Cloud automatically detects image files located in the cloud.

2.2.1.4 Amazon

Amazon Textract is a service that helps you to extract text from scanned documents. You can use it to automate document workflow, process numerous documents quickly. The following are some of the characteristics:

(27)

• It identifies content written in form or table.

• This tool uses API to get data from documents.

• It automatically extracts data from forms.

• Textract can read virtually any documents.

• Automatically identifies key information.

• You can adjust document quality in percentage.

• It is integrated with Amazon Augmented AI service for document processing.

2.2.1.5 Summary

OCR is an excellent application that should be included in an organisation’s marketing strategy, as it has significant ROI. OCR technology plays an important role in the corporate world. If there are inaccuracies and data that is not available, the business area would have issues. Manual methods can take longer and have a higher risk of user error. As a result, using OCR technology not only avoids typical user errors, but it also digitizes the document in less time.

OCR technology has proved to be the best business assistant for complet- ing tasks more effectively and increasing productivity in the digital era.

However, as with every developing technology, there are drawbacks to OCR which needs to be addressed. OCR fits best with typed documents of decent quality. OCR tools cannot easily read handwritten papers. Similarly, non-Latin fonts and typed fonts that imitate handwriting cause a lot of errors during the OCR process.

There is no such thing as 100% accurate OCR software. The number of errors depends on the document’s quality and form, as well as the font used.

2.3 Summary

In this State-of-the-Art section, I attempted to provide an in-depth analysis of both RPA and OCR tools. It can be concluded that both RPA and OCR tools are of significant value in various industries and business processes in terms of both time and cost cutting. However, what was also observed that these tools are not designed to function fully on their own. As I stated earlier, RPA rather focuses on task-level automation and defines more specific workflows.

Similarly, OCR technology’s focus is solely on data extraction which on its own does not generate value.

The outcome that can be incurred from this is that an integration of the

(28)

combination with OCR can act as a human substitute for user tasks in business processes.

(29)

Chapter 3 Neural Networks with RPA vs OCR in RPA

The purpose of this chapter is to compare two alternative technologies for dealing with the data extraction challenge in depth. I will try to explore the benefits of both Neural Networks and Optical Character Recognition. In order to compare the two fairly, I will also take into account the disadvantages. I will summarize my observations at the end of this chapter.

3.1 Neural Networks with RPA

3.1.1 What is Neural Network?

Artificial neural networks (ANNs) are made up of three layers: an input layer, an output layer, and one or more hidden layers that are used to find patterns in data. Each time it processes a set of input, it assigns a weight to a neuron inside the hidden layer.

Neural networks are made up of layers of processing units called neurons, each with its own set of connections. Data is transformed by these networks until it can be classified as an output. Each neuron multiplies an initial value by a weight, adds the results to additional values flowing into the same neuron, modifies the result by the bias of the neuron, and finally normalizes the output with an activation function.

The iterative learning process in which records (rows) are introduced to the network one at a time and the weights associated with the input values are changed each time is a key feature of neural networks. The procedure is always replicated after all cases have been addressed. The network trains by changing the weights to predict the correct class label of input samples during this learning process. The following is a list of widely used applications of Neural Networks -

(30)

Figure 3.1: A Neural Network [26]

• Predictive Analytics

• Deep neural networks can be used to interpret complex image and text data, enabling the bots to determine what actions need to be carried out to handle this data in the manner the user has specified, even if the actions the bot takes is strictly rules-based. For instance, convolutional neural networks can be used to allow a network to interpret images on a screen and react based upon how those images are classified.

• Recurrent neural networks are well suited to language problems.

• In reinforcement learning, the machine learns from experience. It collects the training examples through trial-and-error as it attempts its task, with the goal of maximizing long-term reward

Given this explanation of neural networks (NNs), how they function, and their real-world applications and uses, it’s no surprise that NNs have been widely applied to real-world problems in business, education, economics, and a variety of other fields. NNs can be used to identify and categorize data in addition to optimization approaches [27], intrusion detection [28], and data categorization [29].

3.1.2 Downsides of Neural Network

Given their high precision, it is difficult to understand why Neural Networks are not used more often. Neural Networks, as predicted, have a few drawbacks.

(31)

Since neural networks need more computing power than traditional predictive methods, they are more costly. Furthermore, training Neural Networks necessitates a large amount of data, which is not always available. Neural Networks are also akin to a ”black box,” in that you can see the data that goes in and the result that it generates, but you cannot really understand what happens in between. This makes it difficult for humans to fine-tune the algorithm, and it is also difficult to predict what the network will deliver in a new scenario.

3.1.3 RPA with Neural Network

RPA and AI are two distinct technologies that address different types of problems. RPA, for example, is a process-driven application that automates non- value-added, repetitive activities in business processes. AI (Neural Networks), on the other hand, is a Data-Driven application that learns from previous com- putational outputs (data) to make reliable decisions and outcomes.

Both technologies have different strengths in solving real-world problems.

As mentioned for RPA, it aims to automate repetitive, time-consuming business processes and hence, RPA is more on the “execution side”. Vice versa, AI aims to enable machines (computers) to perform intellectual tasks such as problem solving, decision making, perception and understanding human languages (Natural Language Processing). In short, AI is more on the “thinking”

side.

Think of RPA is your “hand” and AI is your “brain”, by integrating AI cognitive capability into RPA technologies, you own a virtual employee to execute your business processes.

AI technology enhances RPA performance by augmenting cognitive capability into it. AI technology allows RPA to further automated business processes and mitigates the needs of human intervention in between huge processes.

With the integration of RPA & AI technologies into the business process, the benefits that come with it are:

• Greater cost reduction

• Able to automate higher-order processes

• Better efficiency and accuracy

In one of the research studies, a problem of various domains, Anomaly Detection (AD) was addressed using RPA with the support of Artificial Intel- ligence and Machine Learning, The study demonstrated that deep learning, as the most advanced technology, can be used in conjunction with classical anomaly detectors to help and improve RPAs in title insurance (TI) [30].

(32)

3.1.4 An Example: Case Study

After extensive testing, document processing company A-Scan chose Rossum [31] as its primary invoice data extraction solution. A-Scan based its decision on Rossum’s abilities to reduce the cost of data extraction per document, simplify and accelerate onboarding, and update its solution quickly and con- sistently.

3.1.4.1 About A-Scan

A-Scan is a Central European company that specializes in document processing, scanning and data extraction as a BRP provider [32]. Most of their clients use their services on a long-term basis, outsourcing hundreds of thousands of invoices per year. Another segment of A-Scan’s clientele commissions then to scan and extract data from thousands of documents for one-time projects such as real estate agreements or legal documentation. Their clients include multinational corporations and government institutions. To provide their services effectively, A-Scan trains interns and seasonal employees to work with the software that the company uses.

3.1.4.2 The Problem

Several of A-Scan’s clients requested a solution that could extract relevant data from scanned documents and enter that data directly into their systems. A- Scan started looking into how they could automate and accelerate this process while eliminating the exhausting and inefficient chore of retyping.

The initial solution that was available at the time was a template-based automation system. After several months, they had grown dissatisfaction with the solutions high’s costs and low accuracy, as well as the labour-intensive setting up templates for every supplier. Fifty percent of the invoices were still being manually extracting data.

3.1.4.3 The Solution

A search for alternatives led A-Scan to the cloud-hosted, AI powered cognitive data capture solution. After testing extraction accuracy and speed, A-Scan decided to use Rossum to pull data from invoices that were being processed manually.

Now, A-Scan uses this solution for 91% of the invoices and rest in handled by the previous template-based solution with a single legacy client. This AI powered solution provides greater accuracy and is much faster that any of its alternate solutions. To process an equal number of documents, Rossum’s total cost of data extraction and employee on-boarding is less than half of the template-based solution A-Scan was previously using.

(33)

3.2 RPA with OCR

3.2.1 What is OCR?

OCR is a technology used to extract text from images and documents via mechanical or electronic means. It converts typed, handwritten or printed text into machine-encoded text – this data can then be used in electronic business processes without someone manually capturing it.

OCR has been around in various forms for more than 100 years, but unlike the earlier versions of the technology that need to be trained one font at a time with images of each character, today’s artificial intelligence (AI) powered OCR solutions can recognize and capture data from machine printed documents with high levels of accuracy. Their ability to accurately decipher handwritten text is also rapidly improving.

Businesses that employ OCR capabilities to convert images and PDFs (typically originating as scanned paper documents) save time and resources that would otherwise be necessary to manage unsearchable data. Once transferred, OCR-processed textual information can be used by businesses more easily and quickly. The benefits of OCR technology to businesses include:

• Elimination of manual data entry

• Resource savings due to the ability to process more data faster and with fewer resources.

• Error reductions

• Reallocation of physical storage space

• Improved productivity

OCR capabilities, the ability to extract machine-printed text from a digital image, is only one aspect of a data capture solution. Data can be extracted from documents in many different formats—hand printed text (ICR), check boxes (OMR), bar codes, etc.

Robust data capture solutions handle multiple document formats and can be used with both electronic and paper documents, eliminating paper and reducing manual identification and data entry of document content into other systems.

It is interesting to note that the use of OCR technology is not limited to office paperwork. A technical paper proposes the design of a low-cost braille printer that uses OCR technology (Optical Character Recognition) to detect text and images for braille translation or tactile embossing via a mobile application using an algorithm [33].

Another research work presented in the 2018 10th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA) proposes

(34)

studies on an improved video text recognition technology based on OCR (Op- tical Character Recognition) [34]. The edge analysis algorithm is used to extract superimposed text in video, the SVM method is used to divide all the pixels into text and non-text pixels, and an RGB color image is converted to HSV color space. The simulation results show that this method is em- ployed in video text recognition, and that it has greater recognition accuracy, text location accuracy, and video text detection performance than standard methods.

3.2.1.1 Classification of Documents

Extracting document data is an important part of document comprehension.

In this section, I will discuss about different kinds of documents in terms of templates, style, formatting. To address this problem, there is a need to use both rule-based approaches and model-based approaches within OCR to handle data from different document structures. I will begin by classifying the documents in to three broad categories. Based on type of the document in question, an appropriate OCR solution can be chosen.

• Structured Documents: The layouts and templates are usually fixed and nearly consistent in this type of publication. Consider a company that performs KYC using government-issued IDs such as a passport or driver’s license. All of these documents will be identical, with the same ID number, name of person, age, and a few other fields in the same places. Only the details differ. There may be a few limits, such as a table that is overflowing or data that has not been submitted.

The most common method for extracting information from structured texts is to use a template or rule-based engine. Regular expressions, basic position mapping, and OCR are examples of these. As a result, we may either leverage pre-existing templates or write rules for our structured data to incorporate software robots to automate information extraction. The rule-based approach has one disadvantage: because it relies on fixed elements, even tiny changes in form structure can cause rules to fail.

• Semi-Structured Documents: The information in these documents is the same, however they are in different places. Consider bills with 6-10 identical fields, for example. The merchant address can be found at the top of some invoices, but it can also be found at the bottom of others.

These rule-based techniques typically do not provide high accuracies, so we incorporate machine learning and deep learning models into the picture for OCR information extraction. In some circumstances, we can also utilize hybrid models that combine rules and machine learning models. FastRCNN, Attention OCR, and Graph Convolutions are a few

(35)

popular pre-trained models for extracting information from documents [35]. However, these models have a few flaws, thus we use metrics like accuracy and confidence score to assess the algorithm’s effectiveness.

Because the model learns patterns rather than following strict rules, it may make mistakes at first and then rectify them. The remedy to these flaws, on the other hand, is that the more samples the ML model processes, the more patterns it learns to ensure correctness.

• Unstructured Documents: RPA is currently unable to manage unstructured data directly, necessitating the use of OCR to extract and create structured data. Unstructured data, unlike structured and semi- structured documents, does not have key-value pairs. For instance, we notice a merchant address without a key name in a few invoices; like- wise, we find the same thing in other fields like date and invoice ID.

For machine learning models to process these correctly, robots must first learn how to interpret written text into actionable data, such as an email address, phone number, or address. The model will then learn to extract 7- or 10-digit number patterns as phone numbers, as well as large text containing five-digit codes and other nouns as text. We can also employ Natural Language Processing (NLP) techniques like Named Entity Recognition and Word Embedding to improve the accuracy of these models.

Overall, understanding the data is critical before implementing OCR for document processing. Instead of sketching out a process step by step, we may educate a robot to ”do as I do” by documenting the process as it happens using powerful OCR capabilities, as stated above, and combining rules and machine learning algorithms.Your clicks and actions on the screen are tracked by the software robot, which then converts them into an editable workflow.

3.2.2 OCR in RPA

Organizations can use OCR in RPA to automate a bigger proportion of their operational business operations, particularly those that rely heavily on scanned documents, such as customer-completed forms.

Surface connectivity features are included in an RPA+OCR solution, allowing the RPA solution to pull data from pictures, PDFs, and remote applications. Not only may this information be read, but it can also be used to accomplish actions. The result is textual output that is precise, quick, and high-quality, allowing robots to automate more activities while lowering operating and training expenses.

Within the field of RPA, there are two basic groups of OCR business cases.

The first involves turning unstructured data from scanned documents into structured, digitized data that may then be used in digital business processes.

(36)

by the solution. After that, the information can be sent to any enterprise program, such as CRM, ERP, or a legacy system.

The second entails more sophisticated automation capability. Surface connectivity, for example, might be used to automate programs on faraway machines. Advanced RPA OCR could be used to read the image and extract the required text from the application’s screen image or simulation. Organizations can use this feature to automate more processes and grow their automation projects.

Advanced OCR is a critical element for any enterprise-grade platform as RPA progresses towards cognitive automation, which can handle more complicated jobs than the repetitive operations at which RPA thrives. Cognitive technologies mainly replicate human abilities, such as reading in the case of OCR. To facilitate the automation of increasingly complicated business processes, a solution can be created to interact with cognitive and AI capabilities such as chatbots and machine learning technologies, in addition to OCR.

Cognitive automation (RPA+OCR+AI) has a tremendous payoff. Cogni- tive technology can lower your invoice and PO processing costs by 30% to 50%

by harnessing the same technologies used in self-driving cars – computer vision and machine learning [36]. Not to mention the added benefits of decreasing errors and processing time, as well as scaling on demand to accommodate seasonal business fluctuations.

3.2.3 RPA OCR Compatibility

The RPA tools can interact with the APIs provided by OCR tools and absorb data into the system. Some of the best RPA tools in the market now provide built-in OCR tools by several vendors to make the build even simple and hassle free.

The open-source OCR tool Tesseract is used as an illustration of the possibilities that can be obtained by integrating the RPA tool to OCR. It was initially a commercial OCR application from HP, but it is now developed under the Apache license by Google. Tesseract’s automation capabilities are based on OCR (Optical Character Recognition) technology. Data entry into workflows, as well as the capacity to convert text images (scanned text) into editable characters, allowing the content to be further searched, edited, or copied. The RPA tool can then freely manage the modified digital data in accordance with the working rules of the process into which it is input, modified, and so on.

It is wise to note that combining RPA and OCR isn’t as straightforward as adding one and one. RPA adds a great deal of value to OCR. Typically, OCR is only used to process documents with a lot of structure. It can, however, handle and analyze unstructured data with the help of RPA. RPA bots can also adapt to changing settings and improve data collecting and analysis processes, something OCR alone cannot do.

(37)

RPA and OCR are complementary technologies, and some companies have realized the benefits of combining the two. Companies have begun to seam- lessly integrate them to automate processes from beginning to end. RPA and OCR can help innovative minds achieve the kind of efficiency that gives them a competitive advantage.

3.3 Summary

As per the presented discussion about both technologies, though Neural Net- works can be thought of as a good solution, it falls behind when put in comparison with the OCR capabilities in particular areas of business processes where paperwork is the centre of attention.

It is the efficiency and accuracy of OCR tools and various ways of integrating them into RPA systems that make it a better choice. Using the concepts of Document Understanding, classifying different types of documents before processing is one of the greatest advantages as it gives the RPA tool room for setting up different processes for each type.

I would end this section by stating that OCR is more cost effective when compared to Neural Networks as it is a much simpler technology with many competing vendors offering good pricing for large scale projects.

(38)

Chapter 4 Goals Revisited

In the State-Of-The-Art section, I discussed the idea of Robotic Process Au- tomation and reviewed a few RPA vendors, and a similar approach was fol- lowed for Optical Character Recognition and OCR tools. The overall conclusion was that RPA and OCR would complement each other to boost the customer experience and accelerate the company’s digitalization process. As stated previously, with both RPA and OCR having distinct set of functionalities, the later serving its output as input to the former for actual processing, makes them a good fit. Most of the RPA tools available on the market are trying to keep up with the trend of using OCR, therefore, are trying to integrate it with their existing systems. After some analysis, I am confident enough to claim that I have gathered enough information to revisit and redefine my objectives, which were previously established in Section 1.3.

1. Select Robotic Process Automation and Optical Character Recognition notations for future work and inspect them.

2. Analyse how to combine Robotic Process Automation with Optical Char- acter Recognition Tools

3. Introduce an example in the area of Order Management, suitable to be automated with Robotic Process Automation tools.

4. Demonstrate how the Example can be automated by using one of the already reviewed RPA and OCR tools, such as UiPath and Abbyy Finereader.

5. Implement a prototype that uses the integrated solution of UiPath and Abbyy Finereader OCR Engine in the context of Example.

6. Evaluate the work and summarise the gained knowledge.

(39)

Chapter 5 Analysis and Design

The goal of this chapter is to look at how RPA and OCR tools may work together. As a result, I will need to do a deeper dive into RPA and OCR notation. Understanding the notation used by both systems aids in the modelling of processes and the creation of optimized workflows. The layout diagrams, which are used for mapping activities, as well as the representation and flow of data, will be reviewed in general. I chose to work with UiPath because it has gained a lot of attention in recent years as one of the top RPA tools available on the market, and it offers a free Community edition of the UiPath platform. The next phase will be to investigate how RPA processes may be successfully coupled with OCR tools after the notation overview.

5.1 Examine the UiPath notation

In UiPath, process mapping is accomplished by creating workflows in UiPath Studio. When creating a workflow file, UiPath provides four diagrams for combining activities into a functioning structure: Flowchart, Sequence, State Machine, Global Exception Handler.

5.1.1 Flowchart

Flowcharts are more flexible when it comes to integrating operations, although they tend to lay out a workflow in a two-dimensional format. Flowcharts are ideal for displaying decision points within a process because of their free shape and visual attractiveness. Large workflows are prone to uncontrolled interweaving of activities because arrows that might point anywhere mimic the unstructured GoTo programming statement.

(40)

5.1.2 Sequence

Sequences are best suited for simple circumstances where actions follow each other and have a simple linear depiction that runs from top to bottom. They are useful in UI automation, for example, where navigating and typing are done one click/keystroke at a time. Sequences are the ideal style for most workflows since they are simple to put together and understand.

5.1.3 State Machine

A State Machine is a complicated structure that can be visualized as a flowchart containing conditional arrows, known as transitions. I found it ideal for a common high-level process diagram of transactional business process templates since it allows for a more compact description of logic.

5.1.4 Global Exception Handler

The Exception Handler is intended for use in both small and large automation projects, with the goal of discovering execution problems and, more significantly, deciding the workflow action when they occur. If an execution error occurs while debugging, the Global Exception Handler can be configured to intervene and check the workflow’s behaviour using the options previously provided in the Exception Handler.

5.1.5 Data

When it comes to visibility and life cycle, data comes in two forms: arguments and variables. Variables are connected to a container inside a single workflow file and can only be utilized locally, whereas arguments are used to exchange data from one process to another. To eliminate clutter in the Variables panel and to show only what is important at a given stage in the workflow in auto- complete, variables should be kept in the innermost scope. Although it is not the best practice, if two variables with the same name exist, the one defined in the most inner scope takes precedence.

5.1.6 Choices

To enable the Robot to behave differently under various scenarios in data processing and application interaction, decisions must be incorporated in a workflow. The visual structure and readability of a process are greatly in- fluenced by selecting the most effective representation of a condition and its subsequent branches.

• If Activity: If activity partitions a sequence vertically, making it ideal for short balanced linear branches. When more criteria need to be chained

(41)

in an If...Else If,. problems arise, especially when the breadth or height of the branches exceeds the available screen size. Nested If statements should be avoided as a rule to keep the workflow simple and linear.

• If Operator: The VB If operator is highly beneficial for small local conditions or data computing, and it can reduce a full block to a single action in some cases.

• Flow Decision: Flowcharts are useful for displaying critical business logic and circumstances such as nested If statements or If... Else If structures.

Even within a Sequence, a Flowchart can look fine in some scenarios.

• Switch Activity: To simplify and compact an If... Else If cascade with unique conditions and actions per branch, the Switch activity can be used in conjunction with the If operator.

• Flow Switch: The Flow Switch activity selects the next node based on the value of an expression. In flowcharts, Flow Switch is like the procedural Switch action. By starting more connections from the same switch node, it can match more than 12 scenarios.

5.1.7 Comments and Annotation

Annotations and the Comment activity should be used to describe in greater depth a technique or the specifics of a particular interaction or application behaviour. Keep in mind that other people may come across a robotic project at some point, and you might try to explain the process to them.

5.2 OCR Activities in UiPath

Certain applications are incompatible with the use of standard scraping or UI automation methods in specific cases. Activities in UiPath Studio that use OCR technology scan the entire machine’s screen, locating all the displayed characters. This allows the user to design automations based on what is visible on the screen, making virtual machine automation easier.

5.2.1 Get OCR Text

Using the OCR screen scraping approach, ’Get OCR Text’ retrieves a string and associated information from an indicated UI element. This activity, along with a container, can be generated automatically when performing screen scraping. The Google OCR engine is used by default, but you may easily switch to Abbyy or Microsoft. This activity takes a Target as input, which can be a Region variable, a UiElement variable, or a selector, to assist you figure out what you want to automate and where the actions should be executed.

(42)

The target can also be generated automatically using the Indicate on Screen feature, which looks for UI items in the indicated region and generates selectors for them. This activity outputs a string variable containing the text found in the UI element, as well as a TextInfo variable with the screen coordinates of all the words found.

5.2.2 Find OCR Text Position

Find OCR Text Position scans a UI element for a given string and returns a UIElement variable containing the string. This activity might help you locate UI elements on the screen in relation to text.

5.2.3 OCR Text Exists

OCR Text Exists uses OCR technology to determine whether a text occurs in a given UI element and returns a Boolean variable that is true if the text exists and false otherwise. This activity is beneficial in all forms of text-based automation since it allows you to make decisions based on whether or not a specific string is displayed, or it may be used as a Condition in the Retry Scope activity to conduct certain activities in a loop.

5.2.4 Double Click OCR Text, Click OCR Text and Hover OCR Text

Hover OCR Text, Double Click OCR Text, and Click OCR Text all use OCR to scan the machine’s screen for text and perform actions on it. Text recognition automations will usually continue to work if graphic elements change but the text does not. These are particularly helpful activities in virtual machine settings for automating fundamental actions.

5.3 OCR Engines in UiPath

UiPath offers a selection of OCR Engines as part of the OCR Activities that was discussed in the previous section. The choice of an OCR Engine can be narrowed down by finding answers to the following questions.

• Which OCR engine is the most compatible with UiPath?

• What are the differences in processing rates between paid and free versions of the OCR?

• Which engines are capable of reading scans of poor quality?

• Which is the best option for handwritten materials?