Hlavní práce72165_shua00.pdf, 1.1 MB Stáhnout

(1)

Prague University of Economics and Business Faculty of Informatics and Statistics

Application of Semantic Web Technologies to Verification of

Business Requirements

MASTER THESIS

Study program: Knowledge and Web Technologies Field of study: Applied Informatics

Author: Bc. Anastasia Shuvalova Supervisor: prof. Ing. Vojtěch Svátek, Dr.

Consultant: Mgr. et Mgr. Karel Macek, Ph.D.

Prague, May 2021

(2)

Acknowledgements

I would like to thank my supervisor prof. Ing. Vojtěch Svátek, Dr. for his continuous guidance and valuable advice, the consultant Mgr. et Mgr. Karel Macek, Ph.D for his assistance, my beloved family for their moral support.

(3)

Abstrakt

V rámci současné diplomové práce měl autor za cíl rozpracovat částečnou automatizaci Change Management procesů pro rozsáhlý podnik. Hlavním cílem bylo navrhnout řešení, které umožní zaměstnancům bez programátorských dovedností upravit automatické ověřování požadavků na změnu podle aktuálních potřeb. Dalším cílem bylo využít potenciál technologií sémantického webu pro modelování (pomocí RDF a OWL) a ověřování (pomocí SWRL) poža- davků. Byla vytvořena ontologie popisující problematickou oblast. Byl navržen mechanismus pro validaci založený na pravidlech SWRL. Tato pravidla jsou později k dispozici pro úpravy pro zúčastněné strany. Řešení integruje ontologii OWL a sadu navržených pravidel SWRL a je reprezentováno ve formě REST API napsaného v jazyce Python. API provádí ověření vstupních dat týkajících se nově vytvářených požadavků a vrácí odvedené výsledky koncovým uživatelům.

Klíčová slova

Semantický web, Ontologie, Řízení změn, OWL, SWRL

Abstract

Within the current master thesis the author intended to elaborate on the partial automation of the Change Management process for the large-scaled enterprise. The main aim was to design the solution that allows employees without programming skills to adjust the automated validation of the Requests for Change for the current needs. Another goal was to leverage the potential of Semantic Web technologies for modeling (with RDF and OWL) and verifying (with SWRL) the requirements. The required ontology describing the problematic area was created. The validation mechanism based on the SWRL rules than was proposed. These rules are later made available for modifications for the stakeholders. The solution integrates the OWL ontology and the set of designed SWRL rules and is represented in the form of the REST API written in Python language. The API performs the validation of the incoming data regarding the newly created requests and returns the obtained results to the end users.

Keywords

Semantic web, Ontology, Change Management, OWL, SWRL

(4)

List of Figures

2.1 Informal graph of the example triples (5) . . . 15

2.2 Semantic reasoner schematic work-flow . . . 18

4.1 The Request for Change creation form . . . 28

5.1 Machine learning work-flow . . . 30

5.2 "Hard-coding" work-flow . . . 30

5.3 Semantic Web technologies work-flow . . . 31

7.1 Base classes of the created RFC Ontology . . . 38

7.2 High-level implementation work-flow . . . 51

7.3 The solution architecture diagram . . . 53

7.4 The request form filled with insufficient lead time . . . 54

7.5 The notification of insufficient lead time . . . 54

7.6 Testing methodology (22) . . . 55

(7)

List of Tables

7.1 Mapping for the classes and properties corresponding to requirements . . . 56

(8)

List of abbreviations

RFC Request for Change ITSM IT Service Management ITIL IT Infrastructure Library DC Data Center

BU Business Unit ITS IT Services CI Control Item

REST Representational State Transfer API Application Programming Interface URI Uniform Resource Identifier

IRI Internationalized Resource Identifier

XML eXtended Markup Language HTTP HyperText Transfer Protocol HTML HyperText Markup Language RDF Resource Description Framework RDFS RDF Schema

JSON JavaScript Object Notation SWRL Semantic Web Rule Language SQL Structured Query Language OWL Web Ontology Language

SPARQL SPARQL Protocol and RDF Query Language

(9)

Introduction

This master thesis is dedicated to the topic “Application of Semantic Web Technologies to Verification of Business requirements”.

The current topic was found applicable during authors employment in the the IT department of a large-scale enterprise that deals with logistics. The company strictly controls all changes in its IT infrastructure. The process for the control was set up to be governed by the ITIL methodology. As per the ITIL guidelines, all potential changes must be preliminary formulated in a strict and precise way. These changes are also requested in a specific manner.

In order for the change to be approved and progressed, all possible effects on every part of the system or correlated services must be agreed in advance. The analysed effect includes the impact on the timelines, accountable and responsible individuals and related assets. These requests are further processed by responsible team called the Change Management team. The intention of this master thesis is to:

• Utilize the potential of the semantic web technologies for modeling (using RDF and OWL) and verification (using SWRL) of the requests

• Decrease the workload on the Change Management team members by providing them a possibility to add the rules for automatic validation of the records.

The main reason for the choice of the semantic technologies for the implementation of the task is the current process that the corporation utilizes. The current implementation of the requests validation, namely the ordinary manual verification of each incoming request, seems drastically outdated for a qualified professional. There exists a certain room for improvement of this process. The ultimate goal is to make the process more swift and efficient. Another target is to introduce new technologies to the corporate business flows promptly. This intention is of extreme importance, since the mentioned technology yet has never been applied before within the business processes. There are other areas within the company that have a possibility of improvement with the utilization of the principles of linked data.

The author made a decision towards initiating the change in the change request validation.

Another reason behind the choice the mentioned domain is the personal interest of the author.

The author had experienced the problematic nature of the very process of creation of the request for change. It concerns even the minor changes within the framework of working responsibilities.

The web-form that is used for the submission of the request for change was designed to be fairly straightforward. It is, however, required to be aware of a certain amount of specific criteria when working with the form. Values for individual fields, set of complex rules and features, allowed combinations of the fields is just the beginning of the list. These requirements are not static - they do change significantly with the change of the environment. One may find it tedious and complex to comply with all the above requirements without flows

(10)

or mistakes. The further verification is done by a group of professionally trained employees.

Manual labour makes the human-generated error unavoidable.

In principle, interest in semantic technologies has already manifested itself within the company. Some ideas have already been voiced on where and how these technologies might be applied. So far, however, no one has dared to start the implementation process. For the author it appeared to be an excellent opportunity to conduct such an experiment. An opportunity to initiate a totally new direction and a new approach to the solution of the change management problems.

The work on this master thesis is divided into three main parts. First part is dedicated to the research of the state of the art. Within this step, the Change Management area itself is to be studied and explained. It is vital to highlight the most characteristic features and define the scope for the work. Another goal of the step was to study the incremental parts of the change management process.

The second part of the work is covering the design development of the technical part of the current paper. Namely, the tool that will utilize the semantic web technologies for verification of the requests. The design part consists of the vocabulary and ontology creation, design of the verification rules and development of the application in Python programming language.

The third step is dedicated to the implementation of the designed tool. Within this stage the tool was covered by unit tests and further integrated into the current change management process. The potential impact was evaluated.

The paper itself consists of eight chapters. First chapter is called "Motivation" and is dedicated to the added value that the paper and the tool is going to create.

The second chapter is named "Background technologies". In that chapter the author is describing the technologies that will be utilized for the tool development and testing.

The third chapter of the paper is named "Problematic Background". The chapter contains definitions of the crucial terms and processes that were combined within the change management process established in the company. The chapter contains the explanation of the terms that are vital to be understood for the further work.

The fourth chapter defines the problem itself. It further elaborates on a more in-depth insight into the very nature of the problem.

The fifth chapter is dedicated to the research of the state of the art. The process of researching and forming the knowledge base about the existing solution is the area is described. Ways the similar problems were solved before are described and analyzed.

The sixth chapter, "Methods Applied", is dedicated to the second part of the practical part - the design and development of the verification tool. It is covering the process of formulation and implementation of the ontology for the given domain. It also contains the description of

(11)

the design of the set of verification rules in SWRL.

The seventh chapter, "Implementation of the tool", is continuing the description and definition of the development process. The chapter describes the required steps for the integration of the tool into the existing change management process.

The final chapter, "Evaluation", consists of the recapitulation of the work done. It manifests the potential added value of the tool utilization in production environment. Future work prospective is discussed within the scope of the chapter.

(12)

1. Motivation

Motivation behind this master thesis is constituted by the several factors. Firstly, it is the author’s personal interest and the intention to improve complex and ineffective processes within the mentioned company. Manual validation of requests for changes is drastically outdated. The process is incredibly prone to human error. Furthermore, since the process is monotonous, according to commonly accepted best practices, it might and should be automated.

The second reason that prompted the author to start working on this project is the desire to reduce the workload on the team responsible for validating the requests created by the end users. It is also necessary to give the team the opportunity to adapt the checks to the current circumstances, as required by the business process.

Last, but not least, important reason is the author’s intention to introduce the new technologies into the business logic that had not been used before. This could, somewhat, change the view on the implementation of applications to support business processes and initiate a new way of their development.

(13)

2. Background technologies

The stack of the technologies utilized in this project is defined in this chapter. The conception of the linked data is defined and briefly explained. Particular use-cases of the mentioned technologies application are discussed.

2.1 Semantic web

The World Wide Web has made it possible to create a global information space consisting of linked documents. The Internet has became an integral part of the daily life of enormous amount of people. There is a growing demand for direct access to raw data that is not currently available on the Internet or linked in hypertext documents. There is, in other words, a growing need to reuse the available information. Valid data structure is a key factor in the data reuse. The more regular and well-defined the data structure is, the easier it is for people to create tools to reliably process and reuse the data. Although most websites have a specific structure, the language in which they are created, HTML, is focused on structuring text documents, not the data itself. Because the data mixes with the surrounding text, it is difficult for software applications to retrieve the chunks of the structured data from HTML pages.

The Semantic Web was thought as the next stage in the development of the Internet.

Tim Berners-Lee wanted the Semantic Web to become as extension of the existing Inter- net paradigm (1). The idea was rational: it is necessary to connect all the resources of the network with meaningful (semantic) links. The approach was proposed on the contrary to the empty links that send the user from one page to another. It was proposed to assign a unique identifier to each online and even offline entity (object, property). To achieve the ultimate goal, the next step required the combination of these entities into a single graph.

Users would be able to quickly and accurately find the information they need, after the idea is implemented. Most importantly, the computers would have access to the semantic content of the network. The goal was to create a distributed knowledge graph connecting semantically defined data in a single network space, with the possibility of their machine processing and logical inference of new facts.

2.2 Linked data

With the development of the concept of the Semantic Web and the beginning of the implementation process, the Linked Open Data project was created. The project describes a method for publishing data on the web. The term "Linked Data" refers to a set of best practices for publishing structured data on the Internet. The authors of the Linked Data book

(14)

have defined the term as following: "Linked Data provides a publishing paradigm in which not only the documents but also the data can be the first-class citizen of the Internet. This approach enables the Internet to expand with a global data space based on an open standard - the Data Network." (2) The base principles were formulated by Tim Berners-Lee in the Related Data Design Challenges article. The four principles are(3):

• Use URIs as names for things.

• Use HTTP URIs so that people can look up those names.

• When someone looks up a URI, provide useful information.

• Include links to other URIs. so that they can discover more things.

Identification of a resource using a URI allows its unambiguous identification in the entire data space. Although the URI identifies just one source, the same source can be identified by several different URIs, which does not contradict with the principles of the Linked Data.

However, identifiers should be persistent, i.e. unchanged and permanently available. At the moment, the Semantic Web is represented by the following technologies:

• Entities on the web have unique names in the form of the Uniform Resource Identifiers (URIs). It is recommended to include the name of one of the World Wide Web protocols (HTTP or HTTPS) in the URI.

• When using a URI as the web address of a page (URL), a description of the entity is provided in one of the standard formats.

• The standard for data presentation is the RDF language.

• Data presented in RDF format is interpreted using ontologies. The languages for describing the ontology are the RDF Schema and the OWL.

The idea behind these principles is to use standards to represent and access data on the web on the one hand. On the other hand, these principles apply to the establishment of hyperlinks between data from various different sources. These hyperlinks connect all related data into a single global information space, just as hyperlinks in the classical web connect all HTML documents into a single global information space. The Linked Data is in the same relation to the spreadsheets and databases, as the hypertext document network is to word processor files.(4)

2.3 RDF

The term RDF (Resource Description Framework) is a specification that has been developed by World Wide Web Consortium for data representation (5). It allows the presentation of the assertions about the data in a machine-readable way. RDF is part of the Semantic Web concept.

The main structure of the RDF data model is a set of triples in a following format: subject- predicate-object. An RDF statement describing a directional relationship from subject to

(15)

object, the predicate describes the nature of this relationship. The statement can be presented as an RDF graph. The RDF graph has nodes and labeled directed arcs that link pairs of nodes. In the RDF graph nodes imply subjects and objects and arcs are predicates. Nodes can be Internationalized Resource Identifiers (IRIs), anonymous resources (blank nodes) or literals. Predicates (arcs) are IRIs and can define a relationship between the two graph nodes or attribute value for some subject.

To provide an example, the case from W3C RDF 1.1 Primer is used.

The example shows a person named Bob, who was born on July 1, 1990. It is also known that his friend named Alice is interested in the painting of Mona Lisa. The painting was created by Leonardo Da Vinci. Among other things, the documentary La Joconde à Washington was created about this painting. The data is visualized on the following graph:

Figure 2.1: Informal graph of the example triples (5)

Nodes capture parts of real life, but they can also define abstract concepts or references.

Oriented arcs then define their relationships. The statement of the graphAlice is a friend of Bob from the RDF point of view consists of the following parts:

• subject Alice - determines the resource

• the predicateis a friend of – determines the statement about the subject resource

• the object Bob – determines either a specific value (so-called literal) or a reference to another resource

If all sources and predicates are identified by IRI, the statement can be written, for example,

(16)

as follows:

1 < h t t p :// e x a m p l e . org / a l i c e # me >

2 < h t t p :// x m l n s . com / f o a f / 0 . 1 / knows >

3 < h t t p :// e x a m p l e . org / bob # me > .

There is a number of serialization formats for writing RDF graphs. These serializations allow different ways of writing the triples, but at the end the same triples written in a different format are logically equivalent:

• Notation3 family of RDF languages (Turtle, TriG, N-Triples and N-Quads)

• JSON-LD

• RDFa

• RDF/XML

In this master thesis the RDF/XML serialization is utilized. This serialization was chosen as it is the one better supported the particular Python library that is being used for the tool implementation.

2.4 RDFS and OWL

There are several languages that are used to create semantic models. The most important and wide-used ones are RDF/RDFS and OWL. RDFS (RDF Schema) is an extension of the RDF language, it is basically it is written in RDF (6). RDFS enable the end user to express relationships between subjects and objects, it provides standard and flexible format based on triples, thereby a dictionary is provided in form of keywords such as, for example,rdf:type, rdfs:subClassOfthat can be utilized to formulate assertions about the certain things. The RDF Schema system is based on classes and properties, and can be considered similar to the type systems of OOP (object-oriented programming) languages, such as, for example, Java. The main difference of RDF Schema comparing to OOP systems is that it describes the properties in terms of the resource classes to which these properties apply, when the object-oriented approach assumes the definition of the class in terms of the properties that the instance of this class may have.(6) As per the example given in the W3C specification of RDF Schema, the bigger role in this approach play the domain and range mechanisms.

he specification provides the example of defining the propertyeg:authorthat has a domain of eg:Documentand a range of eg:Person, whilst a standard object-oriented system would normally define for example a class eg:Book with the set of attributes of type eg:Person one of them being eg:author. Utilizinf the RDF approach instead of classic OOP system, it becomes easy for other end-users to define additional properties with a same domain of eg:Document and a range being eg:Person. These additional properties can be defined without the necessity to re-write the initial description of the classes. There is one benefit of the RDF being property-centric approach. It is that this property of RDF enables the user to naturally extend the existing description of the resources. This is one of the main principles of the Web architecture (6).

(17)

Another technique that is also widely-used to define the classes and their relationships is utilization of the ontology languages such as OWL. Being the ontology language, OWL 2 is a language for expressing richer vocabularies, also called "ontologies".

The term "ontology" has a long history both inside and outside of computer science filed. In this context it is used, however, to refer to a particular kind of computer artifact. In this text ontology considered as something resembling a computer program, an XML schema, or a web page, naturally presented as a document. An ontology is called a set of precisely defined statements describing some domain, in other words part of the world (usually referred to as the domain of interest or the subject of the ontology). Strongly precised definitions serve several goals: first of all, they prevent misunderstandings in human communication, and they help to ensure that the program behaves in a consistent and predictable way and it is able to integrate successfully well with the other software (7).

OWL language is intended to describe more complex relationships between classes and properties. The relationship between RDF/RDFS and OWL is not that easy to explain. On the one hand, OWL uses some expressions from RDF Schema standard, such as the type membership predicaterdf:type, on the other hand, it overrides such expressions as the entity of the type "class"owl:Class instead of rdfs:Class. Thereby, there is a nuance in the definition of a class for different OWL dialects. On the other hand, some of these overridden expressions are used less frequently than their predecessor counterparts: rdfs:subClassOf is used more frequently than its counterpart owl:subClassOf. In general, in practice an ontology is usually a mixture of expressions from all three standards.

There used to be three dialects at OWL:

• OWL Lite - he very simplified dialect

• OWL DL (Description Logic) - the most commonly used dialect

• OWL Full - the most extensive one

One of the differences between OWL DL and OWL Full is that OWL DL guarantees the computability of any Boolean expression, but OWL Full does not. On the other hand, OWL Full allows, for example, an entity to be both a class and an individual object at the same time, which is necessary in complex models (7). Later, these three standards essentially merged into one and formed one, the most modern and widely used now - OWL 2. It defines the so-called OWL profiles, which specify various constraints on language capabilities: OWL RL, OWL QL, etc. OWL 2 ontologies provide classes, properties, individuals, and data values, and are stored as Semantic Web documents. OWL 2 ontologies can be used together with information written in RDF, and OWL 2 ontologies themselves are mainly exchanged as RDF documents (8).

(18)

2.5 Semantic reasoner

The mechanism that actually applies the rules to the individuals in the ontology is a tool called the semantic reasoner. DBPedia gives the following definition for this term: "A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with" (9).

The schematic picture of what the mechanism looks like is presented below.

Figure 2.2: Semantic reasoner schematic work-flow

As can be seen in the picture, the reasoner, also called Semantic Web Inference Engine, takes as input the Domain Ontology and the corresponding set of rules, in the context of the current project this set of rules is written in SWRL language. The term SWRL is explained in more details in the next section. Further, with the utilization of this input rules set, the reasoner will perform the internal operation that compares the individuals, their properties and the values for these properties against the set of rules and derives the results. There are a number of different reasoners for semantic engineering. Since in scope of the current master thesis the HermiT reasoner is utilized. The HermiT reasoner is the tool that specifically designed for ontology applications. HermiT is a reasoner for OWL 2 ontologies that supports all OWL 2 DL features. Another advantage of this reasoner is that HermiT is a freely available open source tool. It is developed at the University of Oxford and Ulm University (10). The reasoner is implemented in Java and can be used via the OWL API, the Protégé OWL editor or the command line. The reasoner is also integrated into the Owlready2 Python library, which is being utilized in the current project for further development of the software.

2.6 SWRL

Before speaking about SWRL, it is important to introduce the initiative, that was a starting point for SWRL to be proposed. This initiative is called RuleML and it’s mission is following:

"To develop RuleML as the canonical language system for Web rules through schema-defined syntax, formal semantics, and efficient implementations" (11). The definition of the initiativa

(19)

is given on its official web page: "Rule Markup (Rule Markup Language), which has also become the Rule Modeling Language, is a unifying system of language families for web rules specified in part through schema languages (normatively in Relax NG) for web documents and data originally developed for XML and later converted to other formats such as is JSON"

(11).

The Semantic Web Rule Language (SWRL) is a language that was proposed for the Semantic Web that can be used to formulate rules and logicac expressions. The language combines OWL, RDF Schema languages with a subset of the Rule Markup Language, mentioned above.

The specification for SWRL was submitted in May 2004 to the World Wide Web Consortium by the National Research Council of Canada, Network Inference and Stanford University in association with the Joint US/EU ad hoc Agent Markup Language Committee. The specification was based on an earlier proposal for an OWL rules language.(12)

The rules written in SWRL are presented in the form of a logical implication between a body and a head of the rule. A body also often named as antecedent, the head is often called a consequent. The natural meaning can be defined as following: when the conditions specified in the antecedent are met, then the conditions specified in the consequent must also be met.

Both the antecedent and consequent consist of zero or more incremental parts, called atoms.

In the specification is stated that an empty body is considered as trivially true, in other words, satisfied by every interpretation, so the head must also be satisfied by every interpretation.

An empty head is always treated as false, i.e., not satisfied by any interpretation, so the antecedent must also be considered as false. (12).

Multiple atoms within one SWRL rule are treated as a logical conjunction. Atoms in these rules can be of the form A(x), P(x,y), sameAs(x,y) or subClassOf(x,y), where A is an OWL description, P is an OWL property, and x,y can be OWL individuals or OWL data values or simply variables. The SWRL rules can be written in a syntax that is basically the version of Extended BNF notation. Therefore the EBNF syntax is consistent with the OWL specification, it is verbous and quit complex to read. The advantage of the SWRL language, and one of the main benefit of it for this particular paper is, however, its a relatively informal

"human readable" form. It is used in most of published papers about semantic rules. In this syntax, a rule has the following form:

Antecedent⇒Consequent

In this form both body and head are conjunctions of atoms written as:

a₁∧...∧a_n

Variables are defined by using the standard convention of setting them with a prefix of a question mark: ?x. As an example of Using this syntax, a rule stating that the combination

(20)

of parent and sister properties implies the aunt property can be be written as follows:

parent(?x,?y)∧sister(?y,?z)−> aunt(?x,?z)

As it is visible from the example above, the rules look pretty readable and easy for human to understand. In fact, with using the presented informal syntax, any person who can formulate a logical statement is capable of writing it down in the form of a SWRL rule without any special preparation.

2.7 SPARQL

Another technology, that is being part of the stack utilized in the current project is SPARQL.

SPARQL is a query language for RDF data (13). SPARQL can be used to query from various data sources, whether the data is stored initially as an RDF data or accessible as an RDF through some middleware. RDF, as was already mentioned in the current chapter, is a data model that is given by triples: subject, predicate and object. This RDF representation of the semantic model determines the way of building queries on it, implemented in the SPARQL language. Below is given the example of the simplest SPARQL query:

1 S E L E C T * W H E R E {? A ? B ? C }

The statement SELECT * WHERE might be clear to anyone familiar with SQL - this implies the selection of all results from a set of rows (in SQL) or triplets (in SPARQL) that satisfy the selection criteria. A selection condition, in the case of SPARQL, is an expression enclosed in curly braces. It defines the pattern that the triplets must match. Values that begin with a question mark are variables. They can take on any meaning. These values are returned as the result of the query. Since all three positions in the example query are occupied by variables, all three elements of the triplet can be anything. This means that such a query returns the entire contents of the ontology. That is why one should not execute such queries on larger ontologies. Here is an examples of meaningful SPARQL query. Suppose one wants to know what type the object #Alpha is of:

1 S E L E C T * W H E R E { < h t t p :// e x a m p l e . com /# alpha > < h t t p :// www . w3 . org / 1 9 9 9 / 0 2 / 2 2 - rdf - syntax - ns # type > ? c }

For better readability, prefixes can be declared before the query. Then the example query will look like this:

1 P R E F I X r d f s : < h t t p :// www . w3 . org / 2 0 0 0 / 0 1 / rdf - schema >

2

3 S E L E C T * W H E R E { < h t t p :// e x a m p l e . com /# alpha > r d f s : t y p e ? c }

The result of the query is the values of the variable ?c - the only remaining unknown:

1 < h t t p :// www . w3 . org / 2 0 0 2 / 0 7 / owl # N a m e d I n d i v i d u a l >

2 < h t t p :// e x a m p l e . com /# C u s t o m e r >

(21)

SPARQL provides possibility to query required and optional graphic patterns, as well as their connections and disjunctions (13). The language also provides support for aggregation, creation of the subqueries, negation, extensible value testing, and constraint queries against the original RDF graph (13).

The author is giving the information about the SPARQL language among the other technologies descriptions because it will be also utilized in the scope of the current work. As was already mentioned above, SPARQL enables extensible value testing. This approach will be used for validating the ontology that is being created within this project.

(22)

3. The real domain

In this chapter the real domain area will be defined. All the incremental parts of the domain are listed and explained in this part as well.

This section describes the basic terms that are important to be defined in order to have better understanding of the real domain of the problem solved. Before defining the incremental terms within the change management process, it is reasonable to say a few words about the Change management process itself. The project is studying business processes of the company that is utilizing then ITIL recommendations. Change Management seeks to minimize the risk associated with Changes. For the direct implementation of this process, there are various kinds of tools, such as, for example, ITSM systems. Inside the company, an ITSM system called Global ServiceNow (14) is used. The system includes many tools that allow one to regulate the processes inside the company’s IT infrastructure and monitor any changes in it.

3.1 Change

Together with the description of a change management process, it is important to define a general concept of change in an IT company. In ITIL Terms an Definitions document, term

“change“ is defined as follows: “The addition, modification, or removal of approved, supported or baselined hardware, network, software, application, environment, system, desktop build, or associated documentation.”(15)

In accordance with the change management in the context of ITIL, any change in the IT infrastructure must be documented and tracked at all stages of its life cycle. First of all, before the change takes effect, all possible consequences must be analyzed. The impact and affected services and processes must be studied. All changes must be planned in time, documented and notified to everyone who may be affected by the change. In Global ServiceNow, every change is requested through a special form. Once the form is completed and submitted, the change becomes a record in the CMDB. All parties, namely, groups of employees, who in one way or another fall under the influence of this potential change, will be automatically notified by the system.

3.2 Change request lifecycle

It is vital to clearly understand how the lifecycle of a change request looks like. For the scope of this project is is important to define what exactly happens to the request at each stage of this lifecycle. More specifically, what steps are being taken by members of the change management team to regulate these requests. In order to find out, the author have repeatedly communicated with a representative of the mentioned team. Firstly, the request form was

(23)

studying together with the member of the Change Management team. Creation is the very first stage of the request lifecycle. After a potential requester opens the form to create, if he or she saves this request without changing anything, a request will be created in the Draft state in the database. Such request are not being taken into consideration. The interaction of the ServiceNow system with the developed tool begins, depending on the specific check, either at the stage when the requester fills in the required fields, or when he or she submits the request.

Then the request acquires the New status and becomes available for further processing. After the New status is acquired the request is automatically passed to the Change Management team. Manual checks are taking place and the request is either approved for implementation or returned back to the requester. The latter is the most common scenario. After the Change request is corrected the it is again passed to the Change Management team for manual checks.

It is simply to see with a naked eye that the process requires an improvement.

3.3 Change management

Since any change in IT environment has it’s prerequisites and dependencies, it is necessary to ensure that all the relevant preparation steps are complete, consequences are known and acceptable. In bigger IT companies there could be tens or even hundreds of changes required per day. It becomes extremely complicated to support the transparency and predictability of applied changes. Thus, a standardized and well-defined process is required to manage changes.

This process is also defined in ITIL Terms and Definitions paper: “Process of controlling Changes to the infrastructure or any aspect of services, in a controlled manner, enabling approved Changes with minimum disruption.”(15) Change management business processes are of crucial importance for successful operation of large enterprises. These processes should be automated to a maximum possible extent in order to eliminate human-generated errors.

In the company this study is developed for change management process established based on ITIL recommendations. However, there are areas where it needs to be adjusted for more specific needs, therefore in some aspects of the process it differs from official ITIL definitions.

For automatisation widely known system called Global ServiceNow is being currently used.

The tool provides a broad variety of services for managing the changes. The process for sub- mitting a change is quite straightforward. The user navigates to the system, fills in the form and submits it. Some validation is done automatically, using validation rules configurable for the form itself, but some of the rules are difficult to implement. Some field are not being validated automatically. It is still possible to submit an invalid Change Request form. The form is not going to be approved for implementation and will be returned to the requester.

This way, the work of significant amount of people is just wasted. The inefficiency is improved by automated checks implementation for each and every field of the Change form.

(24)

3.4 Change Freeze

At the end of business year, there is a specific period when employees all over the company are not allowed to submit any changes for. This period usually starts from second week of November and ends at January the 10th. During Change Freeze period only urgent or essential changes could be performed. Thus, the process of submission of requests for changes is undergoing corresponding adjustments. Lead time periods for all kinds of changes is length- ening, not essential changes should be re-planned to take place after the Change Freeze period ends, essential changes has to be justified. During the period, all the mentioned changes are being validated in a solely manual manner.

3.5 Business Unit

Business Unit represents a part of the enterprise. It is important to point out that there is a significant difference in the business processes for each Business Unit. Each Business Unit is working on a different business domain. Currently, there is only one Business Unit that is being studied in the scope of this diploma thesis. This Business Unit is called IT Services (ITS). ITS represents the departments that are responsible for providing various kinds of Information technology related services. The ITS provides services to all the remaining Business Units.

3.6 Control Item

Control Item represents any asset that is managed by the company. There is a list of most common categories of Control items:

• Network

• Laptop

• Server

• Endpoint

• Printer

The list is formulated based on an observation provided by the Change management team.

Every Control Item is associated with a service. The association with the service is made based on the impact. If the service will be impacted by default by any change that will be applied to the given Control Item, then the association is made.

(25)

3.7 Impacted Service

Impacted service is the service that could be affected by the requested change. It is also should be considered as impacted as soon as some asset that relates to this service is impacted by the change. This shall be done automatically. This field is one of the most important in the form, since the company is a large-scale enterprise, the same IT service is almost always used in several different processes. Changes are often related to the key services. It is necessary to closely monitor these changes in order to provide a complete understanding of what and when will happen with a particular service. This clarity is also vital since there are obvious and hidden interconnections between the services. It is close to impossible to understand all these interconnections from the first glance.

3.8 Impacted Business Unit

The company is divided into several business units, each providing a specific set of services.

Specifically, these divisions are: DGF, Express, DSC, eCom and ITS. The first four divisions specialize in providing services to external clients. The last, ITS, specializes only in providing IT services to all of the aforementioned business divisions. The IT Services division provides services only within the company. The single process is used by all Business Units for the change regulation. A single tool (ServiceNow) is being integrated for each and every Business Unit. The Business Unit becomes an Impacted Business Unit for every change request that refers to the processes within this Business Unit. In the scope of this particular project, only one Business Unit is considered so far, the ITS. Effectively, only the requests for changes that impact the ITS Business Unit are covered by this study.

3.9 Impacted country

The studied company is a large international organization. Departments that may be affected by the changes are located in a large number of countries around the globe. Same business units and services can be located in several countries at once, however, certain changes will not necessarily take place in all these countries. Therefore, it is important to specify in the form which specific country will be affected by the change. This is exactly what is reflected in the Impacted Country field. It is important to state that a single country may be divided into multiple regions. The regions do correspond to the Data Centers build by the company in the particular country.

(26)

3.10 Backout plan

There are also several fields that fall under the Backout plan category in the Change Request form. In order to make it clearer what these fields exactly are, it is vital to give a definition of the concept of Backout plan. A Backout plan is an IT governance integration approach that specifies the processes required to restore a system. The restoration is being made to the original or earlier state, in the event of failed or aborted implementation.(16) This field contains crucial data that should not be overlooked. The Backout plan can also include information regarding temporary outage of some related services. Usually, these are the the services that are critical for business processes.

3.11 Service outage

The Service outage field is just one of the fields of the Backout plan category mentioned above.

This field stores information about requirments to stop a particular service to implement the Backout plan. The value in this field can only be in the "Yes or No" format. If the value is

"Yes" then such a change is already considered more complex. Therefore, the change requires a more attentive attitude towards itself. This situation will be described in more detail in the following chapters of this work.

3.12 Build and Test tasks

Build and Test tasks is also a field category in a Request for Change form. The set of related fields stores information regarding the steps that must be performed before moving into production. Not all the changes need to include these steps. This category of fields should only be filled in for the changes that are related to the work with hardware or software, manipulation with networks or bulk updates on employees personal computers. These steps are divided into Build and Test categories. The work on the development and implementation of the change falls under the Build category. The Test category, respectively, contains the information regarding the testing of the developed change. For instance, regarding the User Acceptance Testing or the successful Post Implementation Review.

(27)

4. Problem definition

Earlier in this thesis the authors motivation to initiate the work on the current problem was described. Basic terms and known issues have been defined in the previous chapters. These fundamental terms are essential for understanding the very essence of motivation behind this project. The author, however, would like to elaborate more on the actual definition of the state of the problem. The goal of this chapter is to provide a comprehensive overview of the mentioned problem and the approach towards solving the former.

The problem itself lays in the approach towards change implementation in a large-scale enterprises. Since virtually every components of the complex IT infrastructure may influence other components it is close to impossible to avoid long chain of approvals. These approvals are collected from responsible IT professionals that are accountable of the change impact.

The other edge of the process is the manual checks of the Request for Change form fields for their compliance to the accepted criteria. These checks are executed manually. Lastly, the burden of valid request is partly associated with the requester him/herself. An individual is expected to keep in mind a set of ever-changing rules before the submission of the request.

The requester of the change is hardly able to keep up with the process of the constant evolution of the change implementation rules and criteria. It is also virtually impossible to expect the improvement from the side of the change approving individuals. It is their working responsibility to verify that every field is field correctly and valid. The problem is not getting any simpler is the domain of the automated form verification.

Any request for form field modification is submitted to be processed via a dedicated entity that is called a project. These projects are created with an approval that is required beforehand.

The whole life-cycle of the project is connected with constant approvals and verifications. As a result - the time to market for a single project is at least one calendar month. The time to market is extremely long in an environment that has to respond to swift market changes.

Last, but certainly not the least, the modification to the form s made by IT professionals whose time is highly valued. A lot of man days are wasted if another change is required to the already running project. In case of a new change requirement the whole process is required to be started from the very beginning.

The current master thesis is aiming to tackle the latter part of the change in the approval process. The ultimate goal is to allow employees a simple and easy way to introduce adjustments to the automated Request for Change form. Considering the fact that vast majority of employees might be yet unskilled to introduce changes into the source code of the form it would still be possible for them to maintain the changes via proposed technology. The process of the implementation of the changes to the checks is tailored be straightforward for an end user. One may be able to implement improvements and changes on the fly having absolutely no background knowledge of any programming language. In fact, the language of the proposed technology is so simple that it might be considered close to human language

(28)

aspiration.

Data format

This section contains the information regarding the format of the data used and the options to access the data. The image below shows how the form for initial creation of the request for changes looks like:

Figure 4.1: The Request for Change creation form

All the data regarding the submitted requests for changes is presented in a structured format. Value of each field of the change request form is stored in the relational Configuration Management Database in the corresponding table. Names of the fields are self-explanatory, data types for the values are integer, text or date.

The form in the image, however, is not complete, it demonstrates just a subset of the whole form. This picture is presented here for clarity. The form, in fact, contains much more fields, as well as several tabs. It is also important to mention that based on the values selected for some of the fields, additional fields will appear in the form. These fields are supposed to be filled only for a certain kind of change request. If one selects Patch or Update in the Change Type field in the form, the Build and Test tasks tab will appear. These field must be filled in the accordance with the requirements for changes of this type.

Values for the fields in the forms could be fetch in JSON format. It is planned to parse the acquired JSON data into ontology data format within the scope of the current project.

(29)

5. Research of the state of the art

In this chapter the author describes the research of the are of the problem. Several examples on how the similar problems were solved before are presented. Current solutions are discussed and evaluated in terms of the usability in the solution of the particular problem.

5.1 Alternative approaches for solving the problem

There are several leading directions for solving problems in the field of process automation.

The section contains briefly description of the two of the most popular approaches to the solution of these problems. The limitations of these solutions are discussed. The section contains the limitations of the two most well-known approaches: the Machine Learning and

"hard-coding. Since these two methods are most frequently used in solution of the automation problem, they are compared to the semantic approach.

5.1.1 Machine Learning

Machine Learning is a well-known method for automating things. The method is currently under continuous improvement and still being very actively researched. The SAS definition for the term "Machine Learning" is following: Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.(17)

The general principle of the machine learning process is presented in the picture below. As it is visible in the picture, the essence of machine learning is the training of the model. The model is trained of a large dataset. Usually, a set of data with already known and valid results is supplied to the model. After being trained, the model is subsequently able to display results for some particular input data. This method could be successfully used to solve the problem of business requests validation. The main and very significant drawback of this approach is in the model itself. Since the model itself is essentially a black box for and end user, the result of the work is not always valid. There are areas that are able to tolerate this disadvantage. In the case of decision making that is critical for the business requests the valid result is essential. For a large enterprise, the inaccuracy may cause unrecoverable damage to the business processes and the reputation of the company.

(30)

Figure 5.1: Machine learning work-flow 5.1.2 Hard-Coding

The other way to implement the automation of the checks is the "hard-coding". Effectively, the checks are performed in the code and are later integrated to the ITSM system as persistent scripts. This approach has the advantage of the entire mechanism being clear and totally predictable. It is a strong side of the approach compared to the "Machine Learning" technique described earlier. The approach of using the direct "hard-coding" is presented below.

Figure 5.2: "Hard-coding" work-flow

The key benefit of this approach is clearly visible in the picture above. Implemented rules will be deterministic and the output of the script is predictable. But there are also few points that are inherent for hard-coded application. These points are:

• Maintenance is implemented by the developers

• Implementation and deployment require significant time

• The logic is hidden

(31)

• The solution is dependent on the underlying data structure

Some applications are able to tolerate the points above, but it does not meet the requirements formulated for the approach utilized in the solution required in this thesis.

5.1.3 Linked data and SWRL

The next approach is the one that was chosen as the way to solve the given problem. Namely, this is the approach of the Semantic Web technologies utilization.

The Semantic Web technologies approach assumes the utilization of OWL ontology and the application of the SWRL rules to it. These technologies are described in details in the chapter

"Background Technologies". Like all the approaches mentioned in the previous sections, the current one has both positive and negative aspects as well. Firstly, the positive qualities are described.

The picture shows the general components of the approach (schematically) and the high-level workflow.

Figure 5.3: Semantic Web technologies work-flow

As it is visible in the image above, the input to the system is not a specific cases or particular data, but the specification of the problem domain itself. This way the method becomes independent from the technical data schema interpretation. The following benefit is the consequence of the previous one. The whole behaviour is controlled by subject matter experts, i.e SMEs. In the case of the described problem these are the change management team members themselves. Definitely, the rules written in this way will be deterministic and, just like in the hard-coding approach, the result of program execution is predictable and always accurate. This approach, however, has its own drawbacks. One of the main negative points is the problem in the automatic testing of such a program. One may consider the complexity of the ontology implementation an another issue with the approach. That being said, the

(32)

later drawbacks are easily overweighted by the benefits of this approach utilization.

5.2 Related projects

Before proceeding with the implementation of the planned solution on the part of the author, it was necessary to study what ready-made solutions exist for such problems. The exact solutions to the mentioned problem were not found. Several remarkable papers, however, were discovered during the work on this project. The papers are dealing with similar problems, thus, it is worth presenting the research outcome briefly within the scope of the thesis.

The author managed to find several worthy examples where attempts were made to introduce a similar stack of technologies into business logic and processes. One such example is the work called "Ontoprocess - a prototype for semantic business process verification using SWRL rules." which was written by H. Happel and Ljiljana Stojanović in 2006 at the FZI Re- search Center for Information Technologies in Germany. In that paper, the authors describe a process of implementation of the prototype for semantic business process management using SWRL. The developed prototype consists of a rule editor and a process modelling workbench.

The main goal of the paper was to provide an instrument for automated checks of the con- formance of the business processes with the business rules. The authors utilize the approach of the combination of semantically described business processes with SWRL rules by a set of shared ontologies. The technique enables the capturing of the knowledge about the business domain. These semantic specifications allow the authors to automate the verification if the process is consistent enough against the given requirements that are defined by the business rules.(18)

A two-staged architecture was assumed for the formulation of processes. The top level contains the information about the domain, defining the basic business concepts of the organization in ontologies, and business rules in the formal representation of the rules.(18) The second level is presented as the semantic process models, describing the business processes of the organization. The authors use OWL ontologies to describe the business processes and subset of SWRL rules to express the correctness of the requirements. The authors managed to model some business processes (veterenary-related, such as a procurement process for chicken) and produced as an output the domain-specific ontologies to annotate them.(18)

As benefits from the work done, the authors of the paper claim that the speed and efficiency of change management increased significantly. Previously, the process engineers have to check every process in order to be sure of its compliance in standard environments. Ontoprocess helps to highlight processes that become inconsistent in the case of rule or ontology changes.

The rules can guarantee the compliance of business processes, given that they are correctly annotated. Domain and process models can be maintained by appropriate experts, thus allowing a separation of concerns.(18)

The problem that the authors of the paper were aiming to solve is similar to the problem

(33)

that is being solved within the framework of this work. Both in the paper and in the current project, the goal is the validation of the business requirements using semantic technologies.

However, the specifics are still different. Within the framework of this work, the author solves a more incremental problem, while the authors of the mentioned paper work validate business processes as a single entity.

Another paper that was found relevant while researching the existing solutions was the "De- tecting Compliance with Business Rules in Ontology-Based Process Modeling".

The paper suggests the approach of verification of the business process compliance with the business rules using the ontology-based process model. The approach that is proposed in the paper enables specification of business rules in form of logic program expressions that are related to the external ontological business model. It also allows one to use the logical reasoning of the program to find specific elements of the process model. For example, elements that violate the rules established rules. The approach enables one to access the information that is contained directly in the business ontology. Transformations of the instances of the ontology into a logical representation for the program is not necessary. (19) As the authors claim, incorporating semantics into the scope of compliance management can help the detection of the business process disruptions.(19) This should contribute to the improvement of processes in the company. In the paper, the authors aimed to prove that the use of logic program techniques to justify business ontologies can contribute to the automation of the process of the noncompliance detection. The proposed approach allows ad-hoc access to all the rules that are related to the process model. That, as per authors statement, lowers the effort that has to be dedicated to the achievement of the compliance validation by the enterprises.(19) These two papers, despite of being indeed similar in way it tackles the problem that is being solved in this master thesis, cannot be reused for this work. They do, however, provide the real use-case. These paper prove, the approach of the utilization of the semantic web technologies can be applicable to the business logic related problems.

(34)

6. Research methodology

6.1 Choice of technologies

This section describes the process of deciding which technologies will be used to solve the given problem. In this chapter the possible alternative technologies are described. The author analyzed if the mentioned alternatives could be utilized in the solution of the similar problem.

The author explains why the problem can not be solved using simple approaches like Java or JavaScript coding. This section contains the discussion regarding what technologies from Semantic Web Technologies stack were chosen and why.

6.1.1 Semantic Web Technologies versus alternatives

In addition to semantic technologies, one may think of using classical, more widespread technologies. For solution of the given problem, one may use such well-known programming languages as Java or JavaScript. The already acquired knowledge, the volume of documentation, the size of the community and reliable support are among the positive aspects of using these technologies.

According to the statistics, the Java and JavaScript programming languages currently occupy the leading positions in the rating of popularity and prevalence in software development.

Undoubtedly, they have a number of advantages over semantic technologies.(20) However, they also have a number of disadvantages that the author would like to overcome applying Semantic Web technologies. The Java language, surely has great multiplatform support, vast amount of libraries and broad possibilities as to functional perspective. It is, however, worth noting that Java is not particularly widely used within the company within the teams dealing with the ServiceNow interface. The solution of the given problem with Java programming language was disregarded due to mentioned drawbacks.

As for the JavaScript, it is worth noting that the very interface that is used in the company to create a change request involves the integration of scripts written in this particular language.

In order to add particular functionality to the form, for the field validation or another change, one may write a script in JavaScript and add it to the form. The addition, however, requires one to have all the necessary access rights. Obviously, in order to write a properly working script in this language, one must have sufficient knowledge of both programming in general and be an expert in the features of the JavaScript language itself.

Of course, the company has a team that is directly involved in the implementation of such changes in the interface used. The particular implementation of changes in the fields validation, however, is a long and tedious process. It is connected with a lot of bureaucracy and requires a lot of approvals. A dedicated project is required to be created to implement a

(35)

change in validation process. It takes at least one month to implement a change. For minor changes such a time frame is unacceptable. Whereas the check itself is often very simple and logical, the process build around it is rather inefficient.

The main intention of this work was to simplify the work of Change management team members and increase the efficiency without loading the other groups. The proposed technology needs to be understandable enough so that it can be used by Change management team members themselves. Change Management team employees are not required to have programming skills when being employed, thus JavaScript is definitely not the technology of choice within the scope of the project.

It was also an intention to deploy some novel approaches and technologies in our business Research and Development processes. This intention led to an idea of experiencing what Semantic Web Technologies has to offer.

To achieve the goal that was formulated within the scope of this work, it was necessary to choose specific Semantic Web technologies to use. The main selection criteria were:

• breadth of use

• reliability

• maintainability

• ease of use

To build a knowledge model to describe the studied area it was considered to use either the RDF Schema or OWL. To make a decision, the specifications for both RDF Schema and OWL were studied. Based on these two specifications it was concluded that the RDFS allows the description of the groups of resources, datatype properties and the relationships between the resources (5). The OWL is designed to represent more complex knowledge about things, groups of things and relations between them. The OWL is a computational logic- based language such that knowledge expressed in OWL can be reasoned with by computer programs either to verify the consistency of that knowledge or to make implicit knowledge explicit (8). At the end, it was decided to work on OWL level, since it is more robust and has better expressive power.

For the implementation of the verification mechanism directly, the main criterion for choosing a technology was the precision and the utilization simplicity. It is important to highlight that the main benefit of the final solution should have been flexibility and the ability to timely update the validation rules in accordance with the requirements. Since the employees who would have to carry out the update procedures generally do not have programming skills, it was necessary to use a language or technology that would be as clear, readable and easy to learn as possible. The SWRL language was most suitable for these criteria. The simplicity of SWRL lays in the rules expression. The rules are written in the form of implications.

In fact, the rule has only two parts - the antecedent and the consequent. Each part can consist of one or several atoms. These atoms are the entities of the ontology and can have a completely human readable appearance. An alternative to SWRL could be, for example,

(36)

SPARQL query language. In order to effectively use its capabilities and be able to write and modify queries, it is, however, necessary to have a particular set of skills. One would need to have an understanding of the specifics of query languages, graphs, as well as further study the specific syntax of SRARQL.

It is required to integrate the chosen Semantic Web technologies in the business logic. To achieve the goal it was required to chose a programming language. The two possible can- didates for further work were the Java and Python programming languages. The corner selection criterion was the support of the Semantic Web technologies. Both of these languages offer libraries for work with these technologies. Java provides a choice from a fairly large number of solutions, for example Jena Java RDF API and toolkit. Another example is OWL API, as well as a number of libraries for developing RDF data or SPARQL queries.

As mentioned above, it was, however, necessary to find a solution offering support for SWRL requests. To integrate the main functionality, the Python programming language was chosen.

The first reason is the authors previous experience with the Python particular language. The second reason for the choice was that the Python has an indeed wide range of libraries and, thus, supported functionality.