Ontology - Hlavní práce75522

2. Background

2.1 Ontology

Ontology is a wide discipline. Its roots come from philosophy. And like everything in philosophy, the definition of 'ontology' is debatable. Different philosophical schools offer various approaches how to consider 'ontology'. The term 'ontology' appeared in the seventeenth century in Lexicon philosophicum by Rudolf Göckel and, independently, in Theatrum philosophicum by Jacob Lorhard. Officially, in the English language, the term was firstly captured by Bailey’s dictionary, which defines ontology as “an Account of being in the Abstract.” [47]

Ontology as a discipline has the goal to provide a definitive and exhaustive classification of entities in all spheres of being. It should be definitive to give a complete explanation and description for all that is going on in the universe. At the same time, the definitive classification should be undoubtedle to give an account of what makes true all truths.

To serve these cases, classification should be exhaustive including also all types of entities and all types of relationships between them.

In the twenty-first century 'ontology' has gained currency in the field of computer science. The new computational ontology is considered from a certain perspective, which makes it different from the philosophical point of view, where ontology is considered in general. The necessity of computational ontology mainly comes from the Tower of Babel problem. Different systems, databases, frameworks may use identical labels but with different meanings, alternatively, different terms and concepts may refer to the same meaning. The scale of the problem is growing in the way of consolidating information. In the beginning, it was possible to solve incompatibilities on a case-by-case basis. [46]

In 1997 the definition of ontology was defined by Borst as " formal specification of a shared conceptualization". The conceptualization should have a shared view and should be expressed in some formal representation. These two statements enable a machine to process ontology. [47]

20 2.2 Semantic web

The informal definition of Semantic Web can be found in the May 2001 Scientific American article "The Semantic Web" (Berners-Lee et al.), that says "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation."

The Semantic web is the extension of the World Wide Web (WWW), which is based on standards set by the World Wide Web Consortium (W3C). The most important part is the interlinking of data and machine readability. The concept of linked data allows the distribution of information about one single entity over the Web. This is achieved by the usage of vocabularies that define the semantics of the properties. The difference between WWW and SW lays into the fact that WWW is a web of linked documents using Uniform Resource Locators (URLs) and SW is a web of linked data, data are pointing to other data using URLs.

Figure 1. SW technologies and standards [48]

The basic features or fundamental concepts of the Semantic Web according to D.

Allemang and J. Hendler are [49] :

• the AAA principle: "anyone can say anything about any topic"; this is the main slogan of both the World Wide Web and Semantic Web.

• Open world assumption: the principle considers the web as an open world, where at any time information could come to the light. However, the

21 assumption, that the available information is full, is not correct. There is always more information that has been known before.

• Non-unique naming is connected to the first principle. Since "anyone can say anything" on the web, the naming of entities is not coordinated.

• Network effect: this property supports the organic growth of the web. More people bring a higher value of participation.

2.2.1 OWL

OWL or Web Ontology Language is the ontology language of the Semantic web and part of the W3C's Semantic Web technology stack. It is a computational logic-based language that is designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is intended to be used by applications that need to process the content of information instead of just presenting information to humans. The first version of OWL was developed by the W3C OWL Working Group and published in 2004. In 2009 W3C OWL Working Group created the second version of OWL 2, which was published in 2009.

The Second Edition of OWL 2 was issued in 2012. [50]

OWL is a declarative language, not a programming language. It describes a state in a logical way. "A terminology, providing a vocabulary together with such interrelation information constitutes an essential part of a typical OWL 2 document. Besides this terminological knowledge, an ontology might also contain so called assertional knowledge that deals with concrete objects of the considered domain rather than general notions." [50]

2.2.2 RDF

RDF is a Resource Description Framework that is a part of the W3C's Semantic Web technology stack. The RDF1.1 was published in 2014. It comes up with a simple language for representing annotations about Web resources identified by URIs. URIs and the two ends of the link can be used to name the relationships between resources. Using the RDF model, structured and semi-structured data can be mixed, exposed, and shared across systems and applications.

For linking the structure of the Web, RDF uses "triples". A triple is a set of 3 entities in the form of a "subject-predicate-object". This format allows representation in a machine-readable way.

22 The easiest possible mental model for RDF is the graph view, where the graph nodes are represented by resources and the graph edges are links between two resources. The graph view as a visual presentation of the data is usually easy to understand.

2.2.3 RDFS

RDFS or Resource Description Framework Schema provides a mechanism for describing RDF resources and relations between them. It is an extension of the basic RDF vocabulary. The namespace for RDFS is identified by the IRI - http://www.w3.org/2000/01/rdf-schema#, which is associated with the prefix rdfs: to refer to that namespace. [51]

The RDF Shema has the class and property system which describes properties in terms of class to which they apply. The mechanism of domain and range allows determining characteristics of other recourses. "For example, we could define the eg:author property to have a domain of eg:Document and a range of eg:Person" [51]

The benefit of RDF Schema is that it allows at any time to extend the list of properties of existing resources without massive effort. It keeps descriptions of the resources up to date.

2.2.4 SKOS

SKOS or Simple Knowledge Organization System is a part of W3C Semantic Web technology, that is built upon RDF and RDFS. SKOS is a standard to represent controlled vocabularies, taxonomies, and thesauri. The main goal of SKOS is to make publication easier and enable data for linking and reusing. It has a wide range of knowledge sources. SKOS concepts can be related and linked to other concepts, which allows a cost-efficient development.

The main elements of SKOS are concepts, labels and notation, documentation, semantic relations, collections, and mapping properties. Since SKOS adopts a concept-based approach, the relationships are expressed between concepts, and the concepts are associated with the lexical labels. SKOS concept schemes are not formal ontologies in the way OWL ontologies are.

23 2.2.5 SPARQL

SPARQL Protocol and RDF Query Language is an RDF query language. SPARQL first appeared in 2008 and was acknowledged by W3C Semantic Web on 15 January 2008 as an official recommendation.

SPARQL provides functionality to display and manipulate data stored in the RDF format. The entire database should be as a set of "subject-predicate-object" triples. Triple patterns are like RDF triples except that they also may be a variable. SPARQL supports aggregation, subqueries, negation, creating values by expressions, extensible value testing.

Additionally, it provides capabilities for querying graph patterns with their conjunctions and disjunctions. The results of queries are the sets or RDF graphs. [52]

2.3 Linked data

The World Wide Web Consortium provides a set of practices for publishing structured data, which is called Linked data. The relationships between entities from different data sets make from the collection of isolated datasets the Semantic Web. Linked Data is the large-scale integration of data on the Web. Basically, it is the heart of the Semantic Web. Internationalized Resource Identifiers (IRIs) are used to identify entities for the structuring data. IRI can identify only one entity, which can be any entity, that is why IRIs are universal. [53]

Linked data is based on semantic web technologies. The structured and linked data becomes beneficial with the usage of semantic queries. If the structure of the data is regular and well-defined, it is easier for tools to reuse data.

Since the language of websites is HTML, the orientation is towards structuring documents rather than data. The structure of HTML makes the extraction of data complicated.

To face this complication a variety of microformats have been introduced. The weak point of microformats is they only provide a small set of attributes; it is often not possible to express relationships between entities. [54]

Web APIs is another way of making structured data available on the web, it provides simple query access over the HTTP protocol. This way is more generic than microformats.

However, it requires significant effort to integrate data into the application.

In Linked Data issues related to microformats or Web APIs are resolved by the Resource Description Framework (RDF) language. RDF provides a flexible way to describe

24 things in the world with the bricks of RDF datasets, which are called triples. The triple consists of a subject, a predicate, and an object. This structure gives RDF flexibility that microformats miss.

2.3.1 URI

URI is an identifier type that is defined by RFC 3986. It is created to be a simple and extensible identifier. The specifications of URI syntax are written in RFC 3986. It does not specify which resource can be behind the identifier and does not provide information on how it can be addressed. The properties follow from the specifications of the protocols. The resource can be an electronic document, a physical object, or a service. The most important is to distinguish a resource from other resources.

Example: http://www.w3.org/albert/bertram/marie-claude

A limitation of this identifier type is the ASCII character set, which allows only the English alphabet character to be used. [55]

2.3.2 IRI

IRI (Internationalized Resource Identifier) is defined in RFC 3987. Unlike URI, it allows the usage of a UTF-8 encoding character, which allows IRI to include, for example, Czech alphabet characters and to incorporate different words of natural languages. This makes identifiers easier to create, process, understand, memorize, and so on. The IRI specification is compatible with the older specification of URI. It is a complement to URIs. For HTTP, same as URI, IRI Unicode characters are encoded using percentage encoding. IRI is used in the RDF to publish linked open data. [56]

absolute-IRI = scheme ":" ihier-part [ "?" iquery ]

relative-IRI = ihier-part [ "?" iquery ] [ "#" ifragment ]

2.3.3 List of prefixes

A prefix is a standard mechanism of shortening URIs in some RDF serializations, like for example Turtle. Prefixes are beneficial for better understanding, for manual creation and modification, and analysis of the RDF data. They are a convention, so prefixes can be chosen freely. However, several common prefixes exist and are used worldwide. Prefixes that have been used in this thesis are listed below.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

25 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX owl: <http://www.w3.org/2002/07/owl#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX dc: <http://purl.org/dc/terms/>

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

PREFIX wdt: <http://www.wikidata.org/prop/direct/>

2.4 Linked data principles

To allow a machine or a person to explore the semantic web and to make it grow, the semantic web technologies is guided by four principles, which are presented in Berners-Lee 2009 [57]:

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL).

4. Include links to other URIs, so that they can discover more things.

These principles are not the strict rules, but the expected behavior and standardization alignment. Following them makes the data interconnected. However, breaking the principles does not destroy the semantic web, but reduces the efficiency of the functionality.

The first principle states that resources must be identified by URIs. If the URIs are not used, that part is not a semantic web.

The second principle is based on the fact that HTTP name look-up is a powerful set of standards. Since HTTP URIs are names and not addresses, people tend to invent new URI schemas to have them under separate control. The possibility to look up those names is a convenient way to manage data.

The third principal benefits to the information that can be obtained from the RDF, RDFS, and OWL ontologies including the relationships between the properties and classes in ontology.

26 The fourth principle helps to connect data into one unbounded web, which will enable to have as complete information as possible. The value of the information within the web page is enriched by the subset of what it links to. [57]

2.5 Benefits of linked data

The main benefit of linked data is easy and efficient data integration and browsing through complex data. It removes the walls between different formats and different sources.

Due to that, the update of data becomes easier. Links provide the knowledge extension out of existing facts. The Linked Data advantage is that it is sharable, extensible, and easily re-usable.

That is why the linked data approach scientifically overcomes current practices and solutions for creating and delivering library data. [58]

The usage of Web-friendly identifiers URIs, which is supported by the Linked data standards, sustain the multilingual functionality for the data. Identifiers allow multiple descriptions referring to the same resource. Since resources can be described by different libraries, organizations, and individuals, anyone can contribute to the expertise of the source.

This complex approach increases the value of the data far beyond each contribution taken individually. Global unique identifiers allow easier citation and easier access to the metadata description. Libraries and memory institutions provide trusted data of long-term cultural importance.

Additionally, linked data aims to eliminate the redundancy of bibliographic descriptions on the web. Clear identifiers and links reduce the number of different names for one entity.

It is not obvious how useful is the output of Linked data when it is implemented.

However, the capabilities for using data will be improved when the structure of data becomes richer. Navigation across library and searching will become more advanced. The global information graph from the dataset will also become more generous.

The benefit to researchers lays within the re-use of the library data. The advantage is that linked data technology is rather enhancing the Web and not rebuilding or creating a new one. By simply copying and pasting URIs a researcher can make a citation. The citation management integrates library data into research documents. As a result, from the research documents, it is possible to find additional data and information just by following links among

27 multiple domain-specific knowledge bases. Moreover, it makes it easier for other researchers to reuse the data or to replicate the experiments.

For organizations linked data reduce infrastructure costs. It may be the first step toward the cost-effective approach to complex managing of cultural information. For small projects or organizations, access to larger data may increase their presence on the web. The internal data curation and publishing processes can also be improved by linked data usage.

Linked data creates an open global pool of shared data, which makes a direct impact on librarians and archivists. The redundant effort can be reduced as resources can be used and re-used on large scales. The identifiers make updates easier, and resources can be kept up to date with fewer expenses. The focus should shift from re-creating existing descriptions, that have been created by others, to extending domains of local expertise.

For developers, linked data offers a lot of opportunities to provide support to information technology. Developers will not have to use custom software tools for library-specific data formats. By leveraging RDF and the well-known standard protocol HTTP data gets placed in the generically understandable format. [58]

3. Full and Partial Identity

To analyze quasi-equivalent concepts, it is important to understand the nature of equality. This work is built on the assumption of correspondence identity and equivalence, respective quasi-identity, and quasi-equivalence. That is why the beginning of this part is dedicated to the concept of identity and its' philosophical roots.

3.1 The Concept of Identity

The question of equivalence or quasi-equivalence has its roots deep inside the concept of identity. Over years "identity" has had multiple interpretations and there is still no standard definition of this term. Since this concept is mainly coming from philosophy, almost every philosopher was trying to get closer to the essence of the "identity".

Plato was the first one to differentiate the verb "to be", to set "is" as an identifier. He separated "is" as a copula from "is" as identity, which helps to distinguish one object from another one. [20].

On the other side, Aristotle defines the numeric meaning of identity, which is called the Indiscernibility of Sames:

If x and y are the same in number, then every attribute of the one is an attribute of the other. There is another statement related to it - the Indiscernibility of Identicals, which is often called "Leibniz's Law," or "one-half of Leibniz's Law,". However, Kenneth T. Barnes in the book

"Aristotle on identity and its problems" declares the Indiscernibility of Identicals the principle of Aristotle.

If x and y are identical, then every attribute of the one is an attribute of the other.

[21]

The study of Aristotle's views by Nicholas P. White gives its formulation of the principle of the Indiscernibility of Identicals:

If A and B are identical, then whatever is true of the one is true of the other.

[21]

However, there are some questions regarding ownership of the Indiscernibility of Identicals, there is no doubt that principles of identity occupied the central place in Leibniz's work and his philosophy. Leibniz stated that there cannot be two absolutely similar things,

29 because there cannot be things that have absolutely equal properties. His idea was developed in the concept of substance. [22]

In the article "Identity, Indiscernibility, and Philosophical Claims" the version of Leibniz's principles is presented by the below formula. Where a and b mean individuals and F is a variable that ranges properties of the individuals a and b. [24]

In document Hlavní práce75522_qnesa01.pdf, 1.1 MB Stáhnout (Stránka 19-0)