• Nebyly nalezeny žádné výsledky

Hlavní práce75522_qnesa01.pdf, 1.1 MB Stáhnout

N/A
N/A
Protected

Academic year: 2022

Podíl "Hlavní práce75522_qnesa01.pdf, 1.1 MB Stáhnout"

Copied!
96
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

Prague University of Economics and Business

Faculty of Informatics and Statistics

Department of Information and Knowledge Engineering

Management of Quasi-Equivalent Concepts in Ontologies

Author: Anna Nesterova

Supervisor: prof. Ing. Vojtěch Svátek, Dr.

Prague, December 2021

(2)

Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet requirements for an award at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

Date ...

Anna Nesterova Prohlášení

Prohlašuji, že jsem diplomovou práci „Management of Quasi-Equivalent Concepts in Ontologies “vypracovala samostatně za použití v práci uvedených pramenů a literatury.

Datum ...

Anna Nesterova

(3)

Acknowledgement

I hereby wish to express my sincere gratitude and appreciation to my thesis supervisor, prof.

Ing. Vojtěch Svátek, Dr.. for helping and navigating me, for the enthusiastic encouragement and useful critiques of this research work.

Thanks to my whole family, that supports me every day. I would like to thank my parents;

whose love and guidance are with me in whatever I pursue. They are the ultimate role models.

I wish to thank my loving and supportive Esteban Jenkins for the encouragement throughout my study and life.

(4)

Abstrakt

V rámci diplomové práce měl autor za cíl analyzovat relevantní témata a příslušné metody, aby získal přehled o spojování a sémantickém párování, které lze použít v případě kvazi-ekvivalentních konceptů. Analýza zahrnuje současný stav existujících ontologií a konkrétní případy rozhodnutí v rámci kvazi-ekvivalentních konceptů. Cílem tohoto projektu diplomové práce je formulovat předběžná pravidla pro rozhodování v případě kvazi- ekvivalentních konceptů. Dalším cílem bylo získat zpětnou vazbu od inženýrů ontologie o výzvě kvazi-ekvivalentních konceptů a případném řešení. Výsledek zpětné vazby byl analyzován a následně byl použít jako podpora rozhodovacího procesu.

Klíčová slova

Semantický web, Ontologie, Linked Data, LOV, Kvazi-ekvivalence

Abstract

Within the current master thesis, the author intended to analyze relevant topics and relevant methods to get an overview of the matching techniques that can be applied in the case of quasi-equivalent concepts. Analysis includes the current state of existing ontologies and concrete cases of the quasi-equivalent concept merging decision problems. The aim of this thesis project is to formulate tentative guidelines for making decisions in the case of quasi- equivalent concepts. Another goal was to obtain the feedback from ontology engineers on the challenge of quasi-equivalent concepts and how the decision can be handled. The outcome of the feedback has been analyzed to support the decision process.

Keywords

Semantic web, Ontology, Linked Data, LOV, Quasi-equivalence

(5)

Contents

List of abbreviations ... 7

Introduction ... 8

Goals and Methods ... 8

The structure of the thesis ... 9

1. Related work ... 11

2. Background... 19

2.1 Ontology ... 19

2.2 Semantic web ... 20

2.2.1 OWL ... 21

2.2.2 RDF ... 21

2.2.3 RDFS ... 22

2.2.4 SKOS ... 22

2.2.5 SPARQL ... 23

2.3 Linked data ... 23

2.3.1 URI ... 24

2.3.2 IRI... 24

2.3.3 List of prefixes ... 24

2.4 Linked data principles ... 25

2.5 Benefits of linked data ... 26

3. Full and Partial Identity ... 28

3.1 The Concept of Identity ... 28

3.2 Identity Problems ... 29

3.2.1 Philosophical Problems ... 29

3.2.2 Practical Problems ... 30

3.5 Contextual Identity ... 31

4. Ontology Matching ... 33

4.1 Matching techniques ... 33

4.1.1 String-based techniques ... 33

4.1.2 Language-based techniques ... 34

4.1.3 Constraint-Based Techniques ... 34

4.1.4 Informal resource-based techniques ... 34

4.1.5 Formal resource-based techniques ... 35

4.1.6 Graph-based techniques ... 35

4.1.7 Taxonomy-based techniques ... 35

4.1.8 Instance-based techniques... 35

(6)

4.1.9 Model-based techniques ... 35

4.2 Weak-Identity and Similarity Predicates ... 37

5. The Linked Open Vocabularies ... 40

5.1 owl:SameAs ... 43

5.2 skos:exactMatch ... 44

5.3 skos:closeMatch ... 45

5.4 owl:equivalentClass ... 46

5.5 rdfs:seeAlso ... 48

6. Example from Academic domain ... 50

7. Questionnaire ... 54

7.1 Response 1. ... 58

7.2 Response 2. ... 59

7.3 Response 3. ... 61

8. Approach for the management of quasi-equivalent concepts... 64

8.1 The decision process ... 64

8.2 Test of the decision process ... 68

8.2.1 owl:sameAs ... 68

8.2.2 skos:exactMatch ... 72

8.2.3 owl:equivalentClass ... 73

8.2.4 skos:closeMatch... 76

8.3 Discussion ... 78

Conclusion ... 80

References ... 81

Appendix A. LOV Triplets ... 87

Appendix B. DBpedia 'Professor' [66] ... 90

Appendix C. Questionnaire 'Quasi-equivalent concepts' ... 93

context identity, ontology merging and alignment, and relevant methods of knowledge acquisition; Case-based reasoning (?)

(7)

List of abbreviations

ASCII American Standard Code for Information Interchange

FOAF Friend of a Friend

HTML Hypertext Markup Language

HTTP Hypertext Transfer Protocol

IRI Internationalized Resource Identifier

JSON JavaScript Object Notation

KB Knowledge bases

KG Knowledge graphs

LD Linked Data

LOD Linked Open Data

LOV Linked Open Vocabulary

OD Open Data

OWL Web Ontology Language

PDF Portable Document Format

RDF Resource Description Framework

RDFS RDF Schema

SKOS Simple Knowledge Organization System

SPARQL SPARQL query language for RDF

UI User interface

URI Uniform Resource Identifier

URL Uniform Resource Locator

W3C World Wide Web Consortium

WWW World Wide Web

(8)

8

Introduction

Nowadays, an enormous amount of data is created in almost every field and every second. Ontologies are not an exception. There is a large amount of data available on the web in the form of knowledge charts or as part of linked data. [6] Hence, data matching and data interlinking play a crucial role to keep ontology within a reasonable limit for efficient usability.

Since data tend to be large, automatically detecting connections in-between is an important and challenging task. As a result, on the one hand, if concepts that exhibit too many differences are merged, there is the risk to generate unreliable inferences or the probability to resign on an axiomatization of the merged concept. On the other hand, if concepts that are very close to each other are kept as separated, as soon as any single difference is identified in their characteristics, the ontology will grow above reasonable limits and will become unmanageable.

The thesis hypothesis is a significant part of the difficulties appears due to the often- unprincipled approach to merge concepts (from independently developed underlying conceptualizations) that are very close to each other in their meaning but not identical. These concepts are called “quasi-equivalent”. The usage of "quasi-equivalent" relations depends on the granularity of the respective ontology and its goal. For the detailed ontology, it is valuable to define quasi-equivalent relations between concepts as otherwise important connections or descriptions might be lost. For the general ontology, quasi-equivalent relations cannot be that crucial and might be skipped from the equation.

Goals and Methods

The thesis is focusing on studying and analyzing relevant topics and relevant methods to get an overview of the matching techniques that can be applied in the case of quasi- equivalent concepts. The essence of this thesis project, providing the initial road mapping for the envisaged research, would be to:

• Get acquainted with the basic literature on relevant topics, especially ontology merging and alignment, and the relevant methods of knowledge acquisition.

• Analyze the current state of existing ontologies and describe concrete cases of the quasi-equivalent concept merging decision problems.

• Get feedback from ontology engineers on the challenge of quasi-equivalent concepts and how the decision can be handled. Based on the outcome of the feedback provide an analysis of responses.

(9)

9

• Formulate tentative guidelines (including references to possible supporting techniques that could be used, e.g., knowledge elicitation process or text mining tools) for making decisions in such situations.

The structure of the thesis

The first section of the thesis gives an overview of related work regarding equivalence, quasi-equivalence, and relevant topics. It describes various techniques for ontology matching and alignment.

The second section introduces the concepts that are needed to understand the background of the current research. Ontology, Semantic web and its principles, the W3C's Semantic Web technology stack (OWL, RDF, RDFS, SKOS, SPARQL) are presented at the beginning of the section. Then the section describes the practices of Linked Data, Linked Data principles, and benefits of Linked Data. In the third section, the theoretical definition of

"identity" and the evolution of the "identity" question over time is provided.

The fourth part provides an overview of the matching techniques. Then the fifth part focuses on the analysis of existing ontologies and describes concrete cases of the quasi- equivalent concept, merging decision problems. The analysis of existing ontologies has been done with the usage of the Linked Open Vocabularies (LOV). For the analysis, the set of common predicates for representing identity or similarity is chosen. It consists of predicates:

owl:sameAs, skos:exactMatch, skos:closeMatch, rdfs:seeAlso, owl:equivalentClass. Using the LOV SPARQL end point access triples are pulled separately using the simple SPARQL query.

The set of most relevant triples are analyzed.

Then, the example of quasi-equivalent concepts of the term "Professor" in English and

"Profesor" in Czech are compared in the sixth part of the thesis.

The seventh part is dedicated to the questionnaire and feedback. At the beginning of the part, the description of the questionnaire is provided. Then the responses of each participant are given and analyzed.

In the end, based on the LOV analysis and responses of the questionnaire the approach for the management of quasi-equivalent concepts is formulated with the decision processes.

The test of the decision process is done based on the quasi-equivalent pairs from LOV analysis for each predicate: owl:sameAs, skos:exactMatch, owl:equivalentClass, skos:closeMatch.

(10)

10 The conclusion part summarizes the outcome of the feedback, provides an analysis of responses and the decision process for quasi-equivalent concepts.

(11)

11

1. Related work

Since quasi-equivalent concepts are closely related to the equivalence and similarity itself, it is reasonable to include these matters into consideration to have a full overview of the topic.

Mainly in ontology specifications and catalogs, it is possible to find topics related to the equivalent concepts rather than quasi-equivalent concepts. For example, in the catalog of Leigh Dodds and Ian Davis "Linked Data Patterns: A pattern catalog for modeling, publishing, and consuming Linked Data" there is a question regarding the identification of different data, that refers to the same resource or concept. [7]

The situation is widespread since data is published in a decentralized way. It can easily happen that multiple people publish data about the same resource. In this case, it is worth establishing links between datasets to recognize that data refer to the same source. The authors of the catalog "Linked Data Patterns" offer to use relation owl:sameAs or skos:exactMatch to indicate that data are equivalent.[7]

According to the OWL standards "an owl:sameAs statement indicates that two URI references refer to the same thing: the individuals have the same "identity"."[9] This means that all properties of one URI and statements made about that URI is also true for another one.

Jérôme Euzenat and Pavel Shvaiko in the book "Ontology Matching" also mention the construct owl:sameAs as a solution for establishing links between data. They present a practice that consists of identifying entities that represent the same resource and linking them using owl:sameAs. Additionally, without reference to any ontology language set of relations between classes can be used: equivalence (=); disjointness (⊥); less general (≤). With the usage of OWL, these relations are represented by owl:equivalentClass, owl:disjointWith, or rdfs:subClassOf. [4]

The description logic offers manipulation with concept's relations based on the equivalence and incompatibility, which is widely used by the Web ontological language. Marie Duzi demonstrates the definition of equivalence, incompatibility: "Concepts C1, C2 are equivalent, if they have exactly the same extent (construct the same entity). Concepts C1, C2 are incompatible, if in no state of affairs w, t the extent of C1 is a part of the extent of C2, and

(12)

12 vice versa." Furthermore, the concepts C1 and C2 are equivalent only if C1 ≥ C2 and C2 ≤ C1.

[16]

To illustrate this definition the author uses the example of propositions about the President of the USA and Iraq attack:

It is not necessary that if the President of the USA is a republican then he attacks Iraq It is possible that the President of the USA is a republican and he does not attack Iraq

[16]

Additional research about equivalent concepts was done by Rebecca Green to check the frequency of equivalence occurrence. The summary of the research shows that at the basic level equivalence occurs significantly more often than at subordinate and superordinate levels.

The concepts at the basic level such as apple, shoe, chair have a higher probability to match across schemes than concepts at the superordinate level such as fruit, footwear, furniture or at the subordinate level such as Granny Smith, sneaker, recliner. [13]

Another question that Leigh Dodds and Ian Davis present in their catalog is about merging data for resources that may not be consistently identified. This question is closely related to the previous one as again the root cause is decentralization. When different publishers use different identifiers for the same recourse, sometimes can happen that there is a direct equivalence link that exists and sometimes there can be common properties based on what equivalence might be considered. [7]

The solution offered by the authors is to use smushing to create a modified RDF graph with all properties of the equivalent resources. The technique consists of multiple steps. First, it is necessary to decide the target recourse, that will hold the final description. Then it is needed to find all equivalence links, which can be owl:sameAs statements or property values( like, for example, Inverse Functional Properties). After that, for each chosen subject make a new statement using the target resource as the subject, predicate and object stay the same.

Afterward, for each chosen object make a new statement using the target resource as the object, subject and predicate stay the same. [7]

In the end, all properties of the chosen equivalent resources are assigned to the target resource, which is chosen in the first step. By applying smushing the data can be normalized into a consistent set and all references will apply to the equivalent resources and to the target resource respectively.

(13)

13 The authors present the below example of smushing. That example includes two resources, which are linked by owl:sameAs statement. [7]

<http://example.com/product/6>

rdfs:label "Camera";

owl:sameAs <http://example.org/cameras/10>.

<http://example.org/cameras/10>

ex:manufacturer <http://example.org/company/5> .

<http://example.org/company/5>

ex:manufactured <http://example.org/cameras/10>.

After applying smushing and removing statements about the equivalent resources, the following graph is available.

<http://example.com/product/6>

rdfs:label "Camera";

owl:sameAs <http://example.org/cameras/10>;

ex:manufacturer <http://example.org/company/5>.

<http://example.org/company/5>.

ex:manufactured <http://example.com/product/6>.

In the case of using an OWL reasoner, smushing can be applied automatically based on the available data. However, algorithms may deviate in approaches to how the target source is nominated. Another difference is if equivalent resources are getting removed to keep only normalized versions or if they remain.

Additionally, the process of smushing leaves space for identifying equivalent resources.

Statement owl:sameAs is an evident approach. Then Inverse Functional Properties is another property that is suggested by the authors for usage. Besides, different applications can also apply their own rules to get equivalent resources. Local customization of the rules is allowed, even if it brings additional risk of failures. [7]

Grigoris Antoniou and Frank van Harmelen in their book "A semantic web primer" for identifying equivalent resources offer to use owl:sameIndividualAs, which is a synonym for owl:sameAs. Another option is class equivalence indicated by owl:EquivalentClass. The analog for properties is owl:EquivalentProperty. Contradictions are also useful to give additional links for connections or they can be used in combinations with equivalence links.

(14)

14 For example, when x is sameAs y and y is differentFrom z, it is possible to say that x is differentFrom z too. [3]

In the book, the authors show an example of matching foaf:Person instances based on the name and inverse functional properties. The authors state that if two persons have the same email, the conclusion will be that those persons are the same. [3] This can be true, but in some cases, for example, person A and person B can be from the same family and they use one family email. From this point of view, the statement that person A and person B are the same is initially wrong. Since they are from the same family, some properties are the same, but some are different. Hence, the concept of quasi-equivalence would be beneficial considering that.

Concepts, entities, events undergo constant changes, which makes them complex composite phenomena. That is why it is difficult or sometimes even impossible to obtain complete knowledge. The mental image of one person can be different from the mental image of another person, and also different from the truth. For example, the writer assumes one set of properties for the event or entity he is describing. On the other hand, the reader may assume a different set of properties. That difference may have crucial consequences for the text analysis and for the semantic meaning. Based on that, the result of the coreference among entities or events can be fundamentally mismatching.

In the project "Events are Not Simple: Identity, Non-Identity, and Quasi-Identity" the authors have identified three levels of event identity (full, partial, and none), to build corpora containing coreference links between events. The authors introduce the idea that there are three degrees of event identity: fully identical, quasi-identical, and fully independent (not identical).

The degree of identity depends on the level of the coreference. Absolutely independent events are unique. Partial coreference indicates quasi-identity and full coreference - full identity. [1]

Quasi-identity applied to entities is complex. However, considering not only entities but also events, it becomes even more complex. Since quasi-equivalence is the focus of the paper, it is worth understanding concepts of partial coreference and quasi-identity in detail. The authors indicate membership and subevent as two core types of quasi-identity. The difference between the two is based on some particulars, such as time, locations, and participants.

The membership obtains when there are two instances of the same event, which differ in time, locations, and participants. One instance (A) is a set of the same kind of events and another instance (B) is one or more of them, but not all of them. It is possible to say that B is a

(15)

15 member of A. The authors give the example “I attended three parties (E1) last month. The first one (E2) was the best”, E2 is a member of E1.

On the other hand, the subevent obtains when there are two instances of two different events that have the same time, locations, and participants. One instance (A) is a complex sequence of activities and another instance (B) is one of them, which occurs at the same time and location by the same agent as A. It is possible to say that B is a subevent of A. For example, when I went to the restaurant last time (E1), I had to pay the waiter (E2). E2 is a subevent of E1. [1]

Given that it is tricky to differentiate between membership and subevent authors provided key information for several aspects:

Time. For both the membership and the subevent, A and B should be events, and both should refer to the same discourse element. If the time of occurrence of B is contained in the time of occurrence of A, B is a subevent of A. If not and A is a set of events, B is a member of A.

Space/location. For both the membership and the subevent, A and B should be events, and both should refer to the same discourse element. If the location of B is contained in, or overlaps with, or abuts the location of A, B is a subevent of A. If not and A is a set of events, B is a member of A.

Event participants. If A and B are events, and both refer to the same discourse element, but the overall cast of participants is different, B is a member of A.

If there is a doubt about whether two instances are identical or quasi-identical, they should be treated as quasi-identical. As an example, “he had a heart attack” / “he died” - are not identical. However, “he had a fatal heart attack” / “he died from a heart attack” are identical.

In the first case, the heart attack does not imply that he dies. In the second case, there is information that the heart attack was fatal.

The authors also mention non-semantic differences in the case, when two instances refer to the same discourse element. For example, in one instance there is included evaluation or opinion and another one is neutral. They are treated as identical. [1]

There is another article, which shares the main idea of "Events are Not Simple: Identity, Non-Identity, and Quasi-Identity". The name of that article is "Identity, non-identity, and near-

(16)

16 identity: Addressing the complexity of coreference". The authors declare "that coreference is best handled when identity is treated as a continuum, ranging from full identity to non-identity, with room for near-identity relations to explain currently problematic cases." As it is clear from the title instead of quasi-identity the authors use term near-identity.

This middle state is required for a better description of the real world as it is not efficient to separate expressions only into two groups: identical and independent. Same as there is the grey color between white and black, there should be near-identical expressions for the full picture. Near-identity represents situations when two references target 'almost' the same thing. For example, Postville and the old Postville. There is presented a large number of real- life examples in terms of mental space theory to show different degrees of near-identity. [15]

The goal of the article was to provide the framework that can explain under which circumstances different expressions can be interpreted as near-identical. For that purpose, the authors define the relation between linguistics expressions that refer to the entities which have the same granularity level applicable to both the linguistic and the pragmatic context as a scalar relation. The key issues, that the framework is dealing with are the conceptual categorization, individuation, criteria of identity, and the discourse model construct.

The approach to identify near-identity is based on two categorizations: the level of granularity and the level of identity. These categorizations depend on whether there is or there is no complete value replacement. "The degree of near-identity was modelled as a function of the operation involved, and the number of shared features." [15]

The authors provided operations of categorizations on a wide range of examples on real data, which are difficult to classify as identical or non-identical. As a result, the set of features was found that typically lead to near-identity relations. The set includes time, location, role, set members, etc.

The framework presented by the authors of the article benefits interpretation of the real data, which makes it profitable in the multiple branches of linguistics. However, as a conclusion, the authors state that there are no absolute and universe rules, and the presented model should be considered as combinations of "directions and tendencies". [15]

As it is stated in the book "Ontology Matching" data interlinking and ontology matching can play in couple and can be the complement of one to another. They can be also used for

(17)

17 reinforcing each other. An overview of data interlinking and link keys will be valuable for the topic of quasi-equivalent concepts. [4]

The link keys are "sets of properties from both ontologies which, for a pair of classes, identify pairs of instances describing the same individual." [4] Link keys explicitly identify, when the same individual is described by pairs of instances from different data sets. They should provide the connections and the consistency between properties of the data sets. Using link keys, it is convenient to identify instances with the same values.

The article "On the relation between keys and link keys for data interlinking" raises an important topic. As it has been already mentioned, the owl: sameAs property is used to specify a link between the same sources from different RDF data files carrying different IRIs. Because RDFs tend to be large, automatically detecting the owl: sameAs connection between RDFs is an important but challenging task. [6]

The authors in their work describe state-of-the-art approaches to linking data, which are based on finding keys or link keys across RDF data files. Both keys and link keys characterize what makes two resources to be identical and what is used to discover links across datasets.

Since both techniques have been proven to be effective in data interconnection scenarios, the authors' work aims to formulate the relationship between keys and link keys.

For this purpose, the authors provide the semantics of the keys (RDF) and the link keys.

They explain how the key in its various versions can be combined with the alignment between ontologies for data interlinking. [6]

To compare the key and the link key, the semantics of the keys are formulated as axioms of the description logic. There are several types of keys that are used in this context. Instead of S-keys and F-keys, in-key and eq-key are used. The prefixes in- and eq- are abbreviated forms of intersection and equality. Then the semantics of six types of connection keys are defined - weak, plain and strong connection keys and their in- and eq-forms.[6]

Three different types of connection keys (weak, simple, and strong) allow finding links between two data files. They differ in whether they allow the existence of different resources (duplicates) that meet the key conditions in each of the data files: weak link keys allow; simple connection keys allow them only between non-linked resources; strong links forbid all duplicates.

(18)

18 The relationship between keys and link keys is much more sensitive, and one cannot always be replaced by the other. Particularly, it has been shown that data interlinking with keys requires correct alignment (Theorems 1 and 2) and, in the case of eq-keys, completion (Theorem 2). Data interlinking with link keys, on the other hand, does not require alignment (Theorems 3 and 4), but in the case of eq-keys (Theorem 4), they still need to be completed.

Strong-linked keys are keys by definition, and properly aligned keys mean strong- linked keys (Theorem 5). In this case, the links generated by the strong key are the same as the links generated by their associated side keys and alignments (Theorems 7 and 8).

In addition to not requiring alignment, weak link keys can exist independently of the existence of any key in each ontology (Theorem 6; if so, then they are strong links), and yet they can be useful for linking a dataset. These results provide a clear picture of the relationships between key-inspired devices available for data interconnection. They can be easily transferred to hybrid keys and connection keys.

Finally, the authors determine the conditions under which link keys are equivalent to keys and show that linking data to keys and aligning ontologies can be limited to linking data to link keys, but not vice versa.

Authors justify the use of weak links instead of finding matching alignments. Instead of spending time searching for a strong alignment that may not be possible, it can be reasonable to use a suitable weak connection key, that may be useful and save some time.[6]

(19)

19

2. Background

The basic concepts relevant to the research conducted as part of this thesis are described in this chapter.

2.1 Ontology

Ontology is a wide discipline. Its roots come from philosophy. And like everything in philosophy, the definition of 'ontology' is debatable. Different philosophical schools offer various approaches how to consider 'ontology'. The term 'ontology' appeared in the seventeenth century in Lexicon philosophicum by Rudolf Göckel and, independently, in Theatrum philosophicum by Jacob Lorhard. Officially, in the English language, the term was firstly captured by Bailey’s dictionary, which defines ontology as “an Account of being in the Abstract.” [47]

Ontology as a discipline has the goal to provide a definitive and exhaustive classification of entities in all spheres of being. It should be definitive to give a complete explanation and description for all that is going on in the universe. At the same time, the definitive classification should be undoubtedle to give an account of what makes true all truths.

To serve these cases, classification should be exhaustive including also all types of entities and all types of relationships between them.

In the twenty-first century 'ontology' has gained currency in the field of computer science. The new computational ontology is considered from a certain perspective, which makes it different from the philosophical point of view, where ontology is considered in general. The necessity of computational ontology mainly comes from the Tower of Babel problem. Different systems, databases, frameworks may use identical labels but with different meanings, alternatively, different terms and concepts may refer to the same meaning. The scale of the problem is growing in the way of consolidating information. In the beginning, it was possible to solve incompatibilities on a case-by-case basis. [46]

In 1997 the definition of ontology was defined by Borst as " formal specification of a shared conceptualization". The conceptualization should have a shared view and should be expressed in some formal representation. These two statements enable a machine to process ontology. [47]

(20)

20 2.2 Semantic web

The informal definition of Semantic Web can be found in the May 2001 Scientific American article "The Semantic Web" (Berners-Lee et al.), that says "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation."

The Semantic web is the extension of the World Wide Web (WWW), which is based on standards set by the World Wide Web Consortium (W3C). The most important part is the interlinking of data and machine readability. The concept of linked data allows the distribution of information about one single entity over the Web. This is achieved by the usage of vocabularies that define the semantics of the properties. The difference between WWW and SW lays into the fact that WWW is a web of linked documents using Uniform Resource Locators (URLs) and SW is a web of linked data, data are pointing to other data using URLs.

Figure 1. SW technologies and standards [48]

The basic features or fundamental concepts of the Semantic Web according to D.

Allemang and J. Hendler are [49] :

• the AAA principle: "anyone can say anything about any topic"; this is the main slogan of both the World Wide Web and Semantic Web.

• Open world assumption: the principle considers the web as an open world, where at any time information could come to the light. However, the

(21)

21 assumption, that the available information is full, is not correct. There is always more information that has been known before.

• Non-unique naming is connected to the first principle. Since "anyone can say anything" on the web, the naming of entities is not coordinated.

Network effect: this property supports the organic growth of the web. More people bring a higher value of participation.

2.2.1 OWL

OWL or Web Ontology Language is the ontology language of the Semantic web and part of the W3C's Semantic Web technology stack. It is a computational logic-based language that is designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is intended to be used by applications that need to process the content of information instead of just presenting information to humans. The first version of OWL was developed by the W3C OWL Working Group and published in 2004. In 2009 W3C OWL Working Group created the second version of OWL 2, which was published in 2009.

The Second Edition of OWL 2 was issued in 2012. [50]

OWL is a declarative language, not a programming language. It describes a state in a logical way. "A terminology, providing a vocabulary together with such interrelation information constitutes an essential part of a typical OWL 2 document. Besides this terminological knowledge, an ontology might also contain so called assertional knowledge that deals with concrete objects of the considered domain rather than general notions." [50]

2.2.2 RDF

RDF is a Resource Description Framework that is a part of the W3C's Semantic Web technology stack. The RDF1.1 was published in 2014. It comes up with a simple language for representing annotations about Web resources identified by URIs. URIs and the two ends of the link can be used to name the relationships between resources. Using the RDF model, structured and semi-structured data can be mixed, exposed, and shared across systems and applications.

For linking the structure of the Web, RDF uses "triples". A triple is a set of 3 entities in the form of a "subject-predicate-object". This format allows representation in a machine- readable way.

(22)

22 The easiest possible mental model for RDF is the graph view, where the graph nodes are represented by resources and the graph edges are links between two resources. The graph view as a visual presentation of the data is usually easy to understand.

2.2.3 RDFS

RDFS or Resource Description Framework Schema provides a mechanism for describing RDF resources and relations between them. It is an extension of the basic RDF vocabulary. The namespace for RDFS is identified by the IRI - http://www.w3.org/2000/01/rdf-schema#, which is associated with the prefix rdfs: to refer to that namespace. [51]

The RDF Shema has the class and property system which describes properties in terms of class to which they apply. The mechanism of domain and range allows determining characteristics of other recourses. "For example, we could define the eg:author property to have a domain of eg:Document and a range of eg:Person" [51]

The benefit of RDF Schema is that it allows at any time to extend the list of properties of existing resources without massive effort. It keeps descriptions of the resources up to date.

2.2.4 SKOS

SKOS or Simple Knowledge Organization System is a part of W3C Semantic Web technology, that is built upon RDF and RDFS. SKOS is a standard to represent controlled vocabularies, taxonomies, and thesauri. The main goal of SKOS is to make publication easier and enable data for linking and reusing. It has a wide range of knowledge sources. SKOS concepts can be related and linked to other concepts, which allows a cost-efficient development.

The main elements of SKOS are concepts, labels and notation, documentation, semantic relations, collections, and mapping properties. Since SKOS adopts a concept-based approach, the relationships are expressed between concepts, and the concepts are associated with the lexical labels. SKOS concept schemes are not formal ontologies in the way OWL ontologies are.

(23)

23 2.2.5 SPARQL

SPARQL Protocol and RDF Query Language is an RDF query language. SPARQL first appeared in 2008 and was acknowledged by W3C Semantic Web on 15 January 2008 as an official recommendation.

SPARQL provides functionality to display and manipulate data stored in the RDF format. The entire database should be as a set of "subject-predicate-object" triples. Triple patterns are like RDF triples except that they also may be a variable. SPARQL supports aggregation, subqueries, negation, creating values by expressions, extensible value testing.

Additionally, it provides capabilities for querying graph patterns with their conjunctions and disjunctions. The results of queries are the sets or RDF graphs. [52]

2.3 Linked data

The World Wide Web Consortium provides a set of practices for publishing structured data, which is called Linked data. The relationships between entities from different data sets make from the collection of isolated datasets the Semantic Web. Linked Data is the large-scale integration of data on the Web. Basically, it is the heart of the Semantic Web. Internationalized Resource Identifiers (IRIs) are used to identify entities for the structuring data. IRI can identify only one entity, which can be any entity, that is why IRIs are universal. [53]

Linked data is based on semantic web technologies. The structured and linked data becomes beneficial with the usage of semantic queries. If the structure of the data is regular and well-defined, it is easier for tools to reuse data.

Since the language of websites is HTML, the orientation is towards structuring documents rather than data. The structure of HTML makes the extraction of data complicated.

To face this complication a variety of microformats have been introduced. The weak point of microformats is they only provide a small set of attributes; it is often not possible to express relationships between entities. [54]

Web APIs is another way of making structured data available on the web, it provides simple query access over the HTTP protocol. This way is more generic than microformats.

However, it requires significant effort to integrate data into the application.

In Linked Data issues related to microformats or Web APIs are resolved by the Resource Description Framework (RDF) language. RDF provides a flexible way to describe

(24)

24 things in the world with the bricks of RDF datasets, which are called triples. The triple consists of a subject, a predicate, and an object. This structure gives RDF flexibility that microformats miss.

2.3.1 URI

URI is an identifier type that is defined by RFC 3986. It is created to be a simple and extensible identifier. The specifications of URI syntax are written in RFC 3986. It does not specify which resource can be behind the identifier and does not provide information on how it can be addressed. The properties follow from the specifications of the protocols. The resource can be an electronic document, a physical object, or a service. The most important is to distinguish a resource from other resources.

Example: http://www.w3.org/albert/bertram/marie-claude

A limitation of this identifier type is the ASCII character set, which allows only the English alphabet character to be used. [55]

2.3.2 IRI

IRI (Internationalized Resource Identifier) is defined in RFC 3987. Unlike URI, it allows the usage of a UTF-8 encoding character, which allows IRI to include, for example, Czech alphabet characters and to incorporate different words of natural languages. This makes identifiers easier to create, process, understand, memorize, and so on. The IRI specification is compatible with the older specification of URI. It is a complement to URIs. For HTTP, same as URI, IRI Unicode characters are encoded using percentage encoding. IRI is used in the RDF to publish linked open data. [56]

absolute-IRI = scheme ":" ihier-part [ "?" iquery ]

relative-IRI = ihier-part [ "?" iquery ] [ "#" ifragment ]

2.3.3 List of prefixes

A prefix is a standard mechanism of shortening URIs in some RDF serializations, like for example Turtle. Prefixes are beneficial for better understanding, for manual creation and modification, and analysis of the RDF data. They are a convention, so prefixes can be chosen freely. However, several common prefixes exist and are used worldwide. Prefixes that have been used in this thesis are listed below.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

(25)

25 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX owl: <http://www.w3.org/2002/07/owl#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX dc: <http://purl.org/dc/terms/>

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

PREFIX wdt: <http://www.wikidata.org/prop/direct/>

2.4 Linked data principles

To allow a machine or a person to explore the semantic web and to make it grow, the semantic web technologies is guided by four principles, which are presented in Berners-Lee 2009 [57]:

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL).

4. Include links to other URIs, so that they can discover more things.

These principles are not the strict rules, but the expected behavior and standardization alignment. Following them makes the data interconnected. However, breaking the principles does not destroy the semantic web, but reduces the efficiency of the functionality.

The first principle states that resources must be identified by URIs. If the URIs are not used, that part is not a semantic web.

The second principle is based on the fact that HTTP name look-up is a powerful set of standards. Since HTTP URIs are names and not addresses, people tend to invent new URI schemas to have them under separate control. The possibility to look up those names is a convenient way to manage data.

The third principal benefits to the information that can be obtained from the RDF, RDFS, and OWL ontologies including the relationships between the properties and classes in ontology.

(26)

26 The fourth principle helps to connect data into one unbounded web, which will enable to have as complete information as possible. The value of the information within the web page is enriched by the subset of what it links to. [57]

2.5 Benefits of linked data

The main benefit of linked data is easy and efficient data integration and browsing through complex data. It removes the walls between different formats and different sources.

Due to that, the update of data becomes easier. Links provide the knowledge extension out of existing facts. The Linked Data advantage is that it is sharable, extensible, and easily re-usable.

That is why the linked data approach scientifically overcomes current practices and solutions for creating and delivering library data. [58]

The usage of Web-friendly identifiers URIs, which is supported by the Linked data standards, sustain the multilingual functionality for the data. Identifiers allow multiple descriptions referring to the same resource. Since resources can be described by different libraries, organizations, and individuals, anyone can contribute to the expertise of the source.

This complex approach increases the value of the data far beyond each contribution taken individually. Global unique identifiers allow easier citation and easier access to the metadata description. Libraries and memory institutions provide trusted data of long-term cultural importance.

Additionally, linked data aims to eliminate the redundancy of bibliographic descriptions on the web. Clear identifiers and links reduce the number of different names for one entity.

It is not obvious how useful is the output of Linked data when it is implemented.

However, the capabilities for using data will be improved when the structure of data becomes richer. Navigation across library and searching will become more advanced. The global information graph from the dataset will also become more generous.

The benefit to researchers lays within the re-use of the library data. The advantage is that linked data technology is rather enhancing the Web and not rebuilding or creating a new one. By simply copying and pasting URIs a researcher can make a citation. The citation management integrates library data into research documents. As a result, from the research documents, it is possible to find additional data and information just by following links among

(27)

27 multiple domain-specific knowledge bases. Moreover, it makes it easier for other researchers to reuse the data or to replicate the experiments.

For organizations linked data reduce infrastructure costs. It may be the first step toward the cost-effective approach to complex managing of cultural information. For small projects or organizations, access to larger data may increase their presence on the web. The internal data curation and publishing processes can also be improved by linked data usage.

Linked data creates an open global pool of shared data, which makes a direct impact on librarians and archivists. The redundant effort can be reduced as resources can be used and re- used on large scales. The identifiers make updates easier, and resources can be kept up to date with fewer expenses. The focus should shift from re-creating existing descriptions, that have been created by others, to extending domains of local expertise.

For developers, linked data offers a lot of opportunities to provide support to information technology. Developers will not have to use custom software tools for library- specific data formats. By leveraging RDF and the well-known standard protocol HTTP data gets placed in the generically understandable format. [58]

(28)

28

3. Full and Partial Identity

To analyze quasi-equivalent concepts, it is important to understand the nature of equality. This work is built on the assumption of correspondence identity and equivalence, respective quasi-identity, and quasi-equivalence. That is why the beginning of this part is dedicated to the concept of identity and its' philosophical roots.

3.1 The Concept of Identity

The question of equivalence or quasi-equivalence has its roots deep inside the concept of identity. Over years "identity" has had multiple interpretations and there is still no standard definition of this term. Since this concept is mainly coming from philosophy, almost every philosopher was trying to get closer to the essence of the "identity".

Plato was the first one to differentiate the verb "to be", to set "is" as an identifier. He separated "is" as a copula from "is" as identity, which helps to distinguish one object from another one. [20].

On the other side, Aristotle defines the numeric meaning of identity, which is called the Indiscernibility of Sames:

If x and y are the same in number, then every attribute of the one is an attribute of the other. There is another statement related to it - the Indiscernibility of Identicals, which is often called "Leibniz's Law," or "one-half of Leibniz's Law,". However, Kenneth T. Barnes in the book

"Aristotle on identity and its problems" declares the Indiscernibility of Identicals the principle of Aristotle.

If x and y are identical, then every attribute of the one is an attribute of the other.

[21]

The study of Aristotle's views by Nicholas P. White gives its formulation of the principle of the Indiscernibility of Identicals:

If A and B are identical, then whatever is true of the one is true of the other.

[21]

However, there are some questions regarding ownership of the Indiscernibility of Identicals, there is no doubt that principles of identity occupied the central place in Leibniz's work and his philosophy. Leibniz stated that there cannot be two absolutely similar things,

(29)

29 because there cannot be things that have absolutely equal properties. His idea was developed in the concept of substance. [22]

In the article "Identity, Indiscernibility, and Philosophical Claims" the version of Leibniz's principles is presented by the below formula. Where a and b mean individuals and F is a variable that ranges properties of the individuals a and b. [24]

∀F(F(a) ↔ F(b)) → a = b OR a = b → ∀F(F(a) ↔ F(b)

[24]

The complications of identifying that a is equal to b based on their properties are in the fact of finding all the properties of both individuals. This can be possible with the assumption of the limited number of properties.

As an alternative to Leibniz's theory, Kant stated that individuals cannot be specified in terms of a concept of substance. The individual objects according to Kant are bound to space and time, contrasting "identity" with "existence". He wanted to attribute identity not through the properties of individuals but a priori, in its fundamental way. Kant viewed individuals as objects in their unity and not as a set of properties. Practicably, awareness of ourselves as thinking subjects stands for the Kantian sense of identity. As he stated, "The 'I think' must be able to accompany all my representations" [23], §16.

3.2 Identity Problems

The brief research of philosophical roots and definitions of identity shows some identity problems and inconsistency. In this section, some of the issues will be introduced in more detail.

3.2.1 Philosophical Problems

As it has been already slightly mentioned, the main problem of the personal identity in philosophy relates to the question of who to identify as a single individual? From a philosophical point of view, two points require to be clarified before answering that question.

The first one is the disagreement between Leibniz's theory and Kant's theory regarding metaphysical presuppositions like space and time. Is it possible to say that the same chair today and tomorrow is the same chair? Over time some properties of the chair may change. If someone breaks that chair, even the main purpose of the chair may change as it will be impossible to sit. Would it be the same chair in this case? There is no standard and unique

(30)

30 answer from the philosophical point. The same happens with the location. A chair outside a house under the rain and a chair in a warm room will also have slightly different properties as being wet and cold in one case and dry and warm in another case.

The second one is context-dependency. Sometimes, to state that two individuals are identical it is required to mention a context. For example, when a doctor prescribes a painkiller to the patient. In the medical context, two medicines with the same chemical structure and the same effectiveness are identical, they are a painkiller. However, in the context of business or market, it may not be true as two medicines have different prices or are produced by different pharmaceutical companies.

3.2.2 Practical Problems

Some practical problems of identity in ontology are driven by philosophical problems, which have been mentioned above. However, some of them bring the nature of ontology itself.

Due to the fact of existence Open World Assumption and AAA (Anyone can say Anything about Any topic) principles, the number of data is continuously growing including individuals and their properties. That is why the problem of identity in the Web of Data is even more controversial.

Taking into account the Open World Assumption, the lack of an identical statement does not mean that they cannot be identical unless there is no statement owl:differentFrom.

From the AAA point of view, most owl:sameAs links have no guarantee to be correct.

Especially, since they are mostly generated by automated algorithms. In the book "Results of the Ontology Alignment Evaluation Initiative 2019" the accuracy was ranged between 79%

and 92% (SPIMBENCH). [25]

In the article "The sameAs Problem: A Survey on Identity Management in the Web of Data" the authors give an example of the algorithm matching accuracy problem regarding books. It is common to match books formed on the similarity of titles and authors. However, two different editions of one book with a different number of pages also share the title and the author without being owl:sameAs. [19]

Besides, the same article, declared that different people may have a different opinion about the similarity of the two objects. The authors present studies, where three experts were analyzing 250 owl:sameAs links. The first expert confirmed only 73 owl:sameAs links, the second one 132 and the third one 181 owl:sameAs links. The deviation is quite high. [19]

(31)

31 The reasons can be multiple. One of them is coming from the philosophical problem of context-dependency. If two different persons were evaluating owl:sameAs connection based on the different contexts, the result might be completely different. Another reason can relate to the differences in the competence of the modelers.

3.5 Contextual Identity

The contextual identity is closely related to the second philosophical problem of context-dependency. As it was also shown by practical examples the idea of identity connects to the context of individuals. The standard OWL semantics of "owl:sameAs statement indicates that two URI references refer to the same thing: the individuals have the same "identity"."[9]

For example, from the First Data Set there is information about a:greg:

a:greg a foaf:Person.

The second Data Set provides information about a:otherGreg and the equality relation, that a:greg and a:otherGreg.

a:otherGreg a foaf:Person.

a:otherGreg owl:sameAs a:greg.

However, this example does not consider contexts. "At the moment, the way of encoding contexts on the Web is largely ad hoc, as contexts are often embedded in application programs, or implied by community agreement." [19]

The authors of the work " The sameAs Problem: A Survey on Identity Management in the Web of Data" present an approach on how to include the context dependency into the evaluation of the contextual identity. The context is represented by Π, which is the set of all- sufficient properties.

a = Π b → (∀π∈Π) (π(a) = π(b)) (∀π∈Π) (π(a) = π(b)) → a = Π b

[19]

The interpretation of this equality is that two concepts a and b are identical respective the context Π. For example, an avocado a is identical to an avocado b respective concept diet.

However, if the concept is market and avocado a is from Colombia and avocado b is from Mexico, they cannot be considered as identical.

(32)

32 Another article "Detection of Contextual Identity Links in a Knowledge Base" by Joe Raad, Nathalie Pernelle, Fatiha Saïs introduced the definition of the global context. "A global context is represented as a connected sub-ontology of the ontology O that is composed of a set of classes and properties of O, and a set of axioms which is limited to constraints on property domains and ranges." [26] The identity evaluation is based on the notion of the graph isomorphism of instance's descriptions. The global context should be automatically detected to judge equality or inequality.

(33)

33

4. Ontology Matching

This chapter of the thesis focuses on the analysis of existing ontologies and describes concrete cases of quasi-equivalent concept, merging decision problems.

4.1 Matching techniques

Matching techniques can be split into internal or content-based matching and external or context-based matching. This separation is derived from the information origin on which matching is based. Information can come from the internal, from the content of the ontology, or from the external resources, called context. External resources can be formal or informal.

Content-based techniques can be terminological, structural, extensional, or semantic. Context- based can be semantic or syntactic.

Figure 2. Matching techniques [4]

4.1.1 String-based techniques

As it is clear from the name of the technique, the focus is on the structure of the string.

It compares the name and the name description of the classes, which in some cases is very valuable. The string-based method considers string as a sequence of letters in the alphabet. The

(34)

34 most similar strings are likely to be matched. The distance between two strings, usually, is represented by a number, where a smaller value indicates a greater similarity.

There are multiple ways to evaluate the string: an exact sequence of letters, an erroneous sequence of letters, a set of letters, a set of words. String-based methods, additionally, include techniques that can help to improve results of comparisons: normalization, diacritics and digits suppression, link stripping.

4.1.2 Language-based techniques

Language-based techniques usually run before the string comparison to improve the outcome. They analyze the name and the name description of the class as a text in some natural language, based on the natural language processing techniques exploiting morphological properties of the words. Resources, such as lexicons or domain-specific thesauri, allow usage of linguistic relations to match words.

To extract the meaning of the terms from the text language-based techniques rely on Natural Language Processing techniques. Linguistic resources, such as stemmers, part-of- speech taggers, lexicons, and thesauri allow the interpretation of the terms and belong to the invaluable set of linguistic resources to provide an accurate matching.

4.1.3 Constraint-Based Techniques

In addition to comparing names or replacing them, the internal constraints can be compared. The algorithms are applied to the definitions of entities, such as types, cardinality (or multiplicity) of attributes, and keys. This comparison can include the comparison of the internal structure of the entities or the comparison of the entity with other related entities. It determines the similarity of schema elements based on the equivalence of element constraints, such as data types and domains, key characteristics, etc.

4.1.4 Informal resource-based techniques

Informal resources as, for example, pictures can be tied up to the ontologies. Based on how ontological entities are related to the informal resources, the relations can be deduced.

Classes can be equivalent if the same set of pictures annotates both classes. Informal resource- based techniques can find regularities and discrepancies between related entities.

(35)

35 4.1.5 Formal resource-based techniques

Formal resource-based techniques rely on external ontologies. The decision for matching or not comes from one or several external ontologies. Resources such as domain- specific ontologies, upper-level ontologies, linked data are used by several of the context-based matchers. It is done to find the common ground on which comparison can happen.

However, when an additional context is added, it has to be a matter of balance. The context is a piece of new information, that can be useful and lead to the correct results. At the same time, this information can also generate misleading correspondence. This is the main difficulty of the approach.

4.1.6 Graph-based techniques

Graph-based techniques are a type of graph algorithms. They consider database schemas and taxonomies as labelled graphs. The positions of nodes within the graph are analyzed to compare their similarities. If two nodes are similar, the nodes near them have to be also somehow similar.

4.1.7 Taxonomy-based techniques

Taxonomy-based techniques are another type of graph algorithm. They consider only the specialization relation (subClassOf). The idea behind this is that terms are already similar being interpreted as a subset or superset of each other and the probability that their neighbors are also similar is high.

4.1.8 Instance-based techniques

The set of instances are compared by instance-based techniques to decide to match these classes or not. The techniques can help in grouping together entities or computing distances between them. It can be simple theoretical reasoning, but also it can be algorithms, that are able to learn how to sort and provide correct alignments.

If the same set of individuals is shared by two classes, it is highly possible that these classes will be equivalent. Even if they do not share the completely same set of individuals, there is a way to calculate distance between them.

4.1.9 Model-based techniques

Model-based techniques lean on semantic interpretation. The deductive methods for the algorithms are based on propositional satisfiability and description logics reasoning techniques.

(36)

36 If the two entities share the same interpretations, the entities are the same. Respectively, it is vice versa, equal entities have equal interpretations. The model-theoretic semantics is used to justify the results.

(37)

37

4.2 Weak-Identity and Similarity Predicates

The importance of the predicate owl:sameAs is very high. However, different studies show that owl:sameAs has some disadvantages and problems [19], [25], [31]. Even if owl:sameAs became an essential ingredient, other alternative predicates exist.

Different vocabularies and approaches were trying to face problems that bring using owl:sameAs. This section presents some of the alternative approaches consolidated in Table 1.

The descriptions are taken directly from the source to avoid any kind of misalignments or confusion.

Table 1. Identity, weak-Identity and Similarity Predicates

Source Vocabulary Description Property

Rdfs [32] rdfs:seeAlso States that the resource O may provide additional information about S

Does not express similarity Wdt [33] wdt:P2888 (URLs only) used to link two

items, indicating a high degree of confidence that the concepts can be used interchangeably

Transitive Reflexive Symmetric SKOS [30] skos:relatedMatch To state an associative mapping

link between two concepts

Does not express similarity skos:closeMatch To link two concepts that are

sufficiently similar that they can be used interchangeably in some information retrieval applications

Reflexive Symmetric

skos:exactMatch To link two concepts, indicating a high degree of confidence that the concepts can be used

interchangeably across a wide range of information retrieval applications

Transitive Reflexive Symmetric

skos:broadMatch To state a hierarchical mapping link between two concepts

Does not express similarity skos:narrowMatch

Vocab [34] vocab:similarTo Having two things that are not the owl:sameAs but are similar to a certain extent

Reflexive Symmetric

Lvont [28] lvont:strictlySameAs The predicate is formally declared equivalent to owl:sameAs, so applications can still interpret these

Transitive Reflexive Symmetric

(38)

38 links as regular sameLinks.

However, whenever they see that lvont:strictlySameAs was used, they can know that the link is intended in the strict sense lvont:nearlySameAs To explicitly represent near-

identity. These two predicates are explicitly left somewhat vague, simply because similarity is a very vague notion

Not defined

lvont:somewhatSameAs Not defined

Schema.org [35]

schema:sameAs. URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Wikidata entry, or official website.

Does not express similarity

Similarity Ontology [31]

so:identical Two URIs refer to the same thing and so share all the properties, but the reference is opaque.

Transitive Reflexive Symmetric so:claimsIdentical A way for one agent to claim two

URIs are identical, without the inverse needing to be true

Transitive Reflexive so:matches Two URIs refer to possibly distinct

things that share all the properties needed to substitute for each other in some graphs

Transitive Symmetric

so:claimsMatches A way for one agent to claim two URIs are matching, without the inverse needing to be true

Transitive

so:similar Two URIs refer to possibly different things that share some properties but not enough to substitute for each other

Reflexive Symmetric

so:claimsSimilar A way for one agent to claim similarity without reciprocation

Reflexive so:related Two URIs refer to possibly distinct

things and share no properties necessarily but are associated somehow. As it is only symmetric, there are no claims to any sort of similarity, matching, or identity

Does not express similarity

(39)

39 so:claimsRelated Might provide additional

information about the subject resource.

Does not express similarity

Odkazy

Související dokumenty

THIS DRAWING AND/OR TECHNICAL INFORMATION IS. THE PROPERTY OF

Based on the information provided in the paragraphs above, it can be concluded that the goal of this paper, to segment the market fast food consumers in the Czech

This Bachelor thesis provides important, interesting and valuable information about the impact of C19 on the Ground Handling Services. The BT does not focus on determining whether

The aim of this thesis is to analyze differences in disclosures of private information on social media between users from the Czech Republic and Turkey?. Due to the

The main idea that was formed in the theoretical part is that in the modern world there is a dominance of false information, and in all possible manifestations (it can be either

A description of information structure (be it under the traditional terms of functional sentence perspective, theme-rheme articulation, topic and comment, or, as is the case in

 housing often becomes the main asset of households.  in this case, the consumer may not have the completeness of information about the acquired dwelling and / or the.. seller

Willis (1987) demonstrated that the response of the STT-VPL neurones in the monkey respond to nociceptive information and additional studies have shown that the