Matching techniques - Ontology Matching - Hlavní práce75522

4. Ontology Matching

4.1 Matching techniques

Matching techniques can be split into internal or content-based matching and external or context-based matching. This separation is derived from the information origin on which matching is based. Information can come from the internal, from the content of the ontology, or from the external resources, called context. External resources can be formal or informal.

Content-based techniques can be terminological, structural, extensional, or semantic. Context-based can be semantic or syntactic.

Figure 2. Matching techniques [4]

4.1.1 String-based techniques

As it is clear from the name of the technique, the focus is on the structure of the string.

It compares the name and the name description of the classes, which in some cases is very valuable. The string-based method considers string as a sequence of letters in the alphabet. The

34 most similar strings are likely to be matched. The distance between two strings, usually, is represented by a number, where a smaller value indicates a greater similarity.

There are multiple ways to evaluate the string: an exact sequence of letters, an erroneous sequence of letters, a set of letters, a set of words. String-based methods, additionally, include techniques that can help to improve results of comparisons: normalization, diacritics and digits suppression, link stripping.

4.1.2 Language-based techniques

Language-based techniques usually run before the string comparison to improve the outcome. They analyze the name and the name description of the class as a text in some natural language, based on the natural language processing techniques exploiting morphological properties of the words. Resources, such as lexicons or domain-specific thesauri, allow usage of linguistic relations to match words.

To extract the meaning of the terms from the text language-based techniques rely on Natural Language Processing techniques. Linguistic resources, such as stemmers, part-of-speech taggers, lexicons, and thesauri allow the interpretation of the terms and belong to the invaluable set of linguistic resources to provide an accurate matching.

4.1.3 Constraint-Based Techniques

In addition to comparing names or replacing them, the internal constraints can be compared. The algorithms are applied to the definitions of entities, such as types, cardinality (or multiplicity) of attributes, and keys. This comparison can include the comparison of the internal structure of the entities or the comparison of the entity with other related entities. It determines the similarity of schema elements based on the equivalence of element constraints, such as data types and domains, key characteristics, etc.

4.1.4 Informal resource-based techniques

Informal resources as, for example, pictures can be tied up to the ontologies. Based on how ontological entities are related to the informal resources, the relations can be deduced.

Classes can be equivalent if the same set of pictures annotates both classes. Informal resource-based techniques can find regularities and discrepancies between related entities.

35 4.1.5 Formal resource-based techniques

Formal resource-based techniques rely on external ontologies. The decision for matching or not comes from one or several external ontologies. Resources such as domain-specific ontologies, upper-level ontologies, linked data are used by several of the context-based matchers. It is done to find the common ground on which comparison can happen.

However, when an additional context is added, it has to be a matter of balance. The context is a piece of new information, that can be useful and lead to the correct results. At the same time, this information can also generate misleading correspondence. This is the main difficulty of the approach.

4.1.6 Graph-based techniques

Graph-based techniques are a type of graph algorithms. They consider database schemas and taxonomies as labelled graphs. The positions of nodes within the graph are analyzed to compare their similarities. If two nodes are similar, the nodes near them have to be also somehow similar.

4.1.7 Taxonomy-based techniques

Taxonomy-based techniques are another type of graph algorithm. They consider only the specialization relation (subClassOf). The idea behind this is that terms are already similar being interpreted as a subset or superset of each other and the probability that their neighbors are also similar is high.

4.1.8 Instance-based techniques

The set of instances are compared by instance-based techniques to decide to match these classes or not. The techniques can help in grouping together entities or computing distances between them. It can be simple theoretical reasoning, but also it can be algorithms, that are able to learn how to sort and provide correct alignments.

If the same set of individuals is shared by two classes, it is highly possible that these classes will be equivalent. Even if they do not share the completely same set of individuals, there is a way to calculate distance between them.

4.1.9 Model-based techniques

Model-based techniques lean on semantic interpretation. The deductive methods for the algorithms are based on propositional satisfiability and description logics reasoning techniques.

36 If the two entities share the same interpretations, the entities are the same. Respectively, it is vice versa, equal entities have equal interpretations. The model-theoretic semantics is used to justify the results.

In document Hlavní práce75522_qnesa01.pdf, 1.1 MB Stáhnout (Stránka 33-37)