Veuillez, SVP, me corriger ce text que j'ai traduit en Anglais:
Summary:
Ontology matching becomes a crucial operation in many application domains, such as the Semantic Web. Several explicitations of the matching process were proposed in the literature. Those often differ in their degrees of granularity and detail. The objective of this paper is to propose a detailed explicitation of the matching process that takes into account all the ingredients that can compose a matching system.
Key words: Matching process, similarity measures, semantic correspondence, ontologies, semantic Web.
1. introduction
On the Semantic Web, data is envisioned to be annotated using ontologies. Currently, a large amount of ontologies have been developed in various research domains or even in the same domain. Interoperability among different ontologies becomes essential to gain from the power of the Semantic Web. Thus, ontology matching grows to be a core question.
Ontology matching is generally defined as the process which takes as input ontologies, calculates the similarity of their entities and returns an alignment which identifies entities having an identical or close semantic. In the literature, several explicitaions of ontology matching process were proposed. Those often vary in their granularity and detail degree.
The object of this paper is to propose an ontology matching process taking into account the majority of elements which can be found in matching systems.
The paper is organized as follows: Section 2 is devoted to the existing works dealing with the ontology matching process. In section 3, an explanation of the proposed matching process is detailed. Section 4 illustrates the application of the proposed process on some existing matching systems in order to ensure his coverage. Finally, we conclude the paper in Section 5.
2. Related work
There exist some works relating to the matching process (Euzenat & Shvaiko, 2007; Ehrig, 2007; Castano, Ferrara, Hess, & Montanelli, 2007). A synthesis study of this works was made. We noted that the process proposed in (Euzenat & Shvaiko, 2007) is presented as a block box where no specification of the stages having to be followed is indicated. From another share, the processes presented in (Castano, Ferrara, Hess, & Montanelli, 2007) and (Ehrig, 2007) are on a comparable level of details. The stages defining these two processes are well explained per moments and are less by others.
In the process described in (Castano, Ferrara, Hess, & Montanelli, 2007), the stage of selection of the entities on which will be applied similarity measures, called also basic techniques, are not indicated. Moreover, this process anticipates the stage of consistence checking by considering it specific to the approaches providing semantic relations. However, there exist tools, such as Prompt, which use the consistence checking after having produced similarity values between the compared entities.
For the process defined in (Ehrig, 2007), there is no distinction between the approaches using a logical reasoning and those operating on similarity values.
Also, we don’t see at any time in which stage the human intervention can be done. This remark is very important since, actually, the majority of matching approaches are semi-automatic. The human implication in the various stages of the process is thus of rigor.
Therefore, we propose in the following section an improvement of these processes by holding into account the advantages and the inconvenients of each one of them. As result, the maximum of ingredients which can compose matching systems are considered.
3. The Proposed process
The process which we propose takes into account the implication of the users which is an element intervening in the design of the majority of the current matching systems. This implication can be seen in providing inputs, composing basic techniques (euzenat shvaiko 2007), specifing the suitable method of aggregation and its parameters (the weight) among a whole of methods of aggregation suggested or interpreting the correspondences in the case of matching systems adopting manual or semi-automatic methods of extraction.
The steps followed by the majority of matching systems are schematized by the fig.1.
REAMARQUE: The elements labelled by * in fig. 1 are optional.
3.1. The first phase
It is the phase of acquisition where one finds the inputs providing and the representation of ontologies in an internal model.
3.1.1. Inputs
The inputs are two ontologies (or more) O1 and O2; parameters of the system, such as the threshold of correspondences filtering; the initial alignment which will help the system in its treatment; and the external resources which are used as support for the similarity computation.
3.1.2. Representation of ontologies in an internal model.
The internal model can have various forms according to the used matching approach. For example, in order to avoid the conflicts caused by syntactic heterogeneity, ontologies are written using a common ontological language (Eklöf & Martenson, 2006).
3.2. The second phase
It is the phase of ontology analysis and execution of basic techniques.
3.2.1. Extraction of the candidate entities pairs
The most common methods to make this choice are indexed into two categories (Ehrig, 2007): Choosing all the entities of the first ontology with all those of the other ontology or only the entities having the same type (concepts, relations, instances).
3.2.2. Characteristic engineering
The algorithms of ontology matching extract a specific subset from the ontologies characteristics in order to apply a similarity computation to them. These characteristics are extracted in accordance with the levels of semantic complexity defined in (Castano, Ferrara, Hess, & Montanelli, 2007).
3.2.3. Composition and execution of the basic techniques
The fundamental task of all the matching systems is to find the relations between the entities expressed in different ontologies by calculating their similarity degree. This is carried out thanks to the basic techniques which are applied to a particular characteristic of the entities, such as the name, the attributes and the relations. Thus, the characteristics of an entity are compared with the corresponding characteristics of another entity (Bach, 2006).
In order to cover all the characteristics of ontologies, the basic techniques must be composed. This composition can be either sequential or parallel (Euzenat & Shvaiko, 2007). The results provided by the execution of the basic techniques are either individual values of similarity, such as the results returned by the syntactic techniques, or semantic relations, such as the results returned by the techniques applied to the complex concepts (Castano, Ferrara, Hess, & Montanelli, 2007; Oulefki & Akli-Astouati, 2008).
3.3. The third phase
It corresponds to the analysis of the results produced by basic matching techniques. After this analysis, tow types of approachs can be found: Approachs producing similarity values and Approachs producing semantic relations.
3.3.1. Approachs producing similarity values
a. Similarity aggregation. The various similarity values must be aggregated in order to provide only one representative similarity value for two compared entities.
b. Interpretation. The basic techniques provide a great whole of correspondences from which alignment must be extracted (Euzenat & Shvaiko, 2007). The methods of extraction of alignments are manual, semi-automatic or automatic (Euzenat & Shvaiko, 2007). The most common mechanism for the extraction of the correspondences is the use of a filtering based on a threshold by selecting the correspondences having the similarity values higher than this threshold (Euzenat & Shvaiko, 2007). The application of the thresholds thus requires that extracted alignment has a sufficient quality.