Evaluation of the MSO

Dr Philippe Martin (phmartin @ gu.edu.au) - March 1st, 2004

1. Maturity: (How ready is it to use now? What capabilities have already been demonstrated? Time and resources needed to start using? Potential for improvement.)

The MSO is the Multi Source Ontology that the WebKB-2 knowledge server permits Web users to (i) browse, search, filter and display in various formats, (ii) update (protocols for the cooperative edition of the knowledge base are enforced), and (iii) re-use in new private knowledge bases (KBs). The MSO is the part of the public KB of WebKB-2 that includes all the categories (types or individuals) and their inter-relations or definitions, but not the other statements (facts/data, e.g. the representations of Qantas flights within, from or to Australia).

Currently, the only input format if the FS (For Structuration) language which includes the FT (For Taxonomies) language, the FCG (For Conceptual Graphs) language, and the FC (For Control) language. Updates may be made via the interfaces but for various reasons, this is discouraged: knowledge modules should rather be stored in Web/intranet accessible files, progressively tested against the public KB of WebKB-2 and refined, and finally, when considered stable, committed to the knowledge base of WebKB-2. WebKB-2 proposes several examples of such files where the knowledge representations may be mixed with their documentation.

Like any knowledge server (or most softwares), WebKB-2 has a huge potential for improvement: more input/output formats, more inferential capabilities, etc. WebKB-2 has inferential capabilities comparable to LOOM for constraint checking, far less for automatic classification from definitions, and more for graph matching (since contexts, sets and numerical quantifiers are taken into account; there are three increasingly powerful/general graph matching procedures, the more constrained one being a test for logical implication between simple statements using only existential quantifiers).

The MSO itself is only dependent on WebKB-2 in the sense that WebKB-2 is currently used to store, check and permit updates to the MSO. Like any other general ontology, the MSO, has a huge potential for improvement: integration of more top-level ontologies, more correcting and structuring to the WordNet component, more partial/complete definitions and schemas associated to the categories, etc.

However, the current content of the MSO already shows that its top-level and approach is well-suited to integrate and organize categories from different kinds of ontologies: theory or logic oriented ontologies like DOLCE and the SUMO, general ontologies like those of John Sowa, and lexical oriented ontologies like WordNet. The top-level of the MSO now includes many important and complementary distinctions and categories coming from various top-level ontologies, interlinked by various relations (specialization, equivalence, exclusion, ...). Its relation types are structured into a specialization hierarchy, mainly according to their source or destination arguments (this turns out to be a scalable and very helpful way of organizing relation/function types). A relation type signature may specify a variable number of arguments and this helped the construction of the relation type specialization hierarchy.

During the integration of the SUMO (in progress) and DOLCE, some complex KIF axioms/definitions associated to the categories have been left out. Although they could have been translated into FCG (only manually, at present), this would have been very time-consuming, these axioms would not be exploitable by WebKB-2 for logical inferencing and, given their particular nature, KIF is often nearly as intuitive a format as FCG for them. If the MSO is to be re-used by a general inference engine using KIF, the MSO can be translated into KIF and the left out axioms automatically imported from their source KIF files.

2. Robustness: (Heavy weight vs. light weight ontology features? Potential for improving robustness? How well will it handle known requirements, such as those listed in SUO Scope and Purpose.)

I assume this question is more about languages than ontology content (otherwise it seems sufficient to say that the MSO combines the strengths of the ontologies it integrates).

The FT (For Taxonomies) language provides a very intuitive, concise and normalizing format that combines the modelling features of lexical ontologies (e.g. the lexical and semantics links of WordNet) and entity-relationship or description logics languages (e.g. via the use of relation cardinalities and open/complete subtype partitions). Complex definitions or axioms that are beyond the expressivity of FT can be modelled in FCG, which has the expressivity of KIF but is higher-level (i.e. its frame/graph based approach and syntactic sugar makes it more intuitive and normalizing; although FCG also permits to write complex definitions requiring the use of a KIF-like predicate/function based approach, e.g. definitions for extended quantifiers, these definitions would only be stored by WebKB-2, not exploited for inferencing purposes).

The FT and FCG languages are also adapted to a particular feature of knowledge in WebKB-2: every element (i.e. category, link between categories, or any other logical statement that is not just a link) must have a recorded creator (user or source document, represented in the ontology by a category and via a short identifier and/or a URL) and may have a recorded creation date. This approach of storing creators supports a much more flexible and easier way to handle knowledge sharing than the traditional module/file based approach (as for example in the Ontolingua library) since arbitrary complex queries can be made on the knowledge and their creators, and even the knowledge on these creators (thus, modules can be generated if necessary, and filtering can be done; the category search&browse interface of WebKB-2 has an option for simple filtering on creators). This approach also permits WebKB-2 to let knowledge providers cooperatively update a same knowledge base (instead of independently building ontologies), without lexical conflicts and incrementally: semantic conflicts between knowledge sources are corrected by the knowledge providers, which leads to more precise, interconnected and compatible knowledge; with the module based approach, ontology merging is a difficult task that has to be done by end-users when the re-used ontologies conflict.

I have given examples of translations between FCG and other languages (KIF, CGLF, CGIF, RDF+OWL, UML) for a panorama of knowledge representation cases. The category search&browse interface already proposes CGIF and a DAML+OIL compliant RDF as output formats, although complex definitions are not translated (they are presented in FCG within comments) and the link creators are not presented in RDF in order to avoid unusable ad-hoc representations. If required, I can add an export procedure to KIF+SUMO, or KIF+OWL, or KIF and another ontology (since KIF is a low-level language, the "language ontology" to re-use must also be selected). I have begun the implementation of partial import procedures from KIF+SUMO and RDF+OWL, but full import from such low-level languages is not planned in the short/medium term since it would be a difficult/time-consuming task and the use of such languages often leads to knowledge representations that do not satisfy the knowledge sharing criteria encouraged in WebKB-2.

WebKB-2 permits concept types that represent roles (e.g. "parent", "driver") to be used within relation nodes, and it will soon be possible to associate relation signatures to these particular concept types. Thus, knowledge providers do not have to represent these roles via relation types too. In my opinion, the use of this feature increases knowledge sharing/retrieval possibilities and decreases the complexity of knowledge entering/management. This issue has recently been debated on the SUO list, and I have proposed KIF definitions to permit the use of such a feature in KIF.

3. Potential for broad acceptance: (How well will it support maximum number of domains?)

By design and according to my experiences, the WebKB-2 and the MSO are adequate for a maximum number of domain; see the examples. The initial core of the MSO (mostly derived from Sowa's ontologies and the integration of WordNet) and two of the languages of WebKB (FCG, and its structurally identical but syntactically different version: Formalized-English) were designed to ease and normalize the representation of English sentences (the proposed types, constructs and syntactic sugar support an intuituive and explicit representation of English sentences, reduce the number of incomparable ways something can be represented, and hence increase knowledge retrieval and sharing possibilities). Thus however, I have a bias: I designed WebKB-2 to be a knowledge retrieval and sharing system, not a general inference engine (like for example Otter), nor a system performing automatic classification (like description logics systems).

The WebKB-2 framework is designed to ease the integration (or interconnection) of various ontologies, and increase the acceptance of the result. The sources of the categories, definitions or statements are explicit (and hence, these sources are credited). Integrated ontologies complement each other and precise or document each other. I have asked the authors of the ontologies that I have integrated to check the connections I made, and their feedback has been helpful to correct some misinterpretations or mistakes I made.

4. Language Flexibility: (What ontology language is it in? How stable is language? If desired, could it be written in a different ontology language?)

The grammars of FT and FCG are now quite stable. However, I will update them if I find new features that should be in them. For example, 6 months ago, I added the possibilities of using cardinalities on links between categories in FT, and I am currently adding the "N by N" construct in FCG (such a construct, or syntactic sugar, is useful to represent sentences such as "John brought his books 5 by 5"). The previous updates of FT and FCG date back to more than a year ago.

The MSO could be written in KIF. The current MSO could also be written in most other languages, e.g. RDF plus OWL-DL, if the creators of links between categories are not represented.

5. Ownership/Cost/Changes: (Who owns it? Any proprietary restrictions on use? Any charges for utilization? How will it get changed and who controls the changes? Is it being developed by a Standards Developing Organization?

The MSO is public domain.

The development of WebKB-2 was funded until August 2003 by my former employer, the DSTC research center. The release of WebKB-2 as an open source public domain software is an issue that is supposed to be debated at DSTC but the committee for discussing such matters is still not established. However, since August 2003, I have been rewriting WebKB-2 into AnyKB, a knowledge server with similar features but which could use various underlying KBMSs or OODBMSs (e.g. any OKBC-compliant KBMS system could theoretically be re-used by AnyKB, while WebKB-2 is dependent on the OODBMS FastDB or its disk-based version, GigaBASE). Since I was not funded while developing AnyKB, I am free to release it as an open source public domain software in a few months when it has enough features.

6. Domain Friendly: (How easy to develop domain ontologies based on upper ontology?)

The MSO fortunately integrates a lexical ontology (medium-level ontology) in addition to top-level ontologies because otherwise developing domain ontologies based on it would require much more work from each user and knowledge sharing would be more difficult. In WebKB-2, thanks to the large pre-existing ontology, most users simply have to find the concept types semantically closest to the ones they require (e.g. by entering English words and then browsing along semantic links between categories), specialize them if needed, and use them in statements (there is also a facility for those who want to skip the first phase whenever possible: category names, i.e. words, may be used instead of category identifiers within statements whenever there is no ambiguity about the relevant category, i.e. when a used word is a name of only one category or when the signatures of the used relations permit to discard the other candidate categories automatically).

The WebKB-2 site proposes several example files that illustrate how domain ontologies can be developed by specializing WordNet categories and, if needed, interconnecting the new categories between themselves. For example, see the representation of the ADFP9 glossary or the representation of the CADM model. The medium level ontology acts as a large hat that documents, structures, and permits to compare and retrieve the domain-related categories and the knowledge representations that use them.