Message 11936 of the SUO list Subject: SUO: Fw: Multi-Source Ontology (MSO) Date: Tue, 2 Dec 2003 06:54:36 -0500 From: jim.s3@juno.com In reply to: msg11841 by John Sowa Follow-Up: msg11943 by Stefano Borgo, msg11941 by John Sowa, msg11940 by Jon Awbrey SUO WG, Philippe Martin has joined this list and proposed the Multi-Source Ontology (MSO) as a started document. Is there a second? (Only one needed.) If so, this is open for discussion. Jim Schoening --------- Forwarded message ---------- From: Philippe Martin <phmartin at phmartin dot info> To: standard-upper-ontology@ieee.org Cc: spamOnly@phmartin.info, jim.s3@juno.com Date: Tue, 02 Dec 2003 06:18:11 +1000 Subject: Multi-Source Ontology (MSO) Jim, Thank you for the invitation to introduce the Multi-Source Ontology (MSO) of the WebKB-2 knowledge server (www.webkb.org) as a proposed starter document. I also thank John Sowa for having made this proposal last week. I first introduce WebKB-2 and its underlying approach, then its MSO and finally gives URLs for testing and further details. WebKB-2 is a shared knowledge server: it permits Web users to search and update a large shared knowledge base (KB) on the WebKB-2 server machine. WebKB-2 is also a private knowledge annotation server: it can access Web-accessible files on users' machines, and execute knowledge commands (i.e. statements (assertions and definitions) and queries) in these files, and optionally add their statements into the shared KB. To ease knowledge documenting, the commands may be mixed with other document elements (e.g. text in HTML) as long as these command are properly isolated from the rest (e.g. within special tags). Indexation commands also permit users to link any document element (e.g. any part of an HTML file) to a statement by a "representation link". Then, such links may be exploited to display document elements instead of statements in answer to queries. I have been developing WebKB-2 since January 2000 (financially supported by the DSTC, Australian W3C office) above an object-relational DBMS called FastDB (for the main-memory version) or Gigabase (for the disk-based version). WebKB-2 is a partial rewriting and extension of WebKB-1 which focused on private knowledge annotations and had no DBMS capabilities (persistence, transactions, ...). WebKB-1 (developed from January 1997 to december 2000, financially supported by the DSTO, Australian defence research center) was a partial and Web-based rewriting of CGKAT, a knowledge acquisition tool that I developed during my PhD thesis (at the INRIA Sophia Antipolis, France) above the conceptual graph workbench Cogito and the structured document editor Thot (the code of Thot is now used in Amaya, the prototype browser of the W3C). WebKB-2 and WebKB-1 parse the FS language, which has various sub-languages: - FT (For Taxonomies), a simple language for links between categories (subtypeOf, instanceOf, exclusion, identity, WordNet links, etc.), - FCG, my adaptation of CGLF (Conceptual Graph Linear Format) for a more readable and normalized representation of expressive knowledge, - FE (Formalized-English), a notation structurally identical to FCG but with syntactic sugar which makes it look like English, - CGLF, CGIF, KIF, ... (they are sub-languages of FS but are currently only very partially parsed by WebKB-2; WebKB-1 fully parses CGLF), - FC (For Control), some basic procedural control structures, - queries on links and graphs (various kinds of graph-matching are possible), - commands for the various parsing/display options. RDF/DAML+OIL is also partially parsed but is is not a sublanguage of FS. Knowledge entering/querying may also be done via HTML menus which are translated into commands before being sent to the WebKB-2 server via the CGI protocol. In WebKB-2, every element (i.e. category (type or individual), link between categories, or graph (i.e. logical statement that is not just a link)) must have a recorded creator (user or source, represented by a short identifier and/or a URL). Hence, the expressions "Multi-Source Ontology" or "Multi-Source KB". Category identifiers may also include various "names" for the category, as long as theses names are separated by at least two underscores (unlike an identifier, a name may be shared by various categories). Some examples of category identifiers are: - wn#domestic_cat__cat__house_cat (identifier for a WordNet category having "domestic_cat" as "key name", plus "cat" and "house_cat" as other names) - wn#domestic_cat (shorter identifier for the same category) - #domestic_cat (idem; special shortcut for WordNet categories because they currently form 92% of all categories in the WebKB-2 default ontology) - spamOnly@phmartin.info (a possible identifier for myself) - http://www.webkb.org/doc/ (a possible identifier for the WebKB documentation) Within graphs, names can be used instead of identifiers when there is no ambiguity (e.g. when a name refers to only one category or the signatures of the used relations permit to discard all candidates categories but one). A user may add a link between categories she has not created, or use them in graphs, unless this leads to an inconsistency with an already entered statement (if the knowledge entering is done via commands in a file, the parsing of the file continues but no updates will be committed). A user may remove an element only if she has created it. For safety reasons, addition of links redundant with an already entered link are rejected. Addition of redundant or partially redundant graphs are rejected unless the redundancy comes from the user stating that she corrects another user's statement via one of the following inter-graph relations: pm#corrective_specialization, pm#overriding_specialization, pm#corrective_generalization or, if none of the previous ones apply, pm#correction. For example, assuming that oc#statement_on_bird_28 is the identifier for the statement "birds fly" (in FCG: [any bird, agent of: a flight]; in FE: `any bird is agent of a flight') created by the user oc, I can state via the following FCG that I believe that a corrective specialization of that statement is that "according to a study by Foo@bird.org, in 1999, 93% of birds in good health are able to fly": [oc#statement_on_bird_28, corrective_specialization: [ [ [93% of (bird, experiencer of: a good health), can be agent of: a flight], time: 1999], source: (a study, author: Foo@bird.org)] ]; //the creator and identifier of the graph are automatically added by WebKB-2 Corrective relations can only apply between assertions from different users. There is no need for them between definitions: a definition cannot be false, and whatever the concept the creator of a category has (implicitely) in mind, that category refers to that concept, and when another user is tempted to give another definition to that category, she actually thinks of another concept and hence should instead define another category (and link it to as much related categories as possible). If the creator of a category see that other users have misinterpreted it (e.g. when specializing it), she should add definitions/constraints to her category to avoid such mis-interpretations. To escape the inconsistencies that such new constraints are likely to bring to the KB, I see two stategies: - automatic resolution by "cloning": the system keeps the old version of the category (a generalization of the new version) and gives it a new generated identifier; actually, many generalizations and specializations of the category may have to be "cloned", so the general case is complex to implement, semantically sub-optimal and difficult for the users to understand (see www.webkb.org/doc/coopKBbuilding.html if you want to get an idea of what that approach leads to); hence, I have not adopted this strategy in WebKB-2; - the creator of the category does not (cannot) do modifications that lead to detected inconsistencies (although she can give it a more general name, more adequate to the specialization that other users have given it) but she specializes it (and if needed, generalizes other categories by it). This approach permits each user to enter or re-use as many categories as she wants, use the names or identifiers she wants (alternative identifiers may be introduced by connecting categories with identity links), filter out the categories or statements she does not want to see, state her beliefs while keeping the KB consistent (the FCG in the above example is not inconsistent with oc#statement_on_bird_28, but states that I believe that oc#statement_on_bird_28 is false) and keeps the knowledge elements as connected/comparable as possible and with as few redundancies as possible. Most importantly, the approach is asynchronous and the users do not have to agree, meet or even discuss with each other. For a particular application (e.g. an expert system) the categories that are not used may be filtered out, and a selection can be done on alternative statements (e.g. one strategy may be to select the most specialized statements, or to select statements according to their creator identifiers, types or even the graphs that describe those creators). To constrast with the more traditional approach, here is a quote from last week e-mail of John Sowa: > > One of the problems of a registery of ontologies (as in the Ontolingua > > server) compared to a multi-source ontology (as in WebKB) is that it > > is difficult for an ontology provider to relate the new categories (by > > subsumption/exclusion/identity/... links) to the categories of all > > other ontologies in the registry, and hence these ontologies are > > difficult to compare and re-use: each user must select various > > ontologies (and choose between competing ones) then complement and > > inter-relate their categories which is even more difficult than it > > would have been for the authors of the selected ontologies. > > I agree. We certainly need something more than just a registry as > in Ontolingua. What you have accomplished is what I was originally > proposing: a selection of modules, each of which was independent, but > each of which was related to the others by their mapping to a super- > hierarchy of categories that included all the categories from each of them. > ... > But the registry ideas should also be included: each module by itself > should be documented and annotated with all the information about its > history of development, contributors, testers, and especially all > significant applications. WebKB-2 supports the development and documentation of modules by permitting the storage of the knowledge of the modules into one or several Web files (possibly mixed with and/or indexing other document elements) and test them until they are considered "stable" (at which point, the instruction "no storage;" may be removed from the file and its knowledge representations will be committed into the KB; this is the procedure advocated by the documentation of WebKB-2: www.webkb.org/doc/w2doc.html). I now introduce the initial Multi-Source Ontology (MSO) of WebKB-2 (the one currently browsable and updatable at www.webkb.org). I consider all categories of all existing ontologies as having some value, and being either identical or complementary. However, for time-related pratical reasons, the MSO of WebKB-2 is currently only composed of an extension and correction of the noun-related part of WordNet plus various top-level ontologies (mainly, extensions of those of John Sowa in his 1984/2000 books, DOLCE, and various categories from other sources). Its goal/rationale is (like FCG) to support and ease the direct representation of English sentences. Whenever possible, I also avoided to break links from WordNet. Details about how I re-used and corrected WordNet, and merged its top-level ontology into Sowa's ontologies and DOLCE, are accessible from www.webkb.org/doc/papers/iccs03/ (each correction is documented and the ontology is avaible in FS and, for less recent versions, also in CGIF, RDF/DAML+OIL and WordNet formats). The top-level ontology is composed of about 150 primitive relation types (spatial/temporal/thematic/argumentative/... relations), about 30 high-level concept types needed for the signatures of these relations, and about 120 other top-level concept types (not including any WordNet type) that I thought interesting. Some general statements are associated to some high-level types (including WordNet ones) to describe prototypical relations from those types and their specializations. E.g. see www.webkb.org/kb/top/schemas.html They permit WebKB-2 to ease and normalize knowledge entering/querying by generating cascading content-oriented menus for any category selected by the user. However, many more "general statements" need to be entered to make this approach really helpful. Many extensions to that work would be interesting and I plan some of them for the short to medium term: integrating WordNet 2.0 in full, extending the parsing/display of other notations (RDF+OWL, KIF, CGIF), (beginning to) write a natural language parser for English. The release of WebKB-2 as open source may also be a short to medium goal. Interests from the SUO group would certainly guide or prioritize my goals: I offer to extend and develop WebKB-2 and its ontology as an ongoing project for the SUO. References: WebKB-2 general doc: www.webkb.org/doc/w2doc.html WebKB-2 full interface: www.webkb.org/webkbShared.html WebKB-2 example files: www.webkb.org/kb/ WebKB-2 languages: www.webkb.org/doc/languages/ and www.webkb.org/doc/papers/iccs02/ WebKB-2 ontology: www.webkb.org/doc/papers/iccs03/ WebKB-2 and the Semantic Web: www.webkb.org/doc/papers/wi02/ WebKB-1 and WebKB-2 home page: www.webkb.org Philippe MARTIN