Abstract:
This article presents various elements of solutions to encourage and permit people to provide, share, retrieve, update and annotate information in more precise, structured and normalized ways within a knowledge repository (which therefore is not a classic informal repository based on documents or supported by a database but is a knowledge base which can include or index informal elements and thus can complement classic information repositories). Those elements, which are implemented in the knowledge server WebKB-2, relate to syntaxes, querying and comparison mechanisms, cooperation protocols or approaches, and ontologies. The provided examples focus on the representation and comparison of tools, techniques and ideas related to knowledge engineering. The acceptation of our approach by non-technical people remains to be tested.
Keywords: ontology building, domain modelling, semantic modelling,
semantic webs, knowledge sharing/representation/retrieval/server
Bibliographical notes:
Dr Philippe Martin did his Ph.D. in Software Engineering at the INRIA research center of
Sophia Antipolis (France) and its postdoc in 1997 at the University of Adelaide
(Australia). He then worked at Griffith Uni's School of I.C.T. (Australia),
as Research Fellow funded by the DTSO until 2000 (period during which he designed
WebKB-1), as Senior Research Fellow employed by the DSTC until 2004 (period during
which he designed WebKB-2), and then as Senior Lecturer.
His main research interests are knowledge representation, sharing and retrieval.
Dr Michel Eboueya received his PhD in Computer Science from the University of Lille
(France). He is now Associate Professor at the L3I laboratory of the University of
La Rochelle (France) where he has worked since 1993. His current research interest
is the application of various technologies (Visualisation, Fuzzy Logic,
Knowledge Management and Agents) to Ubiquitous Learning.
Dr Lorna Uden
...
Dr Jun Jo received his PhD in Architecture from the University of Sydney in 1994.
He has taught and researched in a number of areas, including
Computer Simulation and Games, Artificial Intelligence, and Robotics.
He is currently the Director of the "Robotics and Games Laboratory" at Griffith
University. He is also organising IRO 2006 (8th International Robot Olympiad
2006) on the Gold Coast in December 2006.
Some information repository projects use or intend to use formal knowledge bases (KBs), e.g., the Open GALEN project which has created a KB of medical knowledge, the QED Project which aims to build a "formal KB of all important, established mathematical knowledge", and the Halo project which has for long term goal the design of a "Digital Aristotle". However, even when not aimed to support problem solving, designing formal KBs is inherently a difficult and time-consuming exercise that current generic KB systems (KBSs) still do not guide well. Hence, despite the benefits of a formal approach, it is very rarely used for creating information repositories and especially corporate memories.
Instead, information repositories are most often composed of informal documents that are independently created by people (typically by publishing a Web document or sending an email to a mailing list). This approach is simple but the well-known drawback is that it is then often difficult to retrieve or compare information because (i) the various needed pieces of information are scattered within many documents and expressed in different ways and at often inadequate levels of detail, and (ii) these pieces of information cannot be automatically organised into a semantic network by any current (or even currently foreseeable) natural language understanding (NLU) technique. The use of cooperatively edited informal documents (as in wikis) helps to reduce the scattering of information but introduces new problems and does not by itself lead to better or sufficient structuring of the information. Structured documents (e.g., documents following an XML schema), databases and application-oriented interfaces enforce some structure but (i) they also often restrict what can be entered even when they provide "free text" entry fields, and (ii) the semantic of the prescribed structure is most often left implicit and is insufficient to be used by KBSs. The manual or automatic use of lightweight metadata for indexing (parts of) documents (e.g., Dublin Core metadata, informal categories such as those of the Open Directory Project, general categories such as those of WordNet or specialized categories such as those from the KA2 project and some other Semantic Web projects) only permits document retrieval and does not lead to any browsable semantic network synthesizing and comparing facts or ideas. The occasional use of semantic relations (as in semantic wikis) is also insufficient for "knowledge retrieval" (i.e., precision-oriented information retrieval) or, more generally, automated reasoning. The use of poorly structured, graphical and overly permissive semi-formal notations such as those used for Concept Maps (or their ISO version, Topic maps) often lead to information that are more difficult to understand, retrieve and exploit than when regular informal sentences are used (commented examples about this are given in (Sowa, 2006)). Controlled languages , i.e., semi-formal languages that look like natural languages but have a restricted syntax or a restricted vocabulary, are often seen as good compromises between formal and informal languages; however, (i) they do not scale (i.e., in the general case, they are not expressive and formal enough or become too complex to use when extended), and (ii) there are often more structured, precise, normalised and readable ways to express knowledge (for example, partOf or generalization hierarchies).
The solution advocated in this article is to provide people various kinds of supports that encourage and permit them to be as precise or formal as they are willing to be, and to re-use, complement, annotate or correct each other's knowledge. One approach is to provide these supports via a knowledge server such as WebKB-2 (Martin, 2003a); this leads to cooperatively updated semi-formal KBs which should be more and more precise as the number of users grows. Another implementation approach would be to re-use a peer-to-peer network but this would be more complex to achieve and would offer no theoretical advantage except for privacy issues (the managers of a central repository can access any piece of information without the consent of its provider). The following sections present the kinds of supports we propose (they are are implemented or being implemented in WebKB-2). Section 2 illustrates various readable, expressive, normalising, formal and semi-formal notations. Section 3 discusses needed features of querying and comparing methods. Section 4 summarizes protocols to support cooperation between knowledge providers and also a method to value contributions and contributors. Section 5 shows an integration of lexical and top-level ontologies, and proposes a way to modularise inputs within information sources. This is a synthesis article which makes explicit and relates the ideas behind several of our works separately published in conference proceedings, and which also introduces recent refinements. WebKB-2 can complement classic information repositories and tools such as KPMG's K-World by offering additional or alternative ways to enter, index and retrieve some of their content. Although non-technical people may find a knowledge-oriented approach difficult to understand at first, it is not actually difficult and its adoption can be progressive since WebKB-2 allows to mix and connect formal and informal textual elements.
Research in knowledge representation has focused on reasoning and therefore not on creating notations that are both very readable and expressive, although Sowa (1984) had this in mind when creating the Conceptual Graph (CG) model, its linear form (CGLF) and display/graphical form (CGDF). His relative success is what still makes much of appeal of the CG formalism and approach (a related reason is that, being higher level than other notations, the CG notations lead to more "normalised" knowledge representations which therefore ease knowledge sharing and retrieval). It is however possible to go further in terms of "high-level-ness", i.e., readability (concision and/or intuitiveness) and normalising effect (restricting the number of automatically incomparable ways something can be represented). Martin (2002) did so by creating the Frame-CG (FCG), Formalized-English (FE) and For-Links (FL) notations. All have LALR(1) grammars. FL is discussed in the next section. FE looks like some pidgin English but is structurally equivalent to FCG which is an extremely concise notation that includes constructs for extended quantifiers, meta-statements, functions and various interpretations of sets (hence, it is semantically equivalent to KIF). FE is quite verbose and hence is not adequate for really building or browsing a reasonably complex KB but it can be shown to anyone. Hence, it can for example be used for showing the various interpretations that a NLU parser makes of a sentence expressed in a natural or controlled language and then let the user select the correct interpretation or precise the sentence.
The following example illustrates the representation of an English (E) sentence in FE, FCG, CGLF, PL (predicate logic), KIF, RDF (more precisely, RDF/XML) and LTM (the Linear form of Topic Maps). Since numerical quantifiers (here, "2" and "3") must be used, no current controlled language (apart from FE) can represent such a sentence correctly, and any RDF representation is ad-hoc (furthermore, no representation in N3 is possible). Except for the LTM representation (which cannot be made more precise), all these representations are formal if the used terms (e.g., "car") are formal, i.e., declared and possibly defined. However, the 's' at the end of "cars" and "sells" is a lexical facility offered by WebKB-2 for FCG and FE when a universal-like quantifier (e.g., "any", "2" and "3") is used: in such a case, WebKB-2 automatically removes the 's'. The selling act is represented via a concept (instead of a relation as in the LTM representation) to allow the quantification (with "2" here) and the use of the relation "time" (or more generally, any number of relations). Hence, to permit knowledge comparison, all representations of selling acts should use the concept type "sell". More generally, all actions should be represented via a concept type. FE and FCG lead to such a normalisation. Most languages (e.g., KIF, RDF and LTM) do not. As opposed to these languages, FE and FCG have many features such as the numerical quantifiers and time data-types illustrated below which, since the users do not have to define them, help knowledge sharing.
E: Ned sold (the same) three cars twice on the 21/1/2001. This sentence does not specify whether the cars have been sold individually, 2 by 2, or 3 by 3. This ambiguity is kept in the representations below. FE: 3 cars are object of 2 sells with agent Ned and time 21/1/2001. If there had been only 1 sell, "Ned sells 3 cars with time 21/1/2001" could have been used in FE as a shortcut for "Ned is agent of a sell with object 3 cars and time 21/1/2001". FCG: [3 cars, object of: (2 sells, agent: Ned, time: 21/1/2001)] CGLF: [Person: Ned]<-(agent)<-[Sell: {*}@2]- { <-(object)<-[Car: {*}@3 @certain]; <-(time)<-[Date: #21/1/2001]; } PL: ∃cars set(cars) ∧ size(cars,3) ∧ ∀c ∈ cars ∃sells set(sells) ∧ size(sells,2) ∧ ∀s ∈ sells agent(s,Ned) ∧ object(s,c) ∧ time(s,21/1/2001) KIF: (forAllN 3 ?c car (forAllN 2 ?s sell (and (agent ?s Ned) (object ?s ?c) (time ?s '21/1/2001)))) Our KIF definition for "forAllN": (defrelation forAllN (?num ?var ?type ?predicate) := (exists ((?s set)) (and (size ?s ?num) (truth ^(forall (,?var) (=> (member ,?var ,?s) (and (,?type ,?var) ,?predicate))))))) RDF: <kif:Set ID="cars"><size>3</size></kif:Set> <rdf:Description aboutEach="#cars"> <rdf:type resource="Car"/> <object><rdf:Description> <kif:Set ID="sells"><size>2</size></kif:Set> <rdf:Description aboutEach="#sell"> <agent resource="Ned"/> <time>21/1/2001</time> </rdf:Description> </rdf:Description></object> </rdf:Description> LTM: {Ned, sell, [[three cars twice]]} ~ NedSell3. {NedSell3, time, [[21/1/2001]]}.
We use the word "links" for referring to conceptual relations between individuals (e.g., statements, particular cities or persons) or non-quantified types. When knowledge representations only involve links, that is, when quantifiers and certain complex uses of sets and meta-statements need not be used, it is possible to use FL (For-Links) which is a simpler notation than FE or FCG although it is nearly as expressive as RDF + OWL-Full. One of the major advantages of FL is that it permits most of the knowledge related to an object to be represented via links from this object (and, for the major kinds of links, to avoid repeating them) instead of having to write or read separate statements. Thus, FL permits very concise and readable representations (the provided predefined links and syntactic sugar also bring a strong normalising factor). Examples are given below. FL has recently been extended to include all the necessary constructs for "structured discussions" (as illustrated below), which makes it a good (and often more expressive) alternative to the notations used in argumentation systems (e.g., AAA and gIBIS).
Several authors of hypertext systems or digital libraries have claimed that they did not use a more structured or knowledge-oriented approach because this would scare many potential users and thus prevent a wide adoption of their tools. For example, this claim has been made by Buckingham-Shum & al. (1999) to justify the lack of explicit relations between statements. Authors of argumentation systems such as AAA (Schuler and Smith, 1990) have used it to justify the restriction to a short list of predefined relation types and concept types instead of allowing people to use and update an ontology. However, none of these systems achieved wide adoption. This may well be attributed to the fact that the restrictions deeply limited what could be done with their tools, and hence their interest and applicability. The restrictions also led to biased representations and complex turnarounds. The need for user-defined typed hyperlinks for hypertext systems has long been shown and MacWeb (Nanard & al., 1993) is an example of a user-friendly and powerful knowledge-based private hypertext system. Similarly, it is a mistake to restrict the expressivity of a general knowledge representation language since choices about how to handle the completeness, decidability and efficiency issues, or how to handle elements such as sets and modalities, are application-dependant (e.g., for some knowledge retrieval or filtering purposes, efficient graph-matching procedures that ignore the detailed semantics of certain elements can be used, while for other purposes exploiting all the details is essential and tractability is not an issue).
E: Any human_body is a body and has at most 2 arms, 2 legs and 1 head. Any arm, leg and head belongs to at most 1 human body. Male_body and female_body are exclusive subtypes of human_body and so are juvenile_body and adult_body. FL: human_body < body, part: arm [0..1,0..2] leg [0..1,0..2] head [1,1], > {male_body female_body} {juvenile_body adult_body}; KIF: (forall ((?b human_body)) (body ?b)) (forall ((?b human_body)) (atMostN 2 '?a arm (part ?b '?a))) (forall ((?a arm)) (atMostN 1 '?b human_body (part '?b ?a))) (forall ((?b male_body)) (not (female_body ?b))) ...
In a knowledge repository, categories and statements come from multiple sources and it is necessary to record those sources to avoid lexical and semantic conflicts (details in Section 4) and allow knowledge filtering on the sources.
E: According to Jun Jo (who has for user id "jj"), a body (as understood in WordNet 2.0) has for part (as defined by "pm") at least 1 leg (as defined by "fg") and exactly 1 head (as understood by "oc"). FL: wn#body pm#part: fg#leg [0..1](jj) oc#head [1](jj); FL: wn#body pm#part: at least 1 fg#leg (jj) 1 oc#head (jj); FCG: [wn#body, pm#part: at least 1 fg#leg, pm#part: 1 oc#head](jj); KIF: (believer '(forall ((?b wn#body)) (atLeastN 1 '?l fg#leg (pm#part '?b ?l))) jj) (believer '(forall ((?b wn#body)) (exists1 '?h oc#head (pm#part '?b ?h))) jj)
Below is an excerpt from a "structured discussion" about the use of XML for knowledge representation, a topic which leads to recurrent debates on many knowledge related mailing lists. The parenthesis are used for two purposes: (i) allowing the direct representation of links from the destination of a link, and (ii) representing meta-information on a link, such as its creator (e.g., "pm" or "fg") or a link on this link (e.g., an objection by "pm" on the use of an objection link by "fg", without stating anything about the destination of this link). The content of the sentences and the indentation in the example below should permit the understanding of these two different uses. (Note that in this example the creators of the statements are left implicit but that prefixes such as "pm#" could be used exactly as in the previous example). The use of dashes to list "joint arguments/objections" (e.g., a rule and its premise) should also be self-explanatory. The use of specialization links between informal statements may seem odd but such links are used in several argumentation systems: they are essential for modularising purposes and for checking the updates of argumentation structures, and hence guiding or exploiting these updates (e.g., the (counter-)arguments for a statement also apply to its specializations and the (counter-)arguments of the specializations are (counter-)examples for their generalizations). The use of links on links to express meta-statements makes FL an adequate notation (i.e., simple, readable, generic) for representing argumentation structures (far more than FF or any other general notation). Few argumentation systems allow links on links (or, more generally, meta-statements; ArguMed is one of the exceptions) and hence most of these systems restrict knowledge entering or lead to incorrect representations of discussions (this is clearly true of systems restricted to Toulmin's argumentation structures, as for example noted by Newman & Marshall (1992)). Even fewer systems provide a textual notation that is not XML-based. Such a notation is nonetheless necessary whenever the use of an XML parser, editor or viewer is impossible or not desirable (this is for example the case in many text-based email editors, in text-based browsers, and in PDF or HTML documents). Argumentation structures such as the ones below cannot be expected to be the "direct" result of a discussion but they may be the result of a semi-automatic re-organization of discussions and then they may be refined by further semi-formal discussions.
//The statements do not systematically begin by a capital letter in order not to limit //their re-use; for example, if parts of these structures are directly re-used to //generate English sentences, the problem of converting (or not) the initial uppercase //into a lowercase does not have to be solved. "a KRL (Knowledge Representation Language) can have an XML notation" extended_specialization: "a KRL should have an XML notation" (pm), argument: ("the data model of a KRL can be stored into a tree-based structure" argument: - "a graph-based model can be stored into a tree-based structure" (pm) - "the data model of a KRL has to be graph-based" (pm) )(pm); "a KRL should (also) have an XML notation" specialization: "the Semantic Web KRL should have an XML notation" (pm), argument: "an XML notation permits a KRL to use URIs and Unicode" (fg, objection: ("most syntaxes can easily be adapted to have object identifiers using URIs and Unicode" argument: "this was noted by Berners-Lee" (pm) )(pm)), argument: "XML can be used for knowledge exchange or storage" (fg, objection: "XML is useless or detrimental for knowledge representation, exchange or storage" (pm)), argument: "a KRL may have various notations in addition to an XML-based notation" (pm, objection: "the more notations there are the less one of them is going to be commonly adopted for knowledge exchange" (pm)), argument: "not using XML for a notation implies that a plug-in has to be installed for each syntax" (pm, objection: "XML tools need to be complemented for the semantics of the knowledge representation to be handled" (pm), objection: "installing a plug-in is likely to take less time than always loading XML files" (Sowa)); "the data model of a KRL has to be graph-based" argument: "this is acknowledged by about everyone" (pm), argument: "this is acknowledged by the W3C" (pm); //the keyword this is a shortcut accepted/used by the server to ease //knowledge entering/presentation; the statements are stored without such a shortcut "XML can be used for knowledge exchange or storage" argument: - "an XML notation permits classic XML tools (parsers, XSLT, ...) to be re-used" (pm) - "classic XML tools are usable even if a graph-based model is used" (pm); "classic XML tools are usable even if a graph-based model is used" specialization: "classic XML tools work on RDF/XML" (pm); "XML is useless or detrimental for knowledge representation, exchange or storage" argument: ("using XML tools for KBSs is a useless additional task" argument: "KBSs do not use XML internally" (pm, objection: "XML can be used for knowledge exchange or storage" (fg, objection: "it is as easy to use other formats for knowledge exchange or storage" (pm), objection: "a KBS (also) have to use other formats for knowledge exchange or storage" (pm))) )(pm), argument: "XML is not a good format for knowledge exchange or storage" (pm); "XML is not a good format for knowledge exchange or storage" argument: - ("XML-based knowledge representations are hard to understand" argument: "this is acknowledged by about everyone" (pm), argument: "this is acknowledged by the W3C" (pm) )(pm) - "a knowledge interchange format should be easy to read and understand" (pm) with a simple editor, by trained people" (pm);
To conclude, for readability reasons and to support various kinds of knowledge entering or views on the knowledge, various formal and semi-formal notations should be supported but precision and normalisation should be encouraged. The above examples (and the implementation of FCG, FE and FL in WebKB-2) show that doing this while keeping the notations readable is possible.
From any object (category or statement) of the knowledge repository, a user should be able to see and browse the directly related objects and also the hierarchies of indirectly related objects via user-selected transitive links (e.g., specialization and partOf links). This remains true when the objects are presented as part of a list of results to a query. Yet, most interfaces force their users to browse in order to see the direct or indirect links from an object. In a realistic KB this makes the understanding and manual retrieval or comparison of information extremely difficult (except may be for users who have a huge short term memory). This is why, although graphical views clearly have some interest, the use of textual notations such as FL and FCG are often more interesting since they permit to provide much more detail for the same amount of space. (An hopefully convincing experience for that idea is to search the various meanings of a word in WebKB-2, and then browse, for finding a concept identifier to use in a representation). Textual notations also ease manual and automatic re-use (e.g., copy-pasting, updating, parsing). Thus, ideally, both graphical and textual views should be available.
The specialization links between object classes or statements in a KB can be
manually inserted or can be re-calculated after each insertion using an inference engine.
Strictly speaking, a specialization link from an object A to an object B, means
that B logically implies A. For example, a polynomial
graph-matching operation (Chein & Mugnier, 1997) permits to determine that
[Ned, agent of: (a sell, object: a car)]
specializes [a vehicle, object of: a sell]
and hence permits this last graph to be used as a query graph to retrieve the first.
In WebKB-2, the operator "spec" can be used as in the command
spec [a person, agent of: a sell]
,
and a similar operator can be used for retrieving
"extended specializations" (extended because there is not always a logical
implication between the retrieved graph and the query graph); for example,
extended specializations of the last query graph are
[3 cars, object of: (2 sells, agent: Ned, time: 21/1/2001)]
and
[John, believer of: not [Ned, agent of: a sell]]
(this last graph is
not a strict specialization but is definitely a "relevant" information). Similarly,
graph-matching operators named "comp" and "?" are respectively usable for retrieving
"comparable" graphs (graphs that either specialize or generalize the query graph) and
"extended comparable" graphs (graphs that are only composed of parts comparable
to parts in the query graph). These operators only retrieve
"relevant" graphs, as opposed to graph-matching operations that also consider
"sibling" categories or "cousin" categories as "matching" categories.
This is why we believe that the links related to these last operators (e.g., the
"extended specialization" link) should also be presented whenever they are associated
to an object.
Retrieving a graph may also involve using different statements that cannot be
joined/merged into one, for example because one of them has meta-statements associated
to it.
In such a case, the retrieved graph is the set of all the required separate statements.
This case often happens when path retrieval is done, that is, when the query graph
involves regular expressions, as in
spec [a person, agent of: (a research, (relation: a thing)+ location: Brisbane)]
(assuming that relation
and thing
are respectively the
uppermost relation type and uppermost concept type, (relation: a thing)+
refers to any non-empty sequence of relation node followed by a concept node).
This example shows that FCG allows paths and path matching to be expressed in a
simple way, and thus allows to use a very powerful knowledge retrieval mechanism.
When a list of results to a query is long, it should be structured into smaller lists, for example into specialization/partOf hierarchies. If this does not provide enough structure, additional schemes should be used: for example, the results can be grouped according to common characteristics; this is a categorization task which in the general case may be difficult to solve optimally and efficiently but important concepts (such as "process" and "physical object", or at a more detailed level, "person", "civil status", "recreational activity", etc.) provide cues for natural groupings. The ontology of WebKB-2 includes this kind of information.
Although precision should be pursued, informal documents will still be created. Allowing the embedding of commands or formal statements within informal documents, whether these statements are hidden or not, and then letting knowledge servers use these documents as inputs, has several advantages: (i) the informal elements (paragraphs, images, etc.) that document the formal representations or are indexed by them do not have to be stored in separate files: these formal and informal elements can be intertwined and hyperlinked to each other; (ii) when parsing the document, the knowledge server can simply return the command results or can also copy back the informal parts that are around, thus creating a "virtual document" (this is especially interesting when calls to the server can be associated to hyperlinks with predefined commands given as parameters). Furthermore, those returned results can be formal representations or, if the user prefer, the informal elements indexed by these representations. Those ideas were explored with WebKB-1 (Martin & Eklund, 2000) and most of them are also implemented in WebKB-2.
For representing certain comparisons of objects, such as the comparison of the features of certain techniques or tools, it is useful to use tables as format. Such tables can be formal or semi-formal and can be used as input or outputs. Fact Guru (Skuce & Lethbridge, 1995) is one of the rare knowledge server that generates comparison tables. More precisely, it permits the comparison of two objects by generating a table with the object identifiers as column headers, the identifiers of all their attributes as row headers, and for each cell either a mark to signal that the attribute does not exist for this object or a description of the destination object. The common generalizations of the two objects (possibly one of them) is also given. However, its approach is not scalable since the list of features/relations from the compared objects is not structured and the cells can contain a description of the destinations of the relations. A more scalable approach (Martin, 2005) is to organize the features of the compared objects into a specialization hierarchy and to use the cells only for indicating if each compared object has or has not (or will have and when) each feature. Below is an example of table generation query, followed by its result and then by the FL and FCG statements used for generating the result. In the cells, '+' means "yes" (the tool has the feature), '-' means "no", and '.' means that the information has not been represented. The prefixes for the relations are left implicit because this leads to no ambiguity (WebKB-2 can find the correct relations). As an intermediary work for comparing the criteria of tools workings with CGs, in the previous "CG tools" page on Wikipedia we have compared seven well-known tools in the CG community and compared them according to 160 criteria grouped into 6 sections and tables. These tables are informal but can easily be updated by the tool creators.
compare pm#WebKB-2 km#Ontolingua on (support of: a is#IR_task, output_language: a km#KR_notation, part: a is#user_interface), maxdepth 5
WebKB-2 Ontolingua support of: is#IR_task + + is#lexical_search + + is#regular_expression_based_search + . km#knowledge_retrieval_task + . km#specialization_structural_retrieval + . (kind: {km#complete_inferencing, km#consistent_inferencing}, input: (a km#query, expressivity: km#PCEF_logic), object: (several statement, expressivity: km#PCEF_logic)) + . km#generalization_structural_retrieval + . output_language: km#KR_notation + + (expressivity: km#FOL) + + km#FCG + . km#KIF . + km#XML-based notation + . km#RDF + - part: is#user_interface + + is#HTML_based_interface + + is#CGI-accessible_command_interface + . is#OKBC_interface . . is#API + . is#graph_visualization_interface - - km#CG_related_tool < km#language/structure_specific_tool, > km#CG-based_KBMS km#CG_graphical_editor km#NL_parser_with_CG_output; km#CG-based_KBMS < km#KBMS, > {km#CGWorld km#PROLOG\+CG km#CoGITaNT km#Notio km#WebKB}; km#WebKB;; > {km#WebKB km#WebKB-2}, url: http://www.webkb.org; km#input_language (*x,*y) = [*x, may be support of: (a km#parsing, input: (a statement, formalism: *y))]; [any pm#WebKB-2, part: (a is#user_interface, part: {a is#API, a is#HTML_based_interface, a is#CGI-accessible_command_interface, no is#graph_visualization_interface}), part: {a is#FastDB, a km#default_MSO_of_WebKB-2}, input_language: a km#FCG, output_language: {a km#FCG, a km#RDF}, support of: a is#regular_expression_based_search, support of: a km#specialization_structural_retrieval, support of: a km#generalization_structural_retrieval, support of: (a km#specialization_structural_retrieval, kind: {km#complete_inferencing, km#consistent_inferencing}, input: (a km#query, expressivity: km#PCEF_logic), object: (several km#statement, expressivity: km#PCEF_logic) )]; //"PCEF": positive conjunctive existential formula [any km#Ontolingua, part: {a is#HTML_based_interface, no is#graph_visualization_interface}, input_language: a km#KIF, output_language: a km#KIF, part: {a km#ontolingua_library, no DBMS}, support of: a is#lexical_search];
In this article, we only consider asynchronous cooperation since it both underlies and is more scalable than exchanges of information between co-temporal users of a system.
The most decentralized knowledge sharing strategy is the one the W3C envisages for the "Semantic Web": many very small KBs/ontologies, more or less independently developed and thus partially redundant, competing and very loosely interconnected. There now are many tools to align concepts from different ontologies; these tools are necessarily far from perfect although they can be sufficient for certain applications; Euzenat & al. (2005) give an evaluation. Thus, despite these tools, the above cited small ontologies have problems similar to those we listed for documents: (i) finding the relevant ontologies, choosing between them and combining them require commonsense (and hence is difficult and sub-optimal even for a knowledge engineer, let alone for a machine), (ii) a knowledge provider cannot simply add one concept or statement "at the right place" and is not guided by a large ontology into providing precise objects that complement existing objects and are more easily re-used, and (iii) the result is more or less lost to others and increases the amount of "data" to search.
A more knowledge-oriented strategy is to have a knowledge server permitting registered users to access and update a single large ontology on a domain and upload files that mix natural language sentences with knowledge representations. We know of only two knowledge servers having special protocols to support cooperation between users: Co4 (Euzenat, 1996) and WebKB-2. (Note: most servers support concurrency control and many servers support users' permissions on files/KBs; however, cooperation support is not so basic: it is about helping knowledge re-use, preventing most conflicts and solving those detected by the system or users). The approach of Co4 is based on peer reviewing; the result is a hierarchy of KBs, the uppermost ones containing the most consensual knowledge while the lowermost ones are the private KBs of the contributing users. We believe the approach of WebKB-2, which is based on a KB shared by all its users, leads to more relations between categories (types or individuals) or statements from the different users and may be easier to handle (by the system and the users) for a large amount of knowledge and large number of users. Details can be found in Martin (2003a) but the next paragraph summarizes its principles.
Each category identifier is prefixed by a short identifier for the category
creator (who is also represented by a category and thus may have
associated statements). Each statement also has an associated creator and
hence, if it is not a definition, may be considered as a belief.
Any object (category or statement) may be re-used by any user within his/her
statements.
The removal of an object can only be done by its creator but a user may
"correct" a belief by connecting it to another belief via a
"corrective relation" (e.g., pm#corrective_restriction
).
(Definitions cannot be corrected since they cannot be false; for example, a
user such as "fg" is perfectly entitled to define fg#cat
as a subtype of
wn#chair
; there is no inconsistency as long as the ways
fg#cat
is further defined or used respect the constraints associated
to wn#chair
).
If entering a new belief introduces a redundancy or an inconsistency that is
detected by the system, it is rejected. The user may then either correct his/her
belief or re-enter it again but connected by a "corrective relation" to each
belief it is redundant or inconsistent with: this allows and makes explicit
the disagreement of one user with (her interpretation of) the belief of another
user. This also technically removes the cause of the problem: a proposition A
may be inconsistent with a proposition B but a belief that
"A is a correction of B" is not technically inconsistent with a belief in B.
(Definitions from different users cannot be inconsistent with each other,
they simply define different categories/meanings;
a system of "category cloning" could be used to handle this situation
automatically but the resulting ontology would be much more complex than
via the manual handling of the situation by each category creator that is
occasionally faced to it; hence, such a system has not been implemented in
WebKB-2).
Choices between beliefs may have to be made by people re-using the KB for an
application, but then they can exploit the explicit relations between beliefs,
for example by always selecting the most specialized ones. The query engine of WebKB-2
always returns a statement with its meta-statements, hence with the
associated corrective relations. Finally, in order to avoid seeing the objects
of certain creators during browsing or within query results, a user may set
filters on these creators, based on their identifiers, types or descriptions.
For the construction of knowledge repositories, an interesting aspect of
this approach to encourage re-use, precision and object connectivity is that it
also works for semi-formal KBs. Here, regarding a statement, the adjective
"semi-formal" allows the statements to use informal terms and even to be written
in a natural language but entails that each statement must at least be
related to another statement by a formal relation,
for example a generalization relation (pm#corrective_generalization
,
pm#summary
, etc.) or an argumentation relation.
Thus, to minimize redundancies and to help information retrieval within
information repositories, this minimal semantic structure (which in many cases
is the only one bearable by many persons) can be used to organize ideas that
are otherwise repeated in many documents. For instance, for a Web site that
centralizes and organizes/represents in a formal, semi-formal and
informal way resources (tools, techniques, publications, mailing list, teams,
etc.) related to a domain, it would be very interesting to have some space where
discussions could be conducted in this minimal semi-formal way, and hence
index or partly replace the mailing list: this would permit to avoid recurring
discussions or presentations of arguments, show the tree of arguments and
counter-arguments for an idea, permit incremental additions, encourage deeper
or more systematic explorations of each idea, and record the various reached
status-quos. Finally, it is fairly possible that structured discussions in
different KBs or documents can be automatically aligned or merged in more
successful ways than it is possible to align categories from different ontologies.
The above described knowledge sharing mechanism of WebKB-2 records and exploits annotations by individual users on statements but does not record and exploit any measure of the "usefulness" of each statement, a value representing its "global interest", acceptation, popularity, originality, etc. Yet, this seems interesting for a knowledge repository and especially for semi-formal discussions: statements that are obvious, un-argued, or for which each argument has been counter-argued, should be marked as such (e.g. via darker colors or smaller fonts) in order to make them less visible (or invisible, depending on the selected display options) and discourage the entering of such statements. More generally, the presentation of the combined efforts from the various contributors may then take into account the usefulness of each statement. Furthermore, given that the creator of each statement is recorded, (i) a value of usefulness may also be calculated for each creator (and displayed), and (ii) in return, this value may be taken into account to calculate the usefulness of the creator's contributions; these are two additional refinements to both detect and encourage argued and interesting contributions, and hence regulate them.
Ideally, the system would accept user-defined measures of usefulness for a statement or a creator, and adapt its display of the repository accordingly. Below, we present the default measure implemented in WebKB-2. We may try to support user-defined measures but since each step of the user's browsing would imply dynamically re-calculating the usefulness of all statements (except those from WordNet) and all creators, the result may be very slow. For now, we only consider beliefs: we have not yet defined the usefulness of a definition.
To calculate the usefulness of a belief, we first associate two more basic attributes to the belief: its "state of confirmation" and its "global interest".
Our formula for a user's usefulness is:
sum of the usefulness of the beliefs from the user +
square root (number of times the user voted on the interest of beliefs)
.
The second part of this equation acknowledges the participation of the
user in votes while decreasing its weight as the number of votes increases.
(Functions decreasing more rapidly than square root
may perhaps
better balance originality and participation effort).
These measures are simple but should incite the users to be careful and precise in their contributions (affirmation, arguments, counter-arguments, etc.) and give arguments for them: unlike in traditional discussions or anonymous reviews, careless statements here penalise their authors. Thus, this should lead users not to make statements outside their domain of expertise or without verifying their facts. (Using a different pseudo when providing low quality statements does not seem to be an helpful strategy to escape the above approach since this reduces the number of authored statements for the first pseudo). On the other hand, the above measures should hopefully not lead "correct but outside-the-main-stream contributions" to be under-rated since counter-arguments must be justified. Finally, when a belief is counter-argued, the usefulness of its author decreases, and hence he/she is incited to deepen the discussion or remove the faulty belief.
In his description of a "Digital Aristotle", Hillis (2004) describes a "Knowledge Web" to which researchers could add ideas or explanations of ideas "at the right place", and suggests that this Knowledge Web could and should "include the mechanisms for credit assignment, usage tracking, and annotation that the [current] Web lacks", thus supporting a much better re-use and evaluation of the work of a researcher than via the current system of article publishing and reviewing. However, Hillis does not give any indication on such mechanisms. Although the mechanisms we proposed in this sub-section and the previous one were intended for one knowledge repository/server, they seem usable for the Knowledge Web too. To complement the approach with respect to the Knowledge Web, the next sub-section proposes a strategy to achieve knowledge sharing between knowledge servers.
Again, an alternative (or, in the long term, complementary) approach is the one of Co4 which, via its hierarchy of KBs generated by peer-reviewing of statements from the users' private KBs, supports knowledge sharing and makes explicit various consensuses. However, assuming there are N statements shared by the users of Co4, in the worst case, we assume that there could be 2N possible KBs if the protocols accept all groupings. Even though this is surely not the case, which KBs should a person look at for finding relations between statements or evaluating the usefulness of a statement/author? Furthermore, the uppermost KBs only represent consensus, not usefulness.
Although independently developed, our approach appears to be an extension of the version of SYNVIEW (Lowe, 1985). In this hypertext system, statements had to be connected by (predefined or user-invented) relations and each statement was valuated by users (this value and another one calculated from the value of arguments and counter-arguments for the statement was simply displayed near the statement as to "summarize the strengths assigned to the various items of evidence within the given contexts"). In 1986, to ease information entering and thus hopefully permit the collaborative work of a small community to create an information repository large enough to interest other people and lead them to participate and store information too, the authors of SYNVIEW removed the constraint of using explicit relations between statements (the statements must be organized hierarchically but the relations linking them are unknown) and replaced the possibility of grading each statement by the possibility of ranking them within the list of (sibling) statements having a same direct super-statement. As noted above, a similar move away from structured representations was made by Buckingham-Shum & al. (1999) for the same reason and the idea of making the approach more "scalable". Although such a move clearly makes information entering easier, in our viewpoint it makes the system far less likely to scalable because the information is far less retrievable and exploitable, and hence of interest for people to search or complement. Such moves have apparently failed to attract more interest than the original more structured approaches. Since unstructured approaches have strong inherent limitations, we are opting for a move towards improving the entering and sharing of structured forms.
Despite the fact that the above described supports for cooperation rely on some centralizing mechanisms, it should be understandable that they could be supported by a peer-to-peer network (although they are easier to implement within a knowledge server). However, these supports do not solve the problem caused by the fact that one piece of information can be of interest in many domains and the fact that one knowledge server (or peer-to-peer network) clearly cannot support the knowledge sharing of all Web users; this problem is "which knowledge server should a person choose to query or update?". A server has to be specialized or to act as a broker for more specialized servers. If competing servers had an equivalent content (today, Web search engines already have "similar" content), a Web user could query or update any general server and, if necessary, be redirected to use a more specialized server, and so on recursively (at each level, only one of the competing servers has to be tried since they mirror each other). If a Web user directly tried a specialized server, it could redirect him/her to use a more appropriate server or indicate which other servers may provide more information for his/her query (or directly forward this query to these other servers).
To permit this, our idea is that each server periodically checks
related servers (more general servers, competing servers
and slightly more specialized servers) and
1) integrates (and hence mirrors) all the objects (categories and
statements) generalizing the objects in a reference collection that
it uses to define its "domain" (if this is a general server, this collection
is reduced to pm#thing
, the uppermost concept type),
2) integrates either all the objects that are more specialized than the
objects in the reference collection, or if a certain depth of specialization
is fixed, associates to its most specialized objects the URLs of the servers
that can provide specializations for these objects (note: classifying servers
according to fields/domains is far too coarse to index/retrieve knowledge
from distributed knowledge servers, e.g. knowledge about "neurons" or "hands"
can be relevant to many domains; thus, a classification by objects is
necessary), and
3) also associates the URLs of more general servers to the direct
specializations of the generalizations of the objects in the reference
collection (this is needed since the specializations of some of these
specializations do not generalize nor specialize the objects in the
reference collection).
Integrating knowledge from other servers is certainly not obvious but this is a more scalable and exploitable approach than letting people and machines select and re-use or integrate dozens or hundreds of (semi-)independently designed small ontologies. A more fundamental obstacle to the widespread use of this approach is that many industry-related servers are likely to make it difficult or illegal to mirror their KBs. However, other approaches will likely suffer from that too.
Instead of replicating knowledge between servers on the Web, the above approach could be used as a knowledge replication mechanism between the machines of a particular peer-to-peer network or semantic grid, with the assumption that each machine has its own knowledge server (probably used by only one person, the owner of the machine). Knowledge sharing is simpler in this case since knowledge replication mechanisms can be systematically performed according to the goals and particularities of that grid, and the above replication mechanism can be seen as a backbone for integrating into each user's KB the updates made in other user's KBs on objects shared by these KBs (more precisely, the updates/additions of statements which for inference purposes are important for these shared objects).
The users of a knowledge repository cannot be asked to update the shared ontology for declaring and defining most of the terms they use. A "lexical ontology" for English or another natural language should at least be provided to ease, check and guide knowledge entering and permit knowledge sharing and retrieval. A lexical ontology connects the words of a language to the categories representing the main meaning of these words, and connects these categories via conceptual links: unlike in foundational ontologies, no more complex definitions are generally provided; however, for important categories such as the top-level ones and the often used ones, "schemas" representing the relations that are commonly used from instances of these important categories, should be provided (for example as a resource for NLU or for the generation of combinable menus that guide and normalise knowledge representation). (Martin, 2003b) describes how such a lexical ontology was created by adapting and correcting WordNet 1.7 and also by extending it with several top-level ontologies and domain-related ontologies. However, many more corrections and additions and additions of links, schemas and ontologies deserve to be done.
A knowledge repository also requires an initial domain ontology for inciting people to enter knowledge and guiding this entering. Our experiment in building a repository about CG related tools led us to partially model other related domains, and we soon had the problem of modularising the information into several files to support readability, searches, checking and systematic input. In order to be generic, we created six files: Fields of study, Systems of logic, Information Sciences, Knowledge Management, Conceptual Graph and Formal Concept Analysis. The last three files specialize the others. Each of the last four files is divided into sections, the uppermost ones being "Domains and Theories", "Tasks and Methodologies", "Structures and Languages", "Tools", "Journals, Conferences, Publishers and Mailing Lists", "Articles, Books and other Documents" and "People: Researchers, Specialists, Teams/Projects, ...". This is a work in progress: the content and number of files will increase but the sections seem stable. Here are examples of the content for the first three sections.
// ========================= Domains and Theories ========================= wn#computer_science__computational_science //in FL, "__" separates alternative names (^branch of engineering science that studies computable processes and structures^) subdomain: wn#artificial_intelligence, subdomain: is#software_engineering_science (is), subdomain: is#database_management_science (is), subdomain of: wn#engineering_science__engineering__applied_science__technology, part: wn#information_theory, part of: wn#information_science; // ========================= Tasks and Methodologies ========================= km#KM_task__knowledge_management__KM (^a K.M. (sub)task^) < is#information_sciences_task, > km#knowledge_representation km#knowledge_extraction_and_modelling km#knowledge_comparison km#knowledge_retrieval_task km#knowledge_creation km#classification km#KB_sharing_management km#mapping/merging/federation_of_KBs km#knowledge_translation km#knowledge_validation {km#monotonic_reasoning km#non_monotonic_reasoning} {km#consistent_inferencing km#inconsistent_inferencing} {km#complete_inferencing km#incomplete_inferencing} {km#structure-only_based_inferencing km#rule_based_inferencing} km#teaching_a_KM_related_subject km#language/structure_specific_task km#KM_methodology_task, object of: km#knowledge_management_science, object: km#KM_structure; //between types, the default cardinality is 0..N //The general relation "object" has different (more specialized) meanings depending on // the connected categories: in the last relation, the meaning is "task object" // (object worked on or generated by the task) not "domain object". km#knowledge_retrieval_task < is#IR_task, > {km#specialization_retrieval km#generalization_retrieval} km#analogy_retrieval km#structure_only_based_retrieval {km#complete_knowledge_retrieval km#incomplete_knowledge_retrieval} {km#consistent_knowledge_retrieval km#inconsistent_knowledge_retrieval}; // ========================= Structures and Languages ========================= km#KM_structure < is#symbolic_structure, > {km#base_of_facts/beliefs km#ontology km#KB_category km#statement} km#KB km#KA_model km#KR_language km#language_specific_structure; km#KB__knowledge_base part: km#ontology km#base_of_facts/beliefs; km#ontology__set_of_category_definitions/constraints > km#lexical_ontology km#language_ontology km#domain_ontology km#top_level_ontology km#concept_ontology km#relation_ontology km#multi_source_ontology, part: 1..* km#KB_category 1..* km#category_definition; km#KR_language__KRL__KR_model_or_notation > {km#KR_model/structure km#KR_notation} //not km#semantics: not a structure km#predicate_logic_oriented_language km#frame_oriented_language km#graph_oriented_language km#KR_language_with_query_commands km#KR_language_with_scripting_capabilities, attribute: km#semantics; km#CG_structure < km#language_specific_structure, > km#CG_statement km#CG_language km#CG_ontology;
Despite the existence of the Web, publishing information still essentially implies writing documents and hence introducing related information and choosing an order and a level of detail to present this information. This linearization exercise is time-consuming and this can make information publishing/retrieval a frustrating experience since the vast majority of what has to be written/read is not new from the viewpoint of the writer/reader. Furthermore, all these redundancies make information hard to find, structure and compare, whether manually or automatically. Yet, almost all researches in information retrieval/sharing have avoided using knowledge-oriented approaches because (i) these approaches are more difficult to implement, (ii) they require research on various fronts, and (iii) making semantic relationships explicit is not an exercise that people are (nowadays) accustomed to. In this article, we have described various elements of solutions for pushing those technical and psychological boundaries toward more formal, structured and normalised knowledge entering or retrieval. These elements relate to syntaxes, querying and comparison mechanisms, cooperation protocols or approaches, and ontologies. By continuing to experiment in the creation and support of knowledge repositories, we shall refine those elements and, hopefully, make such repositories popular. Of course, we do not believe that informal documents will or should disappear but we believe that structured discussions and, more generally, a normalised use of semantic relations, can become popular and enhance information retrieval. Our current main experiment relates to the representation and comparison of tools, techniques and ideas related to knowledge engineering. We shall also very soon experiment with the use of such repositories for e-learning purposes.
S. Buckingham-Shum, E. Motta and J. Domingue (1999). Representing Scholarly Claims in Internet Digital Libraries: A Knowledge Modelling Approach. Proceedings of ECDL 1999 (pp. 423-442), 3rd European Conf. Research and Advanced Technology for Digital Libraries, Paris, France, September 1999.
M. Chein and M.L. Mugnier (1997). Positive Nested Conceptual Graphs. Proceedings of ICCS 1997 (Springer Verlag, LNAI 1257, pp. 95-109), Seattle, USA, August 4-8, 1997.
J. Euzenat (1996). Corporate memory through cooperative creation of knowledge bases and hyper-documents. Proceedings of 10th KAW, (36)1-18, Banff, Canada, November 1996.
J. Euzenat, H. Stuckenschmidt and M. Yatskevich (2005). Introduction to the Ontology Alignment Evaluation 2005 Proceedings of K-Cap 2005 (pp. 61-71), workshop on Integrating ontology, Banff, Canada, 2005.
W.D. Hillis (2004). "Aristotle" (The Knowledge Web). Edge Foundation, Inc., No 138, May 6, 2004.
D. Lowe (1985). Co-operative Structuring of Information: The Representation of reasoning and debate. International Journal of Man-Machine Studies, Volume 23, Number 2, pp. 97-111, August 1985.
Ph. Martin and P. Eklund (2000). Knowledge Indexation and Retrieval and the Word Wide Web. IEEE Intelligent Systems, special issue "Knowledge Management and Knowledge Distribution over the Internet", pp. 18-25, May/June 2000.
Ph. Martin (2002). Knowledge representation in CGLF, CGIF, KIF, Frame-CG and Formalized-English. Proceedings of ICCS 2002, 10th International Conference on Conceptual Structures (Springer Verlag, LNAI 2393, pp. 77-91), Borovets, Bulgaria, July 15-19, 2002.
Ph. Martin (2003a). Knowledge Representation, Sharing and Retrieval on the Web. Chapter of a book titled "Web Intelligence", (Eds.: N. Zhong, J. Liu, Y. Yao; Springer-Verlag, pp. 263-297), January 2003.
Ph. Martin (2003b). Correction and Extension of WordNet 1.7. Proceedings of ICCS 2003 (Springer Verlag, LNAI 2746, pp. 160-173), Dresden, Germany, July 2003.
Ph. Martin, M. Blumenstein and P. Deer (2005). Toward cooperatively-built knowledge repositories. Proceedings of ICCS 2005, 13th International Conference on Conceptual Structures, (Springer Verlag, LNAI 3596, pp. 411-424), Kassel, Germany, July 18-22, 2005.
W. Schuler and J.B. Smith (1990). Author's Argumentation Assistant (AAA): A Hypertext-Based Authoring Tool for Argumentative Texts. Proceedings of ECHT'90 (Cambridge University Press, pp. 137-151), INRIA, France, Nov. 1990.
D. Skuce and T.C. Lethbridge (1995).
CODE4: A Unified System for Managing Conceptual Knowledge.
International Journal of Human-Computer Studies, 42, pp. 413-451.
Fact Guru is the commercial version of CODE4.
J. Nanard, M. Nanard, A. Massotte, A. Djemaa, A. Joubert, H. Betaille and J. Chauché (1993). Integrating Knowledge-based Hypertext and Database for Task-oriented Access to Documents. Proceedings of DEXA 1993 (Springer Verlag, LNCS Vol. 720, pp. 721-732), Prague, 1993.
S. Newman and C. Marshall (1992). Pushing Toulmin Too Far: Learning From an Argument Representation Scheme. Technical Report SSL-92-45, Xerox Palo Alto Research Center, Palo Alto, CA, 1992.
J.F. Sowa (1984). Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, Reading, MA, 1984.
J.F. Sowa (2006). Concept Mapping. Web document. http://www.jfsowa.com/talks/cmapping.pdf