The query&representation languages of WebKB

From conceptual graphs to structured text,
and knowledge base handling to document handling

Abstract.

The languages of the WebKB set of tools are "commands" or "instructions" which may be structured in four categories:
- commands for representing and querying knowledge at different levels of precision;
- Unix-like text processing commands - such as cat, grep, awk and diff - but working on Web-accessible information sources (documents, database elements, etc.);
- control structures - such as if, for, set, pipe - for combining other commands;
- commands for executing commands stored in Web-accessible documents.

These commands may be executed by the WebKB processor which is Web-accessible via the Common Gateway Interface. Thus, the commands may be used from the HTML forms of the WebKB query interfaces or from any software using the CGI protocol.

You may want to use the following interfaces when reading the presentation below. The possible commands are also listed are shortly commented in these interfaces.
- the knowledge-based information retrieval/handling tool;
- the classic information retrieval/handling tool.

Keywords.

Knowledge representation language, Conceptual Graphs, Controlled language, Query language

Table of contents

Document loading&execution commands
Knowledge handling commands
Text processing commands
Control structures
References

1 Document loading&execution commands

A command or a group of commands stored in a document, e.g. an HTML document, needs to be enclosed inside the marks $( and )$ or the HTML marks <KR language="CG"> and </KR> in order to be separated from the other elements of the document. If these marks are enclosed inside the HTML comment marks , the commands will not be displayed by a Web browser. In any case, a document may be peppered with such groups of commands; it is thus possible to mix knowledge and its documentation (cf. the KADS documentation for an example). C-like comments (/* ... */) or C++ line comments (//...) may be used around or inside commands.

Directed by the commands run or load, the WebKB processor may retrieve a Web-accessible document and executes the commands it includes. With run, the results of each command are preceded by the text of the command. Here is the grammar of these commands. + means at least one occurence.

Display and then execute each command in the document: "run" Document Argument+
Just execute the commands in the document: "load" Document Argument+

Documents are referred by URLs. Arguments are single/double quoted strings, numbers, identifiers (cf. the definition of Ident in section 2.1.2), comparison operators, or otherwise, any single character. Inside the document, an argument in referred like in a shell script, i.e. by a variable of the form $n, n being the position number of the argument in the parameters. The script whatIs.html illustrates this.

Another command, named call, encodes and appends the arguments to the document URL (the Get protocol is followed), then returns the information at this address. This is useful to use a Web-accessible information source, e.g. a CGI server and a database, or simply to get the content of a document. Here is the grammar of this command.
Call a CGI server (or open a document): "call" Document Argument+

2 Knowledge handling commands

2.1 Knowledge assertion commands

2.1.1 Languages based on the Conceptual Graph (CG) formalism

We have chosen the CG formalism as our base representation language since it is has the following properties which are necessary for WebKB to be used both in knowledge modeling and information retrieval:
- it is logic-based and support searches via arbitrary complex or general queries by exploiting the specialisation relations between types and therefore between CGs;
- it is expressive since it allows the use of existentially quantified variables, contexts and sets (note: in WebKB, contexts and sets are simply ignored during searches, CGs using them are retrieved as if they were simple existential CGs);
- it is network-based (thus rather intuitive) and has both a graphic notation and a linear notation.

If you are not familliar with CGs, please click here for pointers and a short presentation.

WebKB accepts the usual CG textual notation but also simpler notations: a frame-oriented CG notation, some HTML structures and soon, indented text and a formalised english. These notations are less expressive than the usual CG notation but more readable or handy to use when constructing documents mixing knowledge with other document elements. The frame-oriented CG notation and formalised english may also be handy for making queries. Knowledge represented with a notation may indeed be queried via other notations.

Facilities for not having to declare types before using them in CGs are also proposed.

2.1.2 Declaring, defining and ordering types

Types may be declared and ordered directly, or via type definitions. We first give some commented examples then the general grammar.

The following command declares the concept type Something and associates an annotation to it. (As opposed to comments, annotations are not considered as spaces by the lexical analyser, they are stored with the object they annotate, and thus their content may be exploited by applications).
Something {* something instance of a first order concept type *};

The next command does the same thing but also declares three subtypes to Something. The first two, Entity and Situation, are between parenthesis to represent the fact that they belong to a same partition of exclusive types. They are not exclusive with the third type, Something_playing_a_role. This type is alone in its partition and thus doesn't need to be enclosed in parenthesis. Types are exclusive if they must not have common subtypes (and then instances, since an individual can only be instance of one concept type).
Something {* something instance of a first order concept type *} > (Entity, Situation), Something_playing_a_role;

The concept types previously declared as subtypes may be further precised in the same way. For example:
Situation {* something that occurs in a region of time and space *} > (State, Process), Phenomenon, Situation_playing_a_role;

Similarly, the next command declares a relation type and its subtypes. It also gives the signature of the relation, i.e. that it must go from a concept of type Something to another concept of type Something.
Component_binaryRel (Something, Something) {* mereologic binary relation *} > (Subset, Set_element, Part), (Main_part, Parts);

The relation types previously declared as subtypes must be precised in the same way. For example:
Subset (Collection, Collection);

Look at the top-level ontology proposed by WebKB for other similar examples.

Concept types may be defined by necessary and/or sufficient conditions (NC, SC and NSC) or typical conditions (TC).
The last kind of definitions are useful for associating a typical schema to a type, that is for listing the kinds of things (entities, properties, processes, etc.) which are generally related to the instances of the defined type. The user is free to decide what "typical" means for his/her application.
The definition body is a CG. The defined type is located in this CG via a variable. Here are some examples of concept type definitions.

NC  for Temporal_entity(x) are  [Entity:*x]->(Chrc)->[Temporal_property];
NSC for Collection     (x) are  [Entity:*x]->(Chrc)->[Cardinality];
SC  for Description    (x) are  [Description:*x]<-(Descr)<-[Situation];
TC for Description(x) are
      [Description:*x]-
          { (Modality)->[Modality];
            (Author)->[Cognitive_agent];
            (Believer)->[Cognitive_agent];
            (Rhetorical_binaryRel)->[Description];
            (Statement)->[Description_medium];
          };

NC and NSC definitions also set specialisation relations between types: in these definitions, the defined type is declared as a subtype of the type of the concept which includes the variable.

Relation types may only be defined by necessary and sufficient conditions. The kinds of the connected concepts, i.e. the signature of the relation, is precised via variables. Here is an example:
NSC for Author (x,y) are [Description:*x]<-(Object)<-[Make_statement]->(Agent)->[Cognitive_agent:*y];

The following types are predefined in WebKB: Something, the supertype of all concept types, BinaryRel, the supertype of all binary relation types (its signature is: Something -> Something), and Description, the supertype of all "statements" (in WebKB, a CG is considered as a representation of a statement made by someone about the existence of a some entity or situation in a past, present, future or imaginary world). Three other types are predefined for handling embedded CGs: Contextualizing_description, Contextualizing_relation and Process_contextualizing_its_object. These types will be discussed below.

Here is the WebKB grammar for commands declaring, defining and ordering types. ? means 0 or 1 occurence, * means 0 or more occurences.

Relation type declaration: Ident "("IdentList")" Annot? (("<"|">") PartitionList?
Concept  type declaration: Ident                 Annot? (("<"|">") PartitionList?
Relation type definition :  "NSC" "for" Ident "(" IdentList ")" "are" CG;
Concept  type definition : ("NSC|"NC"|"SC"|"TC") "for" Ident "("IdentList")" "are" CG;

with

Ident : Ident1(IdentN)*
Ident1: [a-zA-Z]|"!"|"#"
IdentN: [a-zA-Z0-9]|"_"|"-"|"/"|"?"|"&"|"%"|"."

PartitionList: Partition (","? Partition)*
Partition    : Ident | "(" Ident (","? Ident)* ")"

Annot: "{"..."}" | "{*"..."*}"  //... means anything not including } except
          //if escaped by an anti-slash or in a single/double quoted string

Note: Identifiers ending by a dot should by avoided because a dot followed by a space can also be used to separate commands (like ';' and '|'). If you want to use an identifier ending by a dot, you will have to use it at the ends of commands and immediately followed by ';' or '|'.

2.1.3 Declaring individuals

Individuals are concept type instances. Like concept types and relation types, they must be declared before being used in CGs (though we'll see below that WebKB accept a special notation for declaring types or instances at the time they are used in a CG). Here are two examples.

Cat : Tom;  //The Concept type Cat must have already been declared
Person : John Peter;  //John and Peter are instances of the type Person;

With the same conventions as earlier, the WebKB grammar for commands declaring individuals is:
Ident " :" Ident (Ident)*

2.1.4 Usual CG linear notation

A CG assertion may be done by writing a CG alone in a command or by preceding it with an exclamation mark. A CG may be asserted and given a name via the command "name". For example:
name #CatOnMat [Cat]->(On)->[Mat];

A copy of a CG under another name may also be created, via the command "copy". For example:
copy #CatOnMat #CatOnMat2; copy [Cat]->(On)->[Mat] #CatOnMat2;

Here is the WebKB grammar for the "usual" CG linear notation.

CG        : (CGname | Concept Arc2rel? "."? | Relation Arc2con "."?) MetaInfo?

Arc2rel   : Arc Relation Arc2con? | "-"? "{" (Arc? Relation Arc2con? ";")+ "}"
Arc2con   : Arc Concept  Arc2rel? | "-"? "{" (Arc  Concept  Arc2rel? ";")+ "}"
                                    //note: ; may be omitted if followed by }

Relation  : "(" Rtype ")"
Concept   : "[" Ctype (":" Referent CG? Annot?)? "]"
          | "["                     CG  Annot?   "]"
          | "["         "*"Variable              "]"

CGName    : "#"[a-zA-Z]IdentN*
Rtype     : Ident | QString   //Identifier or single/double quoted strings
Ctype     : Ident | QString
Referent  : (Individual | "*"Variable) Set?
Individual: Ident | Number
Variable  : Ident | Number
Set       : "{*}" | "dist{*}" | "col{*}" | "cum{*}"
Annot     : "$("...")$" | "(*"..."*)"  //... means anything between those marks
Arc       : Digit? ("<-"|"->)
Digit     : [0-9]

MetaInfo    : "with" "{" MetaInfoBody "}"
MetaInfoBody: ( (Attribute":" Value? ";")+
Attribute   : Ident | Number | QString
Value       : ... ";"  //... means anything not including ; except if escaped

QString     : SQString | DQString
SQString    : '...' //... means anything not including ' except if escaped 
DQString    : "..." //... means anything not including " except if escaped

In WebKB, a concept embedding a CG is interpreted as a statement on the statement represented by the embedded CG. (We remind that in WebKB, a CG is considered as a representation of a statement made by someone about the existence of a some entity or situation in a past, present, future or imaginary world). Hence, the type of the embedding concept must be Description or a subtype of Description (no information is added if the type is Description since by default a CG is interpreted as being embedded in a concept of type Description).
The embedding concept given by the user may have an individual as a referent but not a variable: this individual need not to be declared, it is automatically created as an instance of Description and refers to the embedded CG; thus, other concepts having this individual as a referent are coreferent to the first one and they do not need to embed the CG.

Meta-information (cf. MetaInfo) are just annotations (cf. Annot) except that they must be structured as a set of attribute-value pairs. A meta-information for an embedded CG must be written as an annotation in the concept embedding this CG. A meta-information may specialise another one in a similar way that a CG may specialise another one. Here is the definition.
A meta-information m2 specialises a meta-information m1 if m2 includes at least all the attributes in m1, and for each attribute, its value in m2 is:
- a set of characters identical to the value (of the same attribute) in m1;
- a subtype of the type in the value in m1;
- an individual instance of the value in m1.
Without meta-information, the same data would have to be represented with embedding CGs, which may be tedious for the author and not always adequate for the readers, especially when a lot of CGs are displayed (embedding CGs take a lot of space and it is thus harder to compare the "core" CGs). Consider for example the alternative following representations:
[Cat]->(On)->[Mat] with {Author: phmartin; CreationDate: 21/01/1998;} and
[ [Cat]->(On)->[Mat] ]- { (Author)->[Person: phmartin]; (CreationDate)->[Date: Jan_21_1998]; }
The second CG is more precise but not as readable as the first. The same remark holds for queries using meta-information: the query
spec [Cat]->(On)->[Something] with {Author: Researcher; Date: Jan_21_1998]
which asks for the specialisations of that CG with these meta-information, is easier to write than
spec [ [Cat]->(On)->[Something] ]- { (Author)->[Researcher]; (CreationDate)->[Date: Jan_21_1998]; }

Sets may be represented but are not used for knowledge retrieval. They are however displayed when CGs are retrieved, like annotations and even HTML marks used inside the CG.

The user may not want to take the time to declare and order some or most of the types s/he uses in knowledge representations (this may for example be the case of a user who indexes sentences from various documents for private knowledge organisation purposes). WebKB offers two ways for using undeclared types in CGs and having them automatically inserted in the ontology according to the way they are used: (i) by telling the system, via the command "no decl", that undeclared types may be used (the "command "decl" switches back to the normal mode), or (ii) by prefixing each yet undeclared type with an exclamation mark character as in the following example: [!Cat]->(On)->[!Mat]
Since the WebKB relation type ontology (200 relation types) declares and organises most of usual natural language basic relation types, we suggest the casual user to use (and complement when necessary) the relation types from this ontology, and leave the concept types undeclared. The signatures of the used relation types allow WebKB to structure the undeclared concept types in the application ontology.
When the WebKB warehouse will be implemented and initially filled with the WordNet natural language ontology (90,000 concept types), WebKB will have additionnal hints for organising undeclared concept types and thus also for guessing adequate signatures for undeclared relation types connecting concept of undeclared types. However, it will remain true that the more types (and therefore CGs) are specified, the more answers to queries will be precise.

The reader may also have noted that type names may be simple/quoted strings. Used in a relation, it is equivalent to the exclamation mark prefix. Used in a concept, it is actually a shortcut for specifying that the concept is of type Description and has an annotation the content of which is the string. Thus,
["A cat is happy"]->(Cause)->["It is on a mat"] is equivalent to
[Description: (* "A cat is happy" *)]->(Cause)->[Description: (* "It is on a mat" *)]
Indeed, when so few information is formalised, the use of such CGs is more to support information retrieval via navigation than by knowledge based queries.

2.1.5 Frame-oriented CG linear notation

In order to make CGs more readable and easy to write, WekKB also proposes a more frame-oriented notation. As an example, if Tom has been declared as an individual of type Cat,
[Tom, on: (a mat, on: a table), near: a mouse] is equivalent to
[Cat: Tom]- { (On)->[Mat]->(On)->[Table]; (Near)->[Mouse]; }

Grammars and examples for FCGs are accessible at: http://www.webkb.org/doc/languages/.

2.1.6 Document Elements (DEs) organisation

2.1.6.1 HTML structures

Currently, the only HTML structure interpreted as a knowledge representation by WebKB is the definition list. Its use is similar to the frame-oriented CG notation with strings as type names, but in the definition list, the strings need not be inside quotes, and HTML marks replace the square brakets. As an example,

["A cat" "is on": "a mat"; "looks at" : "a mouse"; ["something related to a cat" $(its definition)$ ] ];

may also be represented in the following way:

<dl><dt> A cat <dd> is on : a mat. <dd> looks at : a mouse. <dd> <dl><dt>something related to a cat<dd>its definition</dl> </dl>

which is displayed in that way by a Web browser:

A cat

is on : a mat.

looks at : a mouse.

something related to a cat: its definition

Other HTML structures, especially headers and named hypertext links, will later be interpreted by WebKB for offering alternative ways of representing conceptual relations.

2.1.6.2 Indented text

Not yet implemented. It will be similar to the above definition lists but without the HTML marks. However, the indented text will have to be surrounded by the marks <it> and </it>.

2.1.7 Formalised english

Grammars and examples for FE are accessible at: http://www.webkb.org/doc/languages/.

2.2 Knowledge-based indexation commands

Document elements (DEs) may be indexed by CGs and then retrieved via these CGs. Conceptual relations between DEs may also be directly represented without representing the DEs. Here is the grammar for the DE indexation and connection commands As opposed to other commands, these ones need not to be isolated from other elements in a document.

DE indexation: "$(Indexation" "(Context:" Context ")"
                  ( "(DE:" DE ")" "(Repr:" Repr ")" )+  ")$"

DE connection: "$(Connection" "(Context:" Context ")"
                  ( "(DE:" DE ")" "(Rel:" Rtype ")" "(DE:" DE ")" )+  ")$"

Context: MetaInfoBody  //this meta-information is added to the indexing CGs
         //Unless DEdescr (cf. below) specifies the URL(s) of the document
         //including the indexed DEs or the connected DEs,
         //the attribute "Indexed_doc" is mandatory for DE indexations,
         //and the attribute "Document" is mandatory for DE connections.
         //In both cases, the value must be a URL.

DE     : "{URL:"      ... "}"  //... is an URL for the indexed/connected DE
       | "{Document:" ... "}"  //as above, but the DE is the whole document
       | "{section title:" ... "}"  //... is anything without unescaped }
       | ("{"Digit("st"|"rd"|"th")?"occurence"?"}")? TextOrHTML

Repr   : TextOrHTML //actually, this text or HTML must be a CG, or an img HTML tag
       //with a CG in its "alt" attribute, for allowing the DE to be retrieved via
       //CG retrieval commands

TextOrHTML: anything not including unbalanced parenthesis, except if escaped
            or inside single/double quoted strings

If the indexed/connected DE cannot be referred by an URL, then if the document is a section, its title may be provided, otherwise the content of the DE must be given, preceded if necessary by its occurrence number (the default occurrence number is 1). Thus, the DE may be retrieved even if the document changes slightly; otherwise, the DE can still be presented and the representation is still up-to-date. On the opposite, if an URL is used and the document is changed, the DE may not be accessible or the representation may not be adequate anymore for the new DE content. The user must choose the method which suits most its application.

The "tool to index DEs by knowledge representations" and the "tool to connect DEs by conceptual relations" are intended to ease the construction of indexation/connection commands.
Examples are given in an "indexation of an interview retranscription".

2.3 Knowledge testing commands

Specialisation relations between types, between CGs, and between types and individuals may be tested as in the following examples. The answer is Yes or No.
? Cat < Entity; ? [Cat] < [Entity]; ? Tom : Cat;

Here is the general grammar.

Test type instance specialisation:  ("?"|"spec") Ident ":" Ctype
Test concept  type specialisation:  ("?"|"spec") CType "<" CType
Test relation type specialisation:  ("?"|"spec") RType "<" RType
Test CG specialisation           :  ("?"|"spec") CG    "<" CG

2.4 Knowledge querying commands

Here is the grammar for knowledge querying commands.

Display the specialisations of a type or a CG: ("?"|"spec") (Ctype|Rtype|CG)
Display the generalisations of a type or a CG: ("^"|"gene") (Ctype|Rtype|CG)
Display the minimal type of an individual    : "type of" Individual
Display the definitions of a type            : "def of" (Ctype|Rtype)
Display CGs (useful with CG names or pipes)  : "display" CG+
Display linear forms of CGs                  : "linearForm" CG+

Before these commands, the following ones may be issued in order to control which CGs must be searched and how they are presented.

Search on CGs used for type definitions   : "on" "def"
Search on "normal" CGs (the CG base)      : "on" "CGs"
Search on all CGs                         : "on" "all"     //default mode

Display each CG in all its embedding CGs  : "all" "embeddings" //default mode
Display each CG in its contextualising CGs: "only" "context"

Display each CG in its linear form        : "use" "linear"
Display each CG with its meta-information : "meta"         //default mode
Display each CG without meta-information  : "no meta"
Display each CG in its original form      : "use" "CGs"    //default mode
Display the  DE indexed by each CG        : "use" "Repr"   //if the DE exists
Display each CG and its indexed DE        : "use" "Repr&CGs"

The conventions about undeclared types in CGs also apply for query CGs. A query CG using an undeclared type which hasn't been used before will not retrieve any specialising CGs but generalising CGs might be found (i.e. all the queried CGs for which the query CG is a specialisation).

Queries for specialisations give the user some freedom in the ways to express queries: searches may be done at a general level and then refined according to the given answers. However, exact names of some types must still be known at one level or another. To palliate this, WebKB allows the user to give only a substring of a type in a query CG, provided that this substring is prefixed by the character %. WebKB generates the actual request(s) by replacing in the given query CG the substring by the manually/automatically declared types which include that substring. The replacements which violate the constraints imposed by relation signatures or individual types are discarded. Then, each remaining request is displayed and executed. For example, spec [%thing] will at least trigger the generation and execution of spec [Something].

In the mode "all embeddings", when a CG is found, if it has embedding CG(s), the most enclosing one is presented. Thus, if the mode "use CGs" or "use Repr&CGs" is on, the originally retrieved CG is shown with all its embedding CGs, unless the user has used CG names for linking the different levels of CGs. However, if the mode "use linear" is on, the currently generated linear forms do not show embedded CGs.

In the mode "only context", not all the embedding CGs are presented but only the uppermost "contextualizing" one. A CG contextualizes another one if (i) it encloses it (directly or indirectly) in a concept of a type subtype of Contextualizing_description (e.g. Generally_true_description), or (ii) it relates the enclosing concept to another one by a relation of type subtype of Contextualizing_relation (e.g. Author), or via intermediary relations to a concept of type subtype of Process_contextualizing_its_object (e.g. Think).

In the modes "use CGs" and "use Repr&CGs", the CG is shown in the exact form used by the user (e.g. with HTML marks around concept types) except when the CG is given in the "alt" field of an HTML image mark. In that case, the image is displayed, not the textual form.

In the modes "use Repr" and "use Repr&CGs", if the CG indexes a document element (DE), this DE is presented.

In the modes "use CGs", "use Repr" and "use Repr&CGs" hypertext links are also generated for accessing each CG or the DE in its source document. This access involves the generation of a copy of the source document plus additional marks for highlighting the location of the CG or DE and telling the Web-browser to show this location.

Many examples of queries may be found in the knowledge-based information retrieval/handling tool and especially in the script scr.html.

2.5 Knowledge generation commands

WebKB knowledge generation commands are CG join commands. Various kinds of joins may be defined but we are considering those which, given CGs, create a new one which specialises each of the source CGs. Thus, though the result is inserted in the CG base, it may not represent anything true for the user, and therefore may be seen as only a way to speed up knowledge representation. For instance, in WebKB, CGs related to a type may be collected and automatically merged via the following command (the components of which will be explained later): spec [TypeX] | maxjoin
The result may then serve as a basis for the user to create a type definition for TypeX.

The user may direct the join by specifying two concepts which must be merged. One of the concepts must specialise the other, and this is the one which will be kept in the merged CG. WebKB proposes a command to do an "isojoin" which, given 2 concepts on which to merge 2 CGs, tries to extend the merged area to the neighbouring relations and concepts. There may be several possibilities and thus several results. This isojoin, which is polynomial, is proposed by the CG workbench CoGITo (Haemmerl�, 1995) exploited by WebKB and described in Chein & Mugnier (1992).

WebKB also proposes an undirected "maximal join" which, given two CGs, tries all the possible isojoins on them (which is still polynomial) and selects one of the resulting CGs which has the "minimal" number of concepts and relations (the others are removed). This commands accepts more than 2 CGs: the first two CGs are joined, then the result is joined with the third, and so on. Here is the grammar of WebKB join commands.

Internal join in a CG: "ijoin"   "on" C1 C2 CG      CG_result_name?
Join of two CGs      : "join"    "on" C1 C2 CG1 CG2 CG_result_name?
Isojoin of two CGs   : "isojoin" "on" C1 C2 CG1 CG2 CG_result_name?
Maximal join on CGs  : "maxjoin" ("-n" CG_result_name)? CG CG+

CG1: CG
CG2: CG
C1 : Ctype (":" Referent)?   //C1 must belong to CG1
C2 : Ctype (":" Referent)?   //C2 must belong to CG2
CG_result_name: CGname

Here are two examples of queries and results, extracted from the results of the execution of scr.html.

> join on Cat Cat:Tom #CatOnMat #TomOnTable

[Description: #Join_of_CatOnMat_and_TomOnTable_on_Cat-__and_Cat-Tom
   [Mat]<-(On)<-[Cat:#Tom]->(On)->[Table]
]

> maxjoin #CatOnMat #TomOnMatNearTable

[Description: #Maxjoin_of_CatOnMat_and_TomOnMatNearTable
   [Cat:#Tom]->(On)->[Mat]->(Near)->[Table]
]

2.6 Knowledge deletion commands

Here is the grammar of WebKB deletion commands.

Delete concept  types if no CG uses them: delCT CType+
Delete relation types if no CG uses them: delRT RType+
Delete individuals    if no CG uses them: delI  Individual+
Delete CGs or definitions: delCG  CG+
Delete only definitions  : delDef CG+
Delete all CGs           : delCGs

3 Text processing commands

WebKB proposes text processing commands similar to some Unix ones except that they work on Web-accessible documents. Here is the grammar.

Print arguments                 : ("print"|"echo") "-n"? Arg+  //"-n": without newline
Count the number of arguments   : "nbArguments" Arg+
Word counter similar to Unix wc : "count" "-bytes"? "-words"? "-lines"? "-paragraphs" Doc+
Give   the current directory    : "pwd"
Change the current directory    : "cd" Dir
Display/concat documents        : "cat" ("-removeHTMLmarks"|"-makeURLsVisible")? Doc+
Grep  documents (cf. Unix grep) : "grep"  Options Regular_expr Doc+
Fgrep documents (cf. Unix fgrep): "fgrep" Options String Doc+
Diff  documents (cf. Unix diff) : "diff"  Options Doc+
Head  documents (cf. Unix head) : "head"  Options Doc+
Tail  documents (cf. Unix tail) : "tail"  Options Doc+
Awk   documents (cf. Unix awk)  : "awk"   Options Doc+
List hypertext accessible doc.  : "accessibleDocFrom" "-HTMLonly"? ("-maxlevel" Digit)? Doc+

Doc    : URL
String : Arg
Arg    : QString | Ident | Number | CG | ...
         //... means anything inside balanced parenthesis or braquets,
         //          otherwise any single character
Options: cf. the options of the corresponding command on Unix

The command cat prints documents one after the other. With the option removeHTMLmarks, the HTML marks are not printed. With the option makeURLsVisible, the URL of each hypertext link is copied between square brackets after the hypertext link so that it is visible when a Web-browser displays the command result.

The command accessibleDocFrom lists the names of the documents which are accessible via hypertext links from the documents given in parameters. Only the accessible HTML documents are listed if the option HTMLonly has been specified. The retrieved documents are taken into account unless the depth maxlevel has been reached. By default, maxlevel equals 1, hence, by default, the retrieved documents are not taken into account for the search.

See the classic information retrieval/handling tool for examples.

4 Control structures

A simple shell-like scripting language has been adopted for enabling command combination. Here is the grammar.

Command : "if" "(" String_or_number_comparison ")" "{" commands "}"
          ("else" "{" commands "}")?
        | "while" "(" String_or_number_comparison ")" "{" commands "}"
        | "do" "{" commands "}" "while" "(" String_or_number_comparison ")"
        | "for" Var "in" "(" String+ ")" "{" commands "}"
        | "for" Var "in" QString "{" commands "}" //values in QString are space-separated
        | "set" Var String //then, prefixed by '$', Varia will be replaced by the string
        | Command ";" Command?
        | Command_returning_small_results "|" command
        | ...  //... means any of the commands cited in other sections

String_or_number_comparison: String Operator String | Number Operator Number
Operator: "<" | ">" | "<=" | ">=" | "!=" | "=" | "=="  //= and == are equivalent
String  : QString | Ident
Var     : Ident

The pipe (command1 | command2) is different from the Unix one. It actually adds the results of the first command to the list of parameters of the second command. Hence, it only works with commands returning small results, which is the case of most commands returning results (CG handling commands, echo, nbArguments, count, pwd, accessibleDocFrom) but not document processing commands (cat, grep, frep, diff, head, tail, awk). This will be fixed in the future.
Here is an example: spec [Cat] | maxjoin [Mat]->(Near)->[Table]
This command will fail if the first retrieved CG is not about a table or a Mat.

See the classic information retrieval/handling tool and scr.html for examples.

5 References

M. Chein & M.L. Mugnier. Conceptual graphs, fundamental notions. In Revue d'intelligence Artificielle, 6(4) pp 365-406, 1992

O. Haemmerl�. CoGITo: une plate-forme de d�veloppement de logiciels sur les graphes conceptuels. Ph.D thesis, Montpellier II University, France, January 1995. URL: http://cat.inist.fr/?aModele=afficheN&cpsidt=166749

Dr. Philippe A. MARTIN