General "Knowledge Representation and Sharing" and complementary techniques I designed to support that

Ph. Martin

LIM, University of La Réunion, France

www.phmartin.info/slides/lim2021/

Preliminary definitions (for this presentation)

(Formal) Identifier (ID), alias "unique identifier" (UID), "formal lexical term" or, in logics, "symbol": lexical object declared as having a unique meaning → not just a "name" (informal ID). (In logics, a sequence of symbols (IDs) is used to create either a term (formal expression) that denotes an object, or a formula (sentence) that denotes a fact.) Formal syntax: syntax (grammar) composed of terms: terminal ones (IDs: lexical terms) and non-terminal ones (directly or indirectly) defined into terminal ones. Model-theoretic semantic/interpretation (1 formal way to define semantics): function that maps terms to individual(object)s, and sentences to truth values.

Information: Information Object(s). Sequence of IDs that conforms to a formal/informal syntax and that expresses something. Knowledge or Data (↔ Info base: Knowledge base or Database) Not here: - "knowledge: (information: data + semantic/types) + context/understanding/entailment/strategy" - "knowledge: thing known to be true by/given observations/beliefs/assertions + deductions " as in Epistemology or Epistemic logic.

Knowledge: Knowledge Representations (KRs). Information that is, at least partially, represented and organized - in some logics, and - via semantic relations (subtype, part, result, instrument, time, place, author, ...), and - in a way known by the interpreting agent (person or software). Example 1 and 2 in Section 2.1. Semi-formal KR: KR relating informal IDs/sentences by semantic (→ formal) relations.

Data: information that is not knowledge.

Plan

1. Need for general Knowledge Sharing (KS)

1.1. Need for knowledge bases (KBs) to be used instead of databases

1.2. Need to distinguish present-time KS and general KS

2. Some complementary techniques I designed to support general knowledge sharing

2.1. Unrestricted/"à la carte" formal languages

2.2. General-purpose ontologies aligning top-level ones and lexical ones

2.3. KB servers that support non-restricting KB Sharing by Web users

2.4. KB servers that support networked KBs

1.1. Need for KBs to be used instead of databases – Definitions

Knowledge Base (KB): set of IDs and of KRs using these IDs.

Parts:

ontology: - IDs (concept/relation types/individuals);
- KRs (logically quantified relations) about/defining (the meanings of) these IDs
(→ partial/full definitions; from them, KR entering/search forms can be generated)
base of facts: KRs about individuals (information objects that are not types) and
hence also all definitions that are not in the ontology.

Database (DB): information base that is not a KB.

Parts:

schema (← predefined or partially defined by the database designer):
- usable types of relations: partOf + instanceOf and sometimes subtypeOf, ...
- usable types of concept (classes) and attributes (class-local implicit relations)
- no definition (except for a few in deductive databases) → no formal|explicit meaning
base of data (← from users): (existentially quantified) relations/attributs about individuals.

Subtypes: - relational/object-oriented/NoSQL/deductive DBs ("RDF DBs" are KBs) - XML/JSON/Latex document, base of documents that are not KBs - data indexed by IDs from a folksonomy / ontology (e.g. as in semantic wikis) ("Linked data – e.g. the Semantic Web – refer to KBs and the data they index or relate).

1.1. Need for KBs to be used instead of databases – Notes

Note 1. As in programming, the more expressive the used formal language, the more objects (functions, relations, ...) can be represented explicitly (→ as "1st class" objects) and hence named, quantified, related, ... and then exploited. With most DB systems, the end-user cannot define nor set relations, the schema is mostly a tree → implicit redundancies (e.g. those that form-normalizing a DB, from 1NF to 6NF, tries to remove) → arbitrary choices (e.g. making a class Employee part of a class Company, or conversely, or choosing one plan/structure in an article, web site, wiki, ...). Note 2. The implementation of a KB system can reuse a DB system with a schema of mainly two tables/classes: one for "explicit relation nodes", one for explicit concept nodes.

1.1. Need for KBs to be used instead of databases

Possibility+need to

store anything (→ no predefined schema to design/follow)
and in an automatically enforceable explicit+precise/organized way
(→ no redundancies/arbitrariness/... unlike in databases, wikis, ...)
(note: a KB system may reuse+complement one or several DBs or DB systems)
search information semantically (not just lexically and structurally), by query (→ 100% precision/recall for logically complete results; → the list of results may be organized, e.g. by specialization relations) or navigation along semantic relations (as in a decision tree)

BUT

* manually representing information is difficult (especially for large KBs and general KS)

* automatically representing the content of a document requires the software to "understand" it

* automatically merging a set of KBs not unified into a "networked KB" by the knowledge authors does not lead to good results if only because the required information is only in the head of the knowledge authors

1.1. ... – Example of KR organization by generalization/implication/exclusion

Entity £ Process Bird^{↗ ↖}Counting //or: Flight Tweety^↗

`no Bird can be agent of a Counting´ //false belief £ £ £ `at least 1 Bird ⇐ `at least 50% of Bird `every Clever Bird can be agent of can be agent of can be agent of a Counting´ a Counting´ a Counting´ ⇑ ⇖ ⇖↘ ⇗↙ `1 Bird `Tweety can be `every Bird can be agent of a Counting can be agent of that has for duration agent of a Counting´ at least 0.5 Hour´ a Counting´ ⇑//if ... ⇑↓ `Tweety has been agent of a Counting `every Bird|Bat can with duration at least 0.5 Hour´ be agent of a Counting´

Legend. "→": generalization that is not an implication, e.g. subtypeOf, instanceOf; "⇒": implication; "£": exclusion ( x £ y <=> ((x ⇒ ¬y) ∨ (x → ¬y)) ); "can": is able to; every sentence is in FE; relation types are in italics; concept types begin by an uppercase; the authors of IDs, sentences and relations are not represented (unlike in this example); in FE, "every" (alias 100%) and "%" are for "observations" and hence imply "at least 1", whereas "any" is for a "definition" and hence does not imply "at least 1"; the distinction is important since observations may be false while definitions cannot (since agents can give any identifier they want to the types they create) and thus cannot be corrected or contradicted

1.2. Need to distinguish present-time KS and general KS

Knowledge sharing as commonly currently thought (since 2000 / the W3C's "Linked Data"):
- for a known application and, generally, for efficient+complete knowledge inferencing
- between agents that can discuss to solve ambiguities: B2B KS.

KS in the general sense (or pre-W3C): KR in explicit ways to maximize knowledge reuse.
KR creators do not make exploitation-dependent choices → each application can make them.
This goal was forgotten/abandoned since the "Linked Data" Web in the same way as advanced
researches on hypertext systems were forgotten/abandoned after the beginning of the Web.

The W3C (WWW Consortium), researchers in knowledge inferencing, and
for hype/grant/publication/... reasons, most researchers since 2000

do not distinguish the two but only propose supports for the 1st
bet that, like "deep learning", (future) automatic knowledge extraction techniques can produce "seemingly good enough" results without bothering end-users with any KR details

→ misleading + unscalable "KS supporting" techniques + poorly reusable KRs.

1.2. Need to distinguish present-time KS and general KS

General KS is necessary for

(a network of) KBs to behave as one well-organized KB instead of (mostly) data wrt. each other ↔ no implicit redundancy/implication/specialization/exclusion between IDs/KRs of these KBs ↔ no redundancy of efforts ↔ exploitation/access-to (within this network) all information conceptually relevant to a query in an organized way, without author based bias (→ popularity, lobbying, marketing, ...) except possibly on criteria decided by each end-user (e.g. for trust purposes) ↔ no incredible waste of efforts from information providers+seekers
(example 1 of application kind) organizing tasks and methods
– especially, software components|librairies and APIs/commands –
via specialization/part/... relations, into a (network of) shared KB(s),
to allow web users to 1) retrieve which software function/command can perform a task,
and 2) given a task and available resources, generate a call to a given function/command
... (see next page)

1.2. Need to distinguish present-time KS and general KS

(... General KS is necessary for ...)

(example 2 of application kind) supporting+enforcing genuine/“unfair” cooperation:
- via knowledge sharing see previous page(s)
- “fair decision” supporting/enforcing system (e.g. direct democracy via a logicocracy) ↔ system preventing “unfair” decisions or marking them as invalid/non-enforceable: a “fair” decision is one that is logically-consistent and an optimum with the preferences and logically consistent beliefs represented in the exploited KBs of the persons affected by the decision; notes:
  - preferences may also be on ways to aggregate and maximize preferences
  - each of the potentially affected persons has to be alerted of the potential decision
  - any person may provide logically argued alternative potential decision
  that are “better” based on criteria, preferences and facts in the exploited KBs
  - an “efficient+fair” way to publish+review knowledge would be based on such a system
- “fair work” supporting/enforcing system: ↔ when an agent commits to a task (e.g. accepts a job, some responsibilities, ...) but does not perform a subtask of it (e.g. answering all emails related to this task) the system (e.g. a workflow system exploiting KBs) reminds the agent and records+advertises the commitment breaching behaviors

2. Underlying ideas of some complementary techniques that I designed to support general knowledge sharing

2.1. Unrestricted/"à la carte" formal languages

2.2. General-purpose ontologies aligning top-level ones and lexical ones

2.3. KB servers that support non-restricting KB Sharing by Web users

2.4. KB servers that support networked KBs

2.1. Unrestricted/"à la carte" formal languages

Problems: most present-time KR languages (KRLs)
- have expressiveness restrictions (→ restrict or bias KS), and/or
- are low-level (i.e. lacking keywords for common features) → not concise (→ hard to read) + not normalizing
Solutions:
- few expressive+high-level KR notations, mainly FE and FL
- for even more flexibility: KRLO, an ontology of KR models and notations → exploitable by tools to import/export formal structures in any of the languages specified in the ontology → knowledge providers can use and mix the syntaxes/features they prefer and are not restricted by existing KRLs/standards, and then each application converts KRs to what its inference engine can exploit
  (note: different programming/modelling paradigms – e.g. Petri Nets – focus on different relation+concept types; all of these types can be organized/related/defined into KRLO)

2.1. ... – Example 1

En:By definition, a flying_bird_with_2_wings is a bird that flies and has two wings. LP:Flying_bird_with_2_wings (b) := Bird(b) ∧ ∃f Flight(f) ∧ agent(f,b) ∧ ∃w1,w2 Wing(w1) ∧ part(b,w1) ∧ part(b,w2) ∧ w1!=w2 FE:any Flying_bird_with_2_wings is a Bird that is agent of a Flight and has for part 2 Wing. FL: Flying_bird_with_2_wings = ^(Bird agent of: a Flight, part: 2 Wing). RDF+OWL2 / Turtle: :Flying_bird_with_2_wings owl:intersectionOf (:Bird [a owl:Restriction; owl:onProperty :agent; owl:someValuesFrom :Flight] [a owl:Restriction; owl:onProperty :wingPart; owl:qualifiedCardinality 2]);

UML_model / UML_concise_notation:

2.1. ... – Example 2

En:On March 21st 2016, John Doe believed that in 2015 and in the USA, at least 78% of adult healthy carinate birds were able to fly. FL:[ [ [ [ [at least 78% of Adult Healthy Carinate_bird is able to be agent of: a Flight ] place: USA ] time: 2015 ] believer: John_Doe ] time: 2016-03-21 ]. FE:` ` ` ` `at least 78% of Adult Healthy Carinate_bird is able to be agent of: a Flight´ at place USA´ at time 2015´ for believer John_Doe´ at time 2016-03-21´. IKLmE / Turtle:[rdf:value [rdf:value [rdf:value [rdf:value [rdf:value [rdf:value [:agent_of [a :Flight] ]; pm:q_ctxt [quantifier "78to100pc"; rdf:type :Adult, :Healthy, :Carinate_bird ] ]; pm:ctxt [:modality :Physical_possibility] ]; pm:ctxt [:place :USA] ]; pm:ctxt [:time "2015"] ]; pm:ctxt [:believer :John_Doe] ]; pm:ctxt [:time 2016-03-21] ].

2.2. General-purpose ontologies aligning top-level ones and lexical ones

Problem: "reusable" ontologies are hardly reused and hard to reuse together. E.g.: different KBs hardly reuse top-level ontologies and lexical ones, or the same ones. Because: most reusable ontologies (top-level ontologies, lexical ones, ...) are not/poorly aligned.
Solution: general-purpose ontologies aligning top-level ones and lexical ones, in KB servers (not documents; see next two subsections). Example core: the MSO of the WebKB-2 server .

2.3. KB servers that support non-restricting KB Sharing by Web users

Problem: searching/merging/sharing/reusing/... knowledge is made difficult by the lack of relations between IDs/KRs from different users (and hence also by the inconsistencies and redundancies between these KRs) because shared KB servers
- are insufficiently used (e.g. because the W3C guidelines and most research are about (semi-)independently developed KBs)
- restrict what can be entered: restricted KRLs/domains, unscalable ways of keeping the KB consistent (committees, consensus, ...)
- lack features for keeping an unrestricted "multi-authored KB" organized and easy to search/use/...
Solution (more detailed on the next page, if needed): using KB servers with KS protocols that maintain the organisation of shared KBs without requiring any restriction of content/KRL/...

2.3. KB servers that support non-restricting KB Sharing by Web users

Solution: using a system (KRL + KS protocol + interface) that
- leads each ID and KR to be associated to its author → each statement becomes either a belief or a ID definition (note: such an association cannot be represented/exploited in OWL)
- leads each "newly entered KR k1 that is inconsistent or redundant with an already entered KR k2" to be related to k2 (by k1´s author) via a relation of correction and/or implication and/or specialization (plus, in case of correction, a formal or semi-formal argument for it) → for inference purposes, choices between conflicting KRs can be automatically made based on their relations and information about their authors → information overload is avoided by the above cited organization and the possibility to set filters for not seeing particular kinds of KRs (e.g. KRs having a successfully justified correction) or KRs from particular authors → edit wars and discussions are resolved/avoided by leading to the accumulation of precisions (hence more and more formal ones; the process converges to a fully specified formal and consistent KB)
- handle removals/updates by - storing and exploiting statements about correction relations, or - ID cloning mechanisms
→ solves the problems of module/document based versioning systems

2.3. ... – Examples of additive corrections

u1#`every bird is agent of a flight´ | \c_=> _[u3] | ↘ u3#`at least 75% of healthy flying_bird can be agent of a flight´ | ↑ |c_=>/^{^} _[u2] |c_=> _[u3] ⇐ ... ↓ | u2#`every bird can be agent of a flight´

Legend. "------(typeID) _[userID]----→": relation of type typeID, created by userID "u1#...": u1 is the author of the prefixed statement; "c=>/^": correction and implication and semantic/structural generalization; "c=>": correction and implication (no specialization/generalization); "⇐": implication relation with destination on the left; "every": 100%

2.4. KB servers that support networked KBs

Problems:
- same ones as in 2.3
- Web users need to know+choose which KBs to update or query
  → scatters and hides information + makes it difficult to access
- current knowledge distribution mechanisms are
  - "database schema based" or
  - centralized; peer-to-peer (P2P) data sharing systems – which are data replication based –
    cannot be directly reused/adapted: they do not replicate/support relations between data
    → no replication of possible relations – i.e. of an organization – between replicated data.
Solution (more detailed on the next 2 pages, if needed):
a network of KBs that acts as a unique shared KB,
based on notions of
- intensional scope: specification of the kinds of objects (IDs or KRs)
  an individual KB is committed to accept from Web users
- KR update/query forwardings to all relevant KBs,
  given their Web-published scopes
  → replication in KBs of (direct) relations to IDs+KRs of all other KBs of the network.

2.4. KB servers that support networked KBs – Solution details – Part 1

Preliminary definitions for the solution.
Intensional scope: specification of the kinds of objects (IDs or KRs) a KB is committed to accept from Web users.
Intensional core scope: part of an intensional scope specifying the kinds of objects that a server is committed to accept, even if, for each of these kinds of objects, another intensional core scope on the Web also includes this kind of objects (i.e., if at least another server has made the same storage commitment for this kind)
Extensional scope: structured textual document that lists each ID (in the ontology from this individual KB) using a normalized expression of the form: "<formal-ID-main-identifier>__scope <URL_of_the_KB>". This format permits KB servers to exploit Google-like search engines for knowing which KBs store a particular ID.
(scope) Nexus: KB server that has
- publicly published its intensional and extensional scopes on the Web, and
- specified in its non-core intensional scope that it committed to accept storing the following kinds of IDs and KRs whenever they do not fall in the scope of another existing nexus: - the subtypes, supertypes, types, instances of each type covered by its intensional scope, and - the direct relations from each of these last objects (that are stored in this KB only as long as no other nexus stores them).

2.4. KB servers that support networked KBs – Solution details – Part 2

Core of the solution.
Joining of an individual KB (server) to a networked KB: simply committing the KB server
- to be a nexus for its intensional scope,
- to perform the next tasks whenever a command (query or update) is submitted to the KB server:
  - insofar as the intensional scope allows it, to handle this command internally via the KB sharing protocol of WebKB-2 or another protocol with better properties (for efficiency reasons, when an object is in the core intensional scope but is related to other objects that are not in it, each of these other objects should be associated to the URL of another KB server that has this object within its core intensional scope)
  - to forward this command to the KB servers which, given their scopes, may handle it, at least partly. These servers are retrieved via the above cited URLs and/or exploitation of a Google-like search engine. Via this propagation, the commands are forwarded to all nexus that can handle them, and no KB server has to store all the IDs of all the KBs, even for interpreting nexus scopes. To counterbalance the fact that some KR forwardings may be lost or not correctly performed, i.e. that this "push-based strategy" may not always work, each KB server may also search other nexus having scopes overlapping its own scopes and then import some KRs from these nexus (this is the complementary "pull-based strategy"). Thus, KB servers with overlapping scopes have overlapping content but this redundancy is explicit and hence not harmful for inference purposes.

3. Conclusion

General KS

is possible
is affordable and desirable:
even though more work, especially at the beginning,
MUCH more reuse and exploitation possibilities
(in the end: much less waste of efforts and of overall global work)
can be achieved incrementally.