(Formal) Identifier (ID), alias "unique identifier" (UID), "formal lexical term" or, in logics, "symbol": lexical object declared as having a unique meaning → not just a "name" (informal ID). (In logics, a sequence of symbols (IDs) is used to create either a term (formal expression) that denotes an object, or a formula (sentence) that denotes a fact.) Formal syntax: syntax (grammar) composed of terms: terminal ones (IDs: lexical terms) and non-terminal ones (directly or indirectly) defined into terminal ones. Model-theoretic semantic/interpretation (1 formal way to define semantics): function that maps terms to individual(object)s, and sentences to truth values.
Information: Information Object(s).
Sequence of IDs that conforms to a formal/informal syntax and that expresses something.
Knowledge or Data (↔ Info base: Knowledge base or Database)
Not here: - "knowledge: (information: data + semantic/types) + context/understanding/entailment/strategy"
- "knowledge: thing known to be true by/given observations/beliefs/assertions + deductions
Knowledge: Knowledge Representations (KRs).
Information that is, at least partially, represented and organized
- in some logics, and
- via semantic relations (subtype, part, result, instrument, time, place, author, ...), and
- in a way known by the interpreting agent (person or software).
Example 1 and 2 in Section 2.1.
Semi-formal KR: KR relating informal IDs/sentences by semantic (→ formal) relations.
Data: information that is not knowledge.
1. Need for general Knowledge Sharing (KS)
1.1. Need for knowledge bases (KBs) to be used instead of databases
1.2. Need to distinguish present-time KS and general KS
2. Some complementary techniques
I designed to support general knowledge sharing
2.1. Unrestricted/"à la carte" formal languages
2.2. General-purpose ontologies aligning top-level ones and lexical ones
2.3. KB servers that support non-restricting KB Sharing by Web users
2.4. KB servers that support networked KBs
Knowledge Base (KB): set of IDs and of KRs using these IDs.
Parts: |
|
Database (DB):
information base that is not a KB.
Parts: |
|
Subtypes: - relational/object-oriented/NoSQL/deductive DBs ("RDF DBs" are KBs) - XML/JSON/Latex document, base of documents that are not KBs - data indexed by IDs from a folksonomy / ontology (e.g. as in semantic wikis) ("Linked data – e.g. the Semantic Web – refer to KBs and the data they index or relate).
Note 1. As in programming, the more expressive the used formal language, the more objects (functions, relations, ...) can be represented explicitly (→ as "1st class" objects) and hence named, quantified, related, ... and then exploited. With most DB systems, the end-user cannot define nor set relations, the schema is mostly a tree → implicit redundancies (e.g. those that form-normalizing a DB, from 1NF to 6NF, tries to remove) → arbitrary choices (e.g. making a class Employee part of a class Company, or conversely, or choosing one plan/structure in an article, web site, wiki, ...). Note 2. The implementation of a KB system can reuse a DB system with a schema of mainly two tables/classes: one for "explicit relation nodes", one for explicit concept nodes.
Possibility+need to
search information semantically (not just lexically and structurally), by query (→ 100% precision/recall for logically complete results; → the list of results may be organized, e.g. by specialization relations) or navigation along semantic relations (as in a decision tree)
BUT
* manually representing information is difficult (especially for large KBs and general KS)
* automatically representing the content of a document requires the software to "understand" it
* automatically merging a set of KBs not unified into a "networked KB" by the knowledge authors does not lead to good results if only because the required information is only in the head of the knowledge authors
Entity £ Process Bird↗ ↖Counting //or: Flight Tweety↗
`no Bird can be agent of a Counting´ //false belief £ £ £ `at least 1 Bird ⇐ `at least 50% of Bird `every Clever Bird can be agent of can be agent of can be agent of a Counting´ a Counting´ a Counting´ ⇑ ⇖ ⇖↘ ⇗↙ `1 Bird `Tweety can be `every Bird can be agent of a Counting can be agent of that has for duration agent of a Counting´ at least 0.5 Hour´ a Counting´ ⇑//if ... ⇑↓ `Tweety has been agent of a Counting `every Bird|Bat can with duration at least 0.5 Hour´ be agent of a Counting´
Legend. "→": generalization that is not an implication, e.g. subtypeOf, instanceOf; "⇒": implication; "£": exclusion ( x £ y <=> ((x ⇒ ¬y) ∨ (x → ¬y)) ); "can": is able to; every sentence is in FE; relation types are in italics; concept types begin by an uppercase; the authors of IDs, sentences and relations are not represented (unlike in this example); in FE, "every" (alias 100%) and "%" are for "observations" and hence imply "at least 1", whereas "any" is for a "definition" and hence does not imply "at least 1"; the distinction is important since observations may be false while definitions cannot (since agents can give any identifier they want to the types they create) and thus cannot be corrected or contradicted
Knowledge sharing as commonly currently thought
(since 2000 / the W3C's "Linked Data"):
- for a known application and, generally, for efficient+complete knowledge inferencing
- between agents that can discuss to solve ambiguities: B2B KS.
KS in the general sense (or pre-W3C):
KR in explicit ways to maximize knowledge reuse.
KR creators do not make exploitation-dependent choices → each application can make them.
This goal was forgotten/abandoned since the "Linked Data" Web in the same way as advanced
researches on hypertext systems were forgotten/abandoned after the beginning of the Web.
The W3C (WWW Consortium), researchers in knowledge inferencing,
and
for hype/grant/publication/... reasons, most researchers since 2000
bet that, like "deep learning", (future) automatic knowledge extraction techniques can produce "seemingly good enough" results without bothering end-users with any KR details
→ misleading + unscalable "KS supporting" techniques + poorly reusable KRs.
General KS is necessary for
(a network of) KBs to behave as one well-organized KB instead of (mostly) data wrt. each other ↔ no implicit redundancy/implication/specialization/exclusion between IDs/KRs of these KBs ↔ no redundancy of efforts ↔ exploitation/access-to (within this network) all information conceptually relevant to a query in an organized way, without author based bias (→ popularity, lobbying, marketing, ...) except possibly on criteria decided by each end-user (e.g. for trust purposes) ↔ no incredible waste of efforts from information providers+seekers
(... General KS is necessary for ...)
“fair decision” supporting/enforcing system (e.g. direct democracy via a logicocracy) ↔ system preventing “unfair” decisions or marking them as invalid/non-enforceable: a “fair” decision is one that is logically-consistent and an optimum with the preferences and logically consistent beliefs represented in the exploited KBs of the persons affected by the decision; notes:
- preferences may also be on ways to aggregate and maximize preferences“fair work” supporting/enforcing system: ↔ when an agent commits to a task (e.g. accepts a job, some responsibilities, ...) but does not perform a subtask of it (e.g. answering all emails related to this task) the system (e.g. a workflow system exploiting KBs) reminds the agent and records+advertises the commitment breaching behaviors
2.1. Unrestricted/"à la carte" formal languages
2.2. General-purpose ontologies aligning top-level ones and lexical ones
2.3. KB servers that support non-restricting KB Sharing by Web users
2.4. KB servers that support networked KBs
are low-level (i.e. lacking keywords for common features)
→ not concise (→ hard to read) + not normalizing
for even more flexibility: KRLO, an ontology of KR models and notations
→ exploitable by tools to import/export formal structures in any of the
languages specified in the ontology
→ knowledge providers can use and mix the syntaxes/features they prefer
and are not restricted by existing KRLs/standards, and then
each application converts KRs to what its inference engine can exploit
(note: different programming/modelling paradigms – e.g. Petri Nets – focus on
different relation+concept types; all of these types can be
organized/related/defined into KRLO)
En: By definition, a flying_bird_with_2_wings is a bird that flies and has two wings.
LP: Flying_bird_with_2_wings (b) :=
Bird(b) ∧ ∃f Flight(f) ∧ agent(f,b)
∧ ∃w1,w2 Wing(w1) ∧ part(b,w1) ∧ part(b,w2) ∧ w1!=w2
FE: any Flying_bird_with_2_wings is a Bird that is agent of a Flight and
has for part 2 Wing.
FL: Flying_bird_with_2_wings = ^(Bird agent of: a Flight, part: 2 Wing).
RDF+OWL2 / Turtle:
:Flying_bird_with_2_wings owl:intersectionOf
(:Bird [a owl:Restriction; owl:onProperty :agent; owl:someValuesFrom :Flight]
[a owl:Restriction; owl:onProperty :wingPart; owl:qualifiedCardinality 2]);
UML_model / UML_concise_notation: |
En: On March 21st 2016, John Doe believed that in 2015 and in the USA,
at least 78% of adult healthy carinate birds were able to fly.
FL: [ [ [ [ [at least 78% of Adult Healthy Carinate_bird is able to be agent of: a Flight ]
place: USA ] time: 2015 ] believer: John_Doe ] time: 2016-03-21 ].
FE: ` ` ` ` `at least 78% of Adult Healthy Carinate_bird is able to be agent of: a Flight´
at place USA´ at time 2015´ for believer John_Doe´ at time 2016-03-21´.
IKLmE / Turtle:
[rdf:value
[rdf:value
[rdf:value
[rdf:value
[rdf:value [rdf:value [:agent_of [a :Flight]
]; pm:q_ctxt [quantifier "78to100pc";
rdf:type :Adult, :Healthy,
:Carinate_bird ]
]; pm:ctxt [:modality :Physical_possibility]
]; pm:ctxt [:place :USA]
]; pm:ctxt [:time "2015"]
]; pm:ctxt [:believer :John_Doe]
]; pm:ctxt [:time 2016-03-21] ].
Problem: "reusable" ontologies are hardly reused and hard to reuse together.
E.g.: different KBs hardly reuse top-level ontologies and lexical ones, or the same ones.
Because: most reusable ontologies (top-level ontologies, lexical ones, ...)
are not/poorly aligned.
Solution: general-purpose ontologies aligning top-level ones and lexical ones, in KB servers (not documents; see next two subsections). Example core: the MSO of the WebKB-2 server.
Problem: searching/merging/sharing/reusing/... knowledge is made difficult by the lack of relations between IDs/KRs from different users (and hence also by the inconsistencies and redundancies between these KRs) because shared KB servers
are insufficiently used (e.g. because the W3C guidelines and most research are about (semi-)independently developed KBs)
restrict what can be entered: restricted KRLs/domains, unscalable ways of keeping the KB consistent (committees, consensus, ...)
lack features for keeping an unrestricted "multi-authored KB" organized and easy to search/use/...
Solution (more detailed on the next page, if needed): using KB servers with KS protocols that maintain the organisation of shared KBs without requiring any restriction of content/KRL/...
Solution: using a system (KRL + KS protocol + interface) that
leads each ID and KR to be associated to its author → each statement becomes either a belief or a ID definition (note: such an association cannot be represented/exploited in OWL)
leads each "newly entered KR k1 that is inconsistent or redundant with an already entered KR k2" to be related to k2 (by k1´s author) via a relation of correction and/or implication and/or specialization (plus, in case of correction, a formal or semi-formal argument for it) → for inference purposes, choices between conflicting KRs can be automatically made based on their relations and information about their authors → information overload is avoided by the above cited organization and the possibility to set filters for not seeing particular kinds of KRs (e.g. KRs having a successfully justified correction) or KRs from particular authors → edit wars and discussions are resolved/avoided by leading to the accumulation of precisions (hence more and more formal ones; the process converges to a fully specified formal and consistent KB)
handle removals/updates by - storing and exploiting statements about correction relations, or - ID cloning mechanisms
→ solves the problems of module/document based versioning systems
u1#`every bird is agent of a flight´ | \c=> _[u3] | ↘ u3#`at least 75% of healthy flying_bird can be agent of a flight´ | ↑ |c=>/^ _[u2] |c=> _[u3] ⇐ ... ↓ | u2#`every bird can be agent of a flight´
Legend.
"------(typeID) _[userID]----→": relation of type typeID, created by userID
"u1#...": u1 is the author of the prefixed statement;
"c=>/^": correction and implication and semantic/structural generalization;
"c=>": correction and implication (no specialization/generalization);
"⇐": implication relation with destination on the left; "every": 100%
Intensional scope: specification of the kinds of objects (IDs or KRs) a KB is committed to accept from Web users.
Intensional core scope: part of an intensional scope specifying the kinds of objects that a server is committed to accept, even if, for each of these kinds of objects, another intensional core scope on the Web also includes this kind of objects (i.e., if at least another server has made the same storage commitment for this kind)
Extensional scope: structured textual document that lists each ID (in the ontology from this individual KB) using a normalized expression of the form: "<formal-ID-main-identifier>__scope <URL_of_the_KB>". This format permits KB servers to exploit Google-like search engines for knowing which KBs store a particular ID.
(scope) Nexus: KB server that has
specified in its non-core intensional scope that it committed to accept storing the following kinds of IDs and KRs whenever they do not fall in the scope of another existing nexus: - the subtypes, supertypes, types, instances of each type covered by its intensional scope, and - the direct relations from each of these last objects (that are stored in this KB only as long as no other nexus stores them).
Joining of an individual KB (server) to a networked KB: simply committing the KB server
insofar as the intensional scope allows it, to handle this command internally via the KB sharing protocol of WebKB-2 or another protocol with better properties (for efficiency reasons, when an object is in the core intensional scope but is related to other objects that are not in it, each of these other objects should be associated to the URL of another KB server that has this object within its core intensional scope)
to forward this command to the KB servers which, given their scopes, may handle it, at least partly. These servers are retrieved via the above cited URLs and/or exploitation of a Google-like search engine. Via this propagation, the commands are forwarded to all nexus that can handle them, and no KB server has to store all the IDs of all the KBs, even for interpreting nexus scopes. To counterbalance the fact that some KR forwardings may be lost or not correctly performed, i.e. that this "push-based strategy" may not always work, each KB server may also search other nexus having scopes overlapping its own scopes and then import some KRs from these nexus (this is the complementary "pull-based strategy"). Thus, KB servers with overlapping scopes have overlapping content but this redundancy is explicit and hence not harmful for inference purposes.
General KS