Towards more semantically structured and integrated course materials, assignments and student feedbacks

A subject for an M.Sc. thesis (more precisely, a "D.E.A." in French)

Authors: Dr Philippe A. MARTIN, Dr Michel Eboueya

Introduction

Computer-assisted learning, and hence e-learning, most often simply uses emails, a discussion forum and Web-accessible hypermedia documents for the course materials. However, most advances in CSCW (Computer Supported Cooperative Work), or more generally Information Technology, can be re-used for computer-assisted learning (e.g., online games and simulations can be used for motivating students and letting them experiment). Here, we focus on the application and refinement of certain knowledge representation and sharing techniques. We distinguish four kinds of research in that general direction.

Creating a formal knowledge base (KB) representing one or several research domains to support automatic problem solving (and then the explanation of the provided solution). These are ambitious and long term projects. One example is the QED Project which aims to build a formal KB of all important, established mathematical knowledge. Another example is the "Digital Aristotle" project which aims to build a system capable of teaching much of the world's scientific knowledge by (i) adapting to its students' knowledge and preferences (Hillis, 2004), and (ii) preparing and answering (with explanations) test questions for them (this implies the encoding of the knowledge in a formal way and meta-reasonings on the problem-solving strategies).
Creating tools that let people (usually researchers) add to a database (DB) or a large KB to represent the factual knowledge of a domain in a (rather) formal way, for knowledge retrieval/sharing/teaching purposes rather than for automatic problem solving purposes. A DB contains a lot of data that follow a small and fixed conceptual schema. On the other hand, a KB contains an ontology, that is, a set of user-provided definitions for categories (concept types, relation types, instances), and a set of facts or rules written using the defined categories. Examples of such DBs/KBs/ontologies are the botanical databases, the Gene ontology and the OpenGalen medical ontology. In order to ease the creation of a shared KB, a KB system must be Web-accessible (it is then called a KB server or an ontology server), should provide user-friendly interfaces/notations and a default large lexical ontology, and should have multi-user cooperation protocols. WebKB-2 (Martin, 2003) is currently the only knowledge server that has these last three characteristics.
Supporting the manual, semi-automatic or automatic indexation/representation - and then retrieval - of (parts of) documents (e.g., research articles or user manuals) with categories or formal statements for information retrieval (IR) purposes. Depending on its focus, such research may be catalogued in different areas: Document Retrieval, precision-oriented IR, Question Answering, Semantic Web, Knowledge Extraction, Natural Language Understanding/Extraction, Terminological Analysis, etc. With formal annotation servers such as WebKB-1 (Martin & Eklund, 2000) people can mix formal indexations/representations with informal statements within a same document. However, in any case, information sharing is restricted to adding a document into a collection of documents.
Letting people (e.g., teachers, researchers, students) add, edit and structure information in a semi-formal way into a shared repository. This leads to less information redundancies, and eases information retrieval/understanding/comparison. Some tools for that are argumentation systems such as AAA (Schuler & Smith, 1990), typed hypertext systems such as MacWeb (Nanard & al.. 1993), semi-formal KB systems such as CODE4 (Skuce and Lethbridge, 1995), formal KB servers such as WebKB-2, and nowadays Semantic wikis, that is, wikis systems such as Wikimedia (which supports Wikipedia, Wikitionary, etc.) but allowing the use of some semantic relations such as subtypeOf, partOf and agentOf). Some of these systems support (large) KBs and hence can also be used for Point 2.

The research described below relates to Point 2 and Point 4. We believe that learning will be improved if a good part of the information from the course related materials, assignments and discussions with/between the students is (or can incrementally become) semantically structured (and hence inter-related or integrated). More precisely, teachers and students should be able to represent the connections from an object (concept, definition, assertion, question) to other objects at the level of precision they want but in an explicit and semantically valid way, and in an easily retrievable way. For example, for a course about Natural Language Parsing, a semantic network of the various involved concepts (techniques, possible goals, features and inputs or outputs of the techniques, tasks, subtasks, strategies, building blocks, problems, debated ideas, existing tools, possible features of such tools, etc.) should be given and organised using specialization relations, partOf relations, argumentation relations, etc. More concrete examples are refered to below.

Such a semantic network has several advantages over a "bunch of sentences". For example, (i) it is much less ambiguous than informal texts (e.g., it is often difficult to understand and remember the specialization relations between techniques or features when reading a book or a manual, and hence understand or compare techniques or tools), (ii) it permits to easily access a particular object and its related objects (e.g., its features, specializations, arguments, counter-arguments), and then again this eases the understanding of an object or a whole domain, (iii) it can be extended by a teacher or any student for feedback purposes (e.g., for a student to clarify or ask for a clarification of a point that she found ambiguous or badly explained in a course material or an assignment) or for evaluation purposes (correctly extending a part of such a network shows a good undertanding of the related part of the course), and (iv) if it is a good state of the art, it could be adopted by researchers the domain to represent their research, thereby becoming an alternative (and much more efficient for knowledge sharing purposes) way to publish or retrieve information on a domain than journals or Web pages.

A semantic network is more difficult to update (correctly) than writing sentences since it forces analysis and the respect of some normalization rules. This is both a challenge (how to make updates easy enough) and an advantage (the network can be searched, filtered or, more generally, exploited, in much deeper ways than informal sentences). Not all the course related materials, assignments and discussions should have to be organised into the global semantic network, only the parts for which this seems interesting. Here are various ressources that are important to consider, re-use and extend for this M.Sc. thesis.

This article (Martin & al, 2005) describing how WebKB-2 and its cooperation protocols (which permit its users to store and tightly interconnect their knowledge into a same large KB without having to agree on terminology or beliefs) can be extended for handling semi-formal knowledge (e.g., structured discussions) and comparing tools or techniques.
These first quick modellings of various domains: Logics, Information Sciences, Knowledge Management, Conceptual Graphs, Formal Concept Analysis.
This comparison of CG tools.
These structured discussions: about abortion, about XML for Knowledge Representation.
This experiment about creating a semantic network for Workflow Management concepts and letting student extend it (note: no knowledge server was used during this experiment).
This experiment evaluating different computer-supported argumentation approaches.
This article about concept maps. Note: this article is a good source of information but, because concept maps are not formal enough, they are often less interesting (less understandable, more ambiguous) than the sentences they represent (this is also why they can often be automatically extracted from these sentences). However, concept maps are already used in teaching for reasons similar to the ones outlined above, e.g. see this site and google on "concept map" and "teaching".
Semantic wikis are also used in teaching for reasons similar to the ones outlined above (e.g., google on "semantic wiki" and "teaching"). Although still very interesting for learning, Wikipedia is regretably not structured enough (strangely, Wikipedia even has policies against structuring) and many semantic wikis are also mainly informal.

Proposed tasks for this thesis

Modelling tasks:

Representing and comparing the concepts, tools and techniques related to knowledge-oriented approaches usable computer-assisted learning (especially for the above cited Point 4). If time permits, the student should also do so for another domain of his/her choice. The models to follow are described in the above cited article, comparison of CG tools and and first quick modellings of various domains (hence, WebKB-2 and its ontology should be used whenever (semi-)formal parts are used).
Organising the debates about the approaches related to knowledge-oriented approaches usable computer-assisted learning (especially for the above cited Point 4) into structured discussions. If time permits, the student should also do so for another domain of his/her choice. This may lead to suggestions of extensions of the current notation. Some research into how to extend the "algorithm to quantify the popularity and originality of each contribution and contributor" or other ways to exploit structured discussions should be considered.

Programming tasks: (yet to be decided)

All modellings and code should be in English only. If this is legally possible, the thesis should be written in English too.

References

W.D. Hillis (2004). "Aristotle" (The Knowledge Web). Edge Foundation, Inc., No 138, May 6, 2004.

Ph. Martin & P. Eklund (2000). Knowledge Indexation and Retrieval and the Word Wide Web. IEEE Intelligent Systems, special issue "Knowledge Management and Knowledge Distribution over the Internet", May/June 2000.

Ph. Martin (2003a). Knowledge Representation, Sharing and Retrieval on the Web. Chapter of a book titled "Web Intelligence", (Eds.: N. Zhong, J. Liu, Y. Yao), Springer-Verlag, Jan. 2003.

Ph. Martin (2003b). Correction and Extension of WordNet 1.7. Proc. of ICCS 2003 (Dresden, Germany, July 2003), Springer Verlag, LNAI 2746, 160-173.

Ph. Martin, M. Blumenstein and P. Deer (2005). Toward cooperatively-built knowledge repositories. Proc. of ICCS 2005, (Kassel, Germany, July 2005),Springer Verlag, LNAI 3596, pp. 411-424.

W. Schuler and J.B. Smith (1990). Author's Argumentation Assistant (AAA): A Hypertext-Based Authoring Tool for Argumentative Texts. Proc. of ECHT'90 (INRIA, France, Nov. 1990), Cambridge University Press, 137-151.

D. Skuce and T.C. Lethbridge (1995). CODE4: A Unified System for Managing Conceptual Knowledge. International Journal of Human-Computer Studies, 42, 413-451.
See also the successor / commercial version: Fact Guru.

D.A. Smith (1998). Computerizing computer science. Communications of the ACM, 41(9), 21-23.

Kassel G., Abel M.-H., Barry C., Boulitreau P., Irastorza C. & Perpette S. Construction et exploitation d'une ontologie pour la gestion des connaissances d'une équipe de recherche. In Actes de la Conférence en Ingénierie des Connaissances. IC 2000 (Toulouse, France), 251-259.

J. Nanard, M. Nanard, A. Massotte, A. Djemaa, A. Joubert, H. Betaille, J. Chauché (1993). Integrating Knowledge-based Hypertext and Database for Task-oriented Access to Documents, in Proc. DEXA 1993, Prague, Springer Verlag, LNCS Vol. 720, 721-732.