K-CAP 2023

K-CAP Tutorials

Decoding the grammar of DNA using Natural Language Processing

Organizers: Tyrone Chen, Navya Tyagi, Sonika Tyagi

Duration: Half-day

DNA is the blueprint defining all living organisms. Therefore, understanding the nature and function of DNA is at the core of all biological studies. Rapid advances in DNA sequencing and computing technologies over the past few decades resulted in large quantities of DNA generated for diverse experiments, exceeding the growth of all major social media platforms and astronomy data combined. However, biological data is both complex and high-dimensional, and is difficult to analyse with conventional methods. Machine learning is naturally well suited to problems with a large volume of data and complexity. In particular, applying Natural Language Processing to the genome is intuitive, since DNA is a natural language. Unique challenges exist in Genome-NLP over natural languages, including the difficulty of word segmentation or corpus comparison. To tackle these challenges, we developed the first automated and open-source genomeNLP workflow that enables efficient and accurate knowledge extraction on biological data, automating and abstracting preprocessing steps unique to biology. This lowers the barrier to perform knowledge extraction by both machine learning practitioners and computational biologists. In this tutorial, we will demonstrate how our workflow can be used to address the above challenges, with implications in fields such as personalised medicine.

Declarative Construction and Validation of Knowledge Graphs

Organizers: Ana Iglesias-Molina, Xuemin Duan

Duration: Half-day

The wide adoption of knowledge graphs have boosted the develop- ment of techniques and tools to support their use along their life cycle. Among them we focus on declarative approaches designed for knowledge graph construction, that rely on the use of mapping languages (e.g. R2RML, RML, SPARQL-Anything) to describe the transformation process. The preliminary limitations of these technologies have been progressively addressed with the efforts of the community so as to overcome their limitations and motivate their adoption. Our objective with this tutorial is to explain the progress on declarative mapping technologies to tackle more complex use cases, and show from a practical perspective the tools and methods that ease the mapping creation process and integration in KG construction pipelines. Furthermore, we also want to present how declarative approaches can also be exploited for constructing, but validating knowledge graphs. Our objective is to show the benefits that declarative approaches can bring into the production of high-quality knowledge graphs, and assists them along their life cycle.

Ordinal Methods for Knowledge Representation and Capture (OrMeKR)

Organizers: Tom Hanika, Dominik Dürrschnabel, Johannes Hirth

The concept of order (i.e., partial ordered sets) is predominant for perceiving and organizing our physical and social environment, for inferring meaning and explanation from observation, and for searching and rectifying decisions. Compared to metric methods, however, the number of (purely) ordinal methods for capturing knowledge from data is rather small, although in principle they may allow for more comprehensible explanations. The reason for this could be the limited availability of computing resources in the last century, which would have been required for (purely) ordinal computations. Hence, typically relational and especially ordinal data are first embedded in metric spaces for learning. Therefore, in this tutorial we want discuss ordinal methods for capturing and representing knowledge, their role in inference and explainability, and their possibilities for knowledge visualization and communication. We want to reflect on these topics in a broad sense, i.e., as a tool to arrange, compare and compute ontologies or concept hierarchies, as a feature in learning and capturing knowledge, and as a performance measure to evaluate model performance.