Tutorials

Hybrid Techniques for Knowledge-based NLP - Knowledge Graphs meet Machine Learning and all their friends

Jose Manuel Gomez-Perez, Ronald Denaux and Raul Ortega

Many different artificial intelligence techniques can be used to explore and exploit large document corpora that are available inside organizations and on the Web. While natural language is symbolic in nature and the first approaches in the field were based on symbolic and rule-based methods, many of the most widely used methods are currently based on neural approaches. Each of these two main schools of thought in natural language processing have their strengths and limitations and there is an increasing trend that seeks to combine them in complementary ways to get the best of both worlds. This tutorial covers the foundations and modern practical applications of knowledge-based and neural methods, techniques and models and their combination for exploiting large document corpora. The tutorial first focuses on the foundations that can be used to this purpose, including knowledge graphs, word embeddings, and language models. Then it shows how these techniques can be effectively combined in NLP tasks and other data modalities in addition to text related to research and innovation projects.

Linking, Extending, Exploiting and Enhancing Tabular Data with Wikidata

Daniel Garijo and Pedro Szekely

Knowledge graphs have become a common asset for repre- senting world knowledge in data driven models and applica- tions. In this tutorial participants will learn how to integrate, link and extend tabular data using Wikidata, a crowdsourced knowledge graph with over 60 million entities and a lively community of curators. The first part of the tutorial presents an overview of Wikidata, its data model and its query and update APIs. The second part presents tools to link tabular datasets to Wikidata, to augment them using Wikidata, to extend Wikidata with new data and to query and visualize the extended Wikidata knowledge graph.

Build a large-scale cross-lingual text search engine from scratch

Carlos Badenes-Olmedo, Jose Luis Redondo-Garcia and Oscar Corcho

Searching for similar documents and exploring major themes cov- ered across groups of documents are common actions when brows- ing collections of scientific papers. This manual, knowledge-intensive task may become less tedious and even lead to unforeseen relevant findings if unsupervised algorithms are applied to help researchers. Most text mining algorithms represent documents in a common feature space that abstracts away from the specific sequence of words used in them. Probabilistic Topic Models reduce that fea- ture space by annotating documents with thematic information. Over this low-dimensional latent space some algorithms have been proposed to perform document similarity search. However, text search engines are based on algorithms that use term matching to measure similarity among texts (e.g TF-IDF, BM25) making a prior translation of multilingual texts required to relate them. In large-scale scenarios, this requirement is difficult to assume due to its high computational and storage cost. The aim of this tutorial is to show the foundations and modern practical applications of knowledge-based and statistical methods for exploring large document corpora. It will first focus on many of the techniques required for this purpose, including natural lan- guage processing tasks, approximate nearest neighbours methods, clustering algorithms, probabilistic topic models, and will then describe how a combination of these techniques is being used in practical applications for browsing large multilingual document corpora without the need to translate texts. Participants will be involved in the entire process of creating the necessary resources to finally build a multilingual text search engine.