Tutorial 1: Reading the Web

The Web is inundated with information in unstructured text form. Machine Reading is a research area that brings together different approaches from diverse research communities, such as Natural Language Processing, Information Extraction, Machine Learning and Data Mining, focusing on the design of computer systems capable of reading natural language (unstructured) text and storing it in knowledge bases (i.e., data stores especially encoded to support subsequent machine reasoning, inference and decision). Thus, Machine Reading algorithms and systems are developed to produce language-understanding technology allowing to automatically process text in affordable time, thus making knowledge discovery from the Web possible. In this tutorial the idea of automatically reading the Web using Machine Reading techniques will be approached. Thus, an initial and brief overview on supervised, unsupervised and semi- supervised Machine Learning methods and algorithms will be given. In the sequel, three of the most successful Machine Reading approaches intended to Read the Web (namely Know it All, Yago and NELL systems) will be presented and discussed. The principles, the subtleties and current results of each approach will be addressed, the on-line resources will be explored and the future directions in each system will be pointed out.


Estevam R. Hruschka Jr.

Federal University of Sao Carlos, Brazil

estevam@dc.ufscar.br | http://www.dc.ufscar.br/~estevam