Horacio Saggion & Francesco Ronzano
DTIC / Universitat Pompeu Fabra
More than 2,5 million scientific papers are published every year. Practically the totality of these publications is accessible online and part of them (13%) is freely available as Open Access content. As a consequence, nowadays researchers, as well as any other interested actor, are overwhelmed by a huge number of articles to consider, thus turning any activity that requires a careful and comprehensive assessment of scientific literature into an considerably complex and time-consuming task. Autoamted approaches and tools to analyze scientific publications by extracting and aggregating relevant content are needed to cope with such huge amount of papers.
Natural Language Processing Technology represents a key enabling factor in providing scientists with intelligent patterns to access to scientific information. Extracting information from scientific papers, for example, can contribute to the development of rich scientific knowledge bases which can be leveraged to support intelligent knowledge access and question answering. Summarization techniques can reduce the size of long papers to their essential content or automatically generate state-of-the-art-reviews. Paraphrase or textual entailment techniques can contribute to the identification of relations across different scientific textual sources. This tutorial provides an overview of the most relevant tasks related to the processing of scientific documents, including but not limited to the in-depth analysis of the structure of the scientific articles, their semantic interpretation, citation analysis, content extraction and summarization.