ABSTRACT Informatics Department

Theses and Dissertations

2019

ABSTRACTS

Departamento de Informática
Pontifícia Universidade Católica do Rio de Janeiro - PUC-Rio
Rio de Janeiro - Brazil

This file contains the list of the MSc. Dissertations and PhD. Thesis presented to the Departmento de Informática, Pontifícia Universidade Católica do Janeiro - PUC-Rio, Brazil, in 2019. They are all available in print format and, according to the authors' preference, some of them are freely available for download, while others are freely available for download to the PUC-Rio community exclusively(*).

For any requests, questions, or suggestions, please contact:
Rosane Castilho bib-di@inf.puc-rio.br

Last update: 22/MARCH/2019

INDEX

[In construction; sometimes, digital versions may not be available yet]

[19_MSc_varela]
Guilherme Sant’Anna VARELA. Anotação profunda de papéis semânticos para o Português. [Title in English: Deep semantic role labeling for Portuguese]. M.Sc. Diss. Port. Presentation: 21/01/2019. 74 p. Advisor: Sérgio Colcher. DOI.

Abstract: We live in a complex world in which a myriad of seemingly unrelated factors – such as Moore’s law which states that the processing capacity on a silicon wafer should increase exponentially, the fall of storage costs and mass adoption of smart-phones contribute to the formation of an increasingly inter-dependent society: 2.5 quintillion bytes of data are generated every day, in fact ninety percent of the world’s data were created in the last few years. Harnessing the emerging patterns within the data, effectively separating information from chaos is crucial for both individual decision making as well as for the survival of organizations. In this scenario the best answer from Natural Language Processing researchers is the task of Semantic Role Labeling. SRL is the task the concerns itself with the audacious goal of event understanding, which means determining ‘Who did what to whom’, ‘Who was the beneficiary?’ or ‘What were the means to achieve some goal´. APS is also an intermediary task to high level applications such as information extraction, question and answering and chatbots. Traditionally, satisfactory results were obtained only by the introduction of highly specific domain knowledge. For Portuguese, this approach is able to yields a F1 score of 79.6%. Recent systems, rely on a pipeline of sub-tasks, yielding a F1 score of 58%. In this dissertation, we adopt a new paradigm using recurrent neural networks for the Brazilian Portuguese, that does not rely on a pipeline, our system obtains a score of 66.23%.

[19_MSc_brito]
Miguel Mendes de BRITO. Aprendizado profundo aplicado à segmentação de texto. [Title in English: Deep learning applied to text chunking]. M.Sc. Diss. Port. Presentation: 21/01/2019. 65 p. Advisor: Sérgio Colcher. DOI

Abstract: Natural Language Processing is a research field that explores how computers can understand and manipulate natural language texts. Sequence tagging is amongst the most well-known tasks in NLP. Text Chunking is one of the problems that can be approached as a sequence tagging problem. Thus, we classify which words belong to a chunk, where each chunk represents a disjoint group of syntactically correlated words. This type of chunking has important applications in more complex tasks of natural language processing, such as dependency parsing, machine translation, semantic role labeling, clause identification and much more. The goal of this work is to present a deep neural network archtecture for the Portuguese text chunking problem. The corpus used in the experiments is the Bosque, from the Floresta Sintá(c)tica project. Based on recent work in the field, our approach surpass the state-of-the-art for Portuguese by achieving a F =1 of 90.51, which corresponds to an increase of 2.56 in comparison with the previous work. In addition, in order to attest the chunker effectiveness we use the tags obtained by our system as feature for the depedency parsing task. These features improved the accuracy of the parser by 0.87.