Theses and Dissertations
2019
ABSTRACTS
Departamento de Informática
Pontifícia Universidade Católica do Rio de Janeiro - PUC-Rio
Rio de Janeiro - Brazil
This file contains the list of the MSc. Dissertations and PhD. Thesis presented to the Departmento de Informática, Pontifícia Universidade Católica do Janeiro - PUC-Rio, Brazil, in 2019. They are all available in print format and, according to the authors' preference, some of them are freely available for download, while others are freely available for download to the PUC-Rio community exclusively(*).
For any requests, questions, or suggestions, please contact:
Rosane Castilho
bib-di@inf.puc-rio.br
Last update: 22/MARCH/2019
[In construction; sometimes, digital versions may not be available yet]
[19_MSc_varela]
Guilherme Sant’Anna VARELA.
Anotação profunda de papéis semânticos para o Português.
[Title in English: Deep semantic role labeling for Portuguese]. M.Sc. Diss.
Port. Presentation:
21/01/2019. 74 p. Advisor: Sérgio Colcher. DOI.
Abstract: We live in a complex world in which a myriad of seemingly
unrelated factors – such as Moore’s law which states that the processing
capacity on a silicon wafer should increase exponentially, the fall of storage
costs and mass adoption of smart-phones contribute to the formation of an
increasingly inter-dependent society: 2.5 quintillion bytes of data are
generated every day, in fact ninety percent of the world’s data were created in
the last few years. Harnessing the emerging patterns within the data,
effectively separating information from chaos is crucial for both individual
decision making as well as for the survival of organizations. In this scenario
the best answer from Natural Language Processing researchers is the task of
Semantic Role Labeling. SRL is the task the concerns itself with the audacious
goal of event understanding, which means determining ‘Who did what to whom’,
‘Who was the beneficiary?’ or ‘What were the means to achieve some goal´. APS is
also an intermediary task to high level applications such as information
extraction, question and answering and chatbots. Traditionally, satisfactory
results were obtained only by the introduction of highly specific domain
knowledge. For Portuguese, this approach is able to yields a F1 score of 79.6%.
Recent systems, rely on a pipeline of sub-tasks, yielding a F1 score of 58%. In
this dissertation, we adopt a new paradigm using recurrent neural networks for
the Brazilian Portuguese, that does not rely on a pipeline, our system obtains a
score of 66.23%.
[19_MSc_brito]
Miguel Mendes de BRITO. Aprendizado profundo aplicado à segmentação de texto. [Title in
English: Deep learning applied to text chunking].
M.Sc. Diss. Port. Presentation: 21/01/2019. 65 p. Advisor: Sérgio Colcher. DOI
Abstract:
Natural Language Processing is a research field that explores how computers can
understand and manipulate natural language texts. Sequence tagging is amongst
the most well-known tasks in NLP. Text Chunking is one of the problems that can
be approached as a sequence tagging problem. Thus, we classify which words
belong to a chunk, where each chunk represents a disjoint group of syntactically
correlated words. This type of chunking has important applications in more
complex tasks of natural language processing, such as dependency parsing,
machine translation, semantic role labeling, clause identification and much
more. The goal of this work is to present a deep neural network archtecture for
the Portuguese text chunking problem. The corpus used in the experiments is the
Bosque, from the Floresta Sintá(c)tica project. Based on recent
work in the field, our approach surpass the state-of-the-art for Portuguese by
achieving a F =1 of 90.51,
which corresponds to an increase of 2.56 in comparison with the previous work.
In addition, in order to attest the chunker effectiveness we use the tags
obtained by our system as feature for the depedency parsing task. These features
improved the accuracy of the parser by 0.87.