Theses and Dissertations
2018
ABSTRACTS
Departamento de Informática
Pontifícia Universidade Católica do Rio de Janeiro - PUC-Rio
Rio de Janeiro - Brazil
This file contains the list of the MSc. Dissertations and PhD. Thesis presented to the Departmento de Informática, Pontifícia Universidade Católica do Janeiro - PUC-Rio, Brazil, in 2018. They are all available in print format and, according to the authors' preference, some of them are freely available for download, while others are freely available for download to the PUC-Rio community exclusively(*).
For any requests, questions, or suggestions, please contact:
Rosane Castilho
bib-di@inf.puc-rio.br
Last update: 22/MARCH/2019
[In construction; sometimes, digital versions may not be available yet]
[18_MSc_concepcion]
[18_MSc_schottler]
Adrian CONCEPCION LÉON.
Secure distributed ledgers to support loT technologies data.
[Title in Portuguese: Ledgers seguros e distribuídos para
suportar dados de tecnologia IoT]. M.Sc. Diss. Eng. Presentation:
18/10/2018. 63 p. Advisor: Markus Endler. DOI.
Abstract:
Allan
Werner SCHÖTTLER.
Visualização de fluxo em reservatórios de petróleo usando LIC volumétrico.
[Title in English: Visualizing flow in black-oil reservoirs using volumetric LIC]. M.Sc. Diss. Eng. Presentation:
14/09/2018. 45 p. Advisor: Waldemar Celles Filho. DOI.
Abstract: In the oil industry, clear and unambiguous visualization of vector
fields resulting from numerical simulations of black-oil reservoirs is
essential. In this dissertation, we study the use of line integral convolution
techniques (LIC) for imaging 3D steady vector fields and apply the results to a
GPU-based volume rendering algorithm. Due to the density of information present
in volume renderings of LIC images, we study the use of sparse textures as input
to the LIC algorithm and apply transfer functions to assign color and opacity to
scalar fields in order to encode visual information to voxels and alleviate the
occlusion problem. Additionally, we address the problem of encoding flow
orientation, inherent to LIC, using an extension of the algorithm – Oriented LIC
(OLIC). Finally, we present a method for volume animation in order to enhance
the flow orientation. We then compare results obtained with LIC and with OLIC.
[18_PhD_bueno]
[18_MSc_paganelli]
[18_PhD_pochet]
[18_MSc_camilo]
Andre Luis Cavalcanti BUENO.
Relaxamento adaptativo da sincronização através do
uso de métodos de aprendizagem supervisionada. [Title in
English: Adaptive relaxed synchronization through the use of supervised learning
methods].
Ph.D. Thesis. Port. Presentation: 07/03/2018. 80 p. Advisor: Noemi de La Rocque
Rodriguez; co-advisor Eliza Dominguez Sotelino (Eng. Civil, PUC-Rio).
DOI
Abstract:
Parallel computing systems have become pervasive, being used to interact with
the physical world and process a large amount of data from various sources. It
is essential, therefore, the continuous improvement of computational performance
to keep up with the increasing rate of the amount of information that needs to
be processed. Some of these applications admit lower quality in the final result
in exchange for increased execution performance. This work aims to evaluate the
feasibility of using supervised learning methods to ensure that the Relaxed
Synchronization technique, used to increase execution performance, provides
results within acceptable limits of error. To do so, we have created a
methodology that uses some input data to assemble test cases that, when
executed, will provide input values for the training of supervised learning
methods. This way, when the user uses his/her application (in the same training
environment) with a new input, the trained classification algorithm will suggest
the relax synchronization factor that is best suited to the triple
application/input/execution environment. We used this methodology in some
well-known parallel applications and showed that, by combining Relaxed
Synchronization with supervised learning methods, it was possible to maintain
the maximum established error rate. In addition, we evaluated the performance
gain obtained with this technique for a number of scenarios in each application.
Antonio Iyda PAGANELLI.
Reliability of Wii balance
board and Microsoft Kinect for capturing posturographic information during
balance tests.
[Title in Portuguese: Confiabilidade do Wii balance board e do Microsoft Kinect na
captura de informações posturográficas durante testes de equilíbrio]. M.Sc. Diss.
Eng. Presentation: 21/09/2018. 148 p. Advisor: Alberto Barbosa Raposo. DOI.
Abstract: Body balance is an important physical skill and it is fundamental
for elderly´s health, considering that falls are a major cause of unintentional
injuries leading to the loss of autonomy and death in this group. Growth of
aging in world population and being balance impairment one of the major causes
of physiotherapeutic attendance, simple, affordable, portable, and reliable
devices for evaluating body balance are of great relevance. Several studies have
been examining concurrent validity and reliability of Microsoft Kinect (Kinect)
and Nintendo Wii Balance Board (WBB) during balance tests. The majority of these
studies suggested that those devices could be used as reliable and valid tools
for assessing balance in semistatic positions. Based on that, this study
investigated test-retest reliability using Kinect and WBB, concurrently, in
three standing positions, and analyzed variables related to center of pressure (CoP)
and center of gravity (CoG), in static manikins and in 70 healthy subjects. Each
participant performed the set of tests twice in the same day. Our solution
demonstrated sensibility to identify different body sway patterns. Tests showed
that the most reliable variables were average speed and total path length in all
directions and tasks. Despite tests with static manikin signalized excellent
reliability, tests with individuals were considered poor to good. However,
variables of consolidated data based on different tasks achieved excellent
scores. CoP properties outperformed those related to CoG, suggesting that WBB
was superior when compared to Kinect in providing more reliable body sway
information. This study reinforced that these devices may provide reliable
quantitative information that enhances qualitative body balance assessments.
Axelle Dany Juliette POCHET.
Modeling of
geobodies: AI
for seismic fault detection and all-quadrilateral mesh generation.
[Title in English: Modelagem de objetos geológicos: IA para detecção automática
de falhas e geração de malhas de quadriláteros]. Ph.D. Thesis. Eng. Presentation: 28/09/2018.
128 p. Advisor: Marcelo Gattass. DOI.
Abstract: Safe oil exploration requires good numerical modeling of the
subsurface geobodies, which includes among othersteps: seismic interpretation
and mesh generation. This thesis presents a study in these two areas. The first
study is a contribution to data interpretation, examining the possibilities of
automatic seismic fault detection using deep learning methods. In particular, we
use Convolutional Neural Networks (CNNs) on seismic amplitude maps, with the
particularity to use synthetic data for training with the goal to classify real
data. In the second study, we propose a new two-dimensional all-quadrilateral
meshing algorithm for geomechanical domains, based on an innovative quadtree
approach: we define new subdivision patterns to efficiently adapt the mesh to any
input geometry. The resulting mesh is suited for Finite Element Method (FEM)
simulations.
Bernardo de Campos Vidal CAMILO. Uma avaliação
experimental de hashing consistente com cargas limitadas na distribuição
de vídeos online.
[Title in English: An experimental evaluation of consistent hashing with bounded
loads in online video distribution]. M.Sc. Diss. Port. Presentation: 06/09/2018.
54 p. Advisor: Noemi de La Rocque Rodriguez. DOI.
Abstract: Video consumption accounts for a large part of Internet traffic
today and tends to increase further in the next years. In this work, we
investigate ways to improve caching in video content delivery networks (CDNs) to
reduce their response time and increase the users’ quality of experience. From
the analysis of different techniques, we concluded that consistent hashing with
bounded loads has interesting characteristics for this purpose and fits
adequately to the video delivery scenario. In order to verify its performance,
we created an experimentation platform and, using data from a real video CDN,
confronted it with the consistent hashing and the least connections balancing
method, all implemented in an equivalent manner to permit a fair comparison.
Lastly,we discussed the results of this evaluation, highlighting the benefits and
limitations of this technique in the considered context.
[18_MSc_redlich]
[18_MSc_boucas]
Caroline Rosa REDLICH.
Segmentação de imagens baseada em grafos de superpixel.
[Title in English: Image segmentation based on superpixel graphs]. M.Sc. Diss. Port. Presentation:
19/04/2018. 74 p. Advisor: Marcelo Gattass.
DOI
Abstract: Image segmentation for object modeling is a complex task that is
still not well solved. The separation of the regions corresponding to each
object in an image is based on proximity, similarity, and discontinuity of its
boundaries. The image to be segmented can be of various natures, including
photographs, medical and seismic images. We can find in literature many proposed
segmentation methods used as solutions to different problems. Recently the
superpixel technique has been used as an initial step that reduces the size of
the problem input. This work proposes a methodology of segmentation of
photographs and ultrasound images based on variants of superpixels. The proposed
methodology adapts to the image’s nature and to the problem’s complexity using
different measures of similarity and distance. This work also presents results
that seek to clarify the proposed procedure and the choice of its parameters.
Cesar de Souza BOUÇAS. Análise de dependência baseada em transição aplicada a
Universal Dependencies. [Title in English: Transition based
dependency parsing applied on Universal Dependencies]. M.Sc. Diss.
Port. Presentation: 22/10/2018. 72 p. Advisor: Ruy Luiz Milidiú. DOI.
Abstract: Dependency parsing is the task that transforms a sentence into a
syntactic structure, usually a dependency tree, that represents relations
between words. This representations are useful to deal with several tasks that
arises with the increasing volume of textual online information and the need for
technologies that depends on NLP tasks to work. It can be used, for example, to
enable computers to infer the meaning of words of multiple natural languages.
This paper presents dependency parsing with focus on one of its most popular
modeling in machine learning: the transition-based method. A greedy
implementation of this model with a simple neural network-based classifier is
used to perform experiments. Universal Dependencies treebanks are used to train
and then test the system using the validation script published in the CoNLL-2017
shared task. The results empirically indicate the benefits of initializing the
input layer of the network with word embeddings obtained through pre-training.
It reached 84.51 LAS in the Portuguese of Brazil test set and 75.19 LAS in the
English test set. This result is nearly 4 points behind the performance of the
best results of transition-based parsers.
[18_MSc_mesejo-leon]
[18_MSc_menezes]
[18_PhD_rego]
[18_MSc_pereira]
[18_MSc_alves]
[18_MSc_motta]
[18_MSc_homsi]
[18_MSc_schutz]
[18_MSc_silva]
[18_PhD_schardong]
[18_MSc_ferreira]
[18_MSc_hurtado]
[18_PhD_coelho]
[18_MSc_sastre]
[18_PhD_eichler]
[18_MSc_melo]
[18_PhD_sousa]
[18_MSc_almeida]
[18_MSc_pitta] [18_MSc_souza]
[18_PhD_albuquerque]
Abstract: This addresses the Dominating Set Problem, an NP-hard problem with
great relevance in applications related to wireless network design, data mining,
coding theory, among others. The minimum dominating set in a graph is a minimal
set of vertices so that each vertex of the graph belongs to it or is adjacent to
a vertex of this set. We study three variants of the problem: first, in the
presence of weights on vertices, searching for a dominating set with smallest
total weight; second, a variant where the subgraph induced by the dominating set
needs to be connected, and, finally, the variant that encompasses these two
characteristics. To solve these three problems, we propose a hybrid algorithm
based on tabu search with additional mathematical-programming components,
leading to a method sometimes called "matheuristic". Several additional
techniques and large neighborhoods are also employed to reach promising regions
in the search space. Our experimental analyses show the good contribution of all
these individual components. Finally, the algorithm is tested on the covering
code problem, which can be viewed as a special case of the minimum dominating
set problem. The codes are studied for the Hamming metric and the
Rosenbloom-Tsfasman metric. For this last case, several shorter codes were
found.
[18_MSc_benedicte]
[18_PhD_santos]
[18_MSc_gouvea]
[18_MSc_velmovitsky]
[18_PhD_diniz]
[18_MSc_sampaio]
[18_PhD_engiel]
[18_MSc_sampaio]
[18_MSc_azevedo]
[18_MSc_morejon]
[18_MSc_rocha]
[18_MSc_venieris]
[18_MSc_gouvea]
[18_MSc_monteiro]
[18_MSc_mesquita]
[18_PhD_pinto]
[18_MSc_pacheco]
[18_PhD_santos]
Daniel Alejandro MESEJO-LEÓN.
Approximate nearest neighbor search for the Kullback-Leibler divergence. [Title in Portuguese:
Busca Aproximada de vizinhos mais próximos para a divergência de Kullback-Leibler].
M.Sc. Diss. Eng. Presentation: 09/01/2018. 59 p. Advisor: Eduardo Sany Laber.
DOI
Abstract:
In a number of applications, data points can be represented as probability
distributions. For instance, documents can be represented as topic models,
images can be represented as histograms and also music can be represented as a
probability distribution. In this work, we address the problem of the
Approximate Nearest Neighbor where the points are probability distributions and
the distance function is the Kullback-Leibler (KL) divergence. We show how to
accelerate existing data structures such as the Bregman Ball Tree, by posing the
KL divergence as an inner product embedding. On the practical side we
investigated the use of two, very popular, indexing techniques: Inverted Index
and Locality Sensitive Hashing. Experiments performed on 6 real world data-sets
showed the Inverted Index performs better than LSH and Bregman Ball Tree, in
terms of queries per second and precision.
Daniel Specht Silva MENEZES. Reconhecimento de entidades
mencionadas para o português.
[Title in English: Named entity recognition for Portuguese]. M.Sc. Diss. Port. Presentation: 27/09/2018.
84 p. Advisor: Ruy Luiz Milidiú. DOI.
Abstract: The production and access of huge amounts of data is a pervasive
element of the Information Age. The volume of availiable data is without
precedents in human history and it’s in constant expansion. An oportunity that
emerges in this context is the development and usage of applicationos that are
capable structuring the knowledge of data. In this context fits the Natural
Language Processing, being able to extract information efficiently from textual
data. A fundamental step for this goal is the task of Named Entity Recognition (NER)
which delimits and categorizes the mentions to entities. The development o
systems for NLP tasks must be accompanied by datasets produced by humans in
order to compare the system with the human discerniment for the NLP task at
hand. These datasets are a scarse resource which the construction is costly in
terms of human supervision. Recentlly, the NER task has been approached using
artificial network models which needs datsets for both training and evaluation.
In this work we propose the construction of a datasets for portuguese NER with
an automatic approach using public data sources structured according to the
principles of Semantic Web, namely, DBpedia and Wikipédia. A metodology for the
construction of this dataset was developed and experiments were performed using
both the built dataset and the neural network architectures with the best
reported results. Many setups for the experiments were evaluated, we obtained
preliminary results for diverse hiperparameters values, also proposing
architectures with the specific focus of incorporating diverse data sources for
training.
Diego Cedrin Gomes RÊGO. Understanding and improving batch
refactoring in software systems.
[Title in Portuguese: Entendendo e melhorando a prática de refatoração em lotes
em sistemas de software]. Ph.D. Thesis. Eng. Presentation:
28/09/2018. 168 p. Advisor: Alessandro Fabrício Garcia.
DOI.
Abstract: Code smells in a program represent indications of structural
quality problems, which can be addressed by software refactoring. However,
developers may neglect or end up creating new code smells through single
refactoring. Little has been reported about recurring beneficial and harmful
eects of refactoring on the program structural quality. As a consequence,
developers still miss guidance along non-trivial smell-removing tasks. In fact,
evidence suggests developers often need to apply a sequence of refactorings,
so-called batch refactoring, to entirely remove a smelly code structure. Thus,
in this thesis, we have conducted a series of studies to understand the impact
of single and batch refactorings on code smells. In our first studies, we
analyze how often commonly-used types of single refactoring aect the density of
code smells along the version histories of dozens of projects. Even though 79.4%
of the refactorings touched smelly elements, 57% had no impact on the smell
removal. Surprisingly, only 9.7% of refactorings removed smells, while 33%
induced the introduction of new ones. On one hand, we observed that harmful
refactoring-smell patterns could be used to guide developers to avoid
smell-inducing refactoring. On the other hand, we observed that many smells can
be removed only through batch refactoring. Thus, our last studies investigate
the impact of batch refactorings on smells. Even when applied in batches,
refactorings tend to maintain or even increase the density of code smells. We
also identified common batch-smell patterns, which enable us to create
heuristics that can guide developers through smell-removing tasks. The last
study evaluated those heuristics, and we conclude the outcomes are promising.
Felipe de Albuquerque Mello PEREIRA. A framework for
generating binary splits in decision trees. [Title in Portuguese:
Um framework para geração de splits binários em árvores de decisão].
M.Sc. Diss. Eng. Presentation: 09/03/2018. 55 p. Advisor: Eduardo Sany Laber.
DOI
Abstract: In this dissertation we propose a framework for designing
splitting criteria for handling multi-valued nominal attributes for decision
trees. Criteria derived from our framework can be implemented to run in
polynomial time in the number of classes and values, with theoretical guarantee
of producing a split that is close to the optimal one. We also present an
experimental study, using real datasets, where the running time and accuracy of
the methods obtained from the framework are evaluated.
Fernando de Abreu Lima ALVES.
Estendendo o Luaproc: suporte para aplicações em ambientes móveis. [Title in
English:
Extending Luaproc: support for applications in a mobile environment].
M.Sc. Diss. Port. Presentation: 20/07/2018. 90 p. Advisor: Noemi de La Rocque
Rodriguez. DOI
Abstract: Mobile devices are undergoing constant increases in their
processing and memory capabilities. This tendency is making mobile processing an
interesting alternative. This work aims to support the programmer in exploring
this potential by using parallelism, both local, in the form of multicore
exploitation, as well as distributed, in the form of multidevice exploration. We
explored this through a parellel library for the Lua programming language,
called Luaproc. We propose an extension to this library and its communication
model, to include this multidevice scenario and combine the facilities of a
message queueing service with the existing facilities for multicore programming.
We then present some applications to show different use cases with distribution
and their performance
Francisco Carvalho Guida MOTTA.
Uma técnica
semiautomática para a segmentação do feto em exames de ultrassom 3D.
[Title in English: A semiautomatic technique for the segmentation of the
fetus in 3D ultrasound exams].
M.Sc. Diss. Port. Presentation: 12/04/2018. 84 p. Advisor: Alberto Barbosa
Raposo. DOI.
Abstract: Ultrasound exams have an important role in obstetrics due to its
low cost, low risk and real-time capabilities. The advent of three-dimensional
ultrasonography has made possible the use of the fetal volume as a biometric
measurement to monitorits development. The quantification of the fetal volume
requires a previous process of segmentation, which consists in the labelling of
the pixels that belong to the object of interest in a digital image. There
isn’t, however, a standard methodology for fetal volumetry and most studies rely
on manual segmentations. The segmentation of ultrasound images is particularly
challenging due to the presence of artifacts as the speckle noise and acoustic
shadows, and the fact that the contrast between regions of interest is commonly
low. In this study, we have developed and tested a semiautomatic method of fetal
segmentation in 3D ultrasound exams. Due to the aforementioned difficulties, good
ultrasound segmentation methods need to make use of expected characteristics of
the specific segmented structures. This thought has guided the development of our
methodology that, through a sequence of simple steps, achieved good quantitative
results in the segmentation task.
Gabriel André HOMSI.
Ship routing and speed optimization
with heterogeneous fuel consumption profiles.
[Title in Portuguese: Roteamento de navios e otimização de velocidade com
perfis de consumo de combustível heterogêneos].
M.Sc. Diss. Eng. Presentation: 08/02/2018. 67 p. Advisor: Thibaut Victor
Gaston Vidal. DOI
Abstract: The shipping industry is essential for international trade.
However, in the wake of the 2008 financial crisis, this industry was severely
hit. In these times, transportation companies can only obtain profit if their fleet
is routed effectively. In this work, we study a class of ship routing problems
related to the Pickup and Delivery Problem with Time Windows. To solve these
problems, we introduce a heuristic and an exact method. The heuristic method is
a hybrid metaheuristic with a set-partitioning-based large neighborhood, while
the exact method is a branch-and-price algorithm. We conduct experiments on a
benchmark suite based on real-life shipping segments. The results obtained show
that our algorithms largely outperform the state-of-the-art methodologies. Next,
we adapt the benchmark suite to model a ship routing problem where the speed on
each sailing leg is a decision variable, and fuel consumption per time unit is a
convex function of the ship speed and payload. To solve this new ship routing
problem with speed optimization, we extend our metaheuristic to find optimal
speed decisions on every local search move evaluation. Our computational
experiments demonstrate that such approach can be highly profitable, with only a
moderate increase in computational effort.
Guilherme Augusto SCHÜTZ. A neural network for online portfolio selection
with side information.
[Title in Portuguese: Uma rede neural para o problema de seleção online de
portfólio com informação lateral]. M.Sc. Dissertation. Eng. Presentation: 01/08/2018.
65 p. Advisor: Ruy Luiz Milidiú.
DOI.
Abstract: The financial market is essential in the economy, bringing
stability, access to new types of investments, and increasing the ability of
companies to access credit. The constant search for reducing the role of human
specialists in decision making aims to reduce the risk inherent in the intrinsic
emotions of the human being, which the machine does not share. As a consequence,
reducing speculative effects in the market, and increasing the precision in the
decisions taken. In this paper, we discuss the problem of selecting portfolios
online, where a vector of asset allocations is required in each step. The
proposed algorithm is the multilayer perceptron with side information - MLPi.
This algorithm uses neural networks to solve the problem when the investor has
access to future information on the price of the assets. To evaluate the use of
side information in portfolio selection, we empirically tested MLPi in contrast
to two algorithms, a baseline and the state-of-the-art. As a baseline, buy-andhold
is used. The state-of-the-art is the online moving average mean reversion
algorithm proposed by Li & Hoi (2012). To evaluate the use of side information
in the algorithm MLPi a benchmark based on a simple optimal solution using the
side information is defined, but without considering the accuracy of the future
information. For the experiments, we use minute-level information from the
Brazilian stock market, traded on the B3 stock exchange. A price predictor is
simulated with 7 different accuracy levels for 200 portfolios. The results show
that both the benchmark and MLPi outperform the two algorithms selected, for
asset accuracy levels greater than 50%, and on average, MLPi outperforms the
benchmark at all levels of simulated accuracy.
Guilherme Gomes Felix da SILVA.
Formalização de algoritmos de criptografia em um assistente de provas interativo.
[Title in English: Formalization of cryptography algorithms in an interactive
theorem prover]. M.Sc. Diss. Port. Presentation: 28/08/2018. 70 p. Advisor:
Edward Hermann Haeusler. DOI.
Abstract: When describing a proof of a theorem, one must be cautious to
ensure said proof does not contain errors or inconsistencies. For very long
proofs, however, error detection can become humanly infeasible. A proof
assistant is a program whose purpose is to perform said error detection
efficiently, as well as to assist in the creation and comprehension of complex
proofs out of simpler, existing proofs. The Lean Theorem Prover, developed in
2012 by Leonardo de Moura, is a proof assistant which functions via description
of proofs in a compilable computer language. We present a description of proofs
of correctness of various algorithms pertaining to cryptography in the Lean
Theorem Prover.
Guilherme Gonçalves SCHARDONG. Visual interactive
support for selecting scenarios from time-series ensembles. [Title in
Portuguese: Uma abordagem visual e interativa para a seleção de conjuntos de
cenários temporais]. Ph.D. Thesis. Eng. Presentation: 13/09/2018. 92 p. Advisor:
Hélio Côrtes Vieira Lopes. DOI.
Abstract: Stochastic programming and scenario reduction approaches have
become invaluable in the analysis and behavior prediction of dynamic systems.
However, such techniques often fail to take advantage of the user’s own
expertise about the problem domain. This work provides visual interactive
support to assist users in solving the scenario reduction problem with time
series data. We employ a series of time-based visualization techniques linked
together to perform the task. By adapting a multidimensional projection
algorithm to handle temporal data, we can graphically present the evolution of
the ensemble. We also propose to use cumulative bump charts to visually compare
the ranks of distances between the ensemble time series and a baseline series.
To evaluate our approach, we developed a prototype application and conducted
observation studies with volunteer users of varying backgrounds and levels of
expertise. Our results indicate that a graphical approach to scenario reduction
may result in a good subset of scenarios and provides a valuable tool for data
exploration in this context. The users liked the interaction mechanisms provided
and judged the task to be easy to perform with the tools we have developed. We
tested the proposed approach against state-of-the-art techniques proposed in the
literature and used in the industry and obtained good results, thus indicating
that our approach is viable in a real-world scenario.
Isabella Vieira FERREIRA.
Assessing the bug-proneness of refactured codes: longitudinal multi-project
studies.
Abstract: Programs often change along the system evolution which implies an
eventual code structure degradation. Recurring symptoms of such degradation are
code smells. Studies suggest that the more frequently code smells affect a
system, the higher becomes the bug-proneness of the code elements. To tackle
code structural quality degradation, developers often apply refactorings on smelly program elements.
However, applying refactorings might not suffice to reduce the bug-proneness
of such degraded program elements. Previous empirical studies do not
systematically analyze the bug-proneness of refactored code. Even though a
recent study suggests that refactoring induces bugs frequently, the authors
do not analyze to what extent refactored code is indeed closely related to
the bug occurrence. Thus, in this dissertation, we conducted two
longitudinal multi-project studies to assess the bug proneness of refactored
code. Our methodology aimed to address various limitations of previous studies.
For instance, we have defined two complementary properties of the bug-proneness
of refactored code, i.e., frequency and distance . While the
former quantifies how often a refactored code is related to emerging bugs,
the latter quantifies how close a bug emerges after a refactoring has been
applied. The quantitative analysis of such properties was complemented by a
manual analysis of refactorings closely related to the bug occurrence. Our
first study aims at assessing the bug-proneness of code refactored through
isolated refactorings, i.e., a single refactoring operation not performed in
conjunction with other refactoring operations. This study reveals that 80%
of the smelly elements that became buggy were not previously refactored.
This result suggests the refactored code is much less bug-prone than non-refactored
code. Moreover, in 75% of the times, a bug emerges in 7 changes far from the
refactoring operation; this amount of changes usually corresponds to 3
months in the analyzed projects. Our second study aims at assessing the
bug-proneness of code elements refactored through batch refactorings, i.e.,
a sequence of inter-related refactoring operations. Our results show that
code refactored through batches is often more resilient to the introduction
of bugs as compared to code refactored through isolated refactorings.
Jan José HURTADO JAUREGUI.
Detail-preserving mesh denoising using adaptive patches.
[Title in Portuguese: Remoção de ruído de malha com preservação de detalhe
usando vizinhanças adaptativas].
M.Sc. Diss. Eng. Presentation: 08/03/2018. 67 p. Advisor: Marcelo Gattass. DOI.
Abstract: The acquisition of triangular meshes typically introduces
undesired noise. Mesh denoising is a geometry processing task to remove this
kind of distortion. To preserve the geometric fidelity of the desired mesh, a
mesh denoising algorithm must preserve true details while removing artificial
high-frequencies from the surface. Several algorithms were proposed to address
this problem using a bilateral filtering scheme. In this work, we propose a
two-step algorithm which uses adaptive patches and bilateral ltering to denoise
the normal field, and the nupdate vertex positions fitting the faces to the
denoised normals. Thecomputation of the adaptive patches is our main
contribution. We formulate this computation as local quadratic optimization
problems that we can control to obtain a desired behavior of the patch. We
compared our proposal with several algorithms proposed in the literature using
synthetic and real data.
Jéferson Rômulo Pereira COELHO.
Uma
metodologia baseada
em ôtimização quadrática para geração de malhas geomecânicas de reservatórios.
[Title in English: A quadratic optimization approach for the reservoir
geomechanical mesh generation].
Ph.D. Thesis. Port. Presentation: 16/04/2018. 121 p. Advisor: Marcelo
Gattass. DOI
Abstract: Geomechanical mesh generation of complex reservoirs remains a
tedious task prone to errors. Recently proposed solutions based on analytical
reconstruction of the sub-surfaces are not capable to represent all the
geometric details of natural objects. This work proposes a discrete model where
the mesh vertices are positioned based on a convex quadratic optimization
process. The optimization problem seeks to guarantee smooth meshes that conform
with prescribed constraints. The resulting mesh therefore respects, as far as
possible, the finite volume mesh of the reservoir pay zone and the existing
horizons. Finally, the proposed methodology for geomechanical meshes can be
easily extend to model sub-surfaces present in the structural interpretation and
geological restauration.
Jefry SASTRE PEREZ.
An agent-based software framework
for machine learning tuning.
[Title in Portuguese: Um framework baseado em agentes para a calibragem de
modelos de aprendizado de máquina].
M.Sc. Diss. Eng. Presentation: 22/03/2018. 55 p. Advisor: Carlos José
Pereira de Lucena. DOI.
Abstract: Nowadays, the challenge of knowledge discovery is to mine massive
amounts of data available online. The most widely used approaches to tackle that
challenge are based on machine learning techniques. In spite of being very
powerful, those techniques require their parameters to be calibrated in order to
generate models with better quality. Such calibration processes are
time-consuming and rely on the skills of machine learning experts. Within this
context, this research presents a framework based on software agents for
automating the calibration of machine learning models. This approach integrates
concepts from Agent Oriented Software Engineering (AOSE) and Machine Learning
(ML). As a proof of concept, we first train a model for the Iris dataset and
then we show how our approach improves the quality of new models generated by
our framework. Then, we create instances of the framework to generate models for
a medical images dataset and finally we use the Grid Sector dataset for a final
experiment.
Jerônimo Sirotheau de Almeida EICHLER.
Exploring RDF knowledge bases through serendipty patterns. [Title in Portuguese: Explorando bases de conhecimento em RDF através de
padrões de fortuidade]. Ph.D. Thesis. Eng. Presentation: 21/08/2018. 72 p. Advisor: Marco
Antonio Casanova. DOI.
Abstract:
Serendipity is defined as the discovery of a thing when one is not searching for
it. In other words, serendipity means the discovery of information that provides
valuable insights by unveiling unanticipated knowledge. The topic is
receiving increased attention in the literature, since the precision requirement
may be justifiably relaxed in order to improve user satisfaction. A field that
can benefit from serendipity is the Web of Data, an immense global data space
where data is publicly available. As more and more data become available in this
data space, searching and extracting relevant information becomes a challenging
task. This thesis contributes to addressing this challenge in two ways. First,
it presents a query orchestration process that introduces three strategies to
inject serendipity patterns in the query process. The serendipity patterns are
inspired by basic characteristics of serendipitous events, such as, analogy and
disturbance, and can be used for augmenting the results with additional
information, suggesting alternative queries or rebalancing the results. Second,
it introduces a benchmark dataset that can be used to compare different
approaches for locating serendipitous content. The strategy adopted for
constructing the dataset consists of dividing the dataset into partitions based
on a global feature and linking entities from different partitions according to
the number of paths they share.
João Paulo Forny de MELO.
Predicting trends in the stock
market.
[Title in Portuguese: Predizendo tendências na bolsa de valores].
M.Sc. Diss. Eng. Presentation: 27/02/2018. 54 p. Advisor: Ruy Luiz Milidiú. DOI
Abstract: Investors are always looking for an edge. However, traditional
economic theories tell us that trying to predict short-term stock price
movements is wasted effort, since it approximate a random walk, i.e., a
stochastic or random process. Besides, these theories state that the market is
efficient enough to always incorporate and reflect all relevant information, making
it impossible to "beat the market". In recent years, with the growth of the web
and data availability in conjunction with advances in Machine Learning, a number
of works are using Natural Language Processing to predict share price variations
based on financial news and social networks data.Therefore, strong evidences are
surfacing that the market can, in some level, be predicted. This work describes
the development of an application based on Machine Learning to predict trends in
the stock market, i.e., positive, negative or neutral price variations with
minute granularity. We evaluate our system using B3 (Brasil Bolsa Balcão),
formerly BM&FBOVESPA, stock quotes data, and a dataset with the most relevant
topics of Google Search and its related articles, provided by the Google Trends
platform and collected, minute by minute, from 08/15/2016 to 07/10/2017. The
experiments show that this data provides useful information to the task at hand,
in which we achieve 69.24% accuracy predicting trends for the PETR4 stock,
creating some leverage to make profits possible with intraday trading.
Leonardo da Silva SOUSA. Understanding how developers
identify design problems in practice.
[Title in Portuguese: Entendendo como os desenvolvedores identificam problemas
de projeto na prática]. Ph.D. Thesis. Eng. Presentation: 30/08/2018.
210 p. Advisors: Alessandro Fabricio Garcia and Carlos José Pereira de Lucena. DOI.
Abstract: A design problem is the manifestation of one or more inappropriate
design decisions that negatively impact non-functional requirements. For
example, the Fat Interface, a problem that indicates when an interface exposes
non-cohesive services, hampers the extensibility and maintainability of a
software system. Despite its harmfulness, identifying a design problem in a
system is diffcult, especially when the source code is the only available
artifact. Although researchers have been investigating techniques to help
developers in identifying design problems, there is little or no knowledge about
the process of identifying design problems. For instance, code smells,
microstructures that are a surface indication of design problems, have been used
in several techniques to support developers during the design problem identification.
However, there is no knowledge if code smells suffice to help developers to
identify design problems. In particular, no study has tried to understand how
developers identify design problems in practice. Thus, in this thesis, we have
conducted a series of studies to understand design problem identification. In our
two first studies, we investigated the role that code smells play in supporting
developers during the design problem identification. Our results indicate that
code smells are relevant for developers in practice; for instance, they are
relevant to indicate elements that need to be refactored. However, we found that
code smells, despite their relevance, do not suffice in helping developers to
identify design problems. In this vein, we conducted another study to
investigate what indicators developers use in practice, and how they use them.
This study resulted in a theory about how developers identify design problems in
practice. For instance, the theory reveals the indicators that developers use,
how they use these indicators, and the characteristics of such indicators that
are perceived as helpful by developers. The results found by our studies
provided us with a better understanding of the process of identifying design
problems thitherto nonexistent. Moreover, our findings pave the way for the
elaboration of more effective techniques to identify design problems in the
source code.
Luis Gustavo ALMEIDA.
ALUMNI Tool: recuperação de
dados pessoais na Web em redes sociais autenticadas.
[Title in English: ALUMNI Tool: information recovery of personal data on
the Web in authenticated social networks]. M.Sc. Diss. Port. Presentation: 31/01/2018.
123 p. Advisor: Marco Antonio Casanova. DOI
Abstract: The use of search bots to collect information for a given context
has grown substantially in recent years. For example, search bots may be used to
capture data from professional social networks. In particular, such social
networks facilitate studying the professional trajectory of the alumni of a
given university, and answer several questions such as: how long does a former
student of PUC-Rio take to arrive at a management position? However, a common
problem in this scenario is the inability to collect information due to
authentication systems, preventing a search robot from accessing certain pages
and content. This dissertation addresses a solution to capture data, which
circumvents the authentication problem and automates the data collection
process. The proposed solution collects data from user profiles for later
database storage and analysis. The dissertation also contemplates the
possibility of adding several other sources of data giving emphasis to a data
warehouse structure.
Luiz Guilherme de Oliveira Pitta.
Uma abordagem para o
problema de conectividade em plataformas multilaterais de IoT.
[Title in English: An approach to the connectivity problem in multilateral
IoT platforms]. M.Sc. Diss. Port. Presentation: 28/03/2018. 86 p. Advisor:
Markus Endler. DOI
Abstract: The popularization of the Internet of Things (IoT) opened up a
series of opportunities for the generation of new applications that were not
previously possible. In the current scenario of IoT there are marketplaces that
sell complete solutions for users with smart objects, gateways for data
transmission and providers that analyze these for a subscription fee. We start
from the view that in the future an "uberization" of IoT should occur, where
each person can offer sensor data and access to actuators to another and that
they will be categorized based on the QoS of the objects that provide them,
similarly as commodities are classified today. In addition, there will be
multilateral platforms where this information can be negotiated in combination
with connectivity providers, to transmit data, and analytics. A platform that
provides this service must ensure that the data (and state) flow of objects is
continuous, without exposing to the user any connectivity problems between them
and the providers. That is, it must have mechanisms to detect problems and
quickly select new providers, all this in a scenario of intense data exchange.
This work presents as contributions a continuous connectivity problem detection
mechanism that uses a Publish-Subscribe paradigm to send problem identification
messages and an architectural solution of a platform based on marketplaces
concepts for IoT, which includes the "commoditization" of service providers and
a matchmaking service to select a combination of these to provide services to
the customer. A case study in the domain of marketplaces is conducted, with the
analysis of the services of the platform with several tests scenarios and the
evaluation of the mechanism of detection of connectivity problems, with the
simulation of different connection failures.
Marcelo Gomes de SOUZA.
Inversão sísmica acústica determinística utilizando redes neurais artificiais.
[Title in English: Deterministic acoustic
seismic inversion using artificial neural networks].
M.Sc. Diss. Port. Presentation: 18/04/2018. 69 p. Advisor: Marcelo Gattass.
DOI
Abstract: Seismic inversion is the process of transforming Reflection
Seismic data into quantitative values of petroleum rock properties. These values
in turn, can be correlated with other properties helping geoscientists to make a
better interpretation that results in a good characterization of an oil
reservoir. There are several traditional algorithms for Seismic Inversion. In
this work we revise Color Inversion (Relative Impedance), Recursive Inversion,
Band-width Inversion and Model-Based Inversion. All four of these algorithms are
based on digital signal processing and optimization. The present work seeks to
reproduce the results of these algorithms through a simple and efficient
methodology based on Neural Networks and pseudo-impedance. This work presents an
implementation of the algorithms proposed in the methodology and tests its
validity in a public seismic data that has an inversion made by the traditional
methods.
Mayra Carvalho ALBUQUERQUE.
Matheuristics for variants of the dominating set problem.
[Title in Portuguese: Metaheurísticas para variantes do problema do conjunto
dominante]. Ph.D. Thesis. Eng. Presentation: 08/02/2018. 87 p. Advisor: Thibaut
Victor Gaston Vidal. DOI
Olouyèmi Ilahko Anne BÉNÉDICTE AGBACHI.
Identifying design problems with a visualization approach of smell
agglomerations.
[Title in Portuguese: Identificando problemas de design através de uma abordagem
de visualização para aglomerações de anomalias de código]. M.Sc. Diss. Eng.
Presentation: 13/04/2018. 100 p. Advisor: Alessandro Fabricio Garcia. DOI.
Abstract: Design problems are characterized by violations of design
principles affecting a software system. Because they often hinder the software
maintenance, developers should identify and eliminate design problems whenever
possible. Nevertheless, identifying design problems is far from trivial. Due to
outdated and scarce design documentation, developers not rarely have to analyze
the source code for identifying these problems. Past studies suggest that code
smells are useful hints of design problems. However, recent studies show that a
single code smell might not suffice to reveal a design problem. That is, around
80% of design problems are realized by multiple code smells, which interrelate
in the so-called smell agglomerations. Thus, developers can explore each smell
agglomeration to identify a design problem in the source code. However, certain
smell agglomerations are formed by several code smells, which makes it hard
reasoning about the existence of a design problem. Visualization approaches have
been proposed to represent smell agglomerations and guide developers in
identifying design problems. However, those approaches provide a very limited
support to the identification of specific design problems, especially the ones affecting
multiple design elements. This dissertation aims to address this limitation by
proposing a novel approach for the visualization of smell agglomerations. We
rely on evidence collected from multiple empirical studies to design our
approach. We evaluate our approach with developers from both academy and
industry. Our results suggest that various developers could use our
visualization approach to accurately identify design problems, in particular
those affecting multiple program elements. Our results also point out to different
ways for improving our visualization approach based on the developers’
perceptions.
Paulo Ivson SANTOS NETTO.
Information
visualization for managing large-scale engineering projects.
[Title in Portuguese: Visualização de informação para gestão de grandes projetos
de engenharia]. Ph.D. Thesis. Eng. Presentation: 13/04/2018. 61 p. Advisor:
Waldemar Celes Filho
Abstract: Large-scale engineering projects such as buildings and city
infrastructure require millions in investments and tight coordination between
expert teams across several years of design, construction, and operation. To
tackle these challenges, the Architecture Engineering and Construction (AEC)
industry is actively developing methods and tools based on Building Information
Modeling (BIM). BIM promotes the use of 3D CAD models as a centralized database
for all physical and functional characteristics of a facility and its related
project/life-cycle information. The inherent complexity of a BIM model offers a
critical visualization challenge: how to best display relevant information
required by different engineering analyses? This work contributes to answering
this question through both theoretical and practical approaches. The thesis
first presents a systematic literature review on the current state of
information visualization (VIS) in BIM research. The review analyzes in detail
currently employed visualizations in diverse use cases across an engineering
project’s life cycle. Based on these findings, the thesis describes the design
and evaluation of a novel 4D construction planning system that overcomes many
limitations of previous work. Engineering collaborators used the software to
review the real-world construction plans of an Oil & Gas industrial plant. The
developed visualizations made evident schedule uncertainties, workspace
conflicts and other constructability issues. The thesis contributes to BIM
research with important visualization guidelines and also contributes to VIS
research by raising awareness to interesting challenges in a increasingly
relevant engineering domain.
Pedro Mendonça Pinto ROCHA.
Melhoria de tempo na execução de workflows científicos distribuídos baseada na
localização informada de arquivos.
[Title in English: Lowering the execution time of scientific distributed
workflows based on informed file location]. M.Sc. Diss. Port. Presentation: 25/04/2018.
50 p. Advisor: Noemi de La Rocque Rodriguez. DOI
Abstract: For distributed scientific workflows the main method of sharing
data between the execution nodes is through files. When those files are large, a
substantial portion of the workflow’s execution time is spent transferring the
files between the storage server and the execution nodes. This work proposes a
strategy for transfering the files directly between the execution nodes,
antecipating the requirements of the next step of the workflow and lowering the
overhead from transfering the files to and of the storage server. This
dissertation analises scenarios in which this strategy shows to be advantajous
and in which it doesn’t.
Pedro Elkind VELMOVITSKY. iBot: An agent-based
software framework for creating domain conversational agents.
[Title in Portuguese: iBot: um framework baseado em agentes para criar agentes
conversacionais em diferentes domínios]. M.Sc. Diss. Eng. Presentation: 05/07/2018.
70 p. Advisor: Carlos José Pereira de Lucena. DOI.
Abstract: Chatbots are computer programs that interact with users using
natural language. Since its inception, the technology has advanced greatly and
cloud-based platforms from big companies allow developers to create intelligent
and efficient chatbots. However, there are not many development approaches to
the main modules of a chatbot that are flexible enough to allow the creation of
different chatbots for each domain, while maintaining a robust dialogue control
in the application. There have been some works that try to develop a more
flexible approach, each of them with their own advantages and disadvantages. One
of the most notable advantages is the use of multi-agent systems to distribute
and perform the tasks performed by the chatbot. In this context, this work
proposes a general and flexible architecture based on multi-agent systems for
building chatbots in any domain chosen by the developer, with dialogue control
in the application. This architecture uses an adaptation of the information
state approach, also using software agents, to perform dialogue management. To
validate the proposed architecture, an user scenario involving the
implementation of 4 proof of concept chatbots is analyzed discussed.
Pedro Henrique Bandeira DINIZ. Detection of
regions of white matter lesions of the brain in T1 and flair images.
[Title in Portuguese: Detecção de regiões de lesões na substância
branca do cérebro em imagens T1 e FLAIR]. Ph.D. Thesis. Eng. Presentation:
08/05/2018. 98 p. Advisors: Marcelo Gattass and Aristófanes Corrêa Silva (UFMA). DOI.
Abstract: White matter lesions are non-static brain lesions
that have a prevalence rate up to 98% in the elder population, although it is
also present in the young population. Because it may be associated with several
brain diseases, it is important to detect them as early as possible. Magnetic
resonance imaging provides three-dimensional data for visualization and analysis
of soft tissues as it contains rich information about their anatomy. However,
the amount of data acquired for these images may be too much for manual
analysis/interpretation alone, representing a difficult and time-consuming task
for specialists. Therefore, this doctoral thesis presents four new computational
methods to automatically detect white matter lesions in magnetic resonance
images, based mainly on algorithms SLIC0 and Convolutional Neural Networks. Our
primary objective is to provide the necessary tools for specialists to
accelerate their works and suggest a second opinion. From the four proposed
methods, the one that achieved best results was applied on 91 magnetic resonance
images, and achieved an accuracy of 97.93%, specificity of 98,02% and
sensitivity of 90,12%, without using any candidate reduction techniques.
Pedro
Igor Profírio SAMPAIO. A study on pervasive games based on the Internet of
Mobile Things .
[Title in Portuguese: Um estudo sobre jogos pervasivos baseados na Internet das
Coisas Móveis]. M.Sc. Dissertation. Eng. Presentation: 02/10/2018. 119 p. Advisors:
Bruno Feijó and Markus Endler.
DOI.
Abstract: Mobile pervasive games are a game genre that combines the real and
virtual worlds in a hybrid space, allowing interactions with not only the
virtually created game world, but also with the physical environment that
surrounds the players. The Internet of Mobile Things (IoMT) specifies situations
in which devices on the Internet of Things (IoT) can be moved or move
autonomously, while maintaining remote connectivity and accessibility from
anywhere on the internet. Following the huge success of recent mobile pervasive
games and the coming IoT boom, we provide an integration for all the technology
involved in the development of a mobile pervasive game that incorporates IoT
devices. We also propose a mobile pervasive game that evaluates the benefits of
the union of both fields. This game prototype explores ways of increasing the
experience of players through pervasive mechanics while taking advantage of the
player’s motivation to perform sensing tasks. It also incorporates serious
applications into the gameplay, such as the localization of facilities and
services.
Priscila ENGIEL.
Eunomia (Εúνομία): a requirement
engineering based compliance framework for software systems.
[Title in Portuguese: Eunomia (Εúνομία): um
framework de conformidade contínua para sistemas de software baseado na Engenharia de
Requisitos]. Ph.D. Thesis. Eng. Presentation: 07/02/2018. 141 p. Advisors: Julio
Cesar Sampaio do Prado Leite and John Mylopoulos. DOI.
Abstract: Laws and regulation affect software development, as they
frequently demand changes in software’ requirements to protect individuals and
businesses regarding security, privacy, governance, sustainability and more.
Legal requirements can dictate new requirements or constrain existing ones. The
problem of software compliance is how to ensure that the software complies with
the norms that the legislation imposes. The problem is particularly challenging
because it combines difficult steps: 1)analyze legal documents, 2) extract
requirements from those documents, 3) identify conflicting requirements with
those already implemented in software and4) ensure that software remains
compliant even with the changes. Compliance is a continuous process: laws,
software and the context within which software system operates changes
continuously. The works dealing with the compliance problem focus only on one or
two subjects: analyze legal documents or extract requirements or identify
conflicts or changes. This thesis deals with all the problems at the same time;
the idea is to extract requirements from legal text, compare them with the
software requirement, resolve the possible conflicts that may arise,
continuously leading with the changes on environment, laws and requirements.
For this, this work proposes a framework that is composed of a compliance
process and continuous monitoring of environmental changes. The framework deals
with different types of laws (security, privacy, transparency, health care) that
are represented in explicit norms. The compliance process supports the
identification, extraction, comparison and conflict resolution to help software
compliance, by producing a compliant set of requirements. The compliance process
is based on the semantic annotation and goal model. The semantic annotation
helps to extract requirements from thelaw, using patterns. The goal model is
used to help the comparison between requirement and to represent requirements in
a formal and consistent requirement specification. The process is tool
supported; some tools were reused (Desiree and NomosT) to further each step. It
was necessary to adapt the tools for the context of the compliance process,
creating a guideline, patterns, and heuristics. The continuous monitoring is
concerned about the changes that affect the software compliance and has the
mechanism to ensure that even with those changes the software will regain
compliance. The compliance monitor is basedon agents and Non Functional
Requirements. The agents are represented using in i*, the idea is to showthe
collaboration between the agents to ensure the continuous compliance. The
requirement specification of how each agent should behave was also generated
using Business Process Modeling Notation and Desiree language. The Non
Functional Requirements catalogue is used to help to define operalizations for
the software awareness. The framework validation was made in two parts: first,
the compliance process and after all the framework proposed. For the compliance
process, the effort and correctness were measured comparing the use of the
proposed process andan ad-hoc method. For the entire framework, the example of
monitoring the changes in the environment when an automated car is crossing the
border between Washington and Canada was used. The study shows that context has
a strong influence on the software requirements, and nonconformity problems may
incur penalties. The contribution of this work is the Eunomia framework that has
a process and goal model perspective with emphasis on monitoring that helps to
deal with the compliance challenge. The framework equips the requirements
engineering team with a systematic method. Eunomia framework is a tool-supported
and systematic process which can be reused to reduce the time effort and to
improve the quality of the requirement specification that helps to create a
compliant software requirement specification that is compliant over the time.
Raphael Araújo SAMPAIO. A study on ellipsoidal
clustering.
[Title in English: Um estudo sobre agrupamento baseado em distribuições
elípticas]. M.Sc. Diss. Port. Presentation: 24/03/2018.
67 p. Advisors: Marcus Vinicius Soledade Poggi Aragão and Thibaut Victor Gaston Vidal. DOI.
Abstract: Unsupervised cluster analysis, the process of grouping sets of
points according to one or more similarity criteria, plays an essential role in
various fields. The two most popular algorithms for this process are the k-means
and the Gaussian Misxture Models (GMM). The former assigns each point to a
single cluster and uses Euclidean distance as similarity. The latter determines
a probability matrix of points to belong to clusters, and the Mahalanobis
distance is the underlying similarity. Appart from the difference in the
assignment method - the so-called hard assignment for the former and soft
assignment for the latter - the algorithm also differs concerning the cluster
structure, or shape: the k-means considers spherical structures in the data;
while the GMM considers ellipsoidal ones through the estimation of covariance
matrices. In this work, a mathematical optimization problem that combines the
hard assignment with the ellipsoidal cluster structures, regularization
techniques are explored. In this context, two-meta-heuristic method, a Random
Swap perturbation and a hybrid genetic algorithm, are adapted, and their impact
on the improvement of the performance of the method is studied. The central
objective is three-fold: to gain an understanding on the conditions in which
ellipsoidal clustering structures are more beneficial than spherical ones; to
determine the impact of covariance estimation with regularization methods; and
to analyze the effect of global optimization meta-heuristic on unsupervised
cluster analysis. Finally, in order to provide grounds for comparison of the
present findings to future related works, a database was generated together with
an extensive benchmark containing an analysis of the variations of different
sizes, shape, number of clusters, and separability and their impact on the
results of different clustering algorithms. Furthermore, packages written in the
Julia language have been made available with the algorithms studied throughout
this work.
Rebecca Porphírio da Costa de AZEVEDO.
A model-centric
sequential approach to outlier ensembles in a marketing science context.
[Title in Portuguese: Ensemble sequencial centrado em modelos para detecção de
outliers no contexto de Marketing Science]. M.Sc. Diss. Eng. Presentation: 06/09/2018.
78 p. Advisor: Hélio Côrtes Vieira Lopes.
DOI.
Abstract: Latest years evolution in mobile devices has increased
dramatically the amount of data and available information for advertisers around
the world. Computational cost and available time to process data and be able to
distinguish true users from anomalies or noise has only increased. Thus, the
creation of a method to detect outliers could support Marketing researchers and
increase their precision in understanding online behavior. Recent studies show
that, so far, meta-algorithms have not been used to detect outliers.
Meta-algorithms tend to bring benefits because they reduce dependency that a
single algorithm can generate. This work proposes a sequential model-centric
ensemble design that uses different algorithms in outlier detection to obtain
better results than those obtained by a single algorithm. The novelty in this
approach consists in: (i) exploring the sequential technique, using algorithms
that impact the next one and whose results are a combination of previously
obtained results; (ii) centralizing performance around the model and not the
data, which means the ensemble is applied in the whole dataset and not on
different subsamples; (iii) support Marketing researchers that need to operate
data Science in a more robust and coherent way.
Reinier MOREJÓN NOVALES.
A multi-agent approach to
data mining processes: applications to health care .
[Title in Portuguese: Uma abordagem multiagente para processos de mineração de
dados: aplicações na área da saúde]. M.Sc. Diss. Eng. Presentation: 06/04/2018.
61 p. Advisor: Carlos José Pereira de Lucena. DOI
Abstract: Data mining is a hot topic that attracts researchers from
different areas, such as databases, machine learning, and multi-agent systems.
As a consequence of the growth of data volume, there is a growing need to obtain
knowledge from these large data sets that are very difficult to handle and
process with traditional methods. Software agents can play a significant role
performing data mining processes in ways that are more efficient. For instance,
they can work to perform selection, extraction, preprocessing and integration of
data as well as parallel, distributed, or multisource mining. This work proposes
an approach (in the form of a framework) that uses software agents to manage
data mining processes. In order to test its applicability, we use several data
sets related to health care domain representing some usage scenarios
(hypothyroidism, diabetes and arrhythmia).
Renato Sayão Crystallino da ROCHA.
Um filtro para arcos
em árvores de dependência.
[Title in English: A dependency tree arc filter]. M.Sc. Diss. Port. Presentation:
26/09/2018. 78 p. Advisor: Ruy Luiz Milidiú.
DOI.
Abstract: The Natural Language Processing task consists of analyzing the
grammatical structure of a sentence written in natural language aiming to learn,
identify and extract information related to its dependency structure. This data
can be structured like a tree, since every word in a sentence has a
head-dependent relation to another word from the same sentence. Since Dependency
Parsing is used in many applications like Machine Translation, Semantic Role
Labeling and Part-Of-Speech Tagging, researchers aiming to improve the accuracy
on their models are approaching this task in many different ways. One of the
approaches consists in looking at this task as a token classification problem,
using different classifiers for each sub-task and joining them in an incremental
way. These sub-tasks consist in classifying, for each head-dependent pair, the
Part-Of-Speech tag of the head, the relative position between the two words and
the distance between them. However, previous researches using this approach show
that the bottleneck lies in the distance classifier. Recurrent Neural Networks
are a kind of Neural Network that allows us to work using sequences of vectors,
allowing for classification problems where both our input and output are
sequences, making them a great choice for the problem at hand. This work studies
the use of Recurrent Neural Networks, in specific Long Short-Term Memory
networks, for the head-dependent distance classifier sub-task as a
sequence-to-sequence classification problem. To evaluate its efficiency, this work
follows the line of previous researches and makes use of the Portuguese corpus
of the Conference on Computational Natural Language Learning 2006 Shared Task.
The resulting model attains 95.27% precision, which is better than the previous
results obtained using incremental models.
Ricardo Almeida VENIERIS.
Uma arquitetura de
software para apoio ao desenvolvimento de sistemas de diagnóstico médicos por imagem.
[Title in English: A software architecture to support development of medical
imagining diagnostic systems]. M.Sc. Diss. Port. Presentation: 07/02/2018. 99 p. Advisor: Carlos José Pereira de Lucena. DOI
Abstract: The image medical exam diagnostic support using Artificial
Intelligence techniques has been extensively discussed and academically
researched. Several computational techniques for segmentation and classification
of such images are continuously created, tested and improved. From these
studies, highly specialized systems that use computational vision and machine
learning techniques to segment and classify exam images using knowledge acquired
through large collections of lauded exams. In the medical domain, there is still
the difficulty of obtaining qualified databases to support the extraction of
knowledge by machine learning systems. In this work we propose a software
architecture construction that supports diagnostic support systems development
that allows: (i) use of multiple exam types, (ii) supporting segmentation and
classification, (iii) using not only machine learning techniques as, (iv)
knowledge of the available medical domain. The motivation is to facilitate the
generation of classifiers task that, besides searching for specific pathological
markers, can be applied to different medical activity objectives, such as
punctual diagnosis, triage and prioritization of care.
Rodrigo
Mosconi de GOUVEA.
Serviços, processos e máquinas: um estudo de
metodologias para a realocação de processos nas máquinas.
[Title in English: Processes and machines: a methodologies study for machine
reassignment problem]. M.Sc. Diss. Port. Presentation: 20/04/2018. 145 p. Advisor: Marcus Vinicius Soledade Poggi de Aragão. DOI.
Abstract: A data center logic organization lies mainly by the strategic
decision on how distribute services between machines, so the operational costs
should be the smallest as possible. Besides those costs, must also consist the
interdependence of their own services, the distribution between their
localities, to improve the quality of their product to their customers. This
work explores the challenge ROADEF 2012 machine assignment problem by the means
of integer programming and column generation. Shows strategies to address
numeric issues. At column generation, it analyzes techniques to speed up the
convergence, by solving after each variable adiction, a previous generation of
columns and stabilization of duals variables. At the end of the work, it
compares the results obtained are compared with the best official results.
Ruhan dos Reis MONTEIRO. A real-time reasoning service for the
Internet of Things.
[Title in English: Um serviço de raciocínio computacional em tempo real para a
Internet das Coisas]. M.Sc. Diss. Port. Presentation: 14/09/2018. 85 p. Advisor: Markus
Endler. DOI.
Abstract: The growth of the Internet of Things (IoT) has brought the
opportunity to create applications in several areas, with the use of sensors and
actuators. One of the problems encountered in IoT systems is the difficulty of
adding semantic relations to the raw data produced by the sensors and being able
to infer new facts from these relations. Moreover, due to the fact that many IoT
applications are online and need to react instantly on sensor data collected by
them, they need to be analyzed in real-time. Streams are a sequence of
time-varying data elements that should not be stored forever and queried on
demand. Streaming data needs to be consumed quickly through ongoing queries that
continue to analyze and produce new relevant data, i.e. stream of output/result
events. The ability to infer new semantic relationships over streaming data is
called Stream Reasoning. We propose a semantic model and a mechanism for
real-time data stream processing and reasoning based on Complex Event Processing
(CEP), RDF (resource description structure) and OWL (Web Ontology Language).
This work presents a middleware service that supports continuous reasoning on
data produced by sensors. The main advantages of our approach are: (a) to
consider time as a key relationship between information; (b) flow processing can
be implemented using CEP; (c) is general enough to be applied to any data flow
management system (DSMS). It was developed in the Advanced Collaboration
Laboratory (LAC) and a case study in the field of fire detection is conducted
and implemented, elucidating the use of real-time inference on streams.
Rustam Câmara MESQUITA.
Geração semiautomática de
função de transferência para realce de fronteiras baseada em derivadas médias.
[Title in English: Semiautomatic generation of transfer function for boundary
highlight based on average derivatives]. M.Sc. Diss. Port. Presentation: 16/03/2018.
77 p. Advisor: Waldemar Celes Filho. DOI.
Abstract: Finding a good transfer function for volume rendering is a difficult
task that requires previous knowledge about the data domain itself. Therefore,
many researches have been developed in the past few years aiming to overcome
this barrier. However, only a few of them have concentrated forces into
obtaining an automatic transfer function detector.Most of them focus on
improving user control over transfer function domain, indicating potentially
interesting regions and easing its manipulation through different histograms.
Also, the results are often presented in medical field, through MRI, CT scan or
ultrasound images. Thus, with the purpose of showing that the concepts used in
these works can be exploited on oil and gas research field, this work proposes a
novel method to automatically detect transfer functions, aiming to visualize the
interfaces between different regions in the reservoir. The proposed approach is
also tested in detecting boundaries between different materials of medical
datasets and other datasets widely used.
Thiago Delgado PINTO.
Unifying agile
specification quality control and implementation conforming assurance.
[Title in Portuguese: Unificando controle de qualidade de especificação ágil de
requisitos e garantia de conformidade de implementação]. Ph.D. Thesis. Eng. Presentation:
06/09/2018. 252 p. Advisor: Arndt von Staa.
DOI.
Abstract: Agile requirements engineering practices are being used more
commonly by software development teams. However, practices related to quality
control still depend heavily on testers’ expertise and manual labor, whilst
produced requirements specifications are often imprecise and hard to verify
statically by both stakeholders and computers. This thesis jointly tackles the
problem of verifying statically agile requirements specifications and generating
full-featured test cases and automated test scripts from them. Its main
contributions include: (1) a new metalanguage, called Concordia, for writing
agile requirement specifications that can be used for both verification and
validation (V&V) activities involving stakeholders; (2) a novel approach to
generate full-featured ready to use test cases and automated test scripts from
the requirements specified with the metalanguage; (3) the assessment in
industrial context of the approaches’ ability to reduce risk of remaining
defects and the costs of V&V.
Toni Tiago da Silva PACHECO. Buscas eficientes em
vizinhanças largas para o problema do caixeiro viajante com coleta e entrega.
[Title in English: Efficient large neighborhood searches for the traveling
salesman problem with pickup and delivery]. M.Sc. Diss. Port. Presentation: 24/03/2018.
67 p. Advisor: Thibaut Victor Gaston Vidal. DOI.
Abstract: In various distribution and logistics issues, products must be
collected at one source and delivered to a destination. Examples include
disabled people transportation, express mail services, medical supplies
logistics, etc. The routing problem addressed by this work, known as Traveling
Salesman Problem with Pickup and Delivery (TSPPD), belongs to the class of
traveling salesman problems with precedence constraints. In this problem, there
is a one-to-one pickup-delivery mapping in which, for each pickup-type client,
there is exactly one associated delivery-type client. Delivery clients can only
be visited after the associated pickup. Since the TSPPD generalizes the TSP it
is also a NP-hard problem, as the TSP is a particular case of TSPPD where each
pickup matches spatially with it’s respective delivery. Variants with capacity
constraints, time windows and different loading policies have received more
attention in the last decade, although there are still significant advances to be
made in terms of solution quality for the basic version of the problem. To solve
this problem, we propose a hybrid metaheuristic algorithm with large
neighborhoods efficiently explored in O(n2). Our experiments demonstrate a significant
computational time reduction and also solutions quality improvement compared to
the previous works.
Wallas Henrique Sousa dos SANTOS.
MCAD Shape Grammar: modelagem procedimental em modelos CAD massivos industriais.
[Title in English: MCAD shape grammar: procedural modeling for industrial
massive CAD models]. Ph.D. Thesis. Port. Presentation: 30/04/2018. 112 p. Advisor: Alberto Barbosa Raposo. DOI.
Abstract: 3D CAD models are tools used in the industry for planning
and simulations before construction or completion of tasks. In many cases, such
as in the oil and gas industry, these models can be massive, that is, they have
large-scale detailed information in order to be sources of accurate information.
Interactive navigation in these models requires a combination of appropriate
hardware and software. Even nowadays with modern GPUs, the direct rendering of
these models is not ecient, requiring classic approaches such as culling
non-visible objects and LOD before sending the data to the GPU. Therefore, for
real-time rendering of massive CAD models, we need scalable algorithms and data
structures to eciently process the scene. The work of this thesis proposes MCAD
(Massive Computer-Aided Design) Shape grammar, an expansive grammar that
procedurally generates objects to create 3D scenes of massive models. In recent
years procedural modeling has drawn attention for quickly creating 3D scenes
using a compact representation, which stores generation rules rather than
explicit representation of the scene. MCAD Shape grammar explores repetitions
and patterns present in massive models for rendering scenes, reducing memory
footprint and procedurally processing the scene eciently. We converted real
refinery models into MCAD Shape grammar and implemented a renderer for them.
Results showed that our solution is scalable with high performance, also it is
the first time that procedural modeling is used in this domain.