ABSTRACT Informatics Department

Theses and Dissertations

2018

ABSTRACTS

Departamento de Informática
Pontifícia Universidade Católica do Rio de Janeiro - PUC-Rio
Rio de Janeiro - Brazil

This file contains the list of the MSc. Dissertations and PhD. Thesis presented to the Departmento de Informática, Pontifícia Universidade Católica do Janeiro - PUC-Rio, Brazil, in 2018. They are all available in print format and, according to the authors' preference, some of them are freely available for download, while others are freely available for download to the PUC-Rio community exclusively(*).

For any requests, questions, or suggestions, please contact:
Rosane Castilho bib-di@inf.puc-rio.br

Last update: 22/MARCH/2019

INDEX

[In construction; sometimes, digital versions may not be available yet]

[18_MSc_concepcion]
Adrian CONCEPCION LÉON. Secure distributed ledgers to support loT technologies data. [Title in Portuguese: Ledgers seguros e distribuídos para suportar dados de tecnologia IoT]. M.Sc. Diss. Eng. Presentation: 18/10/2018. 63 p. Advisor: Markus Endler. DOI.

Abstract: Blockchain and Tangle are data structures and protocols used to create an immutable public record of data insured by a network of peer-to-peer participants that maintains monotonic constantly growing set of data records known as ledgers. Both technologies provide a decentralized solution that guarantees the exchange, among billions of IoT devices, of large amounts of trusted messages, which are very valuable as long as they are valid and complete. This highly encrypted and secure peer-to-peer messaging mechanism is adopted in this project to manage the processing of IoT transactions. To maintain transactions private, and secured consensus algorithms are responsible for validating and choosing transactions and recording them in the global ledger. The results showed that the speed of the consensus algorithms can affect the creation in real time of reliable stories that track the events of the IoT networks. After incorporating Complex Event Processing that allows selecting only those high level events, it is possible to obtain an improvement in many situations. The result is a Middleware system that provides a framework for the construction of large-scale computer applications that use Complex Events Processing and different decentralized ledgers such as the blockchain of Ethereum or IOTA Tangle, for secure data storage.

[18_MSc_schottler]
Allan Werner SCHÖTTLER. Visualização de fluxo em reservatórios de petróleo usando LIC volumétrico. [Title in English: Visualizing flow in black-oil reservoirs using volumetric LIC]. M.Sc. Diss. Eng. Presentation: 14/09/2018. 45 p. Advisor: Waldemar Celles Filho. DOI.

Abstract: In the oil industry, clear and unambiguous visualization of vector fields resulting from numerical simulations of black-oil reservoirs is essential. In this dissertation, we study the use of line integral convolution techniques (LIC) for imaging 3D steady vector fields and apply the results to a GPU-based volume rendering algorithm. Due to the density of information present in volume renderings of LIC images, we study the use of sparse textures as input to the LIC algorithm and apply transfer functions to assign color and opacity to scalar fields in order to encode visual information to voxels and alleviate the occlusion problem. Additionally, we address the problem of encoding flow orientation, inherent to LIC, using an extension of the algorithm – Oriented LIC (OLIC). Finally, we present a method for volume animation in order to enhance the flow orientation. We then compare results obtained with LIC and with OLIC.

[18_PhD_bueno]
Andre Luis Cavalcanti BUENO. Relaxamento adaptativo da sincronização através do uso de métodos de aprendizagem supervisionada. [Title in English: Adaptive relaxed synchronization through the use of supervised learning methods]. Ph.D. Thesis. Port. Presentation: 07/03/2018. 80 p. Advisor: Noemi de La Rocque Rodriguez; co-advisor Eliza Dominguez Sotelino (Eng. Civil, PUC-Rio). DOI

Abstract: Parallel computing systems have become pervasive, being used to interact with the physical world and process a large amount of data from various sources. It is essential, therefore, the continuous improvement of computational performance to keep up with the increasing rate of the amount of information that needs to be processed. Some of these applications admit lower quality in the ﬁnal result in exchange for increased execution performance. This work aims to evaluate the feasibility of using supervised learning methods to ensure that the Relaxed Synchronization technique, used to increase execution performance, provides results within acceptable limits of error. To do so, we have created a methodology that uses some input data to assemble test cases that, when executed, will provide input values for the training of supervised learning methods. This way, when the user uses his/her application (in the same training environment) with a new input, the trained classiﬁcation algorithm will suggest the relax synchronization factor that is best suited to the triple application/input/execution environment. We used this methodology in some well-known parallel applications and showed that, by combining Relaxed Synchronization with supervised learning methods, it was possible to maintain the maximum established error rate. In addition, we evaluated the performance gain obtained with this technique for a number of scenarios in each application.

[18_MSc_paganelli]
Antonio Iyda PAGANELLI. Reliability of Wii balance board and Microsoft Kinect for capturing posturographic information during balance tests. [Title in Portuguese: Confiabilidade do Wii balance board e do Microsoft Kinect na captura de informações posturográficas durante testes de equilíbrio]. M.Sc. Diss. Eng. Presentation: 21/09/2018. 148 p. Advisor: Alberto Barbosa Raposo. DOI.

Abstract: Body balance is an important physical skill and it is fundamental for elderly´s health, considering that falls are a major cause of unintentional injuries leading to the loss of autonomy and death in this group. Growth of aging in world population and being balance impairment one of the major causes of physiotherapeutic attendance, simple, affordable, portable, and reliable devices for evaluating body balance are of great relevance. Several studies have been examining concurrent validity and reliability of Microsoft Kinect (Kinect) and Nintendo Wii Balance Board (WBB) during balance tests. The majority of these studies suggested that those devices could be used as reliable and valid tools for assessing balance in semistatic positions. Based on that, this study investigated test-retest reliability using Kinect and WBB, concurrently, in three standing positions, and analyzed variables related to center of pressure (CoP) and center of gravity (CoG), in static manikins and in 70 healthy subjects. Each participant performed the set of tests twice in the same day. Our solution demonstrated sensibility to identify different body sway patterns. Tests showed that the most reliable variables were average speed and total path length in all directions and tasks. Despite tests with static manikin signalized excellent reliability, tests with individuals were considered poor to good. However, variables of consolidated data based on different tasks achieved excellent scores. CoP properties outperformed those related to CoG, suggesting that WBB was superior when compared to Kinect in providing more reliable body sway information. This study reinforced that these devices may provide reliable quantitative information that enhances qualitative body balance assessments.

[18_PhD_pochet]
Axelle Dany Juliette POCHET. Modeling of geobodies: AI for seismic fault detection and all-quadrilateral mesh generation. [Title in English: Modelagem de objetos geológicos: IA para detecção automática de falhas e geração de malhas de quadriláteros]. Ph.D. Thesis. Eng. Presentation: 28/09/2018. 128 p. Advisor: Marcelo Gattass. DOI.

Abstract: Safe oil exploration requires good numerical modeling of the subsurface geobodies, which includes among othersteps: seismic interpretation and mesh generation. This thesis presents a study in these two areas. The ﬁrst study is a contribution to data interpretation, examining the possibilities of automatic seismic fault detection using deep learning methods. In particular, we use Convolutional Neural Networks (CNNs) on seismic amplitude maps, with the particularity to use synthetic data for training with the goal to classify real data. In the second study, we propose a new two-dimensional all-quadrilateral meshing algorithm for geomechanical domains, based on an innovative quadtree approach: we deﬁne new subdivision patterns to eﬃciently adapt the mesh to any input geometry. The resulting mesh is suited for Finite Element Method (FEM) simulations.

[18_MSc_camilo]
Bernardo de Campos Vidal CAMILO. Uma avaliação experimental de hashing consistente com cargas limitadas na distribuição de vídeos online. [Title in English: An experimental evaluation of consistent hashing with bounded loads in online video distribution]. M.Sc. Diss. Port. Presentation: 06/09/2018. 54 p. Advisor: Noemi de La Rocque Rodriguez. DOI.

Abstract: Video consumption accounts for a large part of Internet traﬃc today and tends to increase further in the next years. In this work, we investigate ways to improve caching in video content delivery networks (CDNs) to reduce their response time and increase the users’ quality of experience. From the analysis of diﬀerent techniques, we concluded that consistent hashing with bounded loads has interesting characteristics for this purpose and ﬁts adequately to the video delivery scenario. In order to verify its performance, we created an experimentation platform and, using data from a real video CDN, confronted it with the consistent hashing and the least connections balancing method, all implemented in an equivalent manner to permit a fair comparison. Lastly,we discussed the results of this evaluation, highlighting the beneﬁts and limitations of this technique in the considered context.

[18_MSc_redlich]
Caroline Rosa REDLICH. Segmentação de imagens baseada em grafos de superpixel. [Title in English: Image segmentation based on superpixel graphs]. M.Sc. Diss. Port. Presentation: 19/04/2018. 74 p. Advisor: Marcelo Gattass. DOI

Abstract: Image segmentation for object modeling is a complex task that is still not well solved. The separation of the regions corresponding to each object in an image is based on proximity, similarity, and discontinuity of its boundaries. The image to be segmented can be of various natures, including photographs, medical and seismic images. We can ﬁnd in literature many proposed segmentation methods used as solutions to diﬀerent problems. Recently the superpixel technique has been used as an initial step that reduces the size of the problem input. This work proposes a methodology of segmentation of photographs and ultrasound images based on variants of superpixels. The proposed methodology adapts to the image’s nature and to the problem’s complexity using diﬀerent measures of similarity and distance. This work also presents results that seek to clarify the proposed procedure and the choice of its parameters.

[18_MSc_boucas]
Cesar de Souza BOUÇAS. Análise de dependência baseada em transição aplicada a Universal Dependencies. [Title in English: Transition based dependency parsing applied on Universal Dependencies]. M.Sc. Diss. Port. Presentation: 22/10/2018. 72 p. Advisor: Ruy Luiz Milidiú. DOI.

Abstract: Dependency parsing is the task that transforms a sentence into a syntactic structure, usually a dependency tree, that represents relations between words. This representations are useful to deal with several tasks that arises with the increasing volume of textual online information and the need for technologies that depends on NLP tasks to work. It can be used, for example, to enable computers to infer the meaning of words of multiple natural languages. This paper presents dependency parsing with focus on one of its most popular modeling in machine learning: the transition-based method. A greedy implementation of this model with a simple neural network-based classifier is used to perform experiments. Universal Dependencies treebanks are used to train and then test the system using the validation script published in the CoNLL-2017 shared task. The results empirically indicate the benefits of initializing the input layer of the network with word embeddings obtained through pre-training. It reached 84.51 LAS in the Portuguese of Brazil test set and 75.19 LAS in the English test set. This result is nearly 4 points behind the performance of the best results of transition-based parsers.

[18_MSc_mesejo-leon]
Daniel Alejandro MESEJO-LEÓN. Approximate nearest neighbor search for the Kullback-Leibler divergence. [Title in Portuguese: Busca Aproximada de vizinhos mais próximos para a divergência de Kullback-Leibler]. M.Sc. Diss. Eng. Presentation: 09/01/2018. 59 p. Advisor: Eduardo Sany Laber. DOI

Abstract: In a number of applications, data points can be represented as probability distributions. For instance, documents can be represented as topic models, images can be represented as histograms and also music can be represented as a probability distribution. In this work, we address the problem of the Approximate Nearest Neighbor where the points are probability distributions and the distance function is the Kullback-Leibler (KL) divergence. We show how to accelerate existing data structures such as the Bregman Ball Tree, by posing the KL divergence as an inner product embedding. On the practical side we investigated the use of two, very popular, indexing techniques: Inverted Index and Locality Sensitive Hashing. Experiments performed on 6 real world data-sets showed the Inverted Index performs better than LSH and Bregman Ball Tree, in terms of queries per second and precision.

[18_MSc_menezes]
Daniel Specht Silva MENEZES. Reconhecimento de entidades mencionadas para o português. [Title in English: Named entity recognition for Portuguese]. M.Sc. Diss. Port. Presentation: 27/09/2018. 84 p. Advisor: Ruy Luiz Milidiú. DOI.

Abstract: The production and access of huge amounts of data is a pervasive element of the Information Age. The volume of availiable data is without precedents in human history and it’s in constant expansion. An oportunity that emerges in this context is the development and usage of applicationos that are capable structuring the knowledge of data. In this context fits the Natural Language Processing, being able to extract information efficiently from textual data. A fundamental step for this goal is the task of Named Entity Recognition (NER) which delimits and categorizes the mentions to entities. The development o systems for NLP tasks must be accompanied by datasets produced by humans in order to compare the system with the human discerniment for the NLP task at hand. These datasets are a scarse resource which the construction is costly in terms of human supervision. Recentlly, the NER task has been approached using artificial network models which needs datsets for both training and evaluation. In this work we propose the construction of a datasets for portuguese NER with an automatic approach using public data sources structured according to the principles of Semantic Web, namely, DBpedia and Wikipédia. A metodology for the construction of this dataset was developed and experiments were performed using both the built dataset and the neural network architectures with the best reported results. Many setups for the experiments were evaluated, we obtained preliminary results for diverse hiperparameters values, also proposing architectures with the specific focus of incorporating diverse data sources for training.

[18_PhD_rego]
Diego Cedrin Gomes RÊGO. Understanding and improving batch refactoring in software systems. [Title in Portuguese: Entendendo e melhorando a prática de refatoração em lotes em sistemas de software]. Ph.D. Thesis. Eng. Presentation: 28/09/2018. 168 p. Advisor: Alessandro Fabrício Garcia. DOI.

Abstract: Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. However, developers may neglect or end up creating new code smells through single refactoring. Little has been reported about recurring beneficial and harmful eects of refactoring on the program structural quality. As a consequence, developers still miss guidance along non-trivial smell-removing tasks. In fact, evidence suggests developers often need to apply a sequence of refactorings, so-called batch refactoring, to entirely remove a smelly code structure. Thus, in this thesis, we have conducted a series of studies to understand the impact of single and batch refactorings on code smells. In our first studies, we analyze how often commonly-used types of single refactoring aect the density of code smells along the version histories of dozens of projects. Even though 79.4% of the refactorings touched smelly elements, 57% had no impact on the smell removal. Surprisingly, only 9.7% of refactorings removed smells, while 33% induced the introduction of new ones. On one hand, we observed that harmful refactoring-smell patterns could be used to guide developers to avoid smell-inducing refactoring. On the other hand, we observed that many smells can be removed only through batch refactoring. Thus, our last studies investigate the impact of batch refactorings on smells. Even when applied in batches, refactorings tend to maintain or even increase the density of code smells. We also identified common batch-smell patterns, which enable us to create heuristics that can guide developers through smell-removing tasks. The last study evaluated those heuristics, and we conclude the outcomes are promising.

[18_MSc_pereira]
Felipe de Albuquerque Mello PEREIRA. A framework for generating binary splits in decision trees. [Title in Portuguese: Um framework para geração de splits binários em árvores de decisão]. M.Sc. Diss. Eng. Presentation: 09/03/2018. 55 p. Advisor: Eduardo Sany Laber. DOI

Abstract: In this dissertation we propose a framework for designing splitting criteria for handling multi-valued nominal attributes for decision trees. Criteria derived from our framework can be implemented to run in polynomial time in the number of classes and values, with theoretical guarantee of producing a split that is close to the optimal one. We also present an experimental study, using real datasets, where the running time and accuracy of the methods obtained from the framework are evaluated.

[18_MSc_alves]
Fernando de Abreu Lima ALVES. Estendendo o Luaproc: suporte para aplicações em ambientes móveis. [Title in English: Extending Luaproc: support for applications in a mobile environment]. M.Sc. Diss. Port. Presentation: 20/07/2018. 90 p. Advisor: Noemi de La Rocque Rodriguez. DOI

Abstract: Mobile devices are undergoing constant increases in their processing and memory capabilities. This tendency is making mobile processing an interesting alternative. This work aims to support the programmer in exploring this potential by using parallelism, both local, in the form of multicore exploitation, as well as distributed, in the form of multidevice exploration. We explored this through a parellel library for the Lua programming language, called Luaproc. We propose an extension to this library and its communication model, to include this multidevice scenario and combine the facilities of a message queueing service with the existing facilities for multicore programming. We then present some applications to show different use cases with distribution and their performance

[18_MSc_motta]
Francisco Carvalho Guida MOTTA. Uma técnica semiautomática para a segmentação do feto em exames de ultrassom 3D. [Title in English: A semiautomatic technique for the segmentation of the fetus in 3D ultrasound exams]. M.Sc. Diss. Port. Presentation: 12/04/2018. 84 p. Advisor: Alberto Barbosa Raposo. DOI.

Abstract: Ultrasound exams have an important role in obstetrics due to its low cost, low risk and real-time capabilities. The advent of three-dimensional ultrasonography has made possible the use of the fetal volume as a biometric measurement to monitorits development. The quantiﬁcation of the fetal volume requires a previous process of segmentation, which consists in the labelling of the pixels that belong to the object of interest in a digital image. There isn’t, however, a standard methodology for fetal volumetry and most studies rely on manual segmentations. The segmentation of ultrasound images is particularly challenging due to the presence of artifacts as the speckle noise and acoustic shadows, and the fact that the contrast between regions of interest is commonly low. In this study, we have developed and tested a semiautomatic method of fetal segmentation in 3D ultrasound exams. Due to the aforementioned diﬃculties, good ultrasound segmentation methods need to make use of expected characteristics of the speciﬁc segmented structures. This thought has guided the development of our methodology that, through a sequence of simple steps, achieved good quantitative results in the segmentation task.

[18_MSc_homsi]
Gabriel André HOMSI. Ship routing and speed optimization with heterogeneous fuel consumption proﬁles. [Title in Portuguese: Roteamento de navios e otimização de velocidade com perﬁs de consumo de combustível heterogêneos]. M.Sc. Diss. Eng. Presentation: 08/02/2018. 67 p. Advisor: Thibaut Victor Gaston Vidal. DOI

Abstract: The shipping industry is essential for international trade. However, in the wake of the 2008 ﬁnancial crisis, this industry was severely hit. In these times, transportation companies can only obtain proﬁt if their ﬂeet is routed eﬀectively. In this work, we study a class of ship routing problems related to the Pickup and Delivery Problem with Time Windows. To solve these problems, we introduce a heuristic and an exact method. The heuristic method is a hybrid metaheuristic with a set-partitioning-based large neighborhood, while the exact method is a branch-and-price algorithm. We conduct experiments on a benchmark suite based on real-life shipping segments. The results obtained show that our algorithms largely outperform the state-of-the-art methodologies. Next, we adapt the benchmark suite to model a ship routing problem where the speed on each sailing leg is a decision variable, and fuel consumption per time unit is a convex function of the ship speed and payload. To solve this new ship routing problem with speed optimization, we extend our metaheuristic to ﬁnd optimal speed decisions on every local search move evaluation. Our computational experiments demonstrate that such approach can be highly proﬁtable, with only a moderate increase in computational eﬀort.

[18_MSc_schutz]
Guilherme Augusto SCHÜTZ. A neural network for online portfolio selection with side information. [Title in Portuguese: Uma rede neural para o problema de seleção online de portfólio com informação lateral]. M.Sc. Dissertation. Eng. Presentation: 01/08/2018. 65 p. Advisor: Ruy Luiz Milidiú. DOI.

Abstract: The financial market is essential in the economy, bringing stability, access to new types of investments, and increasing the ability of companies to access credit. The constant search for reducing the role of human specialists in decision making aims to reduce the risk inherent in the intrinsic emotions of the human being, which the machine does not share. As a consequence, reducing speculative effects in the market, and increasing the precision in the decisions taken. In this paper, we discuss the problem of selecting portfolios online, where a vector of asset allocations is required in each step. The proposed algorithm is the multilayer perceptron with side information - MLPi. This algorithm uses neural networks to solve the problem when the investor has access to future information on the price of the assets. To evaluate the use of side information in portfolio selection, we empirically tested MLPi in contrast to two algorithms, a baseline and the state-of-the-art. As a baseline, buy-andhold is used. The state-of-the-art is the online moving average mean reversion algorithm proposed by Li & Hoi (2012). To evaluate the use of side information in the algorithm MLPi a benchmark based on a simple optimal solution using the side information is defined, but without considering the accuracy of the future information. For the experiments, we use minute-level information from the Brazilian stock market, traded on the B3 stock exchange. A price predictor is simulated with 7 different accuracy levels for 200 portfolios. The results show that both the benchmark and MLPi outperform the two algorithms selected, for asset accuracy levels greater than 50%, and on average, MLPi outperforms the benchmark at all levels of simulated accuracy.

[18_MSc_silva]
Guilherme Gomes Felix da SILVA. Formalização de algoritmos de criptografia em um assistente de provas interativo. [Title in English: Formalization of cryptography algorithms in an interactive theorem prover]. M.Sc. Diss. Port. Presentation: 28/08/2018. 70 p. Advisor: Edward Hermann Haeusler. DOI.

Abstract: When describing a proof of a theorem, one must be cautious to ensure said proof does not contain errors or inconsistencies. For very long proofs, however, error detection can become humanly infeasible. A proof assistant is a program whose purpose is to perform said error detection efficiently, as well as to assist in the creation and comprehension of complex proofs out of simpler, existing proofs. The Lean Theorem Prover, developed in 2012 by Leonardo de Moura, is a proof assistant which functions via description of proofs in a compilable computer language. We present a description of proofs of correctness of various algorithms pertaining to cryptography in the Lean Theorem Prover.

[18_PhD_schardong]
Guilherme Gonçalves SCHARDONG. Visual interactive support for selecting scenarios from time-series ensembles. [Title in Portuguese: Uma abordagem visual e interativa para a seleção de conjuntos de cenários temporais]. Ph.D. Thesis. Eng. Presentation: 13/09/2018. 92 p. Advisor: Hélio Côrtes Vieira Lopes. DOI.

Abstract: Stochastic programming and scenario reduction approaches have become invaluable in the analysis and behavior prediction of dynamic systems. However, such techniques often fail to take advantage of the user’s own expertise about the problem domain. This work provides visual interactive support to assist users in solving the scenario reduction problem with time series data. We employ a series of time-based visualization techniques linked together to perform the task. By adapting a multidimensional projection algorithm to handle temporal data, we can graphically present the evolution of the ensemble. We also propose to use cumulative bump charts to visually compare the ranks of distances between the ensemble time series and a baseline series. To evaluate our approach, we developed a prototype application and conducted observation studies with volunteer users of varying backgrounds and levels of expertise. Our results indicate that a graphical approach to scenario reduction may result in a good subset of scenarios and provides a valuable tool for data exploration in this context. The users liked the interaction mechanisms provided and judged the task to be easy to perform with the tools we have developed. We tested the proposed approach against state-of-the-art techniques proposed in the literature and used in the industry and obtained good results, thus indicating that our approach is viable in a real-world scenario.

[18_MSc_ferreira]
Isabella Vieira FERREIRA. Assessing the bug-proneness of refactured codes: longitudinal multi-project studies. [Title in Portuguese: Avaliando a propensão a bugs do código refatorado: estudos longitudinais multi-projetos]. M.Sc. Diss. Eng. Presentation: 16/07/2018. 90 p. Advisor: Alessandro Fabrício Garcia. DOI.

Abstract: Programs often change along the system evolution which implies an eventual code structure degradation. Recurring symptoms of such degradation are code smells. Studies suggest that the more frequently code smells affect a system, the higher becomes the bug-proneness of the code elements. To tackle code structural quality degradation, developers often apply refactorings on smelly program elements. However, applying refactorings might not suffice to reduce the bug-proneness of such degraded program elements. Previous empirical studies do not systematically analyze the bug-proneness of refactored code. Even though a recent study suggests that refactoring induces bugs frequently, the authors do not analyze to what extent refactored code is indeed closely related to the bug occurrence. Thus, in this dissertation, we conducted two longitudinal multi-project studies to assess the bug proneness of refactored code. Our methodology aimed to address various limitations of previous studies. For instance, we have defined two complementary properties of the bug-proneness of refactored code, i.e., frequency and distance . While the former quantifies how often a refactored code is related to emerging bugs, the latter quantifies how close a bug emerges after a refactoring has been applied. The quantitative analysis of such properties was complemented by a manual analysis of refactorings closely related to the bug occurrence. Our first study aims at assessing the bug-proneness of code refactored through isolated refactorings, i.e., a single refactoring operation not performed in conjunction with other refactoring operations. This study reveals that 80% of the smelly elements that became buggy were not previously refactored. This result suggests the refactored code is much less bug-prone than non-refactored code. Moreover, in 75% of the times, a bug emerges in 7 changes far from the refactoring operation; this amount of changes usually corresponds to 3 months in the analyzed projects. Our second study aims at assessing the bug-proneness of code elements refactored through batch refactorings, i.e., a sequence of inter-related refactoring operations. Our results show that code refactored through batches is often more resilient to the introduction of bugs as compared to code refactored through isolated refactorings.

[18_MSc_hurtado]
Jan José HURTADO JAUREGUI. Detail-preserving mesh denoising using adaptive patches. [Title in Portuguese: Remoção de ruído de malha com preservação de detalhe usando vizinhanças adaptativas]. M.Sc. Diss. Eng. Presentation: 08/03/2018. 67 p. Advisor: Marcelo Gattass. DOI.

Abstract: The acquisition of triangular meshes typically introduces undesired noise. Mesh denoising is a geometry processing task to remove this kind of distortion. To preserve the geometric ﬁdelity of the desired mesh, a mesh denoising algorithm must preserve true details while removing artiﬁcial high-frequencies from the surface. Several algorithms were proposed to address this problem using a bilateral ﬁltering scheme. In this work, we propose a two-step algorithm which uses adaptive patches and bilateral ltering to denoise the normal ﬁeld, and the nupdate vertex positions ﬁtting the faces to the denoised normals. Thecomputation of the adaptive patches is our main contribution. We formulate this computation as local quadratic optimization problems that we can control to obtain a desired behavior of the patch. We compared our proposal with several algorithms proposed in the literature using synthetic and real data.

[18_PhD_coelho]
Jéferson Rômulo Pereira COELHO. Uma metodologia baseada em ôtimização quadrática para geração de malhas geomecânicas de reservatórios. [Title in English: A quadratic optimization approach for the reservoir geomechanical mesh generation]. Ph.D. Thesis. Port. Presentation: 16/04/2018. 121 p. Advisor: Marcelo Gattass. DOI

Abstract: Geomechanical mesh generation of complex reservoirs remains a tedious task prone to errors. Recently proposed solutions based on analytical reconstruction of the sub-surfaces are not capable to represent all the geometric details of natural objects. This work proposes a discrete model where the mesh vertices are positioned based on a convex quadratic optimization process. The optimization problem seeks to guarantee smooth meshes that conform with prescribed constraints. The resulting mesh therefore respects, as far as possible, the ﬁnite volume mesh of the reservoir pay zone and the existing horizons. Finally, the proposed methodology for geomechanical meshes can be easily extend to model sub-surfaces present in the structural interpretation and geological restauration.

[18_MSc_sastre]
Jefry SASTRE PEREZ. An agent-based software framework for machine learning tuning. [Title in Portuguese: Um framework baseado em agentes para a calibragem de modelos de aprendizado de máquina]. M.Sc. Diss. Eng. Presentation: 22/03/2018. 55 p. Advisor: Carlos José Pereira de Lucena. DOI.

Abstract: Nowadays, the challenge of knowledge discovery is to mine massive amounts of data available online. The most widely used approaches to tackle that challenge are based on machine learning techniques. In spite of being very powerful, those techniques require their parameters to be calibrated in order to generate models with better quality. Such calibration processes are time-consuming and rely on the skills of machine learning experts. Within this context, this research presents a framework based on software agents for automating the calibration of machine learning models. This approach integrates concepts from Agent Oriented Software Engineering (AOSE) and Machine Learning (ML). As a proof of concept, we first train a model for the Iris dataset and then we show how our approach improves the quality of new models generated by our framework. Then, we create instances of the framework to generate models for a medical images dataset and finally we use the Grid Sector dataset for a final experiment.

[18_PhD_eichler]
Jerônimo Sirotheau de Almeida EICHLER. Exploring RDF knowledge bases through serendipty patterns. [Title in Portuguese: Explorando bases de conhecimento em RDF através de padrões de fortuidade]. Ph.D. Thesis. Eng. Presentation: 21/08/2018. 72 p. Advisor: Marco Antonio Casanova. DOI.

Abstract: Serendipity is defined as the discovery of a thing when one is not searching for it. In other words, serendipity means the discovery of information that provides valuable insights by unveiling unanticipated knowledge. The topic is receiving increased attention in the literature, since the precision requirement may be justifiably relaxed in order to improve user satisfaction. A field that can benefit from serendipity is the Web of Data, an immense global data space where data is publicly available. As more and more data become available in this data space, searching and extracting relevant information becomes a challenging task. This thesis contributes to addressing this challenge in two ways. First, it presents a query orchestration process that introduces three strategies to inject serendipity patterns in the query process. The serendipity patterns are inspired by basic characteristics of serendipitous events, such as, analogy and disturbance, and can be used for augmenting the results with additional information, suggesting alternative queries or rebalancing the results. Second, it introduces a benchmark dataset that can be used to compare different approaches for locating serendipitous content. The strategy adopted for constructing the dataset consists of dividing the dataset into partitions based on a global feature and linking entities from different partitions according to the number of paths they share.

[18_MSc_melo]
João Paulo Forny de MELO. Predicting trends in the stock market. [Title in Portuguese: Predizendo tendências na bolsa de valores]. M.Sc. Diss. Eng. Presentation: 27/02/2018. 54 p. Advisor: Ruy Luiz Milidiú. DOI

Abstract: Investors are always looking for an edge. However, traditional economic theories tell us that trying to predict short-term stock price movements is wasted eﬀort, since it approximate a random walk, i.e., a stochastic or random process. Besides, these theories state that the market is eﬃcient enough to always incorporate and reﬂect all relevant information, making it impossible to "beat the market". In recent years, with the growth of the web and data availability in conjunction with advances in Machine Learning, a number of works are using Natural Language Processing to predict share price variations based on ﬁnancial news and social networks data.Therefore, strong evidences are surfacing that the market can, in some level, be predicted. This work describes the development of an application based on Machine Learning to predict trends in the stock market, i.e., positive, negative or neutral price variations with minute granularity. We evaluate our system using B3 (Brasil Bolsa Balcão), formerly BM&FBOVESPA, stock quotes data, and a dataset with the most relevant topics of Google Search and its related articles, provided by the Google Trends platform and collected, minute by minute, from 08/15/2016 to 07/10/2017. The experiments show that this data provides useful information to the task at hand, in which we achieve 69.24% accuracy predicting trends for the PETR4 stock, creating some leverage to make proﬁts possible with intraday trading.

[18_PhD_sousa]
Leonardo da Silva SOUSA. Understanding how developers identify design problems in practice. [Title in Portuguese: Entendendo como os desenvolvedores identificam problemas de projeto na prática]. Ph.D. Thesis. Eng. Presentation: 30/08/2018. 210 p. Advisors: Alessandro Fabricio Garcia and Carlos José Pereira de Lucena. DOI.

Abstract: A design problem is the manifestation of one or more inappropriate design decisions that negatively impact non-functional requirements. For example, the Fat Interface, a problem that indicates when an interface exposes non-cohesive services, hampers the extensibility and maintainability of a software system. Despite its harmfulness, identifying a design problem in a system is diffcult, especially when the source code is the only available artifact. Although researchers have been investigating techniques to help developers in identifying design problems, there is little or no knowledge about the process of identifying design problems. For instance, code smells, microstructures that are a surface indication of design problems, have been used in several techniques to support developers during the design problem identiﬁcation. However, there is no knowledge if code smells suffice to help developers to identify design problems. In particular, no study has tried to understand how developers identify design problems in practice. Thus, in this thesis, we have conducted a series of studies to understand design problem identiﬁcation. In our two ﬁrst studies, we investigated the role that code smells play in supporting developers during the design problem identiﬁcation. Our results indicate that code smells are relevant for developers in practice; for instance, they are relevant to indicate elements that need to be refactored. However, we found that code smells, despite their relevance, do not suffice in helping developers to identify design problems. In this vein, we conducted another study to investigate what indicators developers use in practice, and how they use them. This study resulted in a theory about how developers identify design problems in practice. For instance, the theory reveals the indicators that developers use, how they use these indicators, and the characteristics of such indicators that are perceived as helpful by developers. The results found by our studies provided us with a better understanding of the process of identifying design problems thitherto nonexistent. Moreover, our ﬁndings pave the way for the elaboration of more effective techniques to identify design problems in the source code.

[18_MSc_almeida]
Luis Gustavo ALMEIDA. ALUMNI Tool: recuperação de dados pessoais na Web em redes sociais autenticadas. [Title in English: ALUMNI Tool: information recovery of personal data on the Web in authenticated social networks]. M.Sc. Diss. Port. Presentation: 31/01/2018. 123 p. Advisor: Marco Antonio Casanova. DOI

Abstract: The use of search bots to collect information for a given context has grown substantially in recent years. For example, search bots may be used to capture data from professional social networks. In particular, such social networks facilitate studying the professional trajectory of the alumni of a given university, and answer several questions such as: how long does a former student of PUC-Rio take to arrive at a management position? However, a common problem in this scenario is the inability to collect information due to authentication systems, preventing a search robot from accessing certain pages and content. This dissertation addresses a solution to capture data, which circumvents the authentication problem and automates the data collection process. The proposed solution collects data from user profiles for later database storage and analysis. The dissertation also contemplates the possibility of adding several other sources of data giving emphasis to a data warehouse structure.

[18_MSc_pitta]
Luiz Guilherme de Oliveira Pitta. Uma abordagem para o problema de conectividade em plataformas multilaterais de IoT. [Title in English: An approach to the connectivity problem in multilateral IoT platforms]. M.Sc. Diss. Port. Presentation: 28/03/2018. 86 p. Advisor: Markus Endler. DOI

Abstract: The popularization of the Internet of Things (IoT) opened up a series of opportunities for the generation of new applications that were not previously possible. In the current scenario of IoT there are marketplaces that sell complete solutions for users with smart objects, gateways for data transmission and providers that analyze these for a subscription fee. We start from the view that in the future an "uberization" of IoT should occur, where each person can offer sensor data and access to actuators to another and that they will be categorized based on the QoS of the objects that provide them, similarly as commodities are classified today. In addition, there will be multilateral platforms where this information can be negotiated in combination with connectivity providers, to transmit data, and analytics. A platform that provides this service must ensure that the data (and state) flow of objects is continuous, without exposing to the user any connectivity problems between them and the providers. That is, it must have mechanisms to detect problems and quickly select new providers, all this in a scenario of intense data exchange. This work presents as contributions a continuous connectivity problem detection mechanism that uses a Publish-Subscribe paradigm to send problem identification messages and an architectural solution of a platform based on marketplaces concepts for IoT, which includes the "commoditization" of service providers and a matchmaking service to select a combination of these to provide services to the customer. A case study in the domain of marketplaces is conducted, with the analysis of the services of the platform with several tests scenarios and the evaluation of the mechanism of detection of connectivity problems, with the simulation of different connection failures.

[18_MSc_souza]
Marcelo Gomes de SOUZA. Inversão sísmica acústica determinística utilizando redes neurais artificiais. [Title in English: Deterministic acoustic seismic inversion using artificial neural networks]. M.Sc. Diss. Port. Presentation: 18/04/2018. 69 p. Advisor: Marcelo Gattass. DOI

Abstract: Seismic inversion is the process of transforming Reflection Seismic data into quantitative values of petroleum rock properties. These values in turn, can be correlated with other properties helping geoscientists to make a better interpretation that results in a good characterization of an oil reservoir. There are several traditional algorithms for Seismic Inversion. In this work we revise Color Inversion (Relative Impedance), Recursive Inversion, Band-width Inversion and Model-Based Inversion. All four of these algorithms are based on digital signal processing and optimization. The present work seeks to reproduce the results of these algorithms through a simple and efficient methodology based on Neural Networks and pseudo-impedance. This work presents an implementation of the algorithms proposed in the methodology and tests its validity in a public seismic data that has an inversion made by the traditional methods.

[18_PhD_albuquerque]
Mayra Carvalho ALBUQUERQUE. Matheuristics for variants of the dominating set problem. [Title in Portuguese: Metaheurísticas para variantes do problema do conjunto dominante]. Ph.D. Thesis. Eng. Presentation: 08/02/2018. 87 p. Advisor: Thibaut Victor Gaston Vidal. DOI

Abstract: This addresses the Dominating Set Problem, an NP-hard problem with great relevance in applications related to wireless network design, data mining, coding theory, among others. The minimum dominating set in a graph is a minimal set of vertices so that each vertex of the graph belongs to it or is adjacent to a vertex of this set. We study three variants of the problem: first, in the presence of weights on vertices, searching for a dominating set with smallest total weight; second, a variant where the subgraph induced by the dominating set needs to be connected, and, finally, the variant that encompasses these two characteristics. To solve these three problems, we propose a hybrid algorithm based on tabu search with additional mathematical-programming components, leading to a method sometimes called "matheuristic". Several additional techniques and large neighborhoods are also employed to reach promising regions in the search space. Our experimental analyses show the good contribution of all these individual components. Finally, the algorithm is tested on the covering code problem, which can be viewed as a special case of the minimum dominating set problem. The codes are studied for the Hamming metric and the Rosenbloom-Tsfasman metric. For this last case, several shorter codes were found.

[18_MSc_benedicte]
Olouyèmi Ilahko Anne BÉNÉDICTE AGBACHI. Identifying design problems with a visualization approach of smell agglomerations. [Title in Portuguese: Identiﬁcando problemas de design através de uma abordagem de visualização para aglomerações de anomalias de código]. M.Sc. Diss. Eng. Presentation: 13/04/2018. 100 p. Advisor: Alessandro Fabricio Garcia. DOI.

Abstract: Design problems are characterized by violations of design principles aﬀecting a software system. Because they often hinder the software maintenance, developers should identify and eliminate design problems whenever possible. Nevertheless, identifying design problems is far from trivial. Due to outdated and scarce design documentation, developers not rarely have to analyze the source code for identifying these problems. Past studies suggest that code smells are useful hints of design problems. However, recent studies show that a single code smell might not suﬃce to reveal a design problem. That is, around 80% of design problems are realized by multiple code smells, which interrelate in the so-called smell agglomerations. Thus, developers can explore each smell agglomeration to identify a design problem in the source code. However, certain smell agglomerations are formed by several code smells, which makes it hard reasoning about the existence of a design problem. Visualization approaches have been proposed to represent smell agglomerations and guide developers in identifying design problems. However, those approaches provide a very limited support to the identiﬁcation of speciﬁc design problems, especially the ones aﬀecting multiple design elements. This dissertation aims to address this limitation by proposing a novel approach for the visualization of smell agglomerations. We rely on evidence collected from multiple empirical studies to design our approach. We evaluate our approach with developers from both academy and industry. Our results suggest that various developers could use our visualization approach to accurately identify design problems, in particular those aﬀecting multiple program elements. Our results also point out to diﬀerent ways for improving our visualization approach based on the developers’ perceptions.

[18_PhD_santos]
Paulo Ivson SANTOS NETTO. Information visualization for managing large-scale engineering projects. [Title in Portuguese: Visualização de informação para gestão de grandes projetos de engenharia]. Ph.D. Thesis. Eng. Presentation: 13/04/2018. 61 p. Advisor: Waldemar Celes Filho. DOI

Abstract: Large-scale engineering projects such as buildings and city infrastructure require millions in investments and tight coordination between expert teams across several years of design, construction, and operation. To tackle these challenges, the Architecture Engineering and Construction (AEC) industry is actively developing methods and tools based on Building Information Modeling (BIM). BIM promotes the use of 3D CAD models as a centralized database for all physical and functional characteristics of a facility and its related project/life-cycle information. The inherent complexity of a BIM model offers a critical visualization challenge: how to best display relevant information required by different engineering analyses? This work contributes to answering this question through both theoretical and practical approaches. The thesis first presents a systematic literature review on the current state of information visualization (VIS) in BIM research. The review analyzes in detail currently employed visualizations in diverse use cases across an engineering project’s life cycle. Based on these findings, the thesis describes the design and evaluation of a novel 4D construction planning system that overcomes many limitations of previous work. Engineering collaborators used the software to review the real-world construction plans of an Oil & Gas industrial plant. The developed visualizations made evident schedule uncertainties, workspace conflicts and other constructability issues. The thesis contributes to BIM research with important visualization guidelines and also contributes to VIS research by raising awareness to interesting challenges in a increasingly relevant engineering domain.

[18_MSc_gouvea]
Pedro Mendonça Pinto ROCHA. Melhoria de tempo na execução de workflows científicos distribuídos baseada na localização informada de arquivos. [Title in English: Lowering the execution time of scientific distributed workflows based on informed file location]. M.Sc. Diss. Port. Presentation: 25/04/2018. 50 p. Advisor: Noemi de La Rocque Rodriguez. DOI

Abstract: For distributed scientific workflows the main method of sharing data between the execution nodes is through files. When those files are large, a substantial portion of the workflow’s execution time is spent transferring the files between the storage server and the execution nodes. This work proposes a strategy for transfering the files directly between the execution nodes, antecipating the requirements of the next step of the workflow and lowering the overhead from transfering the files to and of the storage server. This dissertation analises scenarios in which this strategy shows to be advantajous and in which it doesn’t.

[18_MSc_velmovitsky]
Pedro Elkind VELMOVITSKY. iBot: An agent-based software framework for creating domain conversational agents. [Title in Portuguese: iBot: um framework baseado em agentes para criar agentes conversacionais em diferentes domínios]. M.Sc. Diss. Eng. Presentation: 05/07/2018. 70 p. Advisor: Carlos José Pereira de Lucena. DOI.

Abstract: Chatbots are computer programs that interact with users using natural language. Since its inception, the technology has advanced greatly and cloud-based platforms from big companies allow developers to create intelligent and efficient chatbots. However, there are not many development approaches to the main modules of a chatbot that are flexible enough to allow the creation of different chatbots for each domain, while maintaining a robust dialogue control in the application. There have been some works that try to develop a more flexible approach, each of them with their own advantages and disadvantages. One of the most notable advantages is the use of multi-agent systems to distribute and perform the tasks performed by the chatbot. In this context, this work proposes a general and flexible architecture based on multi-agent systems for building chatbots in any domain chosen by the developer, with dialogue control in the application. This architecture uses an adaptation of the information state approach, also using software agents, to perform dialogue management. To validate the proposed architecture, an user scenario involving the implementation of 4 proof of concept chatbots is analyzed discussed.

[18_PhD_diniz]
Pedro Henrique Bandeira DINIZ. Detection of regions of white matter lesions of the brain in T1 and flair images. [Title in Portuguese: Detecção de regiões de lesões na substância branca do cérebro em imagens T1 e FLAIR]. Ph.D. Thesis. Eng. Presentation: 08/05/2018. 98 p. Advisors: Marcelo Gattass and Aristófanes Corrêa Silva (UFMA). DOI.

Abstract: White matter lesions are non-static brain lesions that have a prevalence rate up to 98% in the elder population, although it is also present in the young population. Because it may be associated with several brain diseases, it is important to detect them as early as possible. Magnetic resonance imaging provides three-dimensional data for visualization and analysis of soft tissues as it contains rich information about their anatomy. However, the amount of data acquired for these images may be too much for manual analysis/interpretation alone, representing a difficult and time-consuming task for specialists. Therefore, this doctoral thesis presents four new computational methods to automatically detect white matter lesions in magnetic resonance images, based mainly on algorithms SLIC0 and Convolutional Neural Networks. Our primary objective is to provide the necessary tools for specialists to accelerate their works and suggest a second opinion. From the four proposed methods, the one that achieved best results was applied on 91 magnetic resonance images, and achieved an accuracy of 97.93%, specificity of 98,02% and sensitivity of 90,12%, without using any candidate reduction techniques.

[18_MSc_sampaio]
Pedro Igor Profírio SAMPAIO. A study on pervasive games based on the Internet of Mobile Things . [Title in Portuguese: Um estudo sobre jogos pervasivos baseados na Internet das Coisas Móveis]. M.Sc. Dissertation. Eng. Presentation: 02/10/2018. 119 p. Advisors: Bruno Feijó and Markus Endler. DOI.

Abstract: Mobile pervasive games are a game genre that combines the real and virtual worlds in a hybrid space, allowing interactions with not only the virtually created game world, but also with the physical environment that surrounds the players. The Internet of Mobile Things (IoMT) specifies situations in which devices on the Internet of Things (IoT) can be moved or move autonomously, while maintaining remote connectivity and accessibility from anywhere on the internet. Following the huge success of recent mobile pervasive games and the coming IoT boom, we provide an integration for all the technology involved in the development of a mobile pervasive game that incorporates IoT devices. We also propose a mobile pervasive game that evaluates the benefits of the union of both fields. This game prototype explores ways of increasing the experience of players through pervasive mechanics while taking advantage of the player’s motivation to perform sensing tasks. It also incorporates serious applications into the gameplay, such as the localization of facilities and services.

[18_PhD_engiel]
Priscila ENGIEL. Eunomia (Εúνομία): a requirement engineering based compliance framework for software systems. [Title in Portuguese: Eunomia (Εúνομία): um framework de conformidade contínua para sistemas de software baseado na Engenharia de Requisitos]. Ph.D. Thesis. Eng. Presentation: 07/02/2018. 141 p. Advisors: Julio Cesar Sampaio do Prado Leite and John Mylopoulos. DOI.

Abstract: Laws and regulation affect software development, as they frequently demand changes in software’ requirements to protect individuals and businesses regarding security, privacy, governance, sustainability and more. Legal requirements can dictate new requirements or constrain existing ones. The problem of software compliance is how to ensure that the software complies with the norms that the legislation imposes. The problem is particularly challenging because it combines difficult steps: 1)analyze legal documents, 2) extract requirements from those documents, 3) identify conflicting requirements with those already implemented in software and4) ensure that software remains compliant even with the changes. Compliance is a continuous process: laws, software and the context within which software system operates changes continuously. The works dealing with the compliance problem focus only on one or two subjects: analyze legal documents or extract requirements or identify conflicts or changes. This thesis deals with all the problems at the same time; the idea is to extract requirements from legal text, compare them with the software requirement, resolve the possible conflicts that may arise, continuously leading with the changes on environment, laws and requirements. For this, this work proposes a framework that is composed of a compliance process and continuous monitoring of environmental changes. The framework deals with different types of laws (security, privacy, transparency, health care) that are represented in explicit norms. The compliance process supports the identification, extraction, comparison and conflict resolution to help software compliance, by producing a compliant set of requirements. The compliance process is based on the semantic annotation and goal model. The semantic annotation helps to extract requirements from thelaw, using patterns. The goal model is used to help the comparison between requirement and to represent requirements in a formal and consistent requirement specification. The process is tool supported; some tools were reused (Desiree and NomosT) to further each step. It was necessary to adapt the tools for the context of the compliance process, creating a guideline, patterns, and heuristics. The continuous monitoring is concerned about the changes that affect the software compliance and has the mechanism to ensure that even with those changes the software will regain compliance. The compliance monitor is basedon agents and Non Functional Requirements. The agents are represented using in i*, the idea is to showthe collaboration between the agents to ensure the continuous compliance. The requirement specification of how each agent should behave was also generated using Business Process Modeling Notation and Desiree language. The Non Functional Requirements catalogue is used to help to define operalizations for the software awareness. The framework validation was made in two parts: first, the compliance process and after all the framework proposed. For the compliance process, the effort and correctness were measured comparing the use of the proposed process andan ad-hoc method. For the entire framework, the example of monitoring the changes in the environment when an automated car is crossing the border between Washington and Canada was used. The study shows that context has a strong influence on the software requirements, and nonconformity problems may incur penalties. The contribution of this work is the Eunomia framework that has a process and goal model perspective with emphasis on monitoring that helps to deal with the compliance challenge. The framework equips the requirements engineering team with a systematic method. Eunomia framework is a tool-supported and systematic process which can be reused to reduce the time effort and to improve the quality of the requirement specification that helps to create a compliant software requirement specification that is compliant over the time.

[18_MSc_sampaio]
Raphael Araújo SAMPAIO. A study on ellipsoidal clustering. [Title in English: Um estudo sobre agrupamento baseado em distribuições elípticas]. M.Sc. Diss. Port. Presentation: 24/03/2018. 67 p. Advisors: Marcus Vinicius Soledade Poggi Aragão and Thibaut Victor Gaston Vidal. DOI.

Abstract: Unsupervised cluster analysis, the process of grouping sets of points according to one or more similarity criteria, plays an essential role in various fields. The two most popular algorithms for this process are the k-means and the Gaussian Misxture Models (GMM). The former assigns each point to a single cluster and uses Euclidean distance as similarity. The latter determines a probability matrix of points to belong to clusters, and the Mahalanobis distance is the underlying similarity. Appart from the difference in the assignment method - the so-called hard assignment for the former and soft assignment for the latter - the algorithm also differs concerning the cluster structure, or shape: the k-means considers spherical structures in the data; while the GMM considers ellipsoidal ones through the estimation of covariance matrices. In this work, a mathematical optimization problem that combines the hard assignment with the ellipsoidal cluster structures, regularization techniques are explored. In this context, two-meta-heuristic method, a Random Swap perturbation and a hybrid genetic algorithm, are adapted, and their impact on the improvement of the performance of the method is studied. The central objective is three-fold: to gain an understanding on the conditions in which ellipsoidal clustering structures are more beneficial than spherical ones; to determine the impact of covariance estimation with regularization methods; and to analyze the effect of global optimization meta-heuristic on unsupervised cluster analysis. Finally, in order to provide grounds for comparison of the present findings to future related works, a database was generated together with an extensive benchmark containing an analysis of the variations of different sizes, shape, number of clusters, and separability and their impact on the results of different clustering algorithms. Furthermore, packages written in the Julia language have been made available with the algorithms studied throughout this work.

[18_MSc_azevedo]
Rebecca Porphírio da Costa de AZEVEDO. A model-centric sequential approach to outlier ensembles in a marketing science context. [Title in Portuguese: Ensemble sequencial centrado em modelos para detecção de outliers no contexto de Marketing Science]. M.Sc. Diss. Eng. Presentation: 06/09/2018. 78 p. Advisor: Hélio Côrtes Vieira Lopes. DOI.

Abstract: Latest years evolution in mobile devices has increased dramatically the amount of data and available information for advertisers around the world. Computational cost and available time to process data and be able to distinguish true users from anomalies or noise has only increased. Thus, the creation of a method to detect outliers could support Marketing researchers and increase their precision in understanding online behavior. Recent studies show that, so far, meta-algorithms have not been used to detect outliers. Meta-algorithms tend to bring beneﬁts because they reduce dependency that a single algorithm can generate. This work proposes a sequential model-centric ensemble design that uses different algorithms in outlier detection to obtain better results than those obtained by a single algorithm. The novelty in this approach consists in: (i) exploring the sequential technique, using algorithms that impact the next one and whose results are a combination of previously obtained results; (ii) centralizing performance around the model and not the data, which means the ensemble is applied in the whole dataset and not on different subsamples; (iii) support Marketing researchers that need to operate data Science in a more robust and coherent way.

[18_MSc_morejon]
Reinier MOREJÓN NOVALES. A multi-agent approach to data mining processes: applications to health care . [Title in Portuguese: Uma abordagem multiagente para processos de mineração de dados: aplicações na área da saúde]. M.Sc. Diss. Eng. Presentation: 06/04/2018. 61 p. Advisor: Carlos José Pereira de Lucena. DOI

Abstract: Data mining is a hot topic that attracts researchers from different areas, such as databases, machine learning, and multi-agent systems. As a consequence of the growth of data volume, there is a growing need to obtain knowledge from these large data sets that are very difficult to handle and process with traditional methods. Software agents can play a significant role performing data mining processes in ways that are more efficient. For instance, they can work to perform selection, extraction, preprocessing and integration of data as well as parallel, distributed, or multisource mining. This work proposes an approach (in the form of a framework) that uses software agents to manage data mining processes. In order to test its applicability, we use several data sets related to health care domain representing some usage scenarios (hypothyroidism, diabetes and arrhythmia).

[18_MSc_rocha]
Renato Sayão Crystallino da ROCHA. Um ﬁltro para arcos em árvores de dependência. [Title in English: A dependency tree arc ﬁlter]. M.Sc. Diss. Port. Presentation: 26/09/2018. 78 p. Advisor: Ruy Luiz Milidiú. DOI.

Abstract: The Natural Language Processing task consists of analyzing the grammatical structure of a sentence written in natural language aiming to learn, identify and extract information related to its dependency structure. This data can be structured like a tree, since every word in a sentence has a head-dependent relation to another word from the same sentence. Since Dependency Parsing is used in many applications like Machine Translation, Semantic Role Labeling and Part-Of-Speech Tagging, researchers aiming to improve the accuracy on their models are approaching this task in many diﬀerent ways. One of the approaches consists in looking at this task as a token classiﬁcation problem, using diﬀerent classiﬁers for each sub-task and joining them in an incremental way. These sub-tasks consist in classifying, for each head-dependent pair, the Part-Of-Speech tag of the head, the relative position between the two words and the distance between them. However, previous researches using this approach show that the bottleneck lies in the distance classiﬁer. Recurrent Neural Networks are a kind of Neural Network that allows us to work using sequences of vectors, allowing for classiﬁcation problems where both our input and output are sequences, making them a great choice for the problem at hand. This work studies the use of Recurrent Neural Networks, in speciﬁc Long Short-Term Memory networks, for the head-dependent distance classiﬁer sub-task as a sequence-to-sequence classiﬁcation problem. To evaluate its eﬃciency, this work follows the line of previous researches and makes use of the Portuguese corpus of the Conference on Computational Natural Language Learning 2006 Shared Task. The resulting model attains 95.27% precision, which is better than the previous results obtained using incremental models.

[18_MSc_venieris]
Ricardo Almeida VENIERIS. Uma arquitetura de software para apoio ao desenvolvimento de sistemas de diagnóstico médicos por imagem. [Title in English: A software architecture to support development of medical imagining diagnostic systems]. M.Sc. Diss. Port. Presentation: 07/02/2018. 99 p. Advisor: Carlos José Pereira de Lucena. DOI

Abstract: The image medical exam diagnostic support using Artificial Intelligence techniques has been extensively discussed and academically researched. Several computational techniques for segmentation and classification of such images are continuously created, tested and improved. From these studies, highly specialized systems that use computational vision and machine learning techniques to segment and classify exam images using knowledge acquired through large collections of lauded exams. In the medical domain, there is still the difficulty of obtaining qualified databases to support the extraction of knowledge by machine learning systems. In this work we propose a software architecture construction that supports diagnostic support systems development that allows: (i) use of multiple exam types, (ii) supporting segmentation and classification, (iii) using not only machine learning techniques as, (iv) knowledge of the available medical domain. The motivation is to facilitate the generation of classifiers task that, besides searching for specific pathological markers, can be applied to different medical activity objectives, such as punctual diagnosis, triage and prioritization of care.

[18_MSc_gouvea]
Rodrigo Mosconi de GOUVEA. Serviços, processos e máquinas: um estudo de metodologias para a realocação de processos nas máquinas. [Title in English: Processes and machines: a methodologies study for machine reassignment problem]. M.Sc. Diss. Port. Presentation: 20/04/2018. 145 p. Advisor: Marcus Vinicius Soledade Poggi de Aragão. DOI.

Abstract: A data center logic organization lies mainly by the strategic decision on how distribute services between machines, so the operational costs should be the smallest as possible. Besides those costs, must also consist the interdependence of their own services, the distribution between their localities, to improve the quality of their product to their customers. This work explores the challenge ROADEF 2012 machine assignment problem by the means of integer programming and column generation. Shows strategies to address numeric issues. At column generation, it analyzes techniques to speed up the convergence, by solving after each variable adiction, a previous generation of columns and stabilization of duals variables. At the end of the work, it compares the results obtained are compared with the best official results.

[18_MSc_monteiro]
Ruhan dos Reis MONTEIRO. A real-time reasoning service for the Internet of Things. [Title in English: Um serviço de raciocínio computacional em tempo real para a Internet das Coisas]. M.Sc. Diss. Port. Presentation: 14/09/2018. 85 p. Advisor: Markus Endler. DOI.

Abstract: The growth of the Internet of Things (IoT) has brought the opportunity to create applications in several areas, with the use of sensors and actuators. One of the problems encountered in IoT systems is the difficulty of adding semantic relations to the raw data produced by the sensors and being able to infer new facts from these relations. Moreover, due to the fact that many IoT applications are online and need to react instantly on sensor data collected by them, they need to be analyzed in real-time. Streams are a sequence of time-varying data elements that should not be stored forever and queried on demand. Streaming data needs to be consumed quickly through ongoing queries that continue to analyze and produce new relevant data, i.e. stream of output/result events. The ability to infer new semantic relationships over streaming data is called Stream Reasoning. We propose a semantic model and a mechanism for real-time data stream processing and reasoning based on Complex Event Processing (CEP), RDF (resource description structure) and OWL (Web Ontology Language). This work presents a middleware service that supports continuous reasoning on data produced by sensors. The main advantages of our approach are: (a) to consider time as a key relationship between information; (b) flow processing can be implemented using CEP; (c) is general enough to be applied to any data flow management system (DSMS). It was developed in the Advanced Collaboration Laboratory (LAC) and a case study in the field of fire detection is conducted and implemented, elucidating the use of real-time inference on streams.

[18_MSc_mesquita]
Rustam Câmara MESQUITA. Geração semiautomática de função de transferência para realce de fronteiras baseada em derivadas médias. [Title in English: Semiautomatic generation of transfer function for boundary highlight based on average derivatives]. M.Sc. Diss. Port. Presentation: 16/03/2018. 77 p. Advisor: Waldemar Celes Filho. DOI.

Abstract: Finding a good transfer function for volume rendering is a diﬃcult task that requires previous knowledge about the data domain itself. Therefore, many researches have been developed in the past few years aiming to overcome this barrier. However, only a few of them have concentrated forces into obtaining an automatic transfer function detector.Most of them focus on improving user control over transfer function domain, indicating potentially interesting regions and easing its manipulation through diﬀerent histograms. Also, the results are often presented in medical ﬁeld, through MRI, CT scan or ultrasound images. Thus, with the purpose of showing that the concepts used in these works can be exploited on oil and gas research ﬁeld, this work proposes a novel method to automatically detect transfer functions, aiming to visualize the interfaces between diﬀerent regions in the reservoir. The proposed approach is also tested in detecting boundaries between diﬀerent materials of medical datasets and other datasets widely used.

[18_PhD_pinto]
Thiago Delgado PINTO. Unifying agile specification quality control and implementation conforming assurance. [Title in Portuguese: Unificando controle de qualidade de especificação ágil de requisitos e garantia de conformidade de implementação]. Ph.D. Thesis. Eng. Presentation: 06/09/2018. 252 p. Advisor: Arndt von Staa. DOI.

Abstract: Agile requirements engineering practices are being used more commonly by software development teams. However, practices related to quality control still depend heavily on testers’ expertise and manual labor, whilst produced requirements specifications are often imprecise and hard to verify statically by both stakeholders and computers. This thesis jointly tackles the problem of verifying statically agile requirements specifications and generating full-featured test cases and automated test scripts from them. Its main contributions include: (1) a new metalanguage, called Concordia, for writing agile requirement specifications that can be used for both verification and validation (V&V) activities involving stakeholders; (2) a novel approach to generate full-featured ready to use test cases and automated test scripts from the requirements specified with the metalanguage; (3) the assessment in industrial context of the approaches’ ability to reduce risk of remaining defects and the costs of V&V.

[18_MSc_pacheco]
Toni Tiago da Silva PACHECO. Buscas eﬁcientes em vizinhanças largas para o problema do caixeiro viajante com coleta e entrega. [Title in English: Efficient large neighborhood searches for the traveling salesman problem with pickup and delivery]. M.Sc. Diss. Port. Presentation: 24/03/2018. 67 p. Advisor: Thibaut Victor Gaston Vidal. DOI.

Abstract: In various distribution and logistics issues, products must be collected at one source and delivered to a destination. Examples include disabled people transportation, express mail services, medical supplies logistics, etc. The routing problem addressed by this work, known as Traveling Salesman Problem with Pickup and Delivery (TSPPD), belongs to the class of traveling salesman problems with precedence constraints. In this problem, there is a one-to-one pickup-delivery mapping in which, for each pickup-type client, there is exactly one associated delivery-type client. Delivery clients can only be visited after the associated pickup. Since the TSPPD generalizes the TSP it is also a NP-hard problem, as the TSP is a particular case of TSPPD where each pickup matches spatially with it’s respective delivery. Variants with capacity constraints, time windows and diﬀerent loading policies have received more attention in the last decade, although there are still signiﬁcant advances to be made in terms of solution quality for the basic version of the problem. To solve this problem, we propose a hybrid metaheuristic algorithm with large neighborhoods eﬃciently explored in O(n2). Our experiments demonstrate a signiﬁcant computational time reduction and also solutions quality improvement compared to the previous works.

[18_PhD_santos]
Wallas Henrique Sousa dos SANTOS. MCAD Shape Grammar: modelagem procedimental em modelos CAD massivos industriais. [Title in English: MCAD shape grammar: procedural modeling for industrial massive CAD models]. Ph.D. Thesis. Port. Presentation: 30/04/2018. 112 p. Advisor: Alberto Barbosa Raposo. DOI.

Abstract: 3D CAD models are tools used in the industry for planning and simulations before construction or completion of tasks. In many cases, such as in the oil and gas industry, these models can be massive, that is, they have large-scale detailed information in order to be sources of accurate information. Interactive navigation in these models requires a combination of appropriate hardware and software. Even nowadays with modern GPUs, the direct rendering of these models is not ecient, requiring classic approaches such as culling non-visible objects and LOD before sending the data to the GPU. Therefore, for real-time rendering of massive CAD models, we need scalable algorithms and data structures to eciently process the scene. The work of this thesis proposes MCAD (Massive Computer-Aided Design) Shape grammar, an expansive grammar that procedurally generates objects to create 3D scenes of massive models. In recent years procedural modeling has drawn attention for quickly creating 3D scenes using a compact representation, which stores generation rules rather than explicit representation of the scene. MCAD Shape grammar explores repetitions and patterns present in massive models for rendering scenes, reducing memory footprint and procedurally processing the scene eciently. We converted real refinery models into MCAD Shape grammar and implemented a renderer for them. Results showed that our solution is scalable with high performance, also it is the first time that procedural modeling is used in this domain.