ABSTRACT Informatics Department

Monografias em Ciência da Computação

2011

ABSTRACTS

Departmento de Informática
Pontifícia Universidade Católica do Rio de Janeiro - PUC-Rio
Rio de Janeiro - Brazil

This file contains a list of the technical reports of the Departmento de Informática, Pontifícia Universidade Católica do Janeiro - PUC-Rio, Brazil, which are published in our series Monografias em Ciência da Computação (ISSN 0103-9741), edited by Prof. Carlos Lucena. Please note that the reports not available for download are available in their print format and can be obtained via the e-mail below.
For any questions, requests or suggestions, please contact:
Rosane Castilho bib-di@inf.puc-rio.br

Last update: 11/OCTOBER/2011

INDEX

[MCC01/11]
CASANOVA, M.A.; BREITMAN, K.K.; FURTADO, A.L.; VIDAL, V.M.P.; MACEDO, J.A.F. The role of constraints in linked data. 17 p. Eng. E-mail: casanova@inf.puc-rio.br

Abstract: This paper investigates the role that constraints play in Linked Data in the context of a multi-step modeling process, involving three ontologies. The source ontology provides a local model of the exported data. The domain ontology provides a conceptual model of the application domain. The application ontology describes the external model of the exported data, using a subset of the vocabulary of the domain ontology. The main contributions of the paper are methods for constructing application ontology constraints and for defining the mappings between the three ontologies. The methods assume that the ontologies are written in an expressive family of languages and depend on a procedure to test logical implication, which explores the structure of sets of constraints.

[MCC02/11]
VIANA, C.J.M.; LIFSCHITZ, S.; HAEUSLER, E.H.; MIRANDA, A.B. Protein World Database: definição e implementação de estruturas organizacionais. 32 p. Port. E-mail: sergio@inf.puc-rio.br

Abstract: The fast development of new genome sequencing technologies are contributing to increase the scale and resolution of many genomic comparative studies. In this way, the use of computational analysis techniques are becoming indispensable tools for a better understanding of relationships between organisms being studied. The main challenge faced by many researchers is the analysis of the data obtained from sequence alignments in order to obtain a better characterization of the studied organisms (characterizations in terms of their biological features, and also their relationships with the environment). This work aims to contribute to the context of genomic analysis field by modifying the methodology and implementation of some genomic sequencies comparison techniques, and in this way to help the research Genome Comparison Project - GCP. The contribution lies in the modification done to include the way how the orthologous/specific genes and regions are constructed, and also in the way how database techniques are used to optimize access and use of data from sequence comparisons of more than 400 organisms.

[MCC03/11]
VIANA, C.J.M.; LIFSCHITZ, S.; HAEUSLER, E.H.; MIRANDA, A.B. Processamento de dados Semânticos: um estudo de caso com o Protein World Database. 31 p. Port. E-mail: sergio@inf.puc-rio.br

Abstract: The Semantic Web has not only brought many opportunities, but also many other challenges into the data management problem. For instance, biological researchers make their genome findings publicly available, but as much data on the web; those findings are unrelated, and difficult to integrate with other data. In this manner, semantic web technologies could offer possibilities such as automatically infer relationships, which in turn could help to cure diseases. However, as described by Hey et al., scientific data is being generated at exponentially growing rates which makes its processing even more resource consuming. In an effort to assist researchers, this work proposes to make semantic data processing in a flexible and scalable way in order to enable inference over available genomic data. This paper presents a hybrid cloud architecture used for processing and sharing large amounts of biological data while exploiting the MapReduce programming model, and semantic web technologies to enable inference over generated semantic genomic data and related data.

[MCC04/11]
VIANA, C.J.M.; LIFISCHITZ, S.; HAEUSLER, E.H. Um estudo sobre fluxos de dados e bancos de dados biológicos. 20 p. Port. E-mail: sergio@inf.puc-rio.br

Abstract: With the advance in the genome sequencing techniques together with the arising of several repositories of biological data, computational techniques had become indispensable tools for a better characterization and understanding of the organisms in study. One way to analyze the genomes is to compare its sequences with other sequences from previously studied genomes, to define the similarities. Frequent updates in the biological data repositories led to the problem of reprocessing of the comparisons, that consists to avoid new comparisons among sequences of the study genome with already compared sequences from the repository. These comparisons are performed by biological comparasion tools, such as BLAST. This paper describes the problem, comparing it to the data flows, trying to visualize the frequent updates as a biological data flow, and addressing data streams techniques tha can be usefull to deal with the problems related to the reprocessing of the comparasions.

[MCC05/11]
CASANOVA, M.A.; BARBOSA, S.D.J.; BREITMAN, K.K.; FURTADO, A.L.. Three decades of research on database design at PUC-Rio. 14 p. Eng. E-mail: casanova@inf.puc-rio.br

Abstract: Research on database design at PUC-Rio dates back to the late seventies and covers a broad range of topics, from the early development of the relational model to recent applications of semiotic concepts to the design and specification of information systems. This paper briefly reviews some of the major contributions of the group, from the perspective of the authors. It organizes the contributions according to the data model or to the underlying disciplines that they are based on. Within each section, the presentation follows a chronological order as much as possible.

[MCC06/11]
BRANCO, A.; MOTTA, J.A.; DE SOUZA, C.S. Utilizando Engenharia Semiótica na construção de uma ferramenta de simulação para RSSF. 20 p. Eng. E-mail: clarisse@inf.puc-rio.br

Abstract: Building WSN application requires complete view of the project from the developer, including environment and hardware details. We think this view can be understood through the use of a tutorial which presents some WSN concepts supported by a simulation tool. Our goal is to use some Semiotic Engineering concepts in order to build a simulator for WSN and evaluate the use of AgentSheets as a tool to build this simulator. This report presents the concepts in the tool development, details on design decisions about the tool and ends with a conclusion about the whole process and the simulator developed.

[MCC07/11]
SILVA, F.A.G; FURTADO, A.L. Information gathering events in story plots. 27 p. Eng. E-mail: furtado@inf.puc-rio.br

Abstract: Story plots must contain, besides physical action events, a minimal set of information-gathering events, whereby the various characters can form their beliefs on the facts of the mini-world in which the narrative takes place. Three kinds of such events will be considered here, involving, respectively, inter-character communication, perception and reasoning. Multiple discordant beliefs about the same fact are allowed, making necessary the introduction of higher-level facilities to rank them and to exclude those that violate certain constraints. Since the proposed package was designed to run in a plan-based context, other higher-level facilities are also available for pattern-matching against typical plan libraries or previously composed plots. A prototype logic programming implementation is fully operational. A simple example is used throughout the presentation.

[MCC08/11]
STAA, A.v. Overview of the Talisman Version 5 software engineering meta-environment. 50 p. Eng. E-mail: arndt@inf.puc-rio.br

Abstract: This report presents an overview of Talisman's version 5 functionality. Talisman is a computer aided software engineering meta-environment. It focuses strongly on model driven tools. It provides means to build software development and maintenance environments composed of a harmonious collection of representation languages and tools. The set of representation languages and tools may cover a very wide variety of development and maintenance activities. Talisman operates on a net of workstations, each containing an environment instance providing tools to support some of the activities of a specific software development and maintenance process. The collection of environment instances supports a large portion of the activities of a given development process. Talisman stores fine grained objects in a distributed repository. The base schema and meta-schema of this repository as well as the definition of user interfaces, representation languages and tools are kept in a definition base. Definition bases are derived from an environment base which contains all facts about supported representation languages and tools. The environment base is used by the environment builder to create and maintain representation languages and to adapt tools to the specific needs of a particular project. One of the basic aims of Talisman is to compose and maintain code and other artifacts from high level specifications relying heavily on model driven activities. The result of the development using Talisman is a hyper-document interrelating all artifacts that constitute the target system. The construction and maintenance of this hyper-document is achieved by successive transformations, modifications and verifications of a variety of models. To define and fine-tune these tools, Talisman uses an internal programming language, which specializes tools and activities, such as editors, code composers, representation transformers, representation verifiers and hyper-document navigation control.

[MCC09/11]
FERREIRA, J.J.; DE SOUZA, C.S. Agentes no AgentSheets®: como o AgentSheets® comunica o conceito de agentes. 19 p. Eng. E-mail: clarisse@inf.puc-rio.br

Abstract: This paper presents findings and observations on an interesting study of AgentSheets® with a focus on the communication of the concept of agent by the designers of the tool. The concept of agent is a key concept of AgentSheets®, even the tool name carries the related term. This study goal was to investigate how the concept is communicated to the users for the tool use. The study was composed by a Semiotic Inspection Method (SIM), triangulated with assisted user interaction. The interesting points and comments presented some promising indications for future investigations. The paper presents the research held during a graduation course of Introduction to HCI (Human computer interaction) of Informatics Department of PUC-RIO.

[MCC10/11]
ARAÚJO, E.C.; REDLICH, L.R.; LAGO, V.; MORENO, M.: SOARES, L.F.G. Nested Context Language 3.0 Parte 14: Suíte de testes de conformidade para o Ginga-NCL. 20 p Eng. E-mail: soares@inf.puc-rio.br

Abstract: This paper describes the development of the compliance test suite for the Ginga-NCL, an ISDB-TB standard for digital terrestrial TV and ITU-T H.761 Recommendation for IPTV services.Both the test suite specification, i.e., the set of its test cases, as the several possibilities for its application are discussed. The peculiarities with regards the development of conformance tests for systems designed for declarative languages are argued, in particular those found in developing tests for middleware systems aiming at XML-based declarative environments. In this respect, the paper points out what the proposed suite has brought as contributions to the state of the art. Because it is an extensible suite, the article also brings the rules, adopted by the ITU-T, for its extension.

[MCC11/11]
FURTADO, A.L. IDB - an environment for experimenting with intelligent database-resident information systems. 41 p. Eng. E-mail: furtado@inf.puc-rio.br

Abstract: The IDB tool aims to provide an environment for the conceptual specification and testing of intelligent information systems. The IDB environment comprises three different options. Initially, it runs in the workspace of a Prolog program. Next, still under the control of this program, it operates upon database-resident relational tables via an ODBC interface. Finally leading to an operational stage, it provides automatically generated stored procedures, which enforce integrity and collect execution traces for continuing maintenance and redesign purposes. The tool relies on plan-generation to conduct experiments, both in main memory and over the Oracle database.

[MCC12/11]
NUNES, B.P.; MERA, A.; CASANOVA, M.A.; BREITMAN, K.K.; PAES LEME, L.A.P. Complex matching of RDF datatype properties. 12 p. Eng. E-mail: casanova@inf.puc-rio.br

Abstract: Property mapping is a fundamental component of ontology matching and yet hardly any technique goes beyond the identification of single property matches. However, real data often requires some degree of composition, trivially exemplified by the mapping of FirstName, LastName to FullName. Genetic programming offers an alternative, but the solution space is so large that the required computation effort would be prohibitive. This paper proposes a two-phase instance-based technique for complex datatype property matching. In the first phase, the technique computes the estimate mutual information matrix of the property values to (1) find simple, 1:1 matches, and (2) compute a list of possible complex matches. In the second phase, it applies genetic programming to a much reduced search space to find complex matches. The paper concludes with experimental results that illustrate how the technique works and indicate that the technique obtains better results that those achieved by separately using the estimate mutual information matrix or genetic programming.

[MCC13/11]
SKYRME, A.R.A.; RODRIGUEZ, N.L.R.; MUSA. P.M.; IERUSALIMSCHY, R.; SILVESTRE, B.O. Embedding concurrency: a Lua case study. 9 p. Eng. E-mail: noemi@inf.puc-rio.br

Abstract: Concurrency support can be considered in the design of a programming language or provided by constructs added, often by means of libraries, to a language with limited or lacking concurrency features. The choice between these approaches is not an easy one: explicitly concurrent languages offer efficiency and syntax elegance, while libraries offer greater flexibility. In this paper we discuss yet another approach, available to scripting languages: embedding concurrency. We take the Lua programming language and explain the mechanisms it offers to support embedding. Then, using two concurrent systems as examples, we show how these mechanisms can be useful for creating lightweight concurrency models.

[MCC14/11]
MARTINELLI, R.; POGGI, M.; SUBRAMANIAN, A. Improved bounds for large scale capacitated arc routing problem. 17 p. Eng. E-mail: poggi@inf.puc-rio.br

Abstract: The Capacitated Arc Routing Problem (CARP) stands among the hardest combinatorial problems to solve or to find high quality solutions. This becomes even more true when dealing with large instances. This paper investigates methods to improve on lower and upper bounds of instances on graphs with over two hundred vertices and three hundred edges, dimensions that, today, can be considered of large scale. On the lower bound side, we propose to explore the speed of a dual ascent heuristic to generate capacity cuts. These cuts are next improved with a new exact separation enchained to the linear program resolution that follows the dual heuristic. On the upper bound, we apply a modified Iterated Local Search procedure to Capacitated Vehicle Routing Problem (CVRP) instances obtained through a transformation from the CARP original instances. Computational experiments were carried out on the set of large instances from Brandão and Eglese and also on the regular size set. The experiments on the latter allows evaluating the quality of the proposed lower bounds, while the ones on the former presents improved lower and upper bounds to all the set of larger instances.

[MCC15/11]
HORÁCIO, J.S.; COSTA, A.D.; LUCENA, C.J.P.; FIORINI, S.T. GearDB: uma nova ferramenta para geração de dados. 14 p. Eng. E-mail: lucena@inf.puc-rio.br

Abstract: This paper presents GearDB, a new tool for automatically generating data for databases created using different Database Management Systems (DBMS). Aiming at a powerful result, we decided to integrate GearDB with JMeter, since JMeter is a tool used for loading and performance testing, which usually require large volumes of data to carry on their tests. Besides, an industrial large system responsible for controlling the inventory and supply of petroleum and derived products (e.g. gasoline, kerosene, etc) has been used to illustrate a real situation in which GearDB has proven to be useful.

[MCC16/11]
FERNANDES, E.M.; MILIDIÚ, R.L. Entropy-guided feature generation for large margin structured learning. 15 p. Eng. E-mail: milidiu@inf.puc-rio.br

Abstract: Structured learning consists in learning a mapping from inputs to structured outputs by means of a sample of correct input-output pairs. Many important problems fit in this setting. For instance, dependency parsing involves the recognition of a tree underlying a sentence. Feature generation is an important subtask of structured learning modeling. Usually, it is partially solved by a domain expert that builds complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to generate features and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. We denote this method entropy guided since it is based on the conditional entropy of local output variables given some basic features. We have evaluated our method on four computational linguistic tasks. We compare the proposed method with two important alternative feature generation methods, namely manual template generation and polynomial kernel functions. Our results show that entropy-guided feature generation outperforms both alternatives and, furthermore, presents additional advantages. The proposed method is cheaper than manual templates and much faster than kernel methods. Furthermore, the developed systems present state-of-the-art comparable performances and, particularly on Portuguese dependency parsing, remarkably reduces the previous smallest error by more than 15%. We further propose to model two complex natural language processing problems that, as far as we know, have never been approached by structured learning methods before. Namely, quotation extraction and coreference resolution.

[MCC17/11]
LIMA, E.S.; FEIJÓ, B.; FURTADO, A.L.; CIARLINI, A.E.M.; POZZER, C.T. Automatic video editing for video-based interactive storytelling. 11 p. Eng. E-mail: bfeijo@inf.puc-rio.br

Abstract: The development of interactive storytelling systems with the quality of feature films is a hard challenge. A promising approach to this problem is the use of recorded videos to dramatize the stories. However, in this approach, automatic video editing is a critical stage. In this paper, we present an effective method based on cinematography principles that automatically edit segments of videos in real-time, while the plot is being generated by an interactive storytelling system.

[MCC18/11]
FURTADO, A.L. Semiotic relations and proof methods. 8 p. Eng. E-mail: furtado@inf.puc-rio.br

Abstract: When a direct proof of a statement S seems hard or even impossible to obtain, there may exist another statement (or set of statements) S*, somehow related to S, on the basis of which S can be proved. In order to investigate what options can be used to move from S to S*, four kinds of semiotic relations inspired on the four master tropes of semiotic research are briefly reviewed. Specifically, our syntagmatic, paradigmatic, antithetic and meronymic relations correspond, respectively, to metonymy, metaphor, irony and synecdoche. It is suggested that these four semiotic relations determine the options to move from S to S*, leading to proof by inference, proof by analogy, proof by contradiction, and proof by cases.

[MCC19/11]
LEAL, A.L.C. Um estudo de caso sobre o perfil de risco de adoção de boas Práticas em desenvolvimento de software em micro empresas com base na abordagem GQM. 32 p. Port. E-mail: bib-di@inf.puc-rio.br

Abstract: An evaluation of the risk involved in the adoption of several software development practices in different companies from the Arranjo Produtivo Local de Viçosa is presented. We consider the practices proposed by the Plan-driven and Agile methods. From this evaluation, our objective is to formulate an adoption plan that reduces both adoption risk and learning curve. This paper presents initial results from an empirical evaluation conducted at two software companies.

[MCC20/11]
LEAL, A.L.C.; SOUSA, H.P. Modelagem intencional de políticas e implementação de agentes de monitoração de transparência em sistemas de software. 16 p. Port. E-mail: bib-di@inf.puc-rio.br

Abstract: Paper presenting a proposal for the operationalization of software monitoring transparency from multi-agents systems. It is a first effort for the understanding of policies and intentional models for monitors agents.

[MCC21/11]
LEAL, A.L.C. Relações entre riscos, erros, vulnerabilidades e boas práticas de software - uma análise com base em diagramas de influência: o caso da SANS/MITRE. 20 p. Port. E-mail: bib-di@inf.puc-rio.br

Abstract: Work presenting an analysis of reported errors by SANS/MITRE, establishing a relationship among these errors, software vulnerability and best practices through a casual loop diagram.

[MCC22/11]
LEAL, A.L.C.; SOUSA, H.P. Contexto de transparência aplicada a modelos de negócio e a sistema de Software . 17 p. Port. E-mail: bib-di@inf.puc-rio.br

Abstract: This work analyses transparency in the context of software and business processes, estabilishes a relationship among the transparency operations applied to the Lattesscholar software and evaluates aspects paradigm in transparency of business process model.