Author: "Renée J. Miller" / Topic: schema (psychology) - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Renée J. Miller"' showing total 28 results

Start Over Author "Renée J. Miller" Topic schema (psychology)

28 results on '"Renée J. Miller"'

1. Open data integration

Author: Renée J. Miller
Subjects: Focus (computing), Computer science, General Engineering, Data discovery, 02 engineering and technology, computer.software_genre, Data science, Transparency (behavior), Task (project management), Open data, 020204 information systems, Schema (psychology), Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, computer, Data integration
Abstract: Open data plays a major role in supporting both governmental and organizational transparency. Many organizations are adopting Open Data Principles promising to make their open data complete, primary, and timely. These properties make this data tremendously valuable to data scientists. However, scientists generally do not have a priori knowledge about what data is available (its schema or content). Nevertheless, they want to be able to use open data and integrate it with other public or private data they are studying. Traditionally, data integration is done using a framework called query discovery where the main task is to discover a query (or transformation) that translates data from one form into another. The goal is to find the right operators to join, nest, group, link, and twist data into a desired form. We introduce a new paradigm for thinking about integration where the focus is on data discovery, but highly efficient internet-scale discovery that is driven by data analysis needs. We describe a research agenda and recent progress in developing scalable data-analysis or query-aware data discovery algorithms that provide high recall and accuracy over massive data repositories.
Published: 2018
Full Text: View/download PDF

2. The Future of Data Integration

Author: Renée J. Miller
Subjects: Correctness, Computer science, Ontology-based data integration, 02 engineering and technology, computer.software_genre, Data science, Data mapping, Visualization, Data exchange, 020204 information systems, Schema (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, computer, Data integration
Abstract: The value of data explodes when it is integrated. In this talk, I present some important innovations in data integration over the last two decades. These include data exchange [1], which provides a foundation for reasoning about the correctness of transformed data, and the use of declarative mappings in integration [2]. I discuss how data mining has been used to facilitate data integration, including constraint discovery [3], mapping discovery [4], and in schema discovery to combat database decay and facilitate integration [5,6]. I present some important new data integration challenges that arise in data science. These include the use of mining for query and visualization recommendation over massive data lakes [7] and data set search, finding datasets of interest at interactive speeds [8].
Published: 2017
Full Text: View/download PDF

3. Discovering linkage points over web data

Author: Lucian Popa, Ken Q. Pu, Oktie Hassanzadeh, Mauricio A. Hernández, Soheil Hassas Yeganeh, Howard Ho, and Renée J. Miller
Subjects: Computer science, Star schema, Schema (psychology), General Engineering, Schema analysis, Data mining, computer.software_genre, computer, Schema matching, Record linkage, Data integration, Superkey
Abstract: A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator, that associates attributes of one database to another. However, the massive growth in the amount and variety of unstructured and semi-structured data on the Web has created new challenges for this task. Such data sources often do not have a fixed pre-defined schema and contain large numbers of diverse attributes. Furthermore, the end goal is not schema alignment as these schemas may be too heterogeneous (and dynamic) to meaningfully align. Rather, the goal is to align any overlapping data shared by these sources. We will show that even attributes with different meanings (that would not qualify as schema matches) can sometimes be useful in aligning data. The solution we propose in this paper replaces the basic schema-matching step with a more complex instance-based schema analysis and linkage discovery. We present a framework consisting of a library of efficient lexical analyzers and similarity functions, and a set of search algorithms for effective and efficient identification of linkage points over Web data. We experimentally evaluate the effectiveness of our proposed algorithms in real-world integration scenarios in several domains.
Published: 2013
Full Text: View/download PDF

4. TRAMP

Author: Laura M. Haas, Boris Glavic, Gustavo Alonso, and Renée J. Miller
Subjects: Schema (genetic algorithms), Provenance, Theoretical computer science, Information retrieval, Computer science, Schema (psychology), General Engineering, Tracing
Abstract: Though partially automated, developing schema mappings remains a complex and potentially error-prone task. In this paper, we present TRAMP (TRAnsformation Mapping Provenance), an extensive suite of tools supporting the debugging and tracing of schema mappings and transformation queries. TRAMP combines and extends data provenance with two novel notions, transformation provenance and mapping provenance, to explain the relationship between transformed data and those transformations and mappings that produced that data. In addition we provide query support for transformations, data, and all forms of provenance. We formally define transformation and mapping provenance, present an efficient implementation of both forms of provenance, and evaluate the resulting system through extensive experiments.
Published: 2010
Full Text: View/download PDF

5. Gain Control over your Integration Evaluations

Author: Radu Ciucanu, Boris Glavic, Patricia C. Arocena, Renée J. Miller, University of Toronto, Linking Dynamic Data (LINKS), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), University of Oxford [Oxford], Illinois Institute of Technology (IIT), and University of Oxford
Subjects: [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB], business.industry, Computer science, Ontology-based data integration, General Engineering, Reuse, computer.software_genre, Schema evolution, Metadata, Empirical research, Data exchange, Schema (psychology), Scalability, Data mining, Software engineering, business, computer, Data integration
Abstract: International audience; Integration systems are typically evaluated using a few real-world scenarios (e.g., bibliographical or biological datasets) or using synthetic scenarios (e.g., based on star-schemas or other patterns for schemas and constraints). Reusing such evaluations is a cumbersome task because their focus is usually limited to showcasing a specific feature of an approach. This makes it difficult to compare integration solutions, understand their generality, and understand their performance for different application scenarios. Based on this observation, we demonstrate some of the requirements for developing integration benchmarks. We argue that the major abstractions used for integration problems have converged in the last decade which enables the application of robust empirical methods to integration problems (from schema evolution, to data exchange, to answering queries using views and many more). Specifically, we demonstrate that schema mappings are the main abstraction that now drives most integration solutions and show how a metadata generator can be used to create more credible evaluations of the performance and scalability of data integration systems. We will use the demonstration to evangelize for more robust, shared empirical evaluations of data integration systems.
Published: 2015
Full Text: View/download PDF

6. VizCurator

Author: Soheil Hassas Yeganeh, Christina Christodoulakis, Bahar Ghadiri Bashardoost, Renée J. Miller, Kelly Lyons, and Oktie Hassanzadeh
Subjects: Data curation, Computer science, computer.internet_protocol, RDF Schema, Linked data, computer.file_format, Data science, Visualization, World Wide Web, Open data, Schema (psychology), RDF, computer, XML
Abstract: Vizcurator permits the exploration, understanding and curation of open RDF data, its schema, and how it has been linked to other sources. We provide visualizations that enable one to seamlessly navigate through RDFS and RDF layers and quickly understand the open data, how it has been mapped or linked, how it has been structured (and could be restructured), and how deeply it has been related to other open data sources. More importantly, Vizcurator provides a rich set of tools for data curation. It suggests possible improvements to the structure of the data and enables curators to make informed decisions about enhancements to the exploration and exploitation of the data. Moreover, Vizcurator facilitates the mining of temporal resources and the definition of temporal constraints through which the curator can identify conflicting facts. Finally, Vizcurator can be used to create new binary temporal relations by reifying base facts and linking them to temporal resources. We will demonstrate Vizcurator using LinkedCT.org, a five-star open data set mapped from the XML NIH clinical trials data (clinicaltrials.gov) that we have been maintaining and curating for several years.
Published: 2015
Full Text: View/download PDF

7. Data-driven understanding and refinement of schema mappings

Author: Laura M. Haas, Renée J. Miller, Ling Ling Yan, and Ronald Fagin
Subjects: Schema (genetic algorithms), Transformation (function), Theoretical computer science, Basis (linear algebra), Computer science, Process (engineering), Star schema, Schema (psychology), Data transformation, Software, Data-driven, Information Systems
Abstract: At the heart of many data-intensive applications is the problem of quickly and accurately transforming data into a new form. Database researchers have long advocated the use of declarative queries for this process. Yet tools for creating, managing and understanding the complex queries necessary for data transformation are still too primitive to permit widespread adoption of this approach. We present a new framework that uses data examples as the basis for understanding and refining declarative schema mappings. We identify a small set of intuitive operators for manipulating examples. These operators permit a user to follow and refine an example by walking through a data source. We show that our operators are powerful enough both to identify a large class of schema mappings and to distinguish effectively between alternative schema mappings. These operators permit a user to quickly and intuitively build and refine complex data transformation queries that map one data source into another.
Published: 2001
Full Text: View/download PDF

8. Value invention in data exchange

Author: Boris Glavic, Patricia C. Arocena, and Renée J. Miller
Subjects: Set (abstract data type), Schema (genetic algorithms), Theoretical computer science, Computer science, Data exchange, Semantics (computer science), Schema (psychology), Skolem normal form, Tuple, Semantics, Functional dependency, Algorithm
Abstract: The creation of values to represent incomplete information, often referred to as value invention, is central in data exchange. Within schema mappings, Skolem functions have long been used for value invention as they permit a precise representation of missing information. Recent work on a powerful mapping language called second-order tuple generating dependencies (SO tgds), has drawn attention to the fact that the use of arbitrary Skolem functions can have negative computational and programmatic properties in data exchange. In this paper, we present two techniques for understanding when the Skolem functions needed to represent the correct semantics of incomplete information are computationally well-behaved. Specifically, we consider when the Skolem functions in second-order (SO) mappings have a first-order (FO) semantics and are therefore programmatically and computationally more desirable for use in practice. Our first technique, linearization, significantly extends the Nash, Bernstein and Melnik unskolemization algorithm, by understanding when the sets of arguments of the Skolem functions in a mapping are related by set inclusion. We show that such a linear relationship leads to mappings that have FO semantics and are expressible in popular mapping languages including source-to-target tgds and nested tgds. Our second technique uses source semantics, specifically functional dependencies (including keys), to transform SO mappings into equivalent FO mappings. We show that our algorithms are applicable to a strictly larger class of mappings than previous approaches, but more importantly we present an extensive experimental evaluation that quantifies this difference (about 78% improvement) over an extensive schema mapping benchmark and illustrates the applicability of our results on real mappings.
Published: 2013
Full Text: View/download PDF

9. Schema equivalence in heterogeneous systems: Bridging theory and practice

Author: Renée J. Miller, Yannis Ioannidis, and Raghu Ramakrishnan
Subjects: Theoretical computer science, Logical equivalence, Computer science, business.industry, Undecidable problem, Decidability, Hardware and Architecture, If and only if, Data integrity, Schema (psychology), Information system, Artificial intelligence, business, Equivalence (measure theory), Software, Information Systems
Abstract: Current theoretical work offers measures of schema equivalence based on the information capacity of schemas. This work is based on the existence of abstract functions satisfying various restrictions between the sets of all instances of two schemas. In considering schemas that arise in practice, however, it is not clear how to reason about the existence of such abstract functions. Further, these notions of equivalence tend to be too liberal in that schemas are often considered equivalent when a practitioner would consider them to be different. As a result, practical integration methodologies have not utilized this theoretical foundation and most of them have relied on ad-hoc approaches. We present results that seek to bridge this gap. First, we consider the problem of deciding information capacity equivalence and dominance of schemas that occur in practice, i.e., those that can express inheritance and simple integrity constraints. We show that this problem is undecidable. This undecidability suggests that in addition to the overly liberal nature of information capacity equivalence, we should look for alternative, more restrictive notions of equivalence that can be effectively tested. To this end, we develop several tests that each serve as sufficient conditions for information capacity equivalence or dominance. Each test is characterized by a set of schema transformations in the following sense: a test declares that Schema S1 is dominated by schema S2 if and only if there is a sequence of transformations that converts S1 to S2. Thus, each test can be understood essentially by understanding the individual transformations used to characterize it. Each of the transformations we consider is a local, structural schema change with a clear underlying intuition. We demonstrate the power of these tests by showing that one can reason about the equivalence and dominance of quite complex schemas. Because our work is based on structural transformations, the same characterizations that underly our tests can be used to guide designers in modifying a schema to meet their equivalence or dominance goals.
Published: 1994
Full Text: View/download PDF

10. A unified model for data and constraint repair

Author: Fei Chiang and Renée J. Miller
Subjects: Computer science, Business rule, computer.software_genre, Synthetic data, Data modeling, Constraint (information theory), Data efficiency, Data integrity, Schema (psychology), Data quality, Scalability, Data mining, Functional dependency, computer, Operational database
Abstract: Integrity constraints play an important role in data design. However, in an operational database, they may not be enforced for many reasons. Hence, over time, data may become inconsistent with respect to the constraints. To manage this, several approaches have proposed techniques to repair the data, by finding minimal or lowest cost changes to the data that make it consistent with the constraints. Such techniques are appropriate for the old world where data changes, but schemas and their constraints remain fixed. In many modern applications however, constraints may evolve over time as application or business rules change, as data is integrated with new data sources, or as the underlying semantics of the data evolves. In such settings, when an inconsistency occurs, it is no longer clear if there is an error in the data (and the data should be repaired), or if the constraints have evolved (and the constraints should be repaired). In this work, we present a novel unified cost model that allows data and constraint repairs to be compared on an equal footing. We consider repairs over a database that is inconsistent with respect to a set of rules, modeled as functional dependencies (FDs). FDs are the most common type of constraint, and are known to play an important role in maintaining data quality. We evaluate the quality and scalability of our repair algorithms over synthetic data and present a qualitative case study using a well-known real dataset. The results show that our repair algorithms not only scale well for large datasets, but are able to accurately capture and correct inconsistencies, and accurately decide when a data repair versus a constraint repair is best.
Published: 2011
Full Text: View/download PDF

11. Stream schema

Author: Peter M. Fischer, Kyumars Sheykh Esmaili, and Renée J. Miller
Subjects: Data consistency, Information retrieval, Database, Computer science, business.industry, Data stream mining, Data management, computer.software_genre, Stream processing, Metadata, XML database, Conceptual design, Schema (psychology), ddc:004, business, computer
Abstract: Schemas, and more generally metadata specifying structural and semantic constraints, are invaluable in data management. They facilitate conceptual design and enable checking of data consistency. They also play an important role in permitting semantic query optimization, that is, optimization and processing strategies that are often highly effective, but only correct for data conforming to a given schema. While the use of metadata is well-established in relational and XML databases, the same is not true for data streams. The existing work mostly focuses on the specification of dynamic information. In this paper, we consider the specification of static metadata for streams in a model called Stream Schema. We show how Stream Schema can be used to validate the consistency of streams. By explicitly modeling stream constraints, we show that stream queries can be simplified by removing predicates or subqueries that check for consistency. This can greatly enhance pro-grammability of stream processing systems. We also present a set of semantic query optimization strategies that both permit compile-time checking of queries (for example, to detect empty queries) and new runtime processing options, options that would not have been possible without a Stream Schema specification. Case studies on two stream processing platforms (covering different applications and underlying stream models), along with an experimental evaluation, show the benefits of Stream Schema.
Published: 2010
Full Text: View/download PDF

12. (Not) yet another matcher

Author: Renée J. Miller, Remi Coletta, Fabien Duchateau, and Zohra Bellahsene
Subjects: business.industry, Computer science, Similarity measure, Machine learning, computer.software_genre, Schema matching, Schema (psychology), Star schema, XML schema, Artificial intelligence, Data mining, business, computer, Yet another, computer.programming_language, Data integration
Abstract: Discovering correspondences between schema elements is a crucial task for data integration. Most schema matching tools are semi-automatic, e.g. an expert must tune some parameters (thresholds, weights, etc.). They mainly use several methods to combine and aggregate similarity measures. However, their quality results often decrease when one requires to integrate a new similarity measure or when matching particular domain schemas. This paper describes YAM (Yet Another Matcher), which is a schema matcher factory. Indeed, it enables the generation of a dedicated matcher for a given schema matching scenario, according to user inputs. Our approach is based on machine learning since schema matchers can be seen as classifiers. Several bunches of experiments run against matchers generated by YAM and traditional matching tools show how our approach is able to generate the best matcher for a given scenario.
Published: 2009
Full Text: View/download PDF

13. YAM: a Schema Matcher Factory

Author: Renée J. Miller, Remi Coletta, Zohra Bellahsene, Fabien Duchateau, Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Scientific Data Management (ZENITH), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Computer Science [University of Toronto] (DCS), University of Toronto, Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), and Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Inria Sophia Antipolis - Méditerranée (CRISAM)
Subjects: Information retrieval, [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB], Computer science, business.industry, XML schemas, Similarity measure, computer.software_genre, User requirements document, demo, Schema matching, machine learning, Knowledge base, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Schema (psychology), matcher factory, YAM, XML schema, business, Yet Another Matcher, computer, Classifier (UML), data integration, Data integration, computer.programming_language
Abstract: In this paper, we present YAM, a schema matcher factory. YAM (Yet Another Matcher) is not (yet) another schema matching system as it enables the generation of a la carte schema matchers according to user requirements. These re- quirements include a preference for recall or precision, a training data set (schemas already matched) and provided expert correspondences. YAM uses a knowledge base that includes a (possibly large) set of similarity measures and classifiers. Based on the user requirements, YAM learns how to best apply these tools (similarity measures and clas- sifiers) in concert to achieve the best matching quality. In our demonstration, we will let users apply YAM to build the best schema matcher for different user requirements.
Published: 2009

14. Schema AND Data: A Holistic Approach to Mapping, Resolution and Fusion in Information Integration

Author: Laura M. Haas, Donald Kossmann, Martin Hentschel, and Renée J. Miller
Subjects: Three schema approach, business.industry, Computer science, Ontology-based data integration, computer.software_genre, Sensor fusion, Schema (psychology), System integration, Schema mapping, Code generation, Data mining, business, Software engineering, computer, Information integration
Abstract: To integrate information, data in different formats, from dif- ferent, potentially overlapping sources, must be related and transformed to meet the users' needs. Ten years ago, Clio introduced nonprocedural schema mappings to describe the relationship between data in heteroge- neous schemas. This enabled powerful tools for mapping discovery and integration code generation, greatly simplifying the integration process. However, further progress is needed. We see an opportunity to raise the level of abstraction further, to encompass both data- and schema-centric integration tasks and to isolate applications from the details of how the integration is accomplished. Holistic information integration supports it- eration across the various integration tasks, leveraging information about both schema and data to improve the integrated result. Integration inde- pendence allows applications to be independent of how, when, and where information integration takes place, making materialization and the tim- ing of transformations an optimization decision that is transparent to applications. In this paper, we define these two important goals, and propose leveraging data mappings to create a framework that supports both data- and schema-level integration tasks.
Published: 2009
Full Text: View/download PDF

15. Clio: Schema Mapping Creation and Data Exchange

Author: Renée J. Miller, Ronald Fagin, Mauricio A. Hernández, Laura M. Haas, Lucian Popa, and Yannis Velegrakis
Subjects: Information retrieval, Computer science, Data exchange, computer.internet_protocol, Schema (psychology), Schema mapping, computer.software_genre, computer, XML, Information integration, Data integration, Data mapping
Abstract: The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural schema mappings to describe the relationship between data in heterogeneous schemas, a new paradigm in which we view the mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange . In this chapter, we present our algorithms for both schema mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.
Published: 2009
Full Text: View/download PDF

16. Muse

Author: Laura Chiticariu, Daniel Pepper, Wang-Chiew Tan, Renée J. Miller, and Bogdan Alexe
Subjects: Theoretical computer science, Correctness, Natural mapping, Programming language, Computer science, Semantics (computer science), Design tool, computer.software_genre, Data mapping, Data exchange, Schema (psychology), Design process, computer, Information integration
Abstract: Schema mappings are logical assertions that specify the relationships between a source and a target schema in a declarative way. The specification of such mappings is a fundamental problem in information integration. Mappings can be generated by existing mapping systems (semi-)automatically from a visual specification between two schemas. In general, the well-known 80-20 rule applies for mapping generation tools. They can automate 80% of the work, covering common cases and creating a mapping that is close to correct. However, ensuring complete correctness can still require intricate manual work to perfect portions of the mapping.Previous research on mapping understanding and refinement and anecdotal evidence from mapping designers suggest that the mapping design process can be perfected by using data examples to explain the mapping and alternative mappings. We demonstrate Muse, a data example driven mapping design tool currently implemented on top of the Clio schema mapping system. Muse leverages data examples that are familiar to a designer to illustrate nuances of how a small change to a mapping specification changes its semantics. We demonstrate how Muse can differentiate between alternative mapping specifications and infer the desired mapping semantics based on the designer's actions on a short sequence of simple data examples.
Published: 2008
Full Text: View/download PDF

17. Muse: Mapping Understanding and deSign by Example

Author: Laura Chiticariu, Bogdan Alexe, Wang-Chiew Tan, and Renée J. Miller
Subjects: computer.internet_protocol, Programming language, Computer science, Schema (psychology), Data mining, computer.software_genre, Engineering design process, computer, XML, Data modeling, Information integration
Abstract: A fundamental problem in information integration is that of designing the relationships, called schema mappings, between two schemas. The specification of a semantically correct schema mapping is typically a complex task. Automated tools can suggest potential mappings, but few tools are available for helping a designer understand mappings and design alternative mappings. We describe Muse, a mapping design wizard that uses data examples to assist designers in understanding and refining a schema mapping towards the desired specification. We present novel algorithms behind Muse and show how Muse systematically guides the designer on two important components of a mapping design: the specification of the desired grouping semantics for sets of data and the choice among alternative interpretations for semantically ambiguous mappings. In every component, Muse infers the desired semantics based on the designer's actions on a short sequence of small examples. Whenever possible, Muse draws examples from a familiar database, thus facilitating the design process even further. We report our experience with Muse on some publicly available schemas.
Published: 2008
Full Text: View/download PDF

18. Guest editorial: special issue on metadata management

Author: Renée J. Miller and Tiziana Catarci
Subjects: Information retrieval, Computer science, business.industry, Data management, computer.software_genre, Data science, Data modeling, Personalization, Datalog, Metadata, Data model, Hardware and Architecture, Data exchange, Schema (psychology), Metadata management, business, computer, Information Systems, Data integration, computer.programming_language
Abstract: In this special issue on metadata management, we present a new work on creating, gathering, managing, and understanding metadata. The work in this issue highlights the reality that the lack of metadata and effective techniques for managing them is currently one of the biggest challenges to meaningful use and sharing of the wealth (or should we say glut) of data available today. Our first paper, “Model-Independent SchemaTranslation” by Paolo Atzeni, Paolo Cappellari, Riccardo Torlone, Philip A. Bernstein, and Giorgio Gianforme, considers the schema translation problem—a problem as old as the data management field itself.Model IndependentData andSchemaTranslation (MIDST) is a framework for translating schemas from one data model to another. The explicit creation of metadata representing the relationship between the schema and its translation is central to the approach. The schema mapping is represented in a variant of Datalog with Skolem functions. Such mappings are in the spirit of source-to-target tuplegenerating-dependencies (commonly used in data integration and data exchange) but with special constructs for reasoning about data model specific concepts. These mappings play a crucial role in permitting the customization of the schema translation rules for different applications or domains. They also provide the mechanism through which MIDST can be extended to new data models. Our second paper, “PicShark: Mitigating Metadata Scarcity through Large-Scale P2P Collaboration” by Philippe
Published: 2008

19. Leveraging data and structure in ontology integration

Author: Lise Getoor, Renée J. Miller, and Octavian Udrea
Subjects: Information retrieval, Computer science, Process ontology, Ontology-based data integration, Suggested Upper Merged Ontology, Web Ontology Language, Ontology (information science), computer.software_genre, Logical schema, Schema matching, Ontology components, Schema (psychology), Upper ontology, Ontology integration, computer, Ontology alignment, computer.programming_language, Data integration
Abstract: There is a great deal of research on ontology integration which makes use of rich logical constraints to reason about the structural and logical alignment of ontologies. There is also considerable work on matching data instances from heterogeneous schema or ontologies. However, little work exploits the fact that ontologies include both data and structure. We aim to close this gap by presenting a new algorithm (ILIADS) that tightly integrates both data matching and logical reasoning to achieve better matching of ontologies. We evaluate our algorithm on a set of 30 pairs of OWL Lite ontologies with the schema and data matchings found by human reviewers. We compare against two systems - the ontology matching tool FCA-merge [28] and the schema matching tool COMA++ [1]. ILIADS shows an average improvement of 25% in quality over FCA-merge and a 11% improvement in recall over COMA++.
Published: 2007
Full Text: View/download PDF

20. Creating Nested Mappings with Clio

Author: Mauricio A. Hernández, Ariel Fuxman, Paolo Papotti, Renée J. Miller, Howard Ho, Takeshi Fukuda, Lucian Popa, MAURICIO A., Hernandez, Howard, Ho, Lucian, Popa, Ariel, Fuxman, RENE J., Miller, Takeshi, Fukuda, and Papotti, Paolo
Subjects: Source data, Programming language, Computer science, Data exchange, computer.internet_protocol, Schema (psychology), computer.software_genre, External Data Representation, Data structure, computer, XML, Data integration, Electronic data interchange
Abstract: Schema mappings play a central role in many data integration and data exchange scenarios. In those applications, users need to quickly and correctly specify how data represented in one format is converted into a different format. Clio (L. Popa et al., 2002) is a joint research project between IBM and the University of Toronto studying the creation, maintenance, and use of schema mappings. There have always been two goals in our work in Clio: 1) the automatic creation of logical assertions that capture the way one or more source schemas are mapped into a target schema, and 2) the generation of transformation queries or programs that transform a source data instance into a target data instance.
Published: 2007
Full Text: View/download PDF

21. A Semantic Approach to Discovering Schema Mapping Expressions

Author: Alexander Borgida, John Mylopoulos, Yuan An, and Renée J. Miller
Subjects: Theoretical computer science, Schema migration, Computer science, media_common.quotation_subject, Database schema, Graph theory, Logical schema, Conceptual schema, Schema (psychology), Star schema, Conceptual model, Table (database), Graph (abstract data type), Referential integrity, media_common
Abstract: In many applications it is important to find a meaningful relationship between the schemas of a source and target database. This relationship is expressed in terms of declarative logical expressions called schema mappings. The more successful previous solutions have relied on inputs such as simple element correspondences between schemas in addition to local schema constraints such as keys and referential integrity. In this paper, we investigate the use of an alternate source of information about schemas, namely the presumed presence of semantics for each table, expressed in terms of a conceptual model (CM) associated with it. Our approach first compiles each CM into a graph and represents each table's semantics as a subtree in it. We then develop algorithms for discovering subgraphs that are plausible connections between those concepts/nodes in the CM graph that have attributes participating in element correspondences. A conceptual mapping candidate is now a pair of source and target subgraphs which are semantically similar. At the end, these are converted to expressions at the database level. We offer experimental results demonstrating that, for test cases of non-trivial mapping expressions involving schemas from a number of domains, the "semantic" approach outperforms the traditional technique in terms of recall and especially precision.
Published: 2007
Full Text: View/download PDF

22. Representing and Querying Data Transformations

Author: John Mylopoulos, Renée J. Miller, and Yannis Velegrakis
Subjects: Metadata, Information retrieval, Computer science, Schema (psychology), Information system, computer.software_genre, Data structure, Query language, External Data Representation, computer, Data integration
Abstract: Modern information systems often store data that has been transformed and integrated from a variety of sources. This integration may obscure the original source semantics of data items. For many tasks, it is important to be able to determine not only where data items originated, but also why they appear in the integration as they do and through what transformation they were derived. This problem is known as data provenance. In this work, we consider data provenance at the schema and mapping level. In particular, we consider how to answer questions such as "what schema elements in the source(s) contributed to this value", or "through what transformations or mappings was this value derived?" Towards this end, we elevate schemas and mappings to first-class citizens that are stored in a repository and are associated with the actual data values. An extended query language, called MXQL, is also developed that allows meta-data to be queried as regular data and we describe its implementation scenario.
Published: 2005
Full Text: View/download PDF

23. ToMAS: a system for adapting mappings while schemas evolve

Author: Lucian Popa, John Mylopoulos, Yannis Velegrakis, and Renée J. Miller
Subjects: Theoretical computer science, Computer science, Programming language, Data integrity, Schema (psychology), Interoperability, computer.software_genre, computer
Abstract: We demonstrate the Toronto Mapping Adaptation System (ToMAS), a tool for automatically detecting and adapting mappings that have become invalid or inconsistent due to changes in either data semantics or schemas. Due to its modular architecture and its stand-alone nature, ToMAS can easily be applied to numerous scenarios and can interoperate with many other tools. To the best of our knowledge, no other tool can correctly maintain the consistency of the mappings under schema changes at the level of complexity supported by ToMAS.
Published: 2004
Full Text: View/download PDF

24. Mapping Adaptation under Evolving Schemas

Author: Lucian Popa, Yannis Velegrakis, and Renée J. Miller
Subjects: Theoretical computer science, Computer science, Schema (psychology), Interoperability, Information system, XML schema, Data mining, computer.software_genre, computer, computer.programming_language
Abstract: To achieve interoperability, modern information systems and e-commerce applications use mappings to translate data from one representation to another. In dynamic environments like the Web, data sources may change not only their data but also their schemas, their semantics, and their query capabilities. Such changes must be reflected in the mappings. Mappings left inconsistent by a schema change have to be detected and updated. As large, complicated schemas become more prevalent, and as data is reused in more applications, manually maintaining mappings (even simple mappings like view definitions) is becoming impractical. We present a novel framework and a tool (ToMAS) for automatically adapting mappings as schemas evolve. Our approach considers not only local changes to a schema, but also changes that may affect and transform many components of a schema. We consider a comprehensive class of mappings for relational and XML schemas with choice types and (nested) constraints. Our algorithm detects mappings affected by a structural or constraint change and generates all the rewritings that are consistent with the semantics of the mapped schemas. Our approach explicitly models mapping choices made by a user and maintains these choices, whenever possible, as the schemas and mappings evolve. We describe an implementation of a mapping management and adaptation tool based on these ideas and compare it with a mapping generation tool.
Published: 2003
Full Text: View/download PDF

25. Data Exchange: Semantics and Query Answering

Author: Phokion G. Kolaitis, Renée J. Miller, Lucian Popa, and Ronald Fagin
Subjects: Source data, Theoretical computer science, computer.internet_protocol, Computer science, Algebraic specification, 0102 computer and information sciences, 02 engineering and technology, computer.software_genre, Data structure, 01 natural sciences, Schema (genetic algorithms), 010201 computation theory & mathematics, Data exchange, 020204 information systems, Schema (psychology), Algorithmics, 0202 electrical engineering, electronic engineering, information engineering, Data mining, computer, XML, Data integration, Electronic data interchange
Abstract: Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answering in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem. We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. A universal solution has no more and no less data than required for data exchange and it represents the entire space of possible solutions. We then identify fairly general, and practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution.
Published: 2002
Full Text: View/download PDF

26. Translating Web Data

Author: Ronald Fagin, Lucian Popa, Yannis Velegrakis, Mauricio A. Hernández, and Renée J. Miller
Subjects: Source data, Theoretical computer science, Computer science, computer.internet_protocol, Schema (psychology), InformationSystems_DATABASEMANAGEMENT, computer, XML
Abstract: We present a novel framework for mapping between any combination of XML and relational schemas, in which a high-level, user-specified mapping is translated into semantically meaningful queries that transform source data into the target representation. Our approach works in two phases. In the first phase, the high-level mapping, expressed as a set of inter-schema correspondences, is converted into a set of mappings that capture the design choices made in the source and target schemas (including their hierarchical organization as well as their nested referential constraints). The second phase translates these mappings into queries over the source schemas that produce data satisfying the constraints and structure of the target schema, and preserving the semantic relationships of the source. Nonnull target values may need to be invented in this process. The mapping algorithm is complete in that it produces all mappings that are consistent with the schema constraints. We have implemented the translation algorithm in Clio, a schema mapping tool, and present our experience using Clio on several real schemas.
Published: 2002
Full Text: View/download PDF

27. Clio

Author: Renée J. Miller, Mauricio A. Hernández, and Laura M. Haas
Subjects: SQL, Information retrieval, Computer science, computer.internet_protocol, WYSIWYG, Global information system, computer.software_genre, Data warehouse, Data mapping, Schema (psychology), XML schema, Data mining, Tuple, computer, XML, Software, computer.programming_language, Information Systems
Abstract: We consider the integration requirements of modern data intensive applications including data warehousing, global information systems and electronic commerce. At the heart of these requirements lies the schema mapping problem in which a source (legacy) database must be mapped into a different, but xed, target schema. The goal of schema mapping is the discovery of a query or set of queries to map source databases into the new structure. We demonstrate Clio, a new semi-automated tool for creating schema mappings. Clio employs a mapping-by-example paradigm that relies on the use of value correspondences describing how a value of a target attribute can be created from a set of values of source attributes. A typical session with Clio starts with the user loading a source and a target schema into the system. These schemas are read from either an underlying Object-Relational database or from an XML le with an associated XML Schema. Users can then draw value correspondences mapping source attributes into target attributes. Clio's mapping engine incrementally produces the SQL queries that realize the mappings implied by the correspondences. Clio provides schema and data browsers and other feedback to allow users to understand the mapping produced. Entering and manipulating value correspondences can be done in two modes. In the Schema View mode, users see a representation of the source and target schema and create value correspondences by selecting schema objects from the source and mapping them to a target attribute. The alternative Data View mode o ers a WYSIWYG interface for the mapping process that displays example data for both the source and target tables [3]. Users may add and delete value correspondences from this view and immediately see the changes re ected in the resulting target tuples. Also, the Data View mode helps users navigate through alternative mappings, understanding the often subtle di erences between them. For example, in some cases, changing a join from an inner join to an outer join may dramatically change the resulting table. In other cases, the same change may have no e ect due to constraints that hold on the source
Published: 2001
Full Text: View/download PDF

28. Mapping XML and relational schemas with Clio

Author: Lucian Popa, Mauricio A. Hernández, Howard Ho, Felix Naumann, Yannis Velegrakis, and Renée J. Miller
Subjects: Document Structure Description, Computer science, computer.internet_protocol, Relational database, Semi-structured model, education, Schema Matching, computer.software_genre, Schema matching, Logical schema, Information schema, Schema Mapping, XML Schema Editor, Data integrity, Schema (psychology), Information system, XML schema, computer.programming_language, Information retrieval, Schema migration, Database schema, InformationSystems_DATABASEMANAGEMENT, 004 Informatik, XML Schema (W3C), Document Schema Definition Languages, Data exchange, Star schema, Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, ddc:004, computer, XML, Data integration
Abstract: Merging and coalescing data from multiple and diverse sources into different data formats continues to be an important problem in modern information systems. Schema matching (the process of matching elements of a source schema with elements of a target schema) and schema mapping (the process of creating a query that maps between two disparate schemas) are at the heart of data integration systems. We demonstrate Clio, a semi-automatic schema mapping tool developed at the IBM Almaden Research Center. In this paper, we showcase Clio's mapping engine which allows mapping to and from relational and XML schemas, and takes advantage of data constraints in order to preserve data associations.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

28 results on '"Renée J. Miller"'

1. Open data integration

2. The Future of Data Integration

3. Discovering linkage points over web data

4. TRAMP

5. Gain Control over your Integration Evaluations

6. VizCurator

7. Data-driven understanding and refinement of schema mappings

8. Value invention in data exchange

9. Schema equivalence in heterogeneous systems: Bridging theory and practice

10. A unified model for data and constraint repair

11. Stream schema

12. (Not) yet another matcher

13. YAM: a Schema Matcher Factory

14. Schema AND Data: A Holistic Approach to Mapping, Resolution and Fusion in Information Integration

15. Clio: Schema Mapping Creation and Data Exchange

16. Muse

17. Muse: Mapping Understanding and deSign by Example

18. Guest editorial: special issue on metadata management

19. Leveraging data and structure in ontology integration

20. Creating Nested Mappings with Clio

21. A Semantic Approach to Discovering Schema Mapping Expressions

22. Representing and Querying Data Transformations

23. ToMAS: a system for adapting mappings while schemas evolve

24. Mapping Adaptation under Evolving Schemas

25. Data Exchange: Semantics and Query Answering

26. Translating Web Data

27. Clio

28. Mapping XML and relational schemas with Clio

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

28 results on '"Renée J. Miller"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources