Author: "Adriane Chapman" / Topic: computer science - Searchworks@Jio Institute Digital Library Search Results

1. Graph-Based Visual-Semantic Entanglement Network for Zero-Shot Image Recognition

Author: Wendy Hall, Yang Hu, Dan Dai, Pei Yang, Mingnan Luo, Yingxue Xu, Adriane Chapman, and Guihua Wen
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Computer Vision and Pattern Recognition (cs.CV), media_common.quotation_subject, Computer Science - Computer Vision and Pattern Recognition, Space (commercial competition), Convolutional neural network, Machine Learning (cs.LG), FOS: Electrical engineering, electronic engineering, information engineering, Media Technology, Electrical and Electronic Engineering, Representation (mathematics), media_common, business.industry, Image and Video Processing (eess.IV), Pattern recognition, Ambiguity, Electrical Engineering and Systems Science - Image and Video Processing, Computer Science Applications, Signal Processing, Graph (abstract data type), Embedding, Visual modeling, Artificial intelligence, business, Word (computer architecture)
Abstract: Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN feedback the graph modeled information to CNN features; 2. it uses attribute word vectors as the target for the graph semantic modeling of GCN, which forms a self-consistent regression for graph modeling and supervise GCN to learn more personalized attribute relations; 3. it fuses and supplements the hierarchical visual-semantic features refined by graph modeling into visual embedding. Our method outperforms state-of-the-art approaches on multiple representative ZSL datasets: AwA2, CUB, and SUN by promoting the semantic linkage modelling of visual features., 15 pages, 11 figures, on IEEE Transactions on Multimedia
Published: 2022

2. Quality assessment in crowdsourced classification tasks

Author: Elena Simperl, Eddy Maddalena, Adriane Chapman, and Qiong Bu
Subjects: Technology, quality assessment, Computer science, Process (engineering), media_common.quotation_subject, task-orientated crowdsourcing, Inference, Machine learning, computer.software_genre, Crowdsourcing, Domain (software engineering), Computer Science (miscellaneous), Citizen science, Decision Sciences (miscellaneous), Quality (business), media_common, business.industry, aggregation, Engineering (General). Civil engineering (General), Task (computing), Workflow, classification, Business, Management and Accounting (miscellaneous), Artificial intelligence, TA1-2040, business, computer
Abstract: Purpose Ensuring quality is one of the most significant challenges in microtask crowdsourcing tasks. Aggregation of the collected data from the crowd is one of the important steps to infer the correct answer, but the existing study seems to be limited to the single-step task. This study aims to look at multiple-step classification tasks and understand aggregation in such cases; hence, it is useful for assessing the classification quality. Design/methodology/approach The authors present a model to capture the information of the workflow, questions and answers for both single- and multiple-question classification tasks. They propose an adapted approach on top of the classic approach so that the model can handle tasks with several multiple-choice questions in general instead of a specific domain or any specific hierarchical classifications. They evaluate their approach with three representative tasks from existing citizen science projects in which they have the gold standard created by experts. Findings The results show that the approach can provide significant improvements to the overall classification accuracy. The authors’ analysis also demonstrates that all algorithms can achieve higher accuracy for the volunteer- versus paid-generated data sets for the same task. Furthermore, the authors observed interesting patterns in the relationship between the performance of different algorithms and workflow-specific factors including the number of steps and the number of available options in each step. Originality/value Due to the nature of crowdsourcing, aggregating the collected data is an important process to understand the quality of crowdsourcing results. Different inference algorithms have been studied for simple microtasks consisting of single questions with two or more answers. However, as classification tasks typically contain many questions, the proposed method can be applied to a wide range of tasks including both single- and multiple-question classification tasks.
Published: 2019

3. Putting AI ethics to work: are the tools fit for purpose?

Author: Jacqui Ayling and Adriane Chapman
Subjects: Knowledge management, Impact assessment, Unintended consequences, Emerging technologies, Computer science, business.industry, Mechanical Engineering, Best practice, Energy Engineering and Power Technology, Audit, Management Science and Operations Research, Transparency (behavior), Software deployment, Accountability, business
Abstract: Bias, unfairness and lack of transparency and accountability in Artificial Intelligence (AI) systems, and the potential for the misuse of predictive models for decision-making have raised concerns about the ethical impact and unintended consequences of new technologies for society across every sector where data-driven innovation is taking place. This paper reviews the landscape of suggested ethical frameworks with a focus on those which go beyond high-level statements of principles and offer practical tools for application of these principles in the production and deployment of systems. This work provides an assessment of these practical frameworks with the lens of known best practices for impact assessment and audit of technology. We review other historical uses of risk assessments and audits and create a typology that allows us to compare current AI ethics tools to Best Practices found in previous methodologies from technology, environment, privacy, finance and engineering. We analyse current AI ethics tools and their support for diverse stakeholders and components of the AI development and deployment lifecycle as well as the types of tools used to facilitate use. From this, we identify gaps in current AI ethics tools in auditing and risk assessment that should be considered going forward.
Published: 2021

4. Codesign to improve IAQ awareness in classrooms

Author: Adriane Chapman, Stephen Snow, and Bradley McLaughlin
Subjects: Architectural engineering, Government, Indoor air quality, Computer science, business.industry, Software deployment, law, Data management, Ventilation (architecture), business, law.invention
Abstract: The effective monitoring and management of indoor-sourced pollutants is vital, given that poor indoor air quality (IAQ) reduces the academic performance of school students and contributes to short- and long-term health effects. Despite this, 66% of classrooms are found to exceed UK government IAQ guidelines and there is not yet a requirement for real-time IAQ monitoring in UK classrooms.This study describes the design and deployment of a visual iPad display of classroom IAQ called Airlert. Findings inform a discussion of which visual aspects of IAQ feedback devices hold the best potential for empowering teachers to improve understanding of IAQ in their classroom, to employ healthier ventilation practices. Implications for the design of IAQ feedback devices and necessary data management considerations are discussed, along with suggestions for future research.
Published: 2021

5. Mapping Trusted Paths to VGI

Author: David Martin, Stefano Cavazzi, Bernard Roper, and Adriane Chapman
Subjects: Volunteered geographic information, Set (abstract data type), Information retrieval, Computer science, Data quality, User-generated content, Maturity (finance)
Abstract: We propose a novel method of assessing OpenStreetMap data using the concept of Data Maturity. Based on research into data quality and trust in user generated content, this is a set of measurements that can be derived from provenance data extracted from OpenStreetMap edit history.
Published: 2021

6. Identifying Food Fraud using Blockchain

Author: Nawfal F. Fadhel, Hoi Wen Leung, and Adriane Chapman
Subjects: Blockchain, Computer science, Food fraud, Computer security, computer.software_genre, computer
Published: 2021

7. The Right (Provenance) Hammer for the Job: A Comparison of Data Provenance Instrumentation

Author: Riccardo Torlone, Giulia Simonelli, Paolo Missier, Abhirami Sasikant, Adriane Chapman, Chapman, Adriane, Sasikant, Abhirami, Simonelli, Giulia, Missier, Paolo, and Torlone, Riccardo
Subjects: Pipeline transport, Provenance, law, Computer science, Hammer, Instrumentation (computer programming), Data science, Pipeline (software), law.invention
Abstract: As data science techniques are being applied to solve societal problems, understanding what is happening within the “pipeline” is essential for establishing trust and reproducibility of the results. Provenance captures information about what happened during design and execution in order to support reasoning for trust and reproducibility. However, how and where the information is captured as provenance within the data science pipelines changes how it can be utilized. In this work, we describe three different mechanisms to capture provenance in data science pipelines: human-based, tool-based, and script-based. By using an implementation of all techniques in a standard data science toolkit, we analyze the difference in provenance generated by these methods and how its use changes.
Published: 2020

8. Capturing and querying fine-grained provenance of preprocessing pipelines in data science

Author: Adriane Chapman, Paolo Missier, Giulia Simonelli, Riccardo Torlone, Chapman, A., Missier, P., Simonelli, G., and Torlone, R.
Subjects: Pipeline transport, Data processing, Provenance, Computer science, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, General Engineering, Preprocessor, 020201 artificial intelligence & image processing, 02 engineering and technology, Data mining, computer.software_genre, computer
Abstract: Data processing pipelines that are designed to clean, transform and alter data in preparation for learning predictive models, have an impact on those models' accuracy and performance, as well on other properties, such as model fairness. It is therefore important to provide developers with the means to gain an in-depth understanding of how the pipeline steps affect the data, from the raw input to training sets ready to be used for learning. While other efforts track creation and changes of pipelines of relational operators, in this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements within a dataset. Our contributions include: (i) the formal definition of a core set of preprocessing operators, and the definition of provenance patterns for each of them, and (ii) a prototype implementation of an application-level provenance capture library that works alongside Python. We report on provenance processing and storage overhead and scalability experiments, carried out over both real ML benchmark pipelines and over TCP-DI, and show how the resulting provenance can be used to answer a suite of provenance benchmark queries that underpin some of the developers' debugging questions, as expressed on the Data Science Stack Exchange.
Published: 2020

9. The Need for Machine-Processable Agreements in Health Data Management

Author: Mark J. Weal, Anneke Lucassen, George Konstantinidis, Lisa Marie Ballard, Ahmed Alzubaidi, and Adriane Chapman
Subjects: Knowledge management, lcsh:T55.4-60.8, Computer science, Privacy policy, data sharing, 02 engineering and technology, Ontology (information science), lcsh:QA75.5-76.95, Theoretical Computer Science, Domain (software engineering), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, privacy languages, lcsh:Industrial engineering. Management engineering, Semantic Web, Numerical Analysis, business.industry, privacy policies, Construct (python library), Service provider, Data sharing, Computational Mathematics, genomic medicine, Computational Theory and Mathematics, Order (business), health data management, 020201 artificial intelligence & image processing, consent, lcsh:Electronic computers. Computer science, genomic data, business
Abstract: Data processing agreements in health data management are laid out by organisations in monolithic &ldquo, Terms and Conditions&rdquo, documents written in natural legal language. These top-down policies usually protect the interest of the service providers, rather than the data owners. They are coarse-grained and do not allow for more than a few opt-in or opt-out options for individuals to express their consent on personal data processing, and these options often do not transfer to software as they were intended to. In this paper, we study the problem of health data sharing and we advocate the need for individuals to describe their personal contract of data usage in a formal, machine-processable language. We develop an application for sharing patient genomic information and test results, and use interactions with patients and clinicians in order to identify the particular peculiarities a privacy/policy/consent language should offer in this complicated domain. We present how Semantic Web technologies can have a central role in this approach by providing the formal tools and features required in such a language. We present our ongoing approach to construct an ontology-based framework and a policy language that allows patients and clinicians to express fine-grained consent, preferences or suggestions on sharing medical information. Our language offers unique features such as multi-party ownership of data or data sharing dependencies. We evaluate the landscape of policy languages from different areas, and show how they are lacking major requirements needed in health data management. In addition to enabling patients, our approach helps organisations increase technological capabilities, abide by legal requirements, and save resources.
Published: 2020
Full Text: View/download PDF

10. Dataset search: a survey

Author: Laura Koesten, Luis-Daniel Ibáñez, Paul Groth, Emilia Kacprzak, Adriane Chapman, Elena Simperl, George Konstantinidis, and Algorithmic Data Science (IVI, FNWI)
Subjects: FOS: Computer and information sciences, Thesaurus (information retrieval), Service (systems architecture), Computer science, Databases (cs.DB), 02 engineering and technology, Reuse, Data science, Field (computer science), Dataset retrieval, Data sharing, Information search and retrieval, Search engine, Open data, Computer Science - Databases, Hardware and Architecture, 020204 information systems, Dataset search, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, State (computer science), Information Systems, Dataset
Abstract: Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward., Comment: 20 pages, 153 references
Published: 2020

11. Guest Editorial

Author: James Cheney, Simon Miles, Adriane Chapman, Chapman, Adriane, Cheney, James, and Miles, Simon
Subjects: World Wide Web, Provenance, Computer Networks and Communications, Computer science
Published: 2017

12. Using the Provenance from Astronomical Workflows to Increase Processing Efficiency

Author: Michael A. C. Johnson, Carlos Sáenz-Adán, Luc Moreau, Poshak Gandhi, and Adriane Chapman
Subjects: Provenance, Computer science, Processing efficiency, 020207 software engineering, Image processing, 02 engineering and technology, Variation (game tree), computer.software_genre, 01 natural sciences, Pipeline (software), Image (mathematics), Workflow, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Use case, Data mining, 010303 astronomy & astrophysics, computer
Abstract: Astronomy is increasingly becoming a data-driven science as the community builds larger instruments which are capable of gathering more data than previously possible. As the sizes of the datasets increase, it becomes even more important to make the most efficient use of the computational resources available. In this work, we highlight how provenance can be used to increase the computational efficiency of astronomical workflows. We describe a provenance-enabled image processing pipeline and motivate the generation of provenance with two relevant use cases. The first use case investigates the origin of an optical variation and the second is concerned with the objects used to calibrate the image. The provenance was then queried in order to evaluate the relative computational efficiency of use case evaluation, with and without the use of provenance. We find that recording the provenance of the pipeline increases the original processing time by \(\sim \)45%. However, we find that when evaluating the two identified use cases, the inclusion of provenance improves the efficiency of processing by \(\sim \)99% and \(\sim \)96% for Use Cases 1 and 2, respectively. Furthermore, we combine these results with the probability that Use Cases 1 and 2 will need to be evaluated and find a net decrease in computational processing efficiency of 13–44% when incorporating provenance generation within the workflow. However, we deduce that provenance has the potential to produce a net increase in this efficiency if more uses cases are to be considered.
Published: 2018

13. A Graph Testing Framework for Provenance Network Analytics

Author: Bernard Roper, Adriane Chapman, Jeremy Morley, and David Martin
Subjects: Provenance, Information retrieval, Computer science, Analytics, business.industry, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, business, Semantic Web, Network analytics, Graph
Abstract: Provenance Network Analytics is a method of analyzing provenance that assesses a collection of provenance graphs by training a machine learning algorithm to make predictions about the characteristics of data artifacts based on their provenance graph metrics. The shape of a provenance graph can vary according the modelling approach chosen by data analysts, and this is likely to affect the accuracy of machine learning algorithms, so we propose a framework for capturing provenance using semantic web technologies to allow use of multiple provenance models at runtime in order to test their effects.
Published: 2018

14. Belief Propagation Through Provenance Graphs

Author: Mark J. Weal, Luc Moreau, Adriane Chapman, Belfrit Victor Batlajery, Balhajjame, K., Gehani, A., and Alper, P.
Subjects: Consumption (economics), Provenance, 030504 nursing, Computer science, digestive, oral, and skin physiology, Inference, Sample (statistics), 02 engineering and technology, Belief propagation, Due diligence, 03 medical and health sciences, Risk analysis (engineering), Order (exchange), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 0305 other medical science, Factor graph
Abstract: Provenance of food describes food, the processes in food transformation, and the food operators from the source to consumption; modelling the history food. In processing food, the risk of contamination increases if food is treated inappropriately. Therefore, identifying criti- cal processes and applying suitable prevention actions are necessary to measure the risk; known as due diligence. To achieve due diligence, food provenance can be used to analyse the risk of contamination in order to find the best place to sample food. Indeed, it supports building ra- tionale over food-related activities because it describes the details about food during its lifetime. However, many food risk models only rely on simulation with little notion of provenance of food. Incorporating the risk model with food provenance through our framework, prFrame, is our first contribution. prFrame uses Belief Propagation (BP) over the provenance graph for automatically measuring the risk of contamina- tion. As BP works efficiently in a factor graph, our next contribution is the conversion of the provenance graph into the factor graph. Finally, an evaluation of the accuracy of the inference by BP is our last contribution.
Published: 2018

15. prFood: Ontology principles for provenance and risk in the food domain

Author: Mark J. Weal, Luc Moreau, Belfrit Victor Batlajery, and Adriane Chapman
Subjects: 0404 agricultural biotechnology, Traceability, Risk analysis (engineering), Computer science, Supply chain, Domain knowledge, Use case, 04 agricultural and veterinary sciences, Ontology (information science), 040401 food science, Semantic Web, Food history, Domain (software engineering)
Abstract: An ontology, a formal representation of domain knowledge, is an integral building block of the semantic web and is important in building an application within interdisciplinary domains. In the food domain, regulations require Food Business Operators (FBOs) to comply with traceability and track-ability measures when handling food in order to assess risk. In this paper, we identify several requirements to model food, food history, and risk of contamination in order to support the safety regulations of food. Using those requirements, we also identify and apply several design principles for building an ontology called prFood that encompasses interdisciplinary domains, food and risk. In order to apply safe handling of food in the food supply chain, we integrate and incorporate the pre-existing food ontologies with prFood by implementing a mapping procedure. Finally, we validate our approach by answering several use cases derived from EU and/or UK Food Regulations.
Published: 2017

16. What do we do now? Workflows for an unpredictable world

Author: Barbara Blaustein, Adriane Chapman, M. David Allen, and Lisa Mak
Subjects: Government, Knowledge management, Computer Networks and Communications, Data stream mining, business.industry, Computer science, Business process, Schema matching, Data science, Pipeline (software), Workflow, Hardware and Architecture, Key (cryptography), business, Software, Agile software development
Abstract: Workflow systems permit organization of many individual subtasks into a cohesive whole, in order to accomplish a specific mission. For many government and business missions, these systems are used to manage repetitive processes, such as large data-processing and exploitation pipelines. Government missions with strong interactions with the real world are extremely dynamic, as are all missions dealing with error-prone or changing data streams. We contribute a vision for discovery of new steps in adaptive workflow systems, suitability functions that can discover candidate alternatives, and a way forward for sourcing options for decision-makers, without the strong assumptions required by previous work. As data-processing workflows are shared, the sharing entities may find that certain parts of the workflow must be adapted to the new environment of mission. Extremely dynamic environments call for capabilities that support agile operations and pipeline sharing by making it possible to choose relevant actions when a situation invalidates the assumptions of current execution. We adapt some work in schema matching towards this problem, citing key differences between the two sets of challenges.
Published: 2015

17. Fit for purpose: engineering principles for selecting an appropriate type of data exchange standard

Author: Adriane Chapman, Arnon Rosenthal, M. David Allen, Hongwei Zhu, and Len Seligman
Subjects: Knowledge management, Computer science, business.industry, media_common.quotation_subject, Interoperability, Decision rule, Semantic interoperability, Enterprise interoperability, Data sharing, Risk analysis (engineering), Data exchange, Cross-domain interoperability, business, Sophistication, Information Systems, media_common
Abstract: Data standards are a powerful, real-world tool for enterprise interoperability, yet there exists no well-grounded methodology for selecting among alternative standards approaches. We focus on a specific sub-problem within a community's data sharing challenge and identify four major standards-based approaches to that task. We present characteristics of a data sharing community that one should consider in selecting a standards approach--such as relative power, motivation level, and technical sophistication of different participants--and illustrate with real-world examples. These characteristics and other factors are then analyzed to develop decision rules for selecting among the four approaches. Independent of the data exchange problem, we suggest two general practices in choosing a standards approach: (1) vertical decomposition of interoperability issues, in order to define a narrow, formal, tractable problem, and (2) option-exclusion rules, as they are much simpler than stating optimal-choice rules.
Published: 2014

18. The challenge of 'quick and dirty' information quality

Author: Len Seligman, Adriane Chapman, and Arnon Rosenthal
Subjects: Information Systems and Management, business.industry, Computer science, Quick-and-dirty, Information quality, Relevance feedback, 02 engineering and technology, computer.software_genre, World Wide Web, Data sharing, 020204 information systems, Systems management, Human–computer information retrieval, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Relevance (information retrieval), business, computer, Quality assurance, Information Systems
Published: 2016

19. Understanding provenance black boxes

Author: Adriane Chapman and H. V. Jagadish
Subjects: Information Systems and Management, business.industry, Computer science, Usability, Data structure, Workflow engine, Data science, Workflow technology, World Wide Web, Workflow, Hardware and Architecture, Order (business), Drill down, business, Software, Workflow management system, Information Systems
Abstract: Current provenance stores associated with workflow management systems (WfMSs) capture enough coarse-grained information to describe which datasets were used and which processes were run. While this information is enough to rebuild a workflow run, it is not enough to facilitate user understanding. Because the data is manipulated via a series of black boxes, it is often impossible for a human to understand what happened to the data. In this work, we highlight the missing information that can assist user understanding. Unfortunately, provenance information is already very complex and difficult for a user to comprehend, which can be exacerbated by adding the extra information needed for deeper blackbox understanding. In order to alleviate this, we develop a model of provenance answers that follow a "roll up", "drill down" strategy. We evaluate these techniques to determine if users have better understanding of provenance information. We show how this information can be captured by workflow management systems, and that the structures and information needed for this model are a negligible addition to standard provenance stores. Finally, we implement these techniques in a real provenance system, and evaluate implementation feasibility.
Published: 2010

20. Engineering Choices for Open World Provenance

Author: M. David Allen, Adriane Chapman, and Barbara Blaustein
Subjects: Government, Provenance, Information sensitivity, Database, Work (electrical), Computer science, Control (management), Identity (object-oriented programming), Scalability testing, computer.software_genre, computer, Data science, Connectivity
Abstract: This work outlines engineering decisions required to support a provenance system in an open world where systems are not under any common control and use many different technologies. Real U.S. government applications have shown us the need for specialized identity techniques, flexible storage, scalability testing, protection of sensitive information, and customizable provenance queries. We analyze tradeoffs for approaches to each area, focusing more on maintaining graph connectivity and breadth of capture, rather than on fine-grained/detailed capture as in other works. We implement each technique in the PLUS system, test its real-time efficiency, and describe the results.
Published: 2015

21. TIMBER: A native XML database

Author: Laks V. S. Lakshmanan, Andrew Nierman, Stelios Paparizos, Yuqing Wu, Divesh Srivastava, Jignesh M. Patel, Adriane Chapman, Cong Yu, Shurug Al-Khalifa, Nuwee Wiwatwattana, and H. V. Jagadish
Subjects: Document Structure Description, XML Encryption, computer.internet_protocol, Relational database, Computer science, Efficient XML Interchange, XML Signature, Document management system, Query optimization, computer.software_genre, Simple API for XML, XML Schema Editor, Streaming XML, Binary XML, XML schema, computer.programming_language, Database, cXML, XML validation, computer.file_format, XML framework, XML database, Hardware and Architecture, computer, XML, Information Systems, XML Catalog
Abstract: This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and query optimization techniques have also been developed. We present performance numbers to support some of our design decisions. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.
Published: 2002

22. Fit for Purpose: Toward an Engineering Basis for Data Exchange Standards

Author: Adriane Chapman, Arnon Rosenthal, M. David Allen, and Len Seligman
Subjects: Data sharing, SIMPLE (military communications protocol), Data exchange, Computer science, Management science, media_common.quotation_subject, Existential quantification, Decision rule, Data science, Enterprise interoperability, Sophistication, Task (project management), media_common
Abstract: Data standards are a powerful, real-world tool for enterprise interoperability, yet there exists no rigorous methodology for selecting among alternative standards approaches. This paper is a first step toward creating a detailed engineering basis for choosing among standards approaches. We define a specific sub-problem within a community’s data sharing challenge, and focus on it in depth. We describe the major choices (kinds of standards) applied to that task, examining tradeoffs. We present characteristics of a data sharing community that one should consider in selecting a standards approach—such as relative power, motivation level, and technical sophistication of different participants—and illustrate with real-world examples. We then show that one can state simple decision rules (based on engineering experience) that system engineers without decades of data experience can apply. We also comment on the methodology used, extracting lessons (e.g., “negative rules are simpler”) that can be used in similar analyses on other issues.
Published: 2013

23. PLUS: A provenance manager for integrated information

Author: Len Seligman, Adriane Chapman, M. David Allen, and Barbara Blaustein
Subjects: Government, Provenance, Open world, business.industry, Computer science, Relational database, media_common.quotation_subject, Collective intelligence, Access control, computer.software_genre, World Wide Web, Quality (business), business, computer, Data integration, media_common
Abstract: It can be difficult to fully understand the result of integrating information from diverse sources. When all the information comes from a single organization, there is a collective knowledge about where it came from and whether it can be trusted. Unfortunately, once information from multiple organizations is integrated, there is no longer a shared knowledge of the data and its quality. It is often impossible to view and judge the information from a different organization; when errors occur, notification does not always reach all users of the data. We describe how a multi-organizational provenance store that collects provenance from heterogeneous systems addresses these problems. Unlike most provenance systems, we cope with an open world, where the data usage is not determined in advance and can take place across many systems and organizations.
Published: 2011

24. Surrogate Parenthood: Protected and Informative Graphs

Author: Arnon Rosenthal, Barbara Blaustein, Len Seligman, Adriane Chapman, and M. David Allen
Subjects: Social and Information Networks (cs.SI), FOS: Computer and information sciences, Physics - Physics and Society, Graph database, Theoretical computer science, Computer science, General Engineering, FOS: Physical sciences, Computer Science - Social and Information Networks, Physics and Society (physics.soc-ph), computer.software_genre, Graph, computer, Connectivity
Abstract: Many applications, including provenance and some analyses of social networks, require path-based queries over graph-structured data. When these graphs contain sensitive information, paths may be broken, resulting in uninformative query results. This paper presents innovative techniques that give users more informative graph query results; the techniques leverage a common industry practice of providing what we call surrogates: alternate, less sensitive versions of nodes and edges releasable to a broader community. We describe techniques for interposing surrogate nodes and edges to protect sensitive graph components, while maximizing graph connectivity and giving users as much information as possible. In this work, we formalize the problem of creating a protected account G' of a graph G. We provide a utility measure to compare the informativeness of alternate protected accounts and an opacity measure for protected accounts, which indicates the likelihood that an attacker can recreate the topology of the original graph from the protected account. We provide an algorithm to create a maximally useful protected account of a sensitive graph, and show through evaluation with the PLUS prototype that using surrogates and protected accounts adds value for the user, with no significant impact on the time required to generate results for graph queries., VLDB2011
Published: 2011

25. Provenance for Collaboration: Detecting Suspicious Behaviors and Assessing Trust in Information

Author: M. David Allen, Adriane Chapman, Barbara Blaustein, and Len Seligman
Subjects: Collaborative software, Computer science, business.industry, Insider threat, Trusted Computing, Computer security, computer.software_genre, business, computer, Complex problems
Abstract: Data collaborations allow users to draw upon diverse resources to solve complex problems. While collaborations enable a greater ability to manipulate data and services, they also create new security vulnerabilities. Collaboration participants need methods to detect suspicious behaviors (potentially caused by malicious insiders) and assess trust in information when it passes through many hands. In this work, we describe these challenges and introduce provenance as a way to solve them. We describe a provenance system, PLUS, and show how it can be used to assist in assessing trust and detecting suspicious behaviors. A preliminary study shows this to be a promising direction for future research.
Published: 2011

26. Capturing Provenance in the Wild

Author: M. David Allen, Barbara Blaustein, Len Seligman, and Adriane Chapman
Subjects: Provenance, Open source, Enterprise service bus, Database, Open world, Work (electrical), Computer science, Scalability, Closed world, Overhead (computing), computer.software_genre, computer
Abstract: All current provenance systems are “closed world” systems; provenance is collected within the confines of a well understood, pre-planned system. However, when users compose services from heterogeneous systems and organizations to form a new application, it is impossible to track the provenance in the new system using currently available work. In this work, we describe the ability to compose multiple provenance-unaware services in an “open world” system and still collect provenance information about their execution. Our approach is implemented using the PLUS provenance system and the open source MULE Enterprise Service Bus. Our evaluations show that this approach is scalable and has minimal overhead.
Published: 2010

27. Do You Know Where Your Data’s Been? – Tamper-Evident Database Provenance

Author: Jing Zhang, Kristen LeFevre, and Adriane Chapman
Subjects: Set (abstract data type), Provenance, Database, Language change, business.industry, Computer science, Data management, Threat model, Checksum, Overhead (computing), computer.software_genre, business, computer
Abstract: Database provenance chronicles the history of updates and modifications to data, and has received much attention due to its central role in scientific data management. However, the use of provenance information still requires a leap of faith. Without additional protections, provenance records are vulnerable to accidental corruption, and even malicious forgery, a problem that is most pronounced in the loosely-coupled multi-user environments often found in scientific research. This paper investigates the problem of providing integrity and tamper-detection for database provenance. We propose a checksum-based approach, which is well-suited to the unique characteristics of database provenance, including non-linear provenance objects and provenance associated with multiple fine granularities of data. We demonstrate that the proposed solution satisfies a set of desirable security properties, and that the additional time and space overhead incurred by the checksum approach is manageable, making the solution feasible in practice.
Published: 2009

28. Efficient provenance storage

Author: H. V. Jagadish, Prakash Ramanan, and Adriane Chapman
Subjects: Inheritance (object-oriented programming), Provenance, Workflow, Database, Computer science, Factor (programming language), computer.software_genre, computer, computer.programming_language
Abstract: As the world is increasingly networked and digitized, the data we store has more and more frequently been chopped, baked, diced and stewed. In consequence, there is an increasing need to store and manage provenance for each data item stored in a database, describing exactly where it came from, and what manipulations have been applied to it. Storage of the complete provenance of each data item can become prohibitively expensive. In this paper, we identify important properties of provenance that can be used to considerably reduce the amount of storage required.We identify three different techniques: a family of factorization processes and two methods based on inheritance, to decrease the amount of storage required for provenance. We have used the techniques described in this work to significantly reduce the provenance storage costs associated with constructing MiMI [22], a warehouse of data regarding protein interactions, as well as two provenance stores, Karma [31] and PReServ [20], produced through workflow execution. In these real provenance sets, we were able to reduce the size of the provenance by up to a factor of 20. Additionally, we show that this reduced store can be queried efficiently and further that incremental changes can be made inexpensively.
Published: 2008

29. Provenance and the Price of Identity

Author: Adriane Chapman and H. V. Jagadish
Subjects: World Wide Web, Set (abstract data type), Identification (information), Provenance, Workflow, Computer science, Data manipulation language, Identity (object-oriented programming), Space (commercial competition), Strengths and weaknesses
Abstract: As developers acknowledge that provenance is essential, more and more datasets are attempting to keep provenance records describing how they were created. Some of these datasets are constructed using workflows, others cobble together processes and applications to manipulate the data. While the provenance needs are the same, the inputs and set of processes used must be kept, the identity needs are very different. We outline several identification strategies that can be used for data manipulation outside of workflows. We evaluate these strategies in terms of time to create and store identity, and the space needed to keep this information. Additionally, we discuss the strengths and weaknesses of each strategy.
Published: 2008

30. Making database systems usable

Author: Aaron Elkiss, H. V. Jagadish, Cong Yu, Adriane Chapman, Yunyao Li, Magesh Jayapandian, and Arnab Nandi
Subjects: Database server, Physical data model, Database, Alias, Computer science, View, Data manipulation language, Database schema, computer.software_genre, Database design, Database testing, Data modeling, Data model, Schema (psychology), Entity–relationship model, Database theory, computer, Intelligent database, Database model, Data administration
Abstract: Database researchers have striven to improve the capability of a database in terms of both performance and functionality. We assert that the usability of a database is as important as its capability. In this paper, we study why database systems today are so difficult to use. We identify a set of five pain points and propose a research agenda to address these. In particular, we introduce a presentation data model and recommend direct data manipulation with a schema later approach. We also stress the importance of provenance and of consistency across presentation models.
Published: 2007

31. Provenance management in curated databases

Author: James Cheney, Adriane Chapman, and Peter Buneman
Subjects: Provenance, Database, Computer science, Overhead (computing), computer.software_genre, computer
Abstract: Curated databases in bioinformatics and other disciplines are the result of a great deal of manual annotation, correction and transfer of data from other sources. Provenance information concerning the creation, attribution, or version history of such data is crucial for assessing its integrity and scientific value. General purpose database systems provide little support for tracking provenance, especially when data moves among databases. This paper investigates general-purpose techniques for recording provenance for data that is copied among databases. We describe an approach in which we track the user's actions while browsing source databases and copying data into a curated database, in order to record the user's actions in a convenient, queryable form. We present an implementation of this technique and use it to evaluate the feasibility of database support for provenance management. Our experiments show that although the overhead of a naive approach is fairly high, it can be decreased to an acceptable level using simple optimizations.
Published: 2006

32. A provenance model for manually curated data

Author: James Cheney, Stijn Vansummeren, Adriane Chapman, and Peter Buneman
Subjects: World Wide Web, Annotation, Provenance, Focus (computing), Information retrieval, Workflow, Computer science, Process (engineering), Grid, Data modeling
Abstract: Many curated databases are constructed by scientists integrating various existing data sources "by hand", that is, by manually entering or copying data from other sources., Capturing provenance in such an environment is a challenging problem, requiring a good model of the process of curation. Existing models of provenance focus on queries/views in databases or computations on the Grid, not updates of databases or Web sites. In this paper we motivate and present a simple model of provenance for manually curated databases and discuss ongoing and future work.
Published: 2006

33. TIMBER

Author: H. V. Jagadish, Yuqing Wu, Laks V. S. Lakshmanan, Andrew Nierman, Divesh Srivastava, Cong Yu, Nuwee Wiwatwattana, Jignesh M. Patel, Adriane Chapman, Stelios Paparizos, and Shurug Al-Khalifa
Subjects: Document Structure Description, XML Encryption, Relational database, View, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, computer.software_genre, Query optimization, Simple API for XML, XML Schema Editor, Streaming XML, XML schema, Database model, computer.programming_language, Information retrieval, Database, XML validation, computer.file_format, XML framework, XML database, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, XML Catalog
Abstract: XML has become ubiquitous, and XML data has to be managed in databases. The current industry standard is to map XML data into relational tables and store this information in a relational database. Such mappings create both expressive power problems and performance problems.In the TIMBER [7] project we are exploring the issues involved in storing XML in native format. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.
Published: 2003

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

33 results on '"Adriane Chapman"'

1. Graph-Based Visual-Semantic Entanglement Network for Zero-Shot Image Recognition

2. Quality assessment in crowdsourced classification tasks

3. Putting AI ethics to work: are the tools fit for purpose?

4. Codesign to improve IAQ awareness in classrooms

5. Mapping Trusted Paths to VGI

6. Identifying Food Fraud using Blockchain

7. The Right (Provenance) Hammer for the Job: A Comparison of Data Provenance Instrumentation

8. Capturing and querying fine-grained provenance of preprocessing pipelines in data science

9. The Need for Machine-Processable Agreements in Health Data Management

10. Dataset search: a survey

11. Guest Editorial

12. Using the Provenance from Astronomical Workflows to Increase Processing Efficiency

13. A Graph Testing Framework for Provenance Network Analytics

14. Belief Propagation Through Provenance Graphs

15. prFood: Ontology principles for provenance and risk in the food domain

16. What do we do now? Workflows for an unpredictable world

17. Fit for purpose: engineering principles for selecting an appropriate type of data exchange standard

18. The challenge of 'quick and dirty' information quality

19. Understanding provenance black boxes

20. Engineering Choices for Open World Provenance

21. TIMBER: A native XML database

22. Fit for Purpose: Toward an Engineering Basis for Data Exchange Standards

23. PLUS: A provenance manager for integrated information

24. Surrogate Parenthood: Protected and Informative Graphs

25. Provenance for Collaboration: Detecting Suspicious Behaviors and Assessing Trust in Information

26. Capturing Provenance in the Wild

27. Do You Know Where Your Data’s Been? – Tamper-Evident Database Provenance

28. Efficient provenance storage

29. Provenance and the Price of Identity

30. Making database systems usable

31. Provenance management in curated databases

32. A provenance model for manually curated data

33. TIMBER

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

33 results on '"Adriane Chapman"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources