78 results on '"Lipyeow Lim"'
Search Results
52. Expanding CNN-Based Plant Phenotyping Systems to Larger Environments
- Author
-
Kyungim Baek, Lipyeow Lim, and Jonas Krause
- Subjects
Scope (project management) ,Computer science ,Process (engineering) ,business.industry ,Plant phenotyping ,Machine learning ,computer.software_genre ,Convolutional neural network ,Categorization ,Similarity (psychology) ,Key (cryptography) ,Artificial intelligence ,business ,computer - Abstract
Plant phenotyping systems strive to maintain high categorization accuracy when expanding their scopes to larger environments. In this paper, we discuss problems associated with expanding the plant categorization scope. These problems are particularly complicated due to the increase in the number of species and the inter-species similarity. In our approach, we modify previously trained Convolutional Neural Networks (CNNs) and integrate domain-specific knowledge in the fine-tuning process of these models to maintain high accuracy while expanding the scope. This process is the key idea behind our CNN-based expanding approach resulting in plant-expert models. Experiments described in this paper compare the accuracy of an expanded phenotyping system using different plant-related datasets during the training of the CNN categorization models. Although it takes much longer to train these models, our approach achieves better performance compared to models trained without the integration of domain-specific knowledge, especially when the number of species increases significantly.
- Published
- 2020
- Full Text
- View/download PDF
53. Cloud-based query evaluation for energy-efficient mobile sensing
- Author
-
Archan Misra, Sougata Sen, Lipyeow Lim, Youngki Lee, Rajesh Krishna Balan, and Tianli Mo
- Subjects
Power management ,Exploit ,Association rule learning ,Computer Networks and Communications ,Computer science ,media_common.quotation_subject ,Real-time computing ,Fidelity ,Cloud computing ,02 engineering and technology ,Query optimization ,computer.software_genre ,Correlation ,Query expansion ,Phone ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,media_common ,020203 distributed computing ,business.industry ,Probabilistic logic ,Computer Science Applications ,Hardware and Architecture ,Sargable ,Data mining ,Mobile sensing ,business ,computer ,Software ,Information Systems ,Efficient energy use - Abstract
In this paper, we reduce the energy overheads of continuous mobile sensing, specifically for the case of context-aware applications that are interested in collective context or events , i.e., events expressed as a set of complex predicates over sensor data from multiple smartphones. We propose a cloud-based query management and optimization framework, called CloQue , that can support thousands of such concurrent queries, executing over a large number of individual smartphones. Our central insight is that the context of different individuals & groups often have significant correlation, and that this correlation can be learned through standard association rule mining on historical data. CloQue’s exploits such correlation to reduce energy overheads via two key innovations: (i) dynamically reordering the order of predicate processing to preferentially select predicates with not just lower sensing cost and higher selectivity, but that maximally reduce the uncertainty about other context predicates; and (ii) intelligently propagating the query evaluation results to dynamically update the confidence values of other correlated context predicates. We present techniques for probabilistic processing of context queries (to save significant energy at the cost of a query fidelity loss) and for query partitioning (to scale CloQue to a large number of users while meeting latency bounds). An evaluation, using real cellphone traces from two different datasets, shows significant energy savings (between 30% and 50% compared with traditional short-circuit systems) with little loss in accuracy (5% at most). In addition, we utilize parallel evaluation to reduce overall latency. The experiments show our approaches save up to 70% latency.
- Published
- 2017
- Full Text
- View/download PDF
54. A Guided Multi-Scale Categorization of Plant Species in Natural Images
- Author
-
Kyungim Baek, Jonas Krause, and Lipyeow Lim
- Subjects
Similarity (geometry) ,Computer science ,business.industry ,Deep learning ,Feature extraction ,Pattern recognition ,Image segmentation ,Convolutional neural network ,Identification (information) ,Categorization ,Plant species ,Artificial intelligence ,business ,Scale (map) - Abstract
Automatic categorization of plant species in natural images is an important computer vision problem with numerous applications in agriculture and botany. The problem is particularly challenging due to the large number of plant species, the inter-species similarity, the large scale variations in natural images, and the lack of annotated data. In this paper, we present a guided multi-scale approach that segments the regions of interest (containing a plant) from a complex background of the natural image and systematically extracts scale-representative patches based on those regions. These multi-scale patches are used to train state-of-the-art Convolutional Neural Network (CNN) models that analyze a given plant image and determine its species. Focusing specifically on the identification of plant species in natural images, we show that the proposed approach is a very effective way of making deep learning models more robust to scale variations. We perform a comprehensive experimental evaluation of our proposed method over several CNN models. Our best result on the Inception-ResNet-v2 model achieves a top-1 classification accuracy of 89.21% for 100 plant species which represents a 5.4% increase over using random cropping to generate training data.
- Published
- 2019
- Full Text
- View/download PDF
55. Parallelizing String Similarity Join Algorithms
- Author
-
Lipyeow Lim and Ling-Chih Yao
- Subjects
Speedup ,Computer science ,business.industry ,Big data ,Joins ,020207 software engineering ,02 engineering and technology ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Join (sigma algebra) ,String metric ,business ,Algorithm - Abstract
A key operation in data cleaning and integration is the use of string similarity join (SSJ) algorithms to identify and remove duplicates or similar records within data sets. With the advent of big data, a natural question is how to parallelize SSJ algorithms. There is a large body of existing work on SSJ algorithms and parallelizing each one of them may not be the most feasible solution. In this paper, we propose a parallelization framework for string similarity joins that utilizes existing SSJ algorithms. Our framework partitions the data using a variety of partitioning strategies and then executes the SSJ algorithms on the partitions in parallel. Some of the partitioning strategies that we investigate trade accuracy for speed. We implemented and validated our framework on several SSJ algorithms and data sets. Our experiments show that our framework results in significant speedup with little loss in accuracy.
- Published
- 2018
- Full Text
- View/download PDF
56. Efficient index compression in DB2 LUW
- Author
-
Timothy R. Malkemus, Sherman Lau, Zoltan Toth, Cathy Mcarthur, George A. Mihaila, Lipyeow Lim, Bishwaranjan Bhattacharjee, Reza Sherkat, and Kenneth A. Ross
- Subjects
Unix ,Index (economics) ,Database ,business.industry ,Computer science ,Transaction processing ,Computer data storage ,General Engineering ,Workload ,business ,computer.software_genre ,computer ,Data warehouse - Abstract
In database systems, the cost of data storage and retrieval are important components of the total cost and response time of the system. A popular mechanism to reduce the storage footprint is by compressing the data residing in tables and indexes. Compressing indexes efficiently, while maintaining response time requirements, is known to be challenging. This is especially true when designing for a workload spectrum covering both data warehousing and transaction processing environments. DB2 Linux, UNIX, Windows (LUW) recently introduced index compression for use in both environments. This uses techniques that are able to compress index data efficiently while incurring virtually no performance penalty for query processing. On the contrary, for certain operations, the performance is actually better. In this paper, we detail the design of index compression in DB2 LUW and discuss the challenges that were encountered in meeting the design goals. We also demonstrate its effectiveness by showing performance results on typical customer scenarios.
- Published
- 2009
- Full Text
- View/download PDF
57. Cost-Optimal Execution of Boolean Query Trees with Shared Streams
- Author
-
Yves Robert, Dounia Zaidouni, Henri Casanova, Lipyeow Lim, Frédéric Vivien, Information and Computer Sciences [Hawaii] (ICS), University of Hawai‘i [Mānoa] (UHM), Laboratoire de l'Informatique du Parallélisme (LIP), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS), Associate-team ALOHA, ANR-10-BLAN-0301,RESCUE,Résilience des applications scientifiques sur machines exascales(2010), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), and Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Data stream ,Theoretical computer science ,Computer science ,Heuristic ,Data stream mining ,0102 computer and information sciences ,02 engineering and technology ,Query optimization ,01 natural sciences ,Tree traversal ,Query expansion ,010201 computation theory & mathematics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Greedy algorithm ,Heuristics ,Algorithm ,Boolean conjunctive query - Abstract
International audience; The processing of queries expressed as trees of boolean operators applied to predicates on sensor data streams has several applications in mobile computing. Sensor data must be retrieved from the sensors, which incurs a cost, e.g., an energy expense that depletes the battery of a mobile query processing device. The objective is to determine the order in which predicates should be evaluated so as to shortcut part of the query evaluation and minimize the expected cost. This problem has been studied assuming that each data stream occurs at a single predicate. In this work we remove this assumption since it does not necessarily hold for real-world queries. Our main results are an optimal algorithm for single-level trees and a proof of NP-completeness for DNF trees. For DNF trees, however, we show that there is an optimal predicate evaluation order that corresponds to a depth-first traversal. This result provides inspiration for a class of heuristics. We show that one of these heuristics largely outperforms other sensible heuristics, including a heuristic proposed in previous work.
- Published
- 2014
- Full Text
- View/download PDF
58. Web-Age Information Management : 13th International Conference, WAIM 2012, Harbin, China, August 18-20, 2012. Proceedings
- Author
-
Hong Gao, Lipyeow Lim, Wei Wang, Chuan Li, Lei Chen, Hong Gao, Lipyeow Lim, Wei Wang, Chuan Li, and Lei Chen
- Subjects
- Database management, Data mining, Data protection, Computer networks, Information storage and retrieval systems, Application software
- Abstract
This book constitutes the refereed proceedings of the 13th International Conference on Web-Age Information Management, WAIM 2012, held in Harbin, China in August 2012. The 32 revised full papers presented together with 10 short papers and three keynotes were carefully reviewed and selected from a total of 178 submissions. The papers are organized in topical sections on wireless sensor networks; data warehousing and data mining; query processing; spatial databases; similarity search and queries; XML and Web data; graph and uncertain data; distributed computing; data security and management; information extraction and integration; and social networks and modern Web services.
- Published
- 2012
59. Energy-Efficient Collaborative Query Processing Framework for Mobile Sensing Services
- Author
-
Archan Misra, Tianli Mo, Lipyeow Lim, Jin Yang, and Kai-Uwe Sattler
- Subjects
business.industry ,Data stream mining ,Computer science ,Distributed computing ,Mobile computing ,Query optimization ,law.invention ,Bluetooth ,law ,Server ,Mobile telephony ,business ,Mobile device ,Dissemination ,Computer network - Abstract
Many emerging context-aware mobile applications involve the execution of continuous queries over sensor data streams generated by a variety of on-board sensors on multiple personal mobile devices (aka smartphones). To reduce the energy-overheads of such large-scale, continuous mobile sensing and query processing, this paper introduces CQP, a collaborative query processing framework that exploits the overlap (in both the sensor sources and the query predicates) across multiple smartphones. The framework automatically identifies the shareable parts of multiple executing queries, and then reduces the overheads of repetitive execution and data transmissions, by having a set of `leader' mobile nodes execute and disseminate these shareable partial results. To further reduce energy, CQP utilizes lower-energy short-range wireless links (such as Bluetooth) to disseminate such results directly among proximate smartphones. We describe algorithms to support our server-assisted distributed query sharing and optimization strategy. Simulation experiments indicate that this approach can result in 60% reduction in the energy overhead of continuous query processing; when `leader' selection is dynamically rotated to equitably share the burden, we observe an increase of up to 65% in operational lifetime.
- Published
- 2013
- Full Text
- View/download PDF
60. Query-aware compression of join results
- Author
-
Christian A. Lang, Christopher M. Mullins, and Lipyeow Lim
- Subjects
Query plan ,Query expansion ,Web search query ,Database ,Web query classification ,View ,Computer science ,Data_CODINGANDINFORMATIONTHEORY ,computer.software_genre ,Query language ,Query optimization ,computer ,Data compression - Abstract
Client-server database query processing has become an important paradigm in many data processing applications today. In cloud-based data services, for example, queries over structured data are sent to cloud-based servers for processing and the results relayed back to the client devices. Network bandwidth between client devices and cloud-based servers is often a limited resource and the use of data compression to reduce the amount of query result data transmitted would not only conserve bandwidth but also help with battery lifetime in the case of mobile client devices. For query result compression, current data compression methods do not exploit redundancy information that can be inferred from the query structure itself for greater compression. In this paper we propose a novel query-aware compression method for compressing query results sent from database servers to client applications. Our method is based on two key ideas. We exploit redundancy information obtained from the query plan and possibly from the database schema to achieve better compression than standard non-query aware compressors. We use a collection of memory-limited dictionaries to encode attribute values in a lightweight and efficient manner. Each dictionary in the collection of dictionaries are also dynamically resized to adapt to changing temporal access characteristics. We evaluated our method empirically using the TPC-H benchmark show that this technique is effective especially when used in conjunction with standard compressors. Our results show that compression ratios of up to twice that of gzip are possible.
- Published
- 2013
- Full Text
- View/download PDF
61. The case for cloud-enabled mobile sensing services
- Author
-
Archan Misra, Lipyeow Lim, Sougata Sen, and Rajesh Krishna Balan
- Subjects
business.industry ,Computer science ,Mobile computing ,Cloud computing ,Mobile Web ,Intelligent Network ,Mobile station ,Mobile database ,Mobile search ,Mobile technology ,GSM services ,business ,Mobile device ,Computer network - Abstract
We make the case for cloud-enabled mobile sensing services that support an emerging application class, one which infers near-real time collective context using sensor data obtained continuously from a large set of consumer mobile devices. We present the high-level architecture and functional requirements for such a mobile sensing service, and argue that such a service can significantly improve the scalability and energy-efficiency of large-scale mobile sensing by coordinating the sensing & processing tasks across multiple devices. We then focus specifically on the problem of energy-efficiency and provide early exemplars of how optimizing query execution jointly over multiple phones can lead to substantial energy savings.
- Published
- 2012
- Full Text
- View/download PDF
62. Semantic Link Discovery over Relational Data
- Author
-
Renée J. Miller, Lipyeow Lim, Anastasios Kementsietsidis, Min Wang, and Oktie Hassanzadeh
- Subjects
Information retrieval ,Computer science ,business.industry ,Relational database ,Semantic search ,computer.file_format ,Ontology (information science) ,Semantics ,Semantic computing ,Semantic Web Stack ,RDF ,business ,Semantic Web ,computer - Abstract
To make semantic search a reality, we need to be able to efficiently publish large data sets containing rich semantic structure. We have tools for translating relational and semi-structured data into RDF, but such translation tools do not have the goal of adding or providing the kind of semantics necessary to achieve the goals of the Semantic Web and semantic search over the Web. In this chapter, we present LinQuer, a tool for creating semantic links within a data source and between data sources. We focus on link discovery over structured (relational) data since many Semantic Web sources are the result of publishing relational data as RDF and since relational engines provide the scalability and flexibility we need for large scale link discovery. The LinQuer framework is based on the declarative specification of linkage requirements by a user. We present algorithms for translating these requirements to queries that can run over relational data sources, potentially using semantic information (such as a class hierarchy or a more general ontology) to enhance the recall of the link discovery. We show that this framework is flexible enough to permit linking real data, including dirty data (which is commonly found on the Web) and data with a variety of semantic connections.
- Published
- 2012
- Full Text
- View/download PDF
63. Optimizing Sensor Data Acquisition for Energy-Efficient Smartphone-Based Continuous Event Processing
- Author
-
Lipyeow Lim and Archan Misra
- Subjects
Data acquisition ,Asynchronous communication ,Computer science ,Data stream mining ,Real-time computing ,Mobile computing ,Complex event processing ,Overhead (computing) ,Wireless sensor network ,Mobile device - Abstract
Many pervasive applications, such as activity recognition or remote wellness monitoring, utilize a personal mobile device (aka smart phone) to perform continuous processing of data streams acquired from locally-connected, wearable, sensors. To ensure the continuous operation of such applications on a battery-limited mobile device, it is essential to dramatically reduce the energy overhead associated with the process of sensor data acquisition and processing. To achieve this goal, this paper introduces a technique of âacquisition-cost' aware continuous query processing, as part of the Acquisition Cost-Aware Query Adaptation (ACQUA) framework. ACQUA replaces the current paradigm, where the data is typically streamed (pushed) from the sensors to the smart phone, with a pull-based asynchronous model, where the phone retrieves appropriate blocks of sensor data from individual sensors, only when the stream elements are judged to be relevant to the query being processed. We describe algorithms that dynamically optimize the sequence (for complex stream queries with conjunctive and disjunctive predicates) in which such sensor data streams are retrieved by the phone, based on a combination of the communication cost and selectivity properties of individual sensor streams. Simulation experiments indicate that this approach can result in 70% reduction in the energy overhead of continuous query processing, without affecting the fidelity of the processing logic.
- Published
- 2011
- Full Text
- View/download PDF
64. Optimizing Access across Multiple Hierarchies in Data Warehouses
- Author
-
Bishwaranjan Bhattacharjee and Lipyeow Lim
- Subjects
Database ,Computer science ,Resource allocation ,Dimension (data warehouse) ,computer.software_genre ,computer ,Enterprise data management ,Bottleneck ,Data warehouse - Abstract
In enterprise data warehouses, different users in different business units often define their own application specific dimension hierarchies tailor made to their reporting and business performance monitoring needs. Due to resource constraints, only on a small number of these hierarchies are precomputed for performance optimization. Consequently aggregations over hierarchies without precomputations are often less responsive. We report on a performance problem in a very large banking enterprise where the large number of application specific hierarchies became a performance bottleneck. This paper proposes a novel solution for optimizing the performance of data warehouses with a large number of application specific hierarchies. We exploit the observation that dimension hierarchies in real data warehouses often contain significant overlaps. Our method detects common sub-structures among hierarchies and provides a rewriting algorithm to exploit any precomputations on these shared sub-structures. Our solution is applicable to data warehouses of large enterprises with a large number of business units and hence a large number of application specific hierarchies.
- Published
- 2011
- Full Text
- View/download PDF
65. Challenges on Modeling Hybrid XML-Relational Databases
- Author
-
Mirella M. Moro, Lipyeow Lim, and Yuan-Chi Chang
- Subjects
Information retrieval ,computer.internet_protocol ,Relational database ,Computer science ,Redundancy (engineering) ,Business artifacts ,computer ,XML - Abstract
It is well known that XML has been widely adopted for its flexible and self-describing nature. However, relational data will continue to co-exist with XML for several different reasons one of which is the high cost of transferring everything to XML. In this context, data designers face the problem of modeling both relational and XML data within an integrated environment. This chapter highlights important questions on hybrid XML-relational database design and discusses use cases, requirements, and deficiencies in existing design methodologies especially in the light of data and schema evolution. The authors’ analysis results in several design guidelines and a series of challenges to be addressed by future research.
- Published
- 2010
- Full Text
- View/download PDF
66. Statistics-based parallelization of XPath queries in shared memory systems
- Author
-
Bryant Wei Lun Kok, Anastasios Kementsietsidis, Lipyeow Lim, and Rajesh Bordawekar
- Subjects
Theoretical computer science ,Exploit ,computer.internet_protocol ,Computer science ,Parallel computing ,computer.software_genre ,Automatic parallelization ,XML database ,Simple API for XML ,Shared memory ,Statistics ,Latency (engineering) ,computer ,XML ,XPath - Abstract
The wide availability of commodity multi-core systems presents an opportunity to address the latency issues that have plaqued XML query processing. However, simply executing multiple XML queries over multiple cores merely addresses the throughput issue: intra-query parallelization is needed to exploit multiple processing cores for better latency. Toward this effort, this paper investigates the parallelization of individual XPath queries over shared-address space multi-core processors. Much previous work on parallelizing XPath in a distributed setting failed to exploit the shared memory parallelism of multi-core systems. We propose a novel, end-to-end parallelization framework that determines the optimal way of parallelizing an XML query. This decision is based on a statistics-based approach that relies both on the query specifics and the data statistics. At each stage of the parallelization process, we evaluate three alternative approaches, namely, data-, query-, and hybrid-partitioning. For a given XPath query, our parallelization algorithm uses XML statistics to estimate the relative efficiencies of these different alternatives and find an optimal parallel XPath processing plan. Our experiments using well-known XML documents validate our parallel cost model and optimization framework, and demonstrate that it is possible to accelerate XPath processing using commodity multi-core systems.
- Published
- 2010
- Full Text
- View/download PDF
67. A framework for semantic link discovery over relational data
- Author
-
Min Wang, Anastasios Kementsietsidis, Renée J. Miller, Lipyeow Lim, and Oktie Hassanzadeh
- Subjects
Information retrieval ,Computer science ,business.industry ,Relational database ,Data management ,Linked data ,computer.file_format ,computer.software_genre ,Data modeling ,Data mapping ,Open data ,Data model ,Relational database management system ,Information system ,Semi-structured data ,RDF ,business ,computer - Abstract
Discovering links between different data items in a single data source or across different data sources is a challenging problem faced by many information systems today. In particular, the recent Linking Open Data (LOD) community project has highlighted the paramount importance of establishing semantic links among web data sources. Currently, LOD sources provide billions of RDF triples, but only millions of links between data sources. Many of these data sources are published using tools that operate over relational data stored in a standard RDBMS. In this paper, we present a framework for discovery of semantic links from relational data. Our framework is based on declarative specification of linkage requirements by a user. We illustrate the use of our framework using several link discovery algorithms on a real world scenario. Our framework allows data publishers to easily find and publish high-quality links to other data sources, and therefore could significantly enhance the value of the data in the next generation of web.
- Published
- 2009
- Full Text
- View/download PDF
68. Semantic queries in databases
- Author
-
Haixun Wang, Lipyeow Lim, and Min Wang
- Subjects
SQL ,View ,Relational database ,Computer science ,Ontology (information science) ,computer.software_genre ,Query language ,Query optimization ,Search-oriented architecture ,Web query classification ,Entity–relationship model ,Upper ontology ,Query by Example ,computer.programming_language ,Information retrieval ,Graph database ,Database ,Ontology-based data integration ,InformationSystems_DATABASEMANAGEMENT ,Spatial query ,Relational calculus ,Ontology ,Relational model ,Sargable ,Conjunctive query ,computer ,Boolean conjunctive query ,RDF query language - Abstract
Supporting semantic queries in relational databases is essential to many advanced applications. Recently, with the increasing use of ontology in various applications, the need for querying relational data together with its related ontology has become more urgent. In this paper, we identify and discuss the problem of querying relational data with its ontologies. Two fundamental challenges make the problem interesting. First, it is extremely difficult to express queries against graph structured ontology in the relational query language SQL, and second, in many cases where data and its related ontology are complicated, queries are usually not precise, that is, users often have only a vague notion, rather than a clear understanding and definition, of what they query for. We outline a query-by-example approach that enables us to support semantic queries in relational databases with ease. Instead of endeavoring to incorporate ontology into relational form and create new language constructs to express such queries, we ask the user to provide a small number of examples that satisfy the query she has in mind. Using these examples as seeds, the system infers the exact query automatically, and the user is therefore shielded from the complexity of interfacing with the ontology.
- Published
- 2009
- Full Text
- View/download PDF
69. A declarative framework for semantic link discovery over relational data
- Author
-
Min Wang, Anastasios Kementsietsidis, Lipyeow Lim, and Oktie Hassanzadeh
- Subjects
World Wide Web ,Matching (statistics) ,law ,Computer science ,Relational database ,Semantic memory ,Linkage (mechanical) ,Semantic Web Stack ,Linked data ,Semantic data model ,Semantic Web ,Record linkage ,law.invention - Abstract
In this paper, we present a framework for online discovery of semantic links from relational data. Our framework is based on declarative specification of the linkage requirements by the user, that allows matching data items in many real-world scenarios. These requirements are translated to queries that can run over the relational data source, potentially using the semantic knowledge to enhance the accuracy of link discovery. Our framework lets data publishers to easily find and publish high-quality links to other data sources, and therefore could significantly enhance the value of the data in the next generation of web.
- Published
- 2009
- Full Text
- View/download PDF
70. Parallelization of XPath queries using multi-core processors
- Author
-
Lipyeow Lim, Oded Shmueli, and Rajesh Bordawekar
- Subjects
Multi-core processor ,computer.internet_protocol ,XPath 2.0 ,Computer science ,Programming language ,InformationSystems_DATABASEMANAGEMENT ,Parallel computing ,computer.software_genre ,Path expression ,Data modeling ,Set (abstract data type) ,XML Schema (W3C) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,XML ,XPath - Abstract
In this study, we present experiences of parallelizing XPath queries using the Xalan XPath engine on shared-address space multi-core systems. For our evaluation, we consider a scenario where an XPath processor uses multiple threads to concurrently navigate and execute individual XPath queries on a shared XML document. Given the constraints of the XML execution and data models, we propose three strategies for parallelizing individual XPath queries: Data partitioning, Query partitioning, and Hybrid (query and data) partitioning. We experimentally evaluated these strategies on an x86 Linux multi-core system using a set of XPath queries, invoked on a variety of XML documents using the Xalan XPath APIs. Experimental results demonstrate that the proposed parallelization strategies work very effectively in practice; for a majority of XPath queries under evaluation, the execution performance scaled linearly as the number of threads was increased. Results also revealed the pros and cons of the different parallelization strategies for different XPath query patterns.
- Published
- 2009
- Full Text
- View/download PDF
71. Modeling and Querying E-Commerce Data in Hybrid Relational-XML DBMSs
- Author
-
Min Wang, Haixun Wang, and Lipyeow Lim
- Subjects
Document Structure Description ,XML Encryption ,Database ,Computer science ,Efficient XML Interchange ,XML validation ,computer.file_format ,computer.software_genre ,XML framework ,XML database ,XML Schema Editor ,Streaming XML ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer - Abstract
Data in many industrial application systems are often neither completely structured nor unstructured. Consequently semi-structured data models such as XML have become popular as a lowest common denominator to manage such data. The problem is that although XML is adequate to represent the flexible portion of the data, it fails to exploit the highly structured portion of the data. XML normalization theory could be used to factor out the structured portion of the data at the schema level, however, queries written against the original schema no longer run on the normalized XML data. In this paper, we propose a new approach called eXtricate that stores XML documents in a space-efficient decomposed way while supporting efficient processing on the original queries. Our method exploits the fact that considerable amount of information is shared among similar XML documents, and by regarding each document as consisting of a shared framework and a small diff script, we can leverage the strengths of both the relational and XML data models at the same time to handle such data effectively. We prototyped our approach on top of DB2 9 pureXML (a commercial hybrid relational-XML DBMS). Our experiments validate the amount of redundancy in real e-catalog data and show the effectiveness of our method.
- Published
- 2008
- Full Text
- View/download PDF
72. Persisting and querying biometric event streams with hybrid relational-XML DBMS
- Author
-
Lipyeow Lim, Kyu Hyun Kim, Min Wang, and Daby Sow
- Subjects
Biometrics ,Database ,Relational database ,business.industry ,computer.internet_protocol ,Emerging technologies ,Computer science ,Complex event processing ,computer.software_genre ,Stream processing ,Health care ,Leverage (statistics) ,business ,computer ,XML - Abstract
Remote monitoring of patients' biometric data streams offers the possibility to physicians to extend and improve their services to chronically ill patients who are away from medical institutions. This emerging technology is a promising way to address important aspects of the cost issues that most health care systems are experiencing. In order to fulfill its potential, several challenges need to be overcome. First, the data collected needs to be filtered and annotated intelligently to help physicians cope with and navigate the large amount of patient sensor data received as a result of large scale remote health monitoring deployments. Secondly, efficient stream persistence and query mechanisms for these data need to be designed to satisfy health care regulations and help physicians track patient health histories accurately and efficiently. In this paper, we concentrate on the second challenge. We leverage emerging hybrid relational-XML database management systems to design a storage sub-system for remote health monitoring. We evaluate this approach by performing series of performance tests to assess the ability of the proposed system to handle the huge amount of biometric data streams requiring persistence.
- Published
- 2007
- Full Text
- View/download PDF
73. Schema advisor for hybrid relational-XML DBMS
- Author
-
Lipyeow Lim, Mirella M. Moro, and Yuan-Chi Chang
- Subjects
Document Structure Description ,XML Encryption ,Relational database ,computer.internet_protocol ,Computer science ,Semi-structured model ,Efficient XML Interchange ,XML Signature ,computer.software_genre ,Database design ,Simple API for XML ,Relational database management system ,XML Schema Editor ,Schema (psychology) ,Streaming XML ,XML schema ,computer.programming_language ,Database model ,Database ,Search engine indexing ,Database schema ,XML validation ,computer.file_format ,XML framework ,XML database ,XML Schema (W3C) ,Data model ,Information model ,computer ,XML - Abstract
In response to the widespread use of the XML format for document representation and message exchange, major database vendors support XML in terms of persistence, querying and indexing. Specifically, the recently released IBM DB2 9 (for Linux, Unix and Windows) is a hybrid data server with optimized management of both XML and relational data. With the new option of storing and querying XML in a relational DBMS, data architects face the the decision of what portion of their data to persist as XML and what portion as relational data. This problem has not been addressed yet and represents a serious need in the industry. Hence, this paper describes ReXSA, a schema advisor tool that is being prototyped for IBM DB2 9. ReXSA proposes candidate database schemas given an information model of the enterprise data. It has the advantage of considering qualitative properties of the information model such as reuse, evolution and performance profiles for deciding how to persist the data. Finally, we show the viability and practicality of ReXSA by applying it to custom and real usecases.
- Published
- 2007
- Full Text
- View/download PDF
74. Semantic Data Management: Towards Querying Data with their Meaning
- Author
-
Haixun Wang, Min Wang, and Lipyeow Lim
- Subjects
SQL ,Information retrieval ,computer.internet_protocol ,Relational database ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,Ontology (information science) ,computer.software_genre ,Query language ,Data modeling ,XQuery ,Relational database management system ,Ontology ,Domain knowledge ,computer ,XML ,computer.programming_language ,Semantic matching - Abstract
Relational database management systems are constantly being extended and augmented to accommodate data in different domains. Recently, with the increasing use of ontology in various applications, the need to support ontology, especially the related inferencing operation, in DBMS has become more concrete and urgent. However, manipulating knowledge along with relational data in DBMSs is not a trivial undertaking due to the mismatch in data models. In this paper, we introduce a framework for managing relational data and hierarchical domain knowledge together. Our framework persists taxonomies contained in ontologies by leveraging XML support in hybrid relational-XML DBMSs (e.g., IBM's DB2 v9) and rewrites ontology-based semantic matching queries using the industry-standard query languages, SQL/XML and XQuery. Compared with previous approaches, our approach does not materialize transitive closures of ontological relationships to support inferencing. Consequently, our method has wide applicability and good performance.
- Published
- 2007
- Full Text
- View/download PDF
75. Real Time Business Performance Monitoring and Analysis Using Metric Network
- Author
-
Lipyeow Lim, Hui Lei, and Pu Huang
- Subjects
Strategic planning ,Database ,Computer science ,Multi-agent system ,Metric (mathematics) ,Performance monitoring ,Strategic management ,Operational excellence ,Data mining ,Performance indicator ,Construct (python library) ,computer.software_genre ,computer - Abstract
Monitoring and analyzing business performance in a continuous manner nowadays is crucial for enterprises to achieve operational excellence, and to better align daily operations with long-term business strategies. To do so, performance measures need to be collected from daily operations and aggregated to construct higher-level Key Performance Indicators (KPIs) in nearly real time. We propose a system called metric network for enterprise-wide business performance monitoring and analysis. A metric network consists of metrics, metric repositories, aggregation agents, and knowledge agents. We describe in details the generic procedure patterns of these metric network entities and their communication pattern. Our loosely coupled design makes it easy to enhance features by adding more metrics and agents. The proposed approach is examined using real metrics on a fictitious scenario.
- Published
- 2006
- Full Text
- View/download PDF
76. Managing e-commerce catalogs in a DBMS with native XML support
- Author
-
M. Wang and Lipyeow Lim
- Subjects
Database ,business.industry ,Relational database ,computer.internet_protocol ,Computer science ,InformationSystems_DATABASEMANAGEMENT ,E-commerce ,Query optimization ,computer.software_genre ,Schema evolution ,Relational database management system ,Sargable ,IBM ,business ,computer ,XML - Abstract
Electronic commerce is emerging as a major application area for database systems. A large number of e-commerce stores provide electronic product catalogs that allow customers to search products of interest and store owners to manage various product information. Due to the constant schema evolution and the sparsity of e-commerce data, most commercial e-commerce systems use the so-called vertical schema for data storage. However, query processing for data stored using vertical schema is extremely inefficient because current RDBMSs, especially its cost-based query optimizer, are specifically designed to deal with traditional horizontal schemas. In this paper, we show that e-catalog management can be naturally supported in IBM's system RX, the first DBMS that truly supports both XML and relational data in their native forms. By leveraging on system RX's hybrid nature, we present a novel solution for storing, managing, and querying e-catalog data. In addition to traditional queries, we show that our solution supports semantic queries as well. Our solution does not require a separate query optimization layer, because query optimization is handled within the hybrid DBMS engine itself
- Published
- 2005
- Full Text
- View/download PDF
77. Characterizing Web Document Change
- Author
-
Ramesh C. Agarwal, Jeffrey Scott Vitter, Lipyeow Lim, Sriram Padmanabhan, and Min Wang
- Subjects
Web standards ,Web analytics ,medicine.medical_specialty ,Web development ,Web 2.0 ,Computer science ,computer.software_genre ,Social Semantic Web ,World Wide Web ,Search engine ,Web design ,Web page ,medicine ,Web navigation ,Semantic Web Stack ,Data Web ,business.industry ,Web application security ,Web Accessibility Initiative ,Web mining ,Web mapping ,Web service ,business ,Web intelligence ,computer ,Web modeling - Abstract
The World Wide Web is growing and changing at an astonishing rate. For the information in the web to be useful, web information systems such as search engines have to keep up with the growth and change of the web. In this paper we study how web documents change. In particular, we study two important characteristics of web document change that are directly related to keeping web information systems upto-date: the degree of the change and the clusteredness of the change. We analyze the evolution of web documents with respect to these two measures and discuss the implications for web information systems update.
- Published
- 2001
- Full Text
- View/download PDF
78. Optimizing Sensor Data Acquisition for Energy-Efficient Smartphone-Based Continuous Event Processing.
- Author
-
Misra, A. and Lipyeow Lim
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.