10 results on '"Inci Cetindil"'
Search Results
2. Similarity query support in big data management systems.
- Author
-
Taewoo Kim 0001, Wenhai Li, Alexander Behm, Inci Cetindil, Rares Vernica, Vinayak R. Borkar, Michael J. Carey 0001, and Chen Li 0001
- Published
- 2020
- Full Text
- View/download PDF
3. Efficient instant-fuzzy search with proximity ranking.
- Author
-
Inci Cetindil, Jamshid Esmaelnezhad, Taewoo Kim 0001, and Chen Li 0001
- Published
- 2014
- Full Text
- View/download PDF
4. Analysis of Instant Search Query Logs.
- Author
-
Inci Cetindil, Jamshid Esmaelnezhad, Chen Li 0001, and David Newman
- Published
- 2012
5. AsterixDB: A Scalable, Open Source BDMS.
- Author
-
Sattam Alsubaiee, Yasser Altowim, Hotham Altwaijry, Alexander Behm, Vinayak R. Borkar, Yingyi Bu, Michael J. Carey 0001, Inci Cetindil, Madhusudan Cheelangi, Khurram Faraaz, Eugenia Gabrielova, Raman Grover, Zachary Heilbron, Young-Seok Kim, Chen Li 0001, Guangqiang Li, Ji Mahn Ok, Nicola Onose, Pouria Pirzadeh, Vassilis J. Tsotras, Rares Vernica, Jian Wen, and Till Westmann
- Published
- 2014
- Full Text
- View/download PDF
6. Interactive and fuzzy search: a dynamic way to explore MEDLINE.
- Author
-
Jiannan Wang 0001, Inci Cetindil, Shengyue Ji, Chen Li 0001, Xiaohui Xie, Guoliang Li 0001, and Jianhua Feng
- Published
- 2010
- Full Text
- View/download PDF
7. Similarity query support in big data management systems
- Author
-
Michael J. Carey, Alexander Behm, Vinayak Borkar, Wenhai Li, Rares Vernica, Chen Li, Inci Cetindil, and Taewoo Kim
- Subjects
Information retrieval ,Computer science ,business.industry ,Parallel database ,Joins ,02 engineering and technology ,Query language ,Query optimization ,Operator (computer programming) ,Hardware and Architecture ,Analytics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Software ,Record linkage ,Information Systems - Abstract
Similarity query processing is becoming increasingly important in many applications such as data cleaning, record linkage, Web search, and document analytics. In this paper we study how to provide end-to-end similarity query support natively in a parallel database system. We discuss how to express a similarity predicate in its query language, how to build indexes, how to answer similarity queries (selections and joins) efficiently in the runtime engine, possibly using indexes, and how to optimize similarity queries. One particular challenge is how to incorporate existing similarity join algorithms, which often require a series of steps to achieve a high efficiency, including collecting token frequencies, finding matching record id pairs, and reassembling result records based on id pairs. We present a novel approach that uses existing runtime operators to implement such complex join algorithms without reinventing the wheel; doing so positions the system to automatically benefit from future improvements to those operators. The approach includes a technique to transform a similarity join plan into an efficient operator-based physical plan during query optimization by using a template expressed largely in the system’s user-level query language; this technique greatly simplifies the specification of such a transformation rule. We use Apache AsterixDB, a parallel Big Data management system, to illustrate and validate our techniques. We conduct an experimental study using several large, real datasets on a parallel computing cluster to assess the similarity query support. We also include experiments involving three other parallel systems and report the efficacy and performance results.
- Published
- 2020
- Full Text
- View/download PDF
8. AsterixDB
- Author
-
Yasser Altowim, Khurram Faraaz, Vinayak Borkar, Sattam Alsubaiee, Pouria Pirzadeh, Ji Mahn Ok, Yingyi Bu, Zachary Heilbron, Michael J. Carey, Guangqiang Li, Young-Seok Kim, Vassilis J. Tsotras, Madhusudan Cheelangi, Jian Wen, Chen Li, Till Westmann, Raman Grover, Rares Vernica, Eugenia Gabrielova, Nicola Onose, Hotham Altwaijry, Alexander Behm, and Inci Cetindil
- Subjects
FOS: Computer and information sciences ,SQL ,Database ,business.industry ,Computer science ,Big data ,General Engineering ,Databases (cs.DB) ,computer.software_genre ,Query language ,NoSQL ,Data warehouse ,Computer Science - Databases ,Data model ,Relational database management system ,Scalability ,business ,Database transaction ,computer ,computer.programming_language - Abstract
AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B + -tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements.
- Published
- 2014
- Full Text
- View/download PDF
9. Efficient instant-fuzzy search with proximity ranking
- Author
-
Taewoo Kim, Chen Li, Jamshid Esmaelnezhad, and Inci Cetindil
- Subjects
Web search query ,Information retrieval ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Search engine indexing ,Query language ,Query optimization ,Ranking (information retrieval) ,Data set ,Query expansion ,Ranking ,Web query classification ,Query throughput ,Sargable - Abstract
Instant search is an emerging information-retrieval paradigm in which a system finds answers to a query instantly while a user types in keywords character-by-character. Fuzzy search further improves user search experiences by finding relevant answers with keywords similar to query keywords. A main computational challenge in this paradigm is the high-speed requirement, i.e., each query needs to be answered within milliseconds to achieve an instant response and a high query throughput. At the same time, we also need good ranking functions that consider the proximity of keywords to compute relevance scores. In this paper, we study how to integrate proximity information into ranking in instant-fuzzy search while achieving efficient time and space complexities. We adapt existing solutions on proximity ranking to instant-fuzzy search. A naive solution is computing all answers then ranking them, but it cannot meet this high-speed requirement on large data sets when there are too many answers, so there are studies of early-termination techniques to efficiently compute relevant answers. To overcome the space and time limitations of these solutions, we propose an approach that focuses on common phrases in the data and queries, assuming records with these phrases are ranked higher. We study how to index these phrases and develop an incremental-computation algorithm for efficiently segmenting a query into phrases and computing relevant answers. We conducted a thorough experimental study on real data sets to show the tradeoffs between time, space, and quality of these solutions.
- Published
- 2014
- Full Text
- View/download PDF
10. Interactive and fuzzy search: a dynamic way to explore MEDLINE
- Author
-
Jianhua Feng, Shengyue Ji, Jiannan Wang, Xiaohui Xie, Chen Li, Inci Cetindil, and Guoliang Li
- Subjects
Statistics and Probability ,PubMed ,Computer science ,Abstracting and Indexing ,MEDLINE ,Information Storage and Retrieval ,Biochemistry ,World Wide Web ,Search algorithm ,Molecular Biology ,Biomedicine ,Internet ,Information retrieval ,business.industry ,Approximate string matching ,United States ,Computer Science Applications ,Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Index (publishing) ,The Internet ,business ,Algorithms ,Software - Abstract
Motivation: The MEDLINE database, consisting of over 19 million publication records, is the primary source of information for biomedicine and health questions. Although the database itself has been growing rapidly, the search paradigm of MEDLINE has remained largely unchanged. Results: Here, we propose a new system for exploring the entire MEDLINE collection, represented by two unique features: (i) interactive: providing instant feedback to users' query letter by letter, and (ii) fuzzy: allowing approximate search. We develop novel index structures and search algorithms to make such a search model possible. We also develop incremental-update techniques to keep the data up to date. Availability: Interactive and fuzzy searching algorithms for exploring MEDLINE are implemented in a system called iPubMed, freely accessible over the web at http://ipubmed.ics.uci.edu/ and http://tastier.cs.tsinghua.edu.cn/ipubmed/ Contact: chenli@ics.uci.edu; xhx@ics.uci.edu
- Published
- 2010
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.