Author: "Waas, F. (Florian)" - Searchworks@Jio Institute Digital Library Search Results

1. Database Architecture (R)evolution:New Hardware vs. New Software

Author: Harizopoulos, S., Argyros, T., Boncz, P.A. (Peter), Dietterich, D., Madden, S. (Samuel), Waas, F. (Florian), Harizopoulos, S., Argyros, T., Boncz, P.A. (Peter), Dietterich, D., Madden, S. (Samuel), and Waas, F. (Florian)
Published: 2010

2. Database Architecture (R)evolution:New Hardware vs. New Software

Author: Harizopoulos, S., Argyros, T., Boncz, P.A. (Peter), Dietterich, D., Madden, S. (Samuel), Waas, F. (Florian), Harizopoulos, S., Argyros, T., Boncz, P.A. (Peter), Dietterich, D., Madden, S. (Samuel), and Waas, F. (Florian)
Published: 2010

3. MonetDB

Author: Kersten, M.L. (Martin), Boncz, P.A. (Peter), Nes, N.J. (Niels), Manegold, S. (Stefan), Mullender, K.S. (Sjoerd), Groffen, F.E. (Fabian), Ivanova, M.G. (Milena), Zhang, Y. (Ying), Pereira Goncalves, R.A. (Romulo Antonio), Sidirourgos, E. (Eleftherios), Liarou, E. (Erietta), Idreos, S. (Stratos), Rijke, J.A. (Arjen) de, Vries, A.P. (Arjen) de, Alink, W. (Wouter), Cornacchia, R. (Roberto), Akker, J.F.P. (Johan) van den, Ballegooij, A.R. van, Berg, C.A. (Carel) van den, Castelo, J.R., Flokstra, J., Galindo-Legaria, C.A. (César), Grust, T., Héman, S. (Sándor), Hiemstra, D., Ianeva, T., Karlsson, J.S., Keulen, M. van, Konink, S. (Stefan) de, List, J.A., Mamoulis, N. (Nikos), Molenaar, G.J., Modena, G. (Gabriele), Göldner, S., Pellenkoft, A.J. (Jan), Bosch, H.G.P., Quak, W., Ramirez Camps, G. (Georgina), Rittinger, J., Rode, H. (Henning), Scherphof, W., Schmidt, A.R., Tang, N. (Nan), Teubner, J. (Jens), Treijtel, C., Tsikrika, T. (Theodora), Waas, F. (Florian), Westerveld, T.H.W. (Thijs), Windhouwer, M.A. (Menzo), Zukowski, M. (Marcin), Gafriller, A., Singh, A., Scherpenisse, A., Brodbeck, B., Nijs, G. de, Mayr, M., Antonelli, M., Dinther, M.H.M. (Martin) van, Aly, R., Os, R. van, Mayer, S., Kerschbaumer, S., Ressel, T., Schreiber, T., Kersten, M.L. (Martin), Boncz, P.A. (Peter), Nes, N.J. (Niels), Manegold, S. (Stefan), Mullender, K.S. (Sjoerd), Groffen, F.E. (Fabian), Ivanova, M.G. (Milena), Zhang, Y. (Ying), Pereira Goncalves, R.A. (Romulo Antonio), Sidirourgos, E. (Eleftherios), Liarou, E. (Erietta), Idreos, S. (Stratos), Rijke, J.A. (Arjen) de, Vries, A.P. (Arjen) de, Alink, W. (Wouter), Cornacchia, R. (Roberto), Akker, J.F.P. (Johan) van den, Ballegooij, A.R. van, Berg, C.A. (Carel) van den, Castelo, J.R., Flokstra, J., Galindo-Legaria, C.A. (César), Grust, T., Héman, S. (Sándor), Hiemstra, D., Ianeva, T., Karlsson, J.S., Keulen, M. van, Konink, S. (Stefan) de, List, J.A., Mamoulis, N. (Nikos), Molenaar, G.J., Modena, G. (Gabriele), Göldner, S., Pellenkoft, A.J. (Jan), Bosch, H.G.P., Quak, W., Ramirez Camps, G. (Georgina), Rittinger, J., Rode, H. (Henning), Scherphof, W., Schmidt, A.R., Tang, N. (Nan), Teubner, J. (Jens), Treijtel, C., Tsikrika, T. (Theodora), Waas, F. (Florian), Westerveld, T.H.W. (Thijs), Windhouwer, M.A. (Menzo), Zukowski, M. (Marcin), Gafriller, A., Singh, A., Scherpenisse, A., Brodbeck, B., Nijs, G. de, Mayr, M., Antonelli, M., Dinther, M.H.M. (Martin) van, Aly, R., Os, R. van, Mayer, S., Kerschbaumer, S., Ressel, T., and Schreiber, T.
Published: 2005

4. MonetDB

Author: Kersten, M.L. (Martin), Boncz, P.A. (Peter), Nes, N.J. (Niels), Manegold, S. (Stefan), Mullender, K.S. (Sjoerd), Groffen, F.E. (Fabian), Ivanova, M.G. (Milena), Zhang, Y. (Ying), Pereira Goncalves, R.A. (Romulo Antonio), Sidirourgos, E. (Eleftherios), Liarou, E. (Erietta), Idreos, S. (Stratos), Rijke, J.A. (Arjen) de, Vries, A.P. (Arjen) de, Alink, W. (Wouter), Cornacchia, R. (Roberto), Akker, J.F.P. (Johan) van den, Ballegooij, A.R. (Alex) van, Berg, C.A. (Carel) van den, Castelo, J.R., Flokstra, J., Galindo-Legaria, C.A. (César), Grust, T., Héman, S. A. B. C. (Sándor), Hiemstra, D., Ianeva, T., Karlsson, J.S. (Jonas), Keulen, M. van, Konink, S. (Stefan) de, List, J.A. (Johan), Mamoulis, N. (Nikos), Molenaar, G.J., Modena, G. (Gabriele), Göldner, S., Pellenkoft, A.J. (Jan), Bosch, H.G.P., Quak, W., Ramirez Camps, G. (Georgina), Rittinger, J., Rode, H. (Henning), Scherphof, W., Schmidt, A.R., Tang, N. (Nan), Teubner, J. (Jens), Treijtel, C., Tsikrika, T. (Theodora), Waas, F. (Florian), Westerveld, T.H.W. (Thijs), Windhouwer, M.A. (Menzo), Zukowski, M. (Marcin), Gafriller, A., Singh, A., Scherpenisse, A., Brodbeck, B., Nijs, G. de, Mayr, M., Antonelli, M., Dinther, M.H.M. (Martin) van, Aly, R., Os, R. van, Mayer, S., Kerschbaumer, S., Ressel, T., Schreiber, T., Kersten, M.L. (Martin), Boncz, P.A. (Peter), Nes, N.J. (Niels), Manegold, S. (Stefan), Mullender, K.S. (Sjoerd), Groffen, F.E. (Fabian), Ivanova, M.G. (Milena), Zhang, Y. (Ying), Pereira Goncalves, R.A. (Romulo Antonio), Sidirourgos, E. (Eleftherios), Liarou, E. (Erietta), Idreos, S. (Stratos), Rijke, J.A. (Arjen) de, Vries, A.P. (Arjen) de, Alink, W. (Wouter), Cornacchia, R. (Roberto), Akker, J.F.P. (Johan) van den, Ballegooij, A.R. (Alex) van, Berg, C.A. (Carel) van den, Castelo, J.R., Flokstra, J., Galindo-Legaria, C.A. (César), Grust, T., Héman, S. A. B. C. (Sándor), Hiemstra, D., Ianeva, T., Karlsson, J.S. (Jonas), Keulen, M. van, Konink, S. (Stefan) de, List, J.A. (Johan), Mamoulis, N. (Nikos), Molenaar, G.J., Modena, G. (Gabriele), Göldner, S., Pellenkoft, A.J. (Jan), Bosch, H.G.P., Quak, W., Ramirez Camps, G. (Georgina), Rittinger, J., Rode, H. (Henning), Scherphof, W., Schmidt, A.R., Tang, N. (Nan), Teubner, J. (Jens), Treijtel, C., Tsikrika, T. (Theodora), Waas, F. (Florian), Westerveld, T.H.W. (Thijs), Windhouwer, M.A. (Menzo), Zukowski, M. (Marcin), Gafriller, A., Singh, A., Scherpenisse, A., Brodbeck, B., Nijs, G. de, Mayr, M., Antonelli, M., Dinther, M.H.M. (Martin) van, Aly, R., Os, R. van, Mayer, S., Kerschbaumer, S., Ressel, T., and Schreiber, T.
Published: 2005

5. A Look Back on the XML Benchmark Project

Author: Blanken, H.M., Grabs, T., Schek, H., Schenkel, R., Weikum, G., Schmidt, A.R., Waas, F. (Florian), Manegold, S. (Stefan), Kersten, M.L. (Martin), Blanken, H.M., Grabs, T., Schek, H., Schenkel, R., Weikum, G., Schmidt, A.R., Waas, F. (Florian), Manegold, S. (Stefan), and Kersten, M.L. (Martin)
Abstract: The XML Benchmark Project was started to provide a framework for evaluating the interplay of XML technologies and Database Management Systems. The benchmark lays emphasis on engineering aspects as well as on performance of the query processor. In this chapter the authors present a quick overview of the benchmark and point at some of the experience they gathered during the design of the benchmark and while running it on a variety of platforms. Since the benchmark was designed early in the evolution of XML, our experiences also reflect how the perception of XML changed during the three years that have passed since we started working on the subject. The chapter comprises an overview of the benchmark as well as discussions of some lessons learned.
Published: 2003

6. A Look Back on the XML Benchmark Project

Author: Blanken, H.M., Grabs, T., Schek, H., Schenkel, R., Weikum, G., Schmidt, A.R., Waas, F. (Florian), Manegold, S. (Stefan), Kersten, M.L. (Martin), Blanken, H.M., Grabs, T., Schek, H., Schenkel, R., Weikum, G., Schmidt, A.R., Waas, F. (Florian), Manegold, S. (Stefan), and Kersten, M.L. (Martin)
Abstract: The XML Benchmark Project was started to provide a framework for evaluating the interplay of XML technologies and Database Management Systems. The benchmark lays emphasis on engineering aspects as well as on performance of the query processor. In this chapter the authors present a quick overview of the benchmark and point at some of the experience they gathered during the design of the benchmark and while running it on a variety of platforms. Since the benchmark was designed early in the evolution of XML, our experiences also reflect how the perception of XML changed during the three years that have passed since we started working on the subject. The chapter comprises an overview of the benchmark as well as discussions of some lessons learned.
Published: 2003

7. The Effect of Cost Distributions on Evolutionary Optimization Problems

Author: Galindo-Legaria, C.A. (César), Waas, F. (Florian), Galindo-Legaria, C.A. (César), and Waas, F. (Florian)
Published: 2002

8. Assessing XML Data Management with XMark

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., Busse, R., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., and Busse, R.
Abstract: We discuss some of the experiences we gathered during the development and deployment of XMark, a tool to assess the infrastructure and performance of XML Data Management Systems. Since the appearance of the first XML database prototypes in research institutions and development labs, topics like validation, performance evaluation and optimization of XML query processors have received significant interest. The XMark benchmark follows a tradition in database research and provides a framework to assess the abilities and performance of XML processing system: it helps users to see how a query component integrates into an application and how it copes with a variety of query types that are typically encountered in real-world scenarios. To this end, XMark offers an application scenario and a set of queries; each query is intended to challenge a particular aspect of the query processor like the performance of full-text search combined with structural information or joins. Furthermore, we have designed and made available a benchmark document generator that allows for efficient generation of databases of different sizes ranging from small to very large. In short, XMark attempts to cover the major aspects of XML query processing ranging from small to large document and from textual queries to data analysis and ad hoc queries.
Published: 2002

9. XMark: A Benchmark for XML Data Management

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., Busse, R., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., and Busse, R.
Abstract: While standardization efforts for XML query languages have been progressing, researchers and users increasingly focus on the database technology that has to deliver on the new challenges that the abundance of XML documents poses to data management: validation, performance evaluation and optimization of XML query processors are the upcoming issues. Following a long tradition in database research, we provide a framework to assess the abilities of an XML database to cope with a broad range of different query types typically encountered in real-world scenarios. The benchmark can help both implementors and users to compare XML databases in a standardized application scenario. To this end, we offer a set of queries where each query is intended to challenge a particular aspect of the query processor. The overall workload we propose consists of a scalable document database and a concise, yet comprehensive set of queries which covers the major aspects of XML query processing ranging from textual features to data analysis queries and ad hoc queries. We complement our research with results we obtained from running the benchmark on several XML database platforms. These results are intended to give a first baseline and illustrate the state of the art.
Published: 2002

10. Assessing XML Data Management with XMark

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., Busse, R., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., and Busse, R.
Published: 2002

11. The Effect of Cost Distributions on Evolutionary Optimization Problems

Author: Galindo-Legaria, C.A. (César), Waas, F. (Florian), Galindo-Legaria, C.A. (César), and Waas, F. (Florian)
Published: 2002

12. Assessing XML Data Management with XMark

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., Busse, R., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., and Busse, R.
Abstract: We discuss some of the experiences we gathered during the development and deployment of XMark, a tool to assess the infrastructure and performance of XML Data Management Systems. Since the appearance of the first XML database prototypes in research institutions and development labs, topics like validation, performance evaluation and optimization of XML query processors have received significant interest. The XMark benchmark follows a tradition in database research and provides a framework to assess the abilities and performance of XML processing system: it helps users to see how a query component integrates into an application and how it copes with a variety of query types that are typically encountered in real-world scenarios. To this end, XMark offers an application scenario and a set of queries; each query is intended to challenge a particular aspect of the query processor like the performance of full-text search combined with structural information or joins. Furthermore, we have designed and made available a benchmark document generator that allows for efficient generation of databases of different sizes ranging from small to very large. In short, XMark attempts to cover the major aspects of XML query processing ranging from small to large document and from textual queries to data analysis and ad hoc queries.
Published: 2002

13. XMark: A Benchmark for XML Data Management

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., Busse, R., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., and Busse, R.
Abstract: While standardization efforts for XML query languages have been progressing, researchers and users increasingly focus on the database technology that has to deliver on the new challenges that the abundance of XML documents poses to data management: validation, performance evaluation and optimization of XML query processors are the upcoming issues. Following a long tradition in database research, we provide a framework to assess the abilities of an XML database to cope with a broad range of different query types typically encountered in real-world scenarios. The benchmark can help both implementors and users to compare XML databases in a standardized application scenario. To this end, we offer a set of queries where each query is intended to challenge a particular aspect of the query processor. The overall workload we propose consists of a scalable document database and a concise, yet comprehensive set of queries which covers the major aspects of XML query processing ranging from textual features to data analysis queries and ad hoc queries. We complement our research with results we obtained from running the benchmark on several XML database platforms. These results are intended to give a first baseline and illustrate the state of the art.
Published: 2002

14. Assessing XML Data Management with XMark

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., Busse, R., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Carey, M.J., Manolescu, I., and Busse, R.
Published: 2002

15. FeedbackBypass: A new Approach to Interactive Similarity Query Processing

Author: Bartolini, I., Ciaccia, P., Waas, F. (Florian), Bartolini, I., Ciaccia, P., and Waas, F. (Florian)
Abstract: In recent years, several methods have been proposed for implementing interactive similarity queries on multimedia databases. Common to all these methods is the idea to exploit user feedback in order to progressively adjust the query parameters and to eventually converge to an "optimal" parameter setting. However, all these methods also share the drawback to "forget" user preferences across multiple query sessions, thus requiring the feedback loop to be restarted for every new query, i.e. using default parameter values. Not only is this proceeding frustrating from the user's point of view but it also constitutes a significant waste of system resources. In this paper we present FeedbackBypass, a new approach to interactive similarity query processing. It complements the role of relevance feedback engines by storing and maintaining the query parameters determined with feedback loops over time, using a wavelet-based data structure (the Simplex Tree). For each query, a favorable set of query parameters can be determined and used to either "bypass" the feedback loop completely for already-seen queries, or to start the search process from a near-optimal configuration. FeedbackBypass can be combined well with all state-of-the-art relevance feedback techniques working in high-dimensional vector spaces. Its storage requirements scale linearly with the dimensionality of the query space, thus making even sophisticated query spaces amenable. Experimental results demonstrate both the effectiveness and efficiency of our technique.
Published: 2001

16. FeedbackBypass: A new Approach to Interactive Similarity Query Processing

Author: Bartolini, I., Ciaccia, P., Waas, F. (Florian), Bartolini, I., Ciaccia, P., and Waas, F. (Florian)
Abstract: In recent years, several methods have been proposed for implementing interactive similarity queries on multimedia databases. Common to all these methods is the idea to exploit user feedback in order to progressively adjust the query parameters and to eventually converge to an "optimal" parameter setting. However, all these methods also share the drawback to "forget" user preferences across multiple query sessions, thus requiring the feedback loop to be restarted for every new query, i.e. using default parameter values. Not only is this proceeding frustrating from the user's point of view but it also constitutes a significant waste of system resources. In this paper we present FeedbackBypass, a new approach to interactive similarity query processing. It complements the role of relevance feedback engines by storing and maintaining the query parameters determined with feedback loops over time, using a wavelet-based data structure (the Simplex Tree). For each query, a favorable set of query parameters can be determined and used to either "bypass" the feedback loop completely for already-seen queries, or to start the search process from a near-optimal configuration. FeedbackBypass can be combined well with all state-of-the-art relevance feedback techniques working in high-dimensional vector spaces. Its storage requirements scale linearly with the dimensionality of the query space, thus making even sophisticated query spaces amenable. Experimental results demonstrate both the effectiveness and efficiency of our technique.
Published: 2001

17. The XML benchmark project

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Florescu, D., Carey, M.J., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Florescu, D., and Carey, M.J.
Abstract: With standardization efforts of a query language for XML documents drawing to a close, researchers and users increasingly focus their attention on the database technology that has to deliver on the new challenges that the sheer amount of XML documents produced by applications pose to data management: validation, performance evaluation and optimization of XML query processors are the upcoming issues. Following a long tradition in database research, the XML Store Benchmark Project provides a framework to assess an XML database's abilities to cope with a broad spectrum of different queries, typically posed in real-world application scenarios. The benchmark is intended to help both implementors and users to compare XML databases independent of their own, specific application scenario. To this end, the benchmark offers a set queries each of which is intended to challenge a particular primitive of the query processor or storage engine. The overall workload we propose consists of a scalable document database and a concise, yet comprehensive set of queries, which covers the major aspects of query processing. The queries' challenges range from stressing the textual character of the document to data analysis queries, but include also typical ad-hoc queries. We complement our research with results obtained from running the benchmark on our XML database platform. They are intended to give a first baseline, illustrating the state of the art.
Published: 2001

18. Why and How to Benchmark XML Databases

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Florescu, D., Carey, M.J., Manolescu, I., Busse, R., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Florescu, D., Carey, M.J., Manolescu, I., and Busse, R.
Abstract: Benchmarks belong to the very standard repertory of tools deployed in database development. Assessing the capabilities of a system, analyzing actual and potential bottlenecks, and, naturally, comparing the pros and cons of different systems architectures have become indispensable tasks as databases management systems grow in complexity and capacity. In the course of the development of XML databases the need for a benchmark framework has become more and more evident: a great many different ways to store XML data have been suggested in the past, each with its genuine advantages, disadvantages and consequences that propagate through the layers of a complex database system and need to be carefully considered. The different storage schemes render the query characteristics of the data variably different. However, no conclusive methodology for assessing these differences is available to date. In this paper, we outline desiderata for a benchmark for XML databases drawing from our own experience of developing an XML repository, involvement in the definition of the standard query language, and experience with standard benchmarks for relational databases.
Published: 2001

19. Memory-Aware Query Routing in Interactive Web-based Information Systems

Author: Waas, F. (Florian), Kersten, M.L. (Martin), Waas, F. (Florian), and Kersten, M.L. (Martin)
Abstract: Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup.
Published: 2001

20. Memory-Aware Query Routing in Interactive Web-based Information Systems

Author: Waas, F. (Florian), Kersten, M.L. (Martin), Waas, F. (Florian), and Kersten, M.L. (Martin)
Abstract: Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup.
Published: 2001

21. FeedbackBypass: A new Approach to Interactive Similarity Query Processing

Author: Bartolini, I., Ciaccia, P., Waas, F. (Florian), Bartolini, I., Ciaccia, P., and Waas, F. (Florian)
Abstract: In recent years, several methods have been proposed for implementing interactive similarity queries on multimedia databases. Common to all these methods is the idea to exploit user feedback in order to progressively adjust the query parameters and to eventually converge to an "optimal" parameter setting. However, all these methods also share the drawback to "forget" user preferences across multiple query sessions, thus requiring the feedback loop to be restarted for every new query, i.e. using default parameter values. Not only is this proceeding frustrating from the user's point of view but it also constitutes a significant waste of system resources. In this paper we present FeedbackBypass, a new approach to interactive similarity query processing. It complements the role of relevance feedback engines by storing and maintaining the query parameters determined with feedback loops over time, using a wavelet-based data structure (the Simplex Tree). For each query, a favorable set of query parameters can be determined and used to either "bypass" the feedback loop completely for already-seen queries, or to start the search process from a near-optimal configuration. FeedbackBypass can be combined well with all state-of-the-art relevance feedback techniques working in high-dimensional vector spaces. Its storage requirements scale linearly with the dimensionality of the query space, thus making even sophisticated query spaces amenable. Experimental results demonstrate both the effectiveness and efficiency of our technique.
Published: 2001

22. FeedbackBypass: A new Approach to Interactive Similarity Query Processing

Author: Bartolini, I., Ciaccia, P., Waas, F. (Florian), Bartolini, I., Ciaccia, P., and Waas, F. (Florian)
Abstract: In recent years, several methods have been proposed for implementing interactive similarity queries on multimedia databases. Common to all these methods is the idea to exploit user feedback in order to progressively adjust the query parameters and to eventually converge to an "optimal" parameter setting. However, all these methods also share the drawback to "forget" user preferences across multiple query sessions, thus requiring the feedback loop to be restarted for every new query, i.e. using default parameter values. Not only is this proceeding frustrating from the user's point of view but it also constitutes a significant waste of system resources. In this paper we present FeedbackBypass, a new approach to interactive similarity query processing. It complements the role of relevance feedback engines by storing and maintaining the query parameters determined with feedback loops over time, using a wavelet-based data structure (the Simplex Tree). For each query, a favorable set of query parameters can be determined and used to either "bypass" the feedback loop completely for already-seen queries, or to start the search process from a near-optimal configuration. FeedbackBypass can be combined well with all state-of-the-art relevance feedback techniques working in high-dimensional vector spaces. Its storage requirements scale linearly with the dimensionality of the query space, thus making even sophisticated query spaces amenable. Experimental results demonstrate both the effectiveness and efficiency of our technique.
Published: 2001

23. The XML benchmark project

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Florescu, D., Carey, M.J., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Florescu, D., and Carey, M.J.
Abstract: With standardization efforts of a query language for XML documents drawing to a close, researchers and users increasingly focus their attention on the database technology that has to deliver on the new challenges that the sheer amount of XML documents produced by applications pose to data management: validation, performance evaluation and optimization of XML query processors are the upcoming issues. Following a long tradition in database research, the XML Store Benchmark Project provides a framework to assess an XML database's abilities to cope with a broad spectrum of different queries, typically posed in real-world application scenarios. The benchmark is intended to help both implementors and users to compare XML databases independent of their own, specific application scenario. To this end, the benchmark offers a set queries each of which is intended to challenge a particular primitive of the query processor or storage engine. The overall workload we propose consists of a scalable document database and a concise, yet comprehensive set of queries, which covers the major aspects of query processing. The queries' challenges range from stressing the textual character of the document to data analysis queries, but include also typical ad-hoc queries. We complement our research with results obtained from running the benchmark on our XML database platform. They are intended to give a first baseline, illustrating the state of the art.
Published: 2001

24. Memory-Aware Query Routing in Interactive Web-based Information Systems

Author: Waas, F. (Florian), Kersten, M.L. (Martin), Waas, F. (Florian), and Kersten, M.L. (Martin)
Abstract: Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup.
Published: 2001

25. Why and How to Benchmark XML Databases

Author: Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Florescu, D., Carey, M.J., Manolescu, I., Busse, R., Schmidt, A.R., Waas, F. (Florian), Kersten, M.L. (Martin), Florescu, D., Carey, M.J., Manolescu, I., and Busse, R.
Abstract: Benchmarks belong to the very standard repertory of tools deployed in database development. Assessing the capabilities of a system, analyzing actual and potential bottlenecks, and, naturally, comparing the pros and cons of different systems architectures have become indispensable tasks as databases management systems grow in complexity and capacity. In the course of the development of XML databases the need for a benchmark framework has become more and more evident: a great many different ways to store XML data have been suggested in the past, each with its genuine advantages, disadvantages and consequences that propagate through the layers of a complex database system and need to be carefully considered. The different storage schemes render the query characteristics of the data variably different. However, no conclusive methodology for assessing these differences is available to date. In this paper, we outline desiderata for a benchmark for XML databases drawing from our own experience of developing an XML repository, involvement in the definition of the standard query language, and experience with standard benchmarks for relational databases.
Published: 2001

26. Memory-Aware Query Routing in Interactive Web-based Information Systems

Author: Waas, F. (Florian), Kersten, M.L. (Martin), Waas, F. (Florian), and Kersten, M.L. (Martin)
Abstract: Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup.
Published: 2001

27. Join Order Selection - Good Enough is Easy

Author: Waas, F. (Florian), Pellenkoft, A.J. (Jan), Waas, F. (Florian), and Pellenkoft, A.J. (Jan)
Abstract: Uniform sampling of join orders is known to be a competitive alternative to transformation-based optimization techniques. However, uniformity of the sampling process is difficult to establish and only for a restricted class of join queries techniques are known. In this paper, we investigate non-uniform sampling devising a simple yet powerful algorithm that is generally applicable. The key element of the algorithm is a mapping of randomly generated sequences of join predicates to query plans. We take advantage of the bottom-up constructing of query plans by simultaneously computing the costs and discarding partial plans as soon as they exceed the best costs found so far, which implements a highly effective cost-bound pruning component. Sampling does not produce the optimal plan but a near-optimal solution which is fully sufficient as the cost function grows more and more inaccurate with increasing query size. In return, our algorithm establishes a well-balanced trade-off between result quality and time invested in the optimization process.
Published: 2000

28. Counting, enumerating and sampling of execution plans in a cost-based query optimizer

Author: Waas, F. (Florian), Galindo-Legaria, C.A. (César), Waas, F. (Florian), and Galindo-Legaria, C.A. (César)
Abstract: Testing an SQL database system by running large sets of deterministic or stochastic SQL statements is common practice in commercial database development. However, code defects often remain undetected as the query optimizer's choice of an execution plan is not only depending on the query but strongly influenced by a large number of parameters describing the database and the hardware environment. Modifying these parameters in order to steer the optimizer to select other plans is difficult since this means anticipating often complex search strategies implemented in the optimizer. In this paper we devise algorithms for counting, exhaustive generation, and uniform sampling of plans from the complete search space. Our techniques allow extensive validation of both generation of alternatives, and execution algorithms with plans other than the optimized one---if two candidate plans fail to produce the same results, then either the optimizer considered an invalid plan, or the execution code is faulty. When the space of alternatives becomes too large for exhaustive testing, which can occur even with a handful of joins, uniform random sampling provides a mechanism for unbiased testing. The technique is implemented in Microsoft's SQL Server, where it is an integral part of the validation and testing process.
Published: 2000
Full Text: View/download PDF

29. Extending Iterators for Advanced Query Execution

Author: Waas, F. (Florian) and Waas, F. (Florian)
Abstract: Today's commercial relational database systems use tree-shaped execution plans. The evaluation techniques for these plan are well understood and have been refined over the last decade. However, for queries that contain disjunctive predicates, using the more general class of direct acyclic graphs and splitting data streams can be beneficial. Unfortunately, the iterator based evaluation techniques used for tree-shaped plans do not apply to this case. Iterators implement a breadth first search providing full encapsulation where operators communicate by answered requests in synchronous manner. In this paper we develop an extension of the conventional iterator based evaluation technique. We introduce request handles that add context information to the data requests which allows for arbitrary plan topologies including cycles. The original problem of evaluating plans with operators that split data streams can then be solved by mere rewriting of the execution plan.
Published: 2000

30. Using the Wavelet Transform to Learn from User Feedback

Author: Bartolini, I., Ciaccia, P., Waas, F. (Florian), Bartolini, I., Ciaccia, P., and Waas, F. (Florian)
Published: 2000

31. Interactive Visualization of Multidimensional Feature Spaces

Author: Liere, R. (Robert) van, Leeuw, W.C. (Wim) de, Waas, F. (Florian), Liere, R. (Robert) van, Leeuw, W.C. (Wim) de, and Waas, F. (Florian)
Abstract: Image similarity models characterize images as points in high-dimensional feature spaces. Each point is represented by a combination of distinct features, such as brightness, color histograms or texture characteristics of the image, etc. For the design and tuning of features, and thus the effectiveness of the image similarity model, it is important to understand the interrelations of individual features and the implications on the structure of the feature space. In this paper, we discuss an interactive visualization tool for the exploration of multidimensional feature spaces. Our tool uses a graph as an intermediate representation of the points in the feature space. A mass spring algorithm is used to layout the graph in a 2D space in which arrangements of similar images are attracted to each other and dissimilar images are repelled. The emphasis of the visualization tool is on interaction: users may influence the layout by interactively scaling dimensions of the feature space. In this way, the user can explore how a feature behaves in relation to other features.
Published: 2000

32. Efficient Relational Storage and Retrieval of XML Documents (Extended Version)

Author: Schmidt, A.R., Kersten, M.L. (Martin), Windhouwer, M.A. (Menzo), Waas, F. (Florian), Schmidt, A.R., Kersten, M.L. (Martin), Windhouwer, M.A. (Menzo), and Waas, F. (Florian)
Abstract: In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential of vertical fragmentation. Moreover, our approach provides clear and intuitive semantics, which facilitates the definition of a declarative query algebra. Our experimental results with large collections of XML documents demonstrate the effectiveness of the techniques proposed.
Published: 2000

33. Principles of probabilistic query optimization

Author: Waas, F. (Florian) and Waas, F. (Florian)
Published: 2000

34. Efficient Relational Storage and Retrieval of XML Documents

Author: Schmidt, A.R., Kersten, M.L. (Martin), Windhouwer, M.A. (Menzo), Waas, F. (Florian), Schmidt, A.R., Kersten, M.L. (Martin), Windhouwer, M.A. (Menzo), and Waas, F. (Florian)
Abstract: In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential of vertical fragmentation. Moreover, our approach provides clear and intuitive semantics, which facilitates the definition of a declarative query algebra. Our experimental results with large collections of XML documents demonstrate the effectiveness of the techniques proposed.
Published: 2000

35. The effect of cost distributions on evolutionary optimization algorithms

Author: Waas, F. (Florian), Galindo-Legaria, C.A. (César), Waas, F. (Florian), and Galindo-Legaria, C.A. (César)
Abstract: According to the No-Free-Lunch theorems of Wolpert and Macready, we cannot expect one generic optimization technique to outperform others on average. For every optimization technique there exist ``easy'' and ``hard'' problems. However, only little is known as to what criteria determine the particular difficulty of a problem. In this paper, we address this question from an evolutionary computing point of view. We use cost distributions, i.e., the frequencies of the objective function's values occurring in the search spaces, to devise a classification of optimization problems. We scrutinize the influence of cost distributions on the single algorithmic components of evolutionary computing. Our analysis helps identifying (1) problems where evolutionary algorithms are overhead, (2) problems where evolutionary algorithms are highly suitable optimization algorithms, as well as (3) problems that pose difficulties for evolutionary techniques.
Published: 2000

36. Memory aware query scheduling in a database cluster

Author: Waas, F. (Florian), Kersten, M.L. (Martin), Waas, F. (Florian), and Kersten, M.L. (Martin)
Abstract: Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup.
Published: 2000

37. Using the Wavelet Transform to Learn from User Feedback

Author: Bartolini, I., Ciaccia, P., Waas, F. (Florian), Bartolini, I., Ciaccia, P., and Waas, F. (Florian)
Published: 2000

38. Join Order Selection - Good Enough is Easy

Author: Waas, F. (Florian), Pellenkoft, A.J. (Jan), Waas, F. (Florian), and Pellenkoft, A.J. (Jan)
Abstract: Uniform sampling of join orders is known to be a competitive alternative to transformation-based optimization techniques. However, uniformity of the sampling process is difficult to establish and only for a restricted class of join queries techniques are known. In this paper, we investigate non-uniform sampling devising a simple yet powerful algorithm that is generally applicable. The key element of the algorithm is a mapping of randomly generated sequences of join predicates to query plans. We take advantage of the bottom-up constructing of query plans by simultaneously computing the costs and discarding partial plans as soon as they exceed the best costs found so far, which implements a highly effective cost-bound pruning component. Sampling does not produce the optimal plan but a near-optimal solution which is fully sufficient as the cost function grows more and more inaccurate with increasing query size. In return, our algorithm establishes a well-balanced trade-off between result quality and time invested in the optimization process.
Published: 2000

39. Interactive Visualization of Multidimensional Feature Spaces

Author: Liere, R. (Robert) van, Leeuw, W.C. (Wim) de, Waas, F. (Florian), Liere, R. (Robert) van, Leeuw, W.C. (Wim) de, and Waas, F. (Florian)
Abstract: Image similarity models characterize images as points in high-dimensional feature spaces. Each point is represented by a combination of distinct features, such as brightness, color histograms or texture characteristics of the image, etc. For the design and tuning of features, and thus the effectiveness of the image similarity model, it is important to understand the interrelations of individual features and the implications on the structure of the feature space. In this paper, we discuss an interactive visualization tool for the exploration of multidimensional feature spaces. Our tool uses a graph as an intermediate representation of the points in the feature space. A mass spring algorithm is used to layout the graph in a 2D space in which arrangements of similar images are attracted to each other and dissimilar images are repelled. The emphasis of the visualization tool is on interaction: users may influence the layout by interactively scaling dimensions of the feature space. In this way, the user can explore how a feature behaves in relation to other features.
Published: 2000

40. Extending Iterators for Advanced Query Execution

Author: Waas, F. (Florian) and Waas, F. (Florian)
Abstract: Today's commercial relational database systems use tree-shaped execution plans. The evaluation techniques for these plan are well understood and have been refined over the last decade. However, for queries that contain disjunctive predicates, using the more general class of direct acyclic graphs and splitting data streams can be beneficial. Unfortunately, the iterator based evaluation techniques used for tree-shaped plans do not apply to this case. Iterators implement a breadth first search providing full encapsulation where operators communicate by answered requests in synchronous manner. In this paper we develop an extension of the conventional iterator based evaluation technique. We introduce request handles that add context information to the data requests which allows for arbitrary plan topologies including cycles. The original problem of evaluating plans with operators that split data streams can then be solved by mere rewriting of the execution plan.
Published: 2000

41. The effect of cost distributions on evolutionary optimization algorithms

Author: Waas, F. (Florian), Galindo-Legaria, C.A. (César), Waas, F. (Florian), and Galindo-Legaria, C.A. (César)
Abstract: According to the No-Free-Lunch theorems of Wolpert and Macready, we cannot expect one generic optimization technique to outperform others on average. For every optimization technique there exist ``easy'' and ``hard'' problems. However, only little is known as to what criteria determine the particular difficulty of a problem. In this paper, we address this question from an evolutionary computing point of view. We use cost distributions, i.e., the frequencies of the objective function's values occurring in the search spaces, to devise a classification of optimization problems. We scrutinize the influence of cost distributions on the single algorithmic components of evolutionary computing. Our analysis helps identifying (1) problems where evolutionary algorithms are overhead, (2) problems where evolutionary algorithms are highly suitable optimization algorithms, as well as (3) problems that pose difficulties for evolutionary techniques.
Published: 2000

42. Counting, enumerating and sampling of execution plans in a cost-based query optimizer

Author: Waas, F. (Florian), Galindo-Legaria, C.A. (César), Waas, F. (Florian), and Galindo-Legaria, C.A. (César)
Abstract: Testing an SQL database system by running large sets of deterministic or stochastic SQL statements is common practice in commercial database development. However, code defects often remain undetected as the query optimizer's choice of an execution plan is not only depending on the query but strongly influenced by a large number of parameters describing the database and the hardware environment. Modifying these parameters in order to steer the optimizer to select other plans is difficult since this means anticipating often complex search strategies implemented in the optimizer. In this paper we devise algorithms for counting, exhaustive generation, and uniform sampling of plans from the complete search space. Our techniques allow extensive validation of both generation of alternatives, and execution algorithms with plans other than the optimized one---if two candidate plans fail to produce the same results, then either the optimizer considered an invalid plan, or the execution code is faulty. When the space of alternatives becomes too large for exhaustive testing, which can occur even with a handful of joins, uniform random sampling provides a mechanism for unbiased testing. The technique is implemented in Microsoft's SQL Server, where it is an integral part of the validation and testing process.
Published: 2000
Full Text: View/download PDF

43. Memory aware query scheduling in a database cluster

Author: Waas, F. (Florian), Kersten, M.L. (Martin), Waas, F. (Florian), and Kersten, M.L. (Martin)
Abstract: Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup.
Published: 2000

44. Efficient Relational Storage and Retrieval of XML Documents (Extended Version)

Author: Schmidt, A.R., Kersten, M.L. (Martin), Windhouwer, M.A. (Menzo), Waas, F. (Florian), Schmidt, A.R., Kersten, M.L. (Martin), Windhouwer, M.A. (Menzo), and Waas, F. (Florian)
Abstract: In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential of vertical fragmentation. Moreover, our approach provides clear and intuitive semantics, which facilitates the definition of a declarative query algebra. Our experimental results with large collections of XML documents demonstrate the effectiveness of the techniques proposed.
Published: 2000

45. Principles of probabilistic query optimization

Author: Waas, F. (Florian) and Waas, F. (Florian)
Published: 2000

46. Efficient Relational Storage and Retrieval of XML Documents

Author: Schmidt, A.R., Kersten, M.L. (Martin), Windhouwer, M.A. (Menzo), Waas, F. (Florian), Schmidt, A.R., Kersten, M.L. (Martin), Windhouwer, M.A. (Menzo), and Waas, F. (Florian)
Abstract: In this paper, we present a data and an execution model that allow for efficient storage and retrieval of XML documents in a relational database. The data model is strictly based on the notion of binary associations: by decomposing XML documents into small, flexible and semantically homogeneous units we are able to exploit the performance potential of vertical fragmentation. Moreover, our approach provides clear and intuitive semantics, which facilitates the definition of a declarative query algebra. Our experimental results with large collections of XML documents demonstrate the effectiveness of the techniques proposed.
Published: 2000

47. Probabilistic bottom-up join order selection - breaking the curse of NP-completeness

Author: Waas, F. (Florian), Pellenkoft, A.J. (Jan), Waas, F. (Florian), and Pellenkoft, A.J. (Jan)
Abstract: Join-ordering is known to be NP-complete and therefore a variety of heuristics have been devised to tackle large queries which are considered computational intractable otherwise. However, practitioners often point out that typical problem instances are not difficult to optimize at all. In this paper we address that seeming discrepancy. We present a probabilistic bottom-up join-ordering technique that is distinguished by the high-quality results achieved, the extremely short running time and---most notable---its independence of the search space's size. The subsequent thorough analysis of the algorithm's principle confirm our experimental results.
Published: 1999

48. Cost distributions in a symmetric Euclidean traveling salesman problems : asupplement to TSPLIB

Author: Waas, F. (Florian) and Waas, F. (Florian)
Abstract: We present analytically and experimentally determined cost distributions for all euclidean two-dimensional symmetric instances of the Traveling Salesman Problem in the TSPLIB library. Results obtained show characteristic cost distributions in all cases with and a high stability against degeneration.
Published: 1999

49. Counting, enumerating and sampling of execution plans in a cost-based query optimizer

Author: Waas, F. (Florian), Galindo-Legaria, C.A. (César), Waas, F. (Florian), and Galindo-Legaria, C.A. (César)
Abstract: Testing an SQL database system by running large sets of deterministic or stochastic SQL statements is common practice in commercial database development. However, code defects often remain undetected as the query optimizer's choice of an execution plan is not only depending on the query but strongly influenced by a large number of parameters describing the database and the hardware environment. Modifying these parameters in order to steer the optimizer to select other plans is difficult since this means anticipating often complex search strategies implemented in the optimizer. In this paper we devise algorithms for counting, exhaustive generation, and uniform sampling of plans from the complete search space. Our techniques allow extensive validation of both generation of alternatives, and execution algorithms with plans other than the optimized one---if two candidate plans fail to produce the same results, then either the optimizer considered an invalid plan, or the execution code is faulty. When the space of alternatives becomes too large for exhaustive testing, which can occur even with a handful of joins, uniform random sampling provides a mechanism for unbiased testing. The technique is implemented in Microsoft's SQL Server, where it is an integral part of the validation and testing process.
Published: 1999

50. Handling Non-deterministic Data Availability in Parallel Query Execution.

Author: Waas, F. (Florian) and Waas, F. (Florian)
Abstract: The situation of non-deterministic data availability, where it is not known a priori which of two or more processes will respond first, cannot be handled with standard techniques. The consequence is sub-optimal processing because of inefficient resource allocation and unnecessary delays. In this paper we develop an effective solution to the problem by extending the demand-driven evaluation paradigm to the end of using operators with more than just one output stream. We show how inter-process communication and non-deterministic data availability in parallel query processing reduce to cases that can be executed efficiently with the new evaluation paradigm.
Published: 1999

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

72 results on '"Waas, F. (Florian)"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources