Very large collections of documents have become frequent in several application areas such as in medicine, social sciences, natural language processing, e-commerce and many others. The web is an example of a large document collection. Such collections are continuously growing in terms of the amount of documents, and the number of search operations (e.g. queries, similarity evaluation, document ranking, etc.) executed on the datasets. These factors impact the scalability of search engines twofold. The increase in the number of documents directly impacts the size of document indexes and the complexity of individual text similarity evaluations. The increase in the rate of evaluations demanded also increase the need for throughput. CPUs are optimized for low latency execution of a moderate number of application threads. In contrast, GPU architectures are designed to deliver high throughput computing for massive numbers of threads.
In this project, we intend to collaborate in the development of an enhanced parallel version of the WAND ranking algorithm using the heterogeneous power of GPUs and CPUs to execute document evaluation on massive collections of documents in scalable and efficient ways. A scheduling algorithm that uses the GPU as a static cache to process the most frequent and computationally expensive queries will be proposed and evaluated. Additionally, as a case of study we propose to evaluate our proposal in online e-commerce systems.
As an outcome from the collaboration between the researchers in this proposal, we expect to publish the results in high ranking conferences and journals in the related research fields. We also expect to collaborate in thesis supervision of MSc and PhD students.
As an outcome from the collaboration between the researchers in this proposal, we expect to publish the results in high ranking conferences and journals in the related research fields. We also expect to collaborate in thesis supervision of MSc and PhD student