Scalable Approximate Query Evaluation on Document Inverted Files for GPU based Big-Data Applications

Acronym

ANID PCI

Project Title

Internal ID

5810

Principal Investigator

Marin-Caihuan, J

Start Date

2020

End Date

2021

OpenAIRE ID

STIC190018

Keywords

INFORMATION RETRIEVAL...

PARALLEL COMPUTING

BIG DATA

Description

Very large collections of documents have become frequent in several application areas such as in medicine, social sciences, natural language processing, e-commerce and many others. The web is an example of a large document collection. Such collections are continuously growing in terms of the amount of documents, and the number of search operations (e.g. queries, similarity evaluation, document ranking, etc.) executed on the datasets. These factors impact the scalability of search engines twofold. The increase in the number of documents directly impacts the size of document indexes and the complexity of individual text similarity evaluations. The increase in the rate of evaluations demanded also increase the need for throughput. CPUs are optimized for low latency execution of a moderate number of application threads. In contrast, GPU architectures are designed to deliver high throughput computing for massive numbers of threads.

In this project, we intend to collaborate in the development of an enhanced parallel version of the WAND ranking algorithm using the heterogeneous power of GPUs and CPUs to execute document evaluation on massive collections of documents in scalable and efficient ways. A scheduling algorithm that uses the GPU as a static cache to process the most frequent and computationally expensive queries will be proposed and evaluated. Additionally, as a case of study we propose to evaluate our proposal in online e-commerce systems.

As an outcome from the collaboration between the researchers in this proposal, we expect to publish the results in high ranking conferences and journals in the related research fields. We also expect to collaborate in thesis supervision of MSc and PhD students.

As an outcome from the collaboration between the researchers in this proposal, we expect to publish the results in high ranking conferences and journals in the related research fields. We also expect to collaborate in thesis supervision of MSc and PhD student