Chrome Extension
WeChat Mini Program
Use on ChatGLM

Serverless BM25 Search and BERT Reranking.

Mayank Anand, Jiarui Zhang,Shane Ding,Ji Xin,Jimmy Lin

DESIRES(2021)

Cited 0|Views12
No score
Abstract
The retrieve–rerank pipeline is a well-established architecture for search applications, typically with first-stage retrieval using keyword search followed by reranking with a transformer-based model. In deploying such an architecture in the cloud, developers must devote considerable effort to resource provisioning and management: typically, the goal is to optimize the infrastructure configuration (number and type of server instance) to achieve certain performance characteristics (latency, throughput, etc.) while reducing operating costs. In this paper, we introduce a serverless prototype of the retrieve–rerank pipeline for search using Amazon Web Services (AWS), comprised of BM25 for first-stage retrieval using Lucene followed by reranking with the monoBERT model using Hugging Face Transformers. The advantage of a serverless design is that a cloud provider shoulders the burden of operational management, for example, allocating server instances and scaling with query load. We experimentally show with the popular MS MARCO passage ranking test collection that compared to a traditional server-based deployment, our serverless implementation (1) retains the same level of effectiveness, (2) can reduce average latency by exploiting massive parallelism, and (3) incurs comparable costs if the service is expected to be idle for some fraction of the time. Our implementation is open-sourced at https://github.com/castorini/serverless-bert-reranking.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined