Latent Retrieval for Large-Scale Fact-Checking and Question Answering with NLI training

Abstract

Passage retrieval is a part of fact-checking and question answering systems that is critical yet often neglected. Most systems usually rely only on traditional sparse retrieval. This can have a significant impact on the recall, especially when the relevant passages have few overlapping words with the query sentence. Recent approaches have attempted to learn dense representations of queries and passages to better capture the latent semantic content of text. While dense retrieval models have been proven effective in question answering, there is no relevant work for improving evidence retrieval in fact-checking. In this work, we show that simple training of a dense retriever is sufficient to outperform traditional sparse representations in both question answering and fact-checking. We constructed a new artificial dataset called Factual-NLI, comprised of factual claims and relevant evidence, and demonstrate that using it to train a dense retriever can improve evidence retrieval significantly. Experimental results on the MSMARCO dataset indicate that pre-training with Factual-NLI, and other NLI datasets, is also effective for large-scale passage retrieval in question answering. Our model is incorporated in a real world semantic search engine that returns snippets containing evidence related to questions and claims about the COVID-19 pandemic.

Publication
ICTAI 2020

Related