How to build a Question Answering System using transformers
Lot of emphasis has been put on transformers for language modelling in the recent past. One challenging problem to solve is to find the answer to a question from a given context. We’ll explore the solution to this problem and try to build a question answering system using the transformers library in this blog post.
Let’s say we have a context and a question. Our job is to find the answer to the question from this given context. For example if the context is
” Abraham Lincoln (February 12, 1809 – April 15, 1865) was the sixteenth President of the United States, serving from March 4, 1861 until his assassination. As an outspoken opponent of the expansion of slavery in the United States, “[I]n his short autobiography written for the 1860 presidential campaign, Lincoln would describe his protest in the Illinois legislature as one that ‘briefly defined his position on the slavery question, and so far as it goes, it was then the same that it is now.” This was in reference to the anti-expansion sentiments he had then expressed. Doris Kearns Goodwin, Team of Rivals: The Political Genius of Abraham Lincoln (2005) p. 91. Holzer pg. 232. Writing of the Cooper Union speech, Holzer notes, “Cooper Union proved a unique confluence of political culture, rhetorical opportunity, technological innovation, and human genius, and it brought Abraham Lincoln to the center stage of American politics at precisely the right time and place, and with precisely the right message: that slavery was wrong, and ought to be confined to the areas where it already existed, and placed on the ‘course of ultimate extinction… .'” Lincoln won the Republican Party nomination in 1860 and was elected president later that year. During his term, he helped preserve the United States by leading the defeat of the secessionist Confederate States of America in the American Civil War. He introduced measures that resulted in the abolition of slavery, issuing his Emancipation Proclamation in 1863 and promoting the passage of the Thirteenth Amendment to the Constitution in 1865 ”
and the question is
“Who was the sixteenth President of the United States?”,
the answer should be
“Abraham Lincoln”.
Let’s see if we can ask a transformer model to answer this question for us. I’ll demonstrate this in the following code snippets in python.
from transformers import pipeline
question_answering = pipeline("question-answering")
The above pipeline imports the DistilBERT-base model which was fine tuned on the SQuAD dataset. There is an option in the pipeline function to point the model to some other model. Continuing with the code.
context = """Abraham Lincoln (February 12, 1809 – April 15, 1865) was the sixteenth President of the United States, serving from March 4, 1861 until his assassination. As an outspoken opponent of the expansion of slavery in the United States, "[I]n his short autobiography written for the 1860 presidential campaign, Lincoln would describe his protest in the Illinois legislature as one that 'briefly defined his position on the slavery question, and so far as it goes, it was then the same that it is now." This was in reference to the anti-expansion sentiments he had then expressed. Doris Kearns Goodwin, Team of Rivals: The Political Genius of Abraham Lincoln (2005) p. 91. Holzer pg. 232. Writing of the Cooper Union speech, Holzer notes, "Cooper Union proved a unique confluence of political culture, rhetorical opportunity, technological innovation, and human genius, and it brought Abraham Lincoln to the center stage of American politics at precisely the right time and place, and with precisely the right message: that slavery was wrong, and ought to be confined to the areas where it already existed, and placed on the 'course of ultimate extinction... .'" Lincoln won the Republican Party nomination in 1860 and was elected president later that year. During his term, he helped preserve the United States by leading the defeat of the secessionist Confederate States of America in the American Civil War. He introduced measures that resulted in the abolition of slavery, issuing his Emancipation Proclamation in 1863 and promoting the passage of the Thirteenth Amendment to the Constitution in 1865"""
question = "Who was the sixteenth President of the United States?"
result = question_answering(question=question, context=context)
print(result['answer'])
'Abraham Lincoln'
print(result['score'])
0.9959080219268799
The answers to the above two print statements should be as displayed “Abraham Lincoln” and 0.9959080219268799 respectively. The score is the confidence score with which the model outputs the answer to the asked question.
Shortcomings
One of the shortcomings of this approach is that, the model can only lift answers from the given context. If a question is asked whose answer can’t be directly found in the context the model won’t be able to perform well and will still respond by directly lifting text from the context. This needs further exploration and will be covered in future blog posts.