Georeactor Blog

RSS Feed

Continuous Q&A

Tags: mlreliveqawhichwiki

In summer of 2022 a continuous news Q&A benchmark was launched with a paper, RealtimeQA. This is perhaps the most realized of a few benchmarks which continuously challenge LLMs with new information.

With the help of news quizzes from CNN and TheWeek, the authors kept evaluating GPT-3 until the end of 2023, and published their last questions in January 2024. In March and April people asked for updates in the GitHub Issues.

The challenge of making an effective benchmark here is that a human needs to consistently be involved to write questions. A question should involve reading comprehension and new information in the article - for a breaking news story (e.g. "President Biden legalizes Kinder eggs") we don't want the questions to be parsers "who legalized Kinder eggs?" or general knowledge "who is President of the US?" The ideal question is something you wouldn't know a week or month before, like "where was the announcement made?" or "were all types of Kinder eggs made legal?". I've wanted something like this that's more exciting, like news articles → updating knowledge graph, maybe starting with TV episode reviews updating a mini world database, but keep it focused.

I decided to upload the questions in a new dataset named Relive-QA and post a link in the GitHub Issue if people want to collaborate. A demo script will scrape articles from WikiNews English, and use the new GPT-4o to write multiple-choice questions about them. This raises a few issues: