A pair of researchers from the University of Innsbruck in Austria have developed a method to determine how well an artificial intelligence (AI) system is at understanding ‘temporal validity,’ a benchmark that could have significant implications for the use generative AI products such as ChatGPT in the fintech sector.
Temporal validity refers to how relevant a given statement is to another statement over time. Essentially, it refers to the time-based value of paired statements.
An AI being evaluated on its ability to predict temporal validity would be given a set of statements and asked to choose the one most closely related through time.
In their recently published pre-print research paper titled “Temporal Validity Change Prediction,” Georg Wenzel and Adam Jatowt use the example of a statement wherein a person is declared to be reading a book on a bus. In the above example, the most valid context statement is “I’ve only got a few more pages left, then I’m done.” As the target statement indicates the bus rider is currently reading a book, the other two are irrelevant by comparison.
The researchers created a labelled dataset of training examples which they then used to build a benchmarking task for large language models (LLMs). They chose ChatGPT as a foundational model for testing due its popularity with end users and found it underperformed by significant margins compared to less generalized models.
The researchers also demonstrated that experimenting with temporal value change prediction during an LLM’s training cycle has the potential to lead to higher scores on the temporal-change benchmarking task.
Teaching these systems how to determine the most relevant statements across a corpus, with timeliness being a determining factor, could revolutionize the ability for AI models to make strong real-time predictions in massive-scale sectors such as the cryptocurrency and stock markets.