With its human-like talking abilities, ChatGPT erupted onto the scene in the latter part of last year. The publication of its most recent version sparked a crypto rally and calls for a development halt. However, a recent study suggests that the skills of the top AI bot may potentially be deteriorating.
Extensive Examinations
Different ChatGPT versions from March and June 2022 were extensively examined by researchers at Stanford and UC Berkeley. They created exact standards to assess the model’s aptitude for mathematical, coding, and visual thinking tasks. The outcomes of the performance of ChatGPT over time were not favourable.
In the testing, a surprising decline in performance between versions was found. In March, ChatGPT correctly answered 488 out of 500 questions on a math task involving the identification of prime numbers, for an accuracy of 97.6%. However, ChatGPT’s accuracy fell to 2.4% in June after only answering 12 of the questions correctly.
The software coding skills of the chatbot experienced a particularly sharp drop. According to the study, “for GPT-4, the proportion of generations that are directly executable decreased from 52.0% in March to 10.0% in June.” These results were attained without the use of any code interpreter plugins thanks to the use of the models’ “pure” versions.
Getting worse over time?
The researchers used visual cues from the Abstract Reasoning Corpus (ARC) dataset to evaluate reasoning. Even here, albeit not as sharp, a deterioration could be seen. According to the study, “GPT-4 in June made errors on queries on which it was correct for in March.” What can account for ChatGPT’s apparent decline after only a few months? It could be a result of optimisations being made by OpenAI, the company that created it, according to researchers.
Changes made to stop ChatGPT from responding to risky queries may be one of the causes. The effectiveness of ChatGPT for other jobs may be compromised by this safety alignment, though. The researchers discovered that the model now prefers to provide verbose, indirect responses rather than straightforward ones.
According to Santiago Valderrama, an AI expert, “GPT-4 is getting worse over time, not better,” on Twitter. Valderrama also suggested that the original ChatGPT architecture might have been replaced with a “cheaper and faster” combination of models. He proposed a theory that could speed up user answers but lower competency: “Rumours suggest they are using several smaller and specialised GPT-4 models that act similarly to a large model but are less expensive to run.”