GPT-4 New Feature: Predicting Paper Acceptance through Peer Review
If papers are submitted to GPT-4 for peer review, its evaluations are almost consistent with those of human reviewers.
In this context, some researchers from universities such as Renmin University of China and Zhejiang University have submitted over 10,000 papers to GPT-4, requesting it to provide answers on "the risk of retraction" based on these papers and comparing them with human predictions.
The results show that GPT-4 performs well in this regard.
Although recent reports on ChatGPT have raised concerns among scholars that the existence of AI models may lead to academic misconduct, there are also ways to maintain academic integrity. In order to enable ChatGPT to predict whether papers will be retracted, the research group first explored the question of "whether it has this ability itself."
According to evidence, many problematic papers will appear on social media platforms such as Twitter before being deleted.
To verify whether it has the ability to predict retractions, the research group collected 3,505 rejected papers, including 3,505 unretracted papers obtained through a rough matching algorithm. These features include journal publication, year of publication, number of authors, and quantity. Out of the 7,010 mentioned articles, tweet data were obtained through the Twitter API, including tweet publication time and content. After screening, it was found that there were 8,367 withdrawn English tweets, while there were 6,180 unretracted English tweets.
Human prediction (researchers predicting whether a paper will be retracted based on tweets) is an important research indicator that can be used to measure the degree of match between models and human algorithms. According to human predictions, if humans perceive problems in the articles, the probability of retraction is as high as 93%, indicating that certain tweets can accurately predict the likelihood of paper retractions.
However, the proportion of articles predicted to be retracted by humans is not high, accounting for only about 16% (recall rate = 16%). Therefore, although only a small proportion of tweets exhibit obvious signs of problems before the retraction of articles, these signs do exist. In this case, critical tweets can serve as catalysts for the retraction of papers, highlighting the value of incorporating them into early research integrity warning systems.
Researchers found that comments that effectively predict retracted articles can be divided into two categories:
The first category directly emphasizes errors or academic misconduct in the articles;
The second category highlights doubts about the quality of the articles in a critical or ironic manner.
This information helps journals investigate articles, and once serious issues are discovered, articles may be retracted. In this context, critical comments can promote the retraction of articles, highlighting the importance of incorporating them into early warning systems for research integrity.