Clickbaits are made to lure you in. They are so attractive and yet usually misleading. This kind of headlines has been extensively exploited to the point that bots are being used to generate headlines.
From the standpoint of readers who want to get information or moderators who want to uphold the integrity of their site’s content, spotting these bot-generated headlines will help a lot. Clickbaits can become wildfires of misinformation left unchecked.
With this, a collaboration study from Penn State University and Arizona State University developed a human-aided AI system to improve the discernment between bot-generated and human written headlines.
An AI-human collaboration
The researchers of the study pointed out one challenge in automatedly combatting clickbait — the lack of high-quality labeled samples to train AI system.
To remedy this, the researchers commissioned human entities such as crowdworkers and journalism students to make clickbaits. At the same time, they used deep generative models to make clickbaits from scratch.
Pooling these together, we have what they defined as synthetic clickbaits.
The results? A spike in the area-under-the-curve (AUC) values of the models by as high as 14.5%. Higher AUC values mean that the model is better at classifying the data on-hand.
This improved performance allowed it to outperform the some of the excellently performing clickbait detection algorithms we have right now.
Insufficient data in supervised learning in one of the problems faced in AI and machine learning (ML) systems today in the context of clickbait detection.
With this study showing that synthetically-generated text can help to supplement existing training samples for ML systems, this kind of data sets could be an interesting domain to further explore in other studies.
Through more developments like this, content moderation will be further improved. Human moderators can then focus on combating intentional information with a huge number of bot-generated ones filtered out.