AI Helps Uncover Russian State-Sponsored Disinformation in Hungary
Arms deliveries. EU sanctions. Ethnic minorities. These were the three topics Hungarian media reported on most frequently between fall 2021 and spring 2022, according to two researchers who analyzed thousands of articles published by Hungarian media. Benjamin Novak, a doctoral student at Johns Hopkins University and formerly a reporter for The New York Times in Hungary until 2022, and Martin Wendiggensen, a political scientist and also a doctoral student at Johns Hopkins University, worked together to explore whether Hungarian media narratives matched those of Russian propaganda publications — and found that largely to be the case.
National sentiment shifted and messages supporting Russian objectives appeared in Hungary in mid-September 2021, months before Russian troops actually invaded Ukraine.
“We can only speculate about the motivation of the Hungarian media to increasingly regurgitate Russian propaganda from that point on,” says Wendiggensen, who presented the results of the investigation at the recent LabsCon security conference.
What is certain, he says, is that from fall 2021 onward, not only did the number of articles covering the three subject areas increase rapidly, but the topics from that point on always followed the same narrative patterns: arms supplies are bad because they prolong wars, Ukraine treats ethnic minorities badly, and European Union sanctions are bad for the Hungarian economy.
Training the ML Model
Novak’s research relied on manually analyzing the articles, while Wendiggensen trained a machine learning (ML) model to analyze the corpus of articles. What is striking about their research is that man and machine arrived at the same result without prior consultation, suggesting that ML can be a reliable method for identifying disinformation campaigns.
Wendiggensen taught the machine to capture the frequency of whole sets of topics — not just individual words — and analyze them to determine the nation’s tone. His application used code blocks provided by colleague and ML specialist Kohei Watanabe. In the first step, the software independently captured, without human intervention, all of the press articles that had previously been downloaded and broken down into components, such as headline, date, and body text. The application then associated each of the 26 million words collected with a geometric, multidimensional vector. Relationships among the terms were established based on the angles at which the vectors were positioned and the distances between the vectors, Wendiggensen says.
To increase the precision of the relationships, this space is not limited to the usual three dimensions. Instead, the software tracks the vectors through hundreds of dimensions.
“Thus, after a while, the model recognizes that, for example, ‘sanctions’ and ‘Brussels’ and ‘negative’ are closely related,” Wendiggensen explains. “By calculating the relationship vectors, we can apply mathematics to words.”
By the conclusion of this phase, ML model identified the same top three topics as the ones Novak had found.
“The goal in working out the machine learning model was to make similarities mathematically expressible and thus statistically reliable,” Wendiggensen says.
Putin Good, EU Bad
In the second phase of his research, Wendiggensen gave the software opposing words, such as “good” and “bad” or “evil” and “benign.” Based on this human-introduced, scored-target dimension, the ML model assigned a score to each article. The ML model did not look at individual words to calculate the score; rather, it worked with sentences to establish relationships among them. The model keeps the statements of the individual sentences as meta-information, so even thoughts spanning several sentences could be captured and scored in their entirety.
The tipping point for pro-Russian coverage arrived in mid-September 2021, Wendiggensen says. The software takes just 15 minutes to evaluate polarity, allowing researchers to keep checking on the media landscape.
“Even today, the three topics are still dominant,” Wendiggensen says. “No other topic discussed in Hungarian media accounts for more than 15% of all articles on Ukraine.”
One of the reasons why pro-Russian messages were able to get so entrenched is because Hungary lacks media pluralism, meaning the ability to get different viewpoints from different media outlets. The current government directly and indirectly controls all reporting — the state-owned media holding company MTVA controls all public broadcasting stations, for example. Government-friendly companies own regional press outlets and a central holding organization coordinates all of the 500 or so pro-government media companies.
Up Next: Videos, Long-Term Monitoring
While the narratives on arms supply and ethnic minorities largely correspond to the Russian propaganda, the Hungarian media added a bit of local color to the topic of sanctions. Possible and actual sanctions against Russia were used to explain away the poor state of the Hungarian economy.
In the next step, the researchers also want to process videos published by Hungarian TV stations. They already have a good 8,000 hours of moving images, with narration scripts transcribed by software. This increased the word collection by 60 million.
Subsequently, Novak and Wendiggensen want to turn to transnational reporting on pan-European right-wing networks. This will not remain a mere media analysis. Rather, anti-European narratives disseminated by political representatives will also be considered.
“Our ultimate goal is to create a dataset that other researchers can analyze at will,” Wendiggensen says.
Based on such a structured collection of published words and phrases, it would be possible, for example, to track how messages change over time. Or answers could be found to questions, such as, “Do media lose their liberality when the domestic economy is doing worse?”
“We want to make theoretical relationships measurable,” Wendiggensen says.