PILOT EXPERIMENT ON AUTOMATIC DETECTION OF VIETNAMESE MEDIA NARRATIVES ABOUT THE RUSSIA-UKRAINE WAR UNDER LIMITED RESOURCES

Authors

  • Viktoriia Musiichuk A. Krymskyi Institute of Oriental Studies National Academy of Sciences of Ukraine, Ukraine

DOI:

https://doi.org/10.24025/2707-0573.12.2025.345012

Abstract

Background. Narratives in media texts play a crucial role in shaping public opinion on international conflicts, including the Russia–Ukraine war. Systematic investigation of how Vietnamese media narratives are constructed is relevant both from a linguistic perspective and for developing effective strategies of international communication. Traditional narrative analysis is inefficient for large corpora due to its resource intensity, especially for low-resource languages such as Vietnamese (lack of annotated datasets, complex morpheme tokenization, limited access to multi-layered data). Existing studies are largely confined to qualitative discourse analysis of social media and do not employ scalable NLP-based automation.

Purpose. The aim of the article is to develop and test a hybrid methodology for the automated extraction of narratives from Vietnamese media texts under limited computational resources, combining classical narratology with digital humanities methods (NLP, clustering) in order to identify event-centric narrative axes (events, characters, frames) and provide their interpretation.​

Methods. The study presents a pilot experiment on a corpus of 160 news items from Báo tin tức, collected via ParseHub and tokenized with Underthesea. An abductive approach (Burch, 2024) was implemented along two complementary strands: (1) an inductive strand using KeyBERT+PhoBERT/SimCSE-Vietnamese (text embeddings, keyphrase extraction) and GPT-4 (grouping into events/characters/themes); (2) a deductive strand using K-means/HDBSCAN+PhoBERT/SimCSE-Vietnamese (embeddings, clustering) and GPT-4 (cluster interpretation). All automatic outputs were subjected to manual verification.

Results. In the first strand, KeyBERT was used to extract keyphrases, which were subsequently mapped onto narrative labels (events, characters), aggregated into thematic groups and narrative frames with the assistance of GPT-4. In the second strand, parallel clustering was performed with K-means and HDBSCAN. The resulting clusters were interpreted and associated with narrative frames and core lexical items. Comparison of the two-vector approach revealed convergence in the extracted semantic axes and narrative frames. In both strands, Vietnamese news were found to prioritise coverage of the war’s impact on local and global economies, politics, and the humanitarian sphere over detailed analysis of military operations.

Discussion. The study demonstrates the effectiveness of the proposed methodology for automated detection of Vietnamese media narratives about the Russia–Ukraine war under limited resources. The abductive design proved methodologically valid, as both strands produced consistent and mutually reinforcing results. The workflow is robust in resource-limited environments such as Google Colab and is scalable to larger corpora. Future research may include systematic comparison of different corpora, integration of NER for character extraction, and dynamic narrative tracking using LSTM-based models, thereby contributing to the analysis of propagandistic and geopolitical discourses.

Author Biography

Viktoriia Musiichuk, A. Krymskyi Institute of Oriental Studies National Academy of Sciences of Ukraine

Ph.D. Philology, Senior researcher, A. Krymskyi Institute of Oriental Studies National Academy of Sciences of Ukraine, Head of Asia-Pacific Department

Published

2025-12-31

Issue

Section

Статті