Decoding Language in the Digital Age: A Model of Computational Discourse Analysis

Zahra Roozafzai

Cite: Roozafzai, Z. Decoding Language in the Digital Age: A Model of Computational Discourse Analysis. JDS, 7(1), 35-53, (2025). https://doi.org/10.33847/2686-8296.7.1_4

Abstract. This research examines the application of computational methods to discourse analysis in the digital age. As language adapts to new technological contexts, the need for automated, data-driven approaches to understanding language in use grows increasingly evident. The study investigates various computational techniques employed in discourse analysis, including natural language processing, machine learning, and text mining, utilizing a diverse range of textual data from social media interactions, online forums, and news articles. It explores the efficacy of these methods in uncovering patterns, structures, and meanings within the data corpus. Additionally, the research addresses the challenges and limitations of these techniques, and evaluates their potential to enhance our understanding of language and communication in an ever-evolving digital landscape. By contributing to the ongoing discourse on the role of technology in discourse analysis, this study aims to inform linguistic and social research, highlighting the importance of data-driven approaches in unraveling the complexities of language use in the digital era.
Keywords: Computational discourse analysis, Natural language processing, Machine learning, Text mining, Digital communication. 

References

Aggarwal, C., & Zhai, C. (Eds.). (2012). Mining text data. Springer Science & Business Media. https://link.springer.com/book/10.1007/978-1-4614-3223-4

Allen, L.K., Creer, S.D., & Poulos, M.C. (2021). Natural language processing as a technique for conducting text‐based research. Language and Linguistics Compasshttps://doi.org/10.1111/lnc3.12433

Al-Khatib, K., Hardaker, C., & Stewart, D. (2016). Argumentation mining for political discourse analysis. Proceedings of the 27th International Conference on Computational Linguistics, 1-11. 

Althoff, T., Clark, K., & Leskovec, J. (2016). Natural Language Processing for Mental Health: Large Scale Discourse Analysis of Counseling Conversations. ArXiv, abs/1605.04462.

Anandarajan, M., Hill, C., & Nolan, T. (2018). Text Preprocessing. Practical Text Analyticshttps://doi.org/10.1007/978-3-319-95663-3_4

Baldridge, J., Asher, N., & Hunter, J. (2007). Annotation for and Robust Parsing of Discourse Structure on Unrestricted Texts. https://doi.org/10.1515/ZFS.2007.018

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Jouppi, N. P., Simonyan, K., Schaul, T., Odena, A., & Ng, A. Y. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165https://arxiv.org/abs/2005.14165

Bruni, E., Uijlings, J.R., Baroni, M., & Sebe, N. (2012). Distributional semantics with eyes: using image analysis to improve computational representations of word meaning. Proceedings of the 20th ACM international conference on Multimedia. https://doi.org/10.1145/2393347.2396422

Brunova, E.G., Bidulya, Y.V., & Gorbunov, A.A. (2021). ASPECT-BASED SENTIMENT ANALYSIS OF POLITICAL DISCOURSE. Tyumen State University Herald. Humanities Research. Humanitateshttps://doi.org/10.21684/2411-197x-2021-7-3-6-22

Caillet, M., Pessiot, J., Amini, M., & Gallinari, P. (2004). Unsupervised Learning with Term Clustering for Thematic Segmentation of Texts. RIAO Conferencehttps://doi.org/10.5555/2816272.2816331

Chuang, W. T., & Yang, J. (2000). Extracting sentence segments for text summarization: A machine learning approach. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 152-159.

Chen, D. L., & Manning, C. D. (2014). A fast and accurate dependency parser using neural networks. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740-750). 10.3115/v1/D14-1082 .  https://aclanthology.org/D14-1082

Clark, J., & González-Brenes, J.P. (2008). Coreference Resolution : Current Trends and Future Directions.

Dascalu, M. (2014). Computational Discourse Analysis. https://doi.org/10.18653/v1/P19-4003

D’Avolio, L.W., Nguyen, T., Goryachev, S., & Fiore, L.D. (2011). Automated concept-level information extraction to reduce the need for custom software and rules development. Journal of the American Medical Informatics Association : JAMIA, 18 5, 607-13 . https://doi.org/10.1136/amiajnl-2011-000183

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (pp. 4171-4186).

Eisenstein, J. (2019). Introduction to natural language processing. MIT Press. https://doi.org/10.7551/mitpress/11802.001.0001

Erdmann, A., Wrisley, D.J., Allen, B., Brown, C., Cohen-Bodénès, S., Elsner, M., Feng, Y., Joseph, B., Joyeux-Prunel, B., & Marneffe, M.D. (2019). Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities. North American Chapter of the Association for Computational Linguisticshttps://doi.org/10.18653/v1/N19-1231

Farzindar, A., & Inkpen, D. (2015). Natural Language Processing for Social Media: Second Edition. https://doi.org/10.2200/s00809ed2v01y201710hlt038

Fong, A., & Ratwani, R.R. (2015). An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling. Methods of Information in Medicine, 54, 338 – 345. https://doi.org/10.3414/ME15-01-0010

Glavas, G., Nanni, F., & Ponzetto, S.P. (2019). Computational Analysis of Political Texts: Bridging Research Efforts Across Communities. Annual Meeting of the Association for Computational Linguisticshttps://doi.org/10.18653/V1/P19-4004

Han, J., Zhang, Z., & Schuller, B. (2019). Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives [Review Article]. IEEE Computational Intelligence Magazine, 14, 68-81. https://doi.org/10.1109/MCI.2019.2901088

Hearst, M. A. (2003). What is text mining? [Unpublished manuscript]. University of California, Berkeley.

Hochstenbach, R., Frasincar, F., & Truşcǎ, M.M. (2021). Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis. WISEhttps://doi.org/10.1007/978-3-030-91560-5_21

Joty, S.R., Carenini, G., Ng, R.T., & Murray, G. (2019). Discourse Analysis and Its Applications. Annual Meeting of the Association for Computational Linguisticshttps://doi.org/10.18653/v1/P19-4003

Jurafsky, D., & Martin, J. H. (2020). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (3rd ed.). Prentice Hall.

Karimi, A., Rossi, L., Prati, A., & Full, K. (2020). Adversarial Training for Aspect-Based Sentiment Analysis with BERT. 2020 25th International Conference on Pattern Recognition (ICPR), 8797-8803. https://doi.org/10.1109/ICPR48806.2021.9412167

Kozhevnikov, V., & Pankratova, E.S. (2020). RESEARCH OF TEXT PRE-PROCESSING METHODS FOR PREPARING DATA IN RUSSIAN FOR MACHINE LEARNING. Theoretical & Applied Science.

https://doi.org/10.15863/tas.2020.04.84.55

Kübler, S., McDonald, R., & Nivre, J. (2009). Dependency parsing. Synthesis Lectures on Human Language Technologies, 2(1), 1-127. https://doi.org/10.2200/S00220ED1V01Y200903HLT008

Lan, M., Wang, J., Wu, Y., Niu, Z., & Wang, H. (2017). Multi-task Attention-based Neural Networks for Implicit Discourse Relationship Representation and Identification. Conference on Empirical Methods in Natural Language Processinghttps://doi.org/10.18653/v1/D17-1134

Menini, S., Cabrio, E., Tonelli, S., & Villata, S. (2018). Never Retreat, Never Retract: Argumentation Analysis for Political Speeches. AAAI Conference on Artificial Intelligencehttps://doi.org/10.1609/aaai.v32i1.11920

Nanni, F., Zhao, Y., Ponzetto, S.P., & Dietz, L. (2017). Enhancing Domain-Specific Entity Linking in DH. Digital Humanities Conference.

Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. Proceedings of the 2nd International Conference on Knowledge Capture (K-CAP) (pp. 70-77). https://doi.org/10.1145/969568.969616

Navarro, D.F., Ijaz, K., Rezazadegan, D., Rahimi-Ardabili, H., Dras, M., Coiera, E.W., & Berkovsky, S. (2023). Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. International journal of medical informatics, 177, 105122 . https://doi.org/10.1016/j.ijmedinf.2023.105122

Ozsoy, M. G., Alpaslan, F. N., & Cicekli, I. (2011). Text summarization using Latent Semantic Analysis. Journal of Information Science, 37(4), 405-417. https://doi.org/10.1177/0165551511408848

Prange, J., Schneider, N., & Srikumar, V. (2021). Overview of AMALGUM – Large Silver Quality Annotations across English Genres. SCILhttps://doi.org/10.7275/EP47-3T54

Qazvinian, V., & Radev, D.R. (2012). A Computational Analysis of Collective Discourse. ArXiv, abs/1204.3498.

Rink, B., Harabagiu, S.M., & Roberts, K. (2011). Automatic extraction of relations between medical concepts in clinical texts. Journal of the American Medical Informatics Association : JAMIA, 18 5, 594-600 . https://doi.org/10.1136/amiajnl-2011-000153

Roozafzai, Z.S. (2023). Unveiling Power and Ideologies in the Age of Algorithms: Exploring the Intersection of Critical Discourse Analysis and Artificial Intelligence. 04.04.2024. Qeios

https://doi.org/10.32388/60YE02

Spangher, A., May, J., Shiang, S., & Deng, L. (2021). Multitask Semi-Supervised Learning for Class-Imbalanced Discourse Classification. Conference on Empirical Methods in Natural Language Processinghttps://doi.org/10.18653/v1/2021.emnlp-main.40

Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71-106. https://doi.org/10.1162/0891201053630264

Pang, K., Zhang, L., Su, Y., & Tan, C. (2020). Text Summarization with Recursive Neural Networks: An Empirical Study. IEEE Access, 8, 184687-184701. https://doi.org/10.1109/ACCESS.2020.3039083

Patil, A.V. (2024). Identifying specific details from text to populate databases and generate summaries using Named Entity Recognition. INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENThttps://doi.org/10.55041/ijsrem33111

Sandu, A., Cotfas, L., Stănescu, A., & Delcea, C. (2024). A Bibliometric Analysis of Text Mining: Exploring the Use of Natural Language Processing in Social Media Research. Applied Scienceshttps://doi.org/10.3390/app14083144

Sang, E. F. T., & Meulder, F. D. (2003). Introduction to the conll-2003 shared task: Language-independent named entity recognition. Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 (CONLL) (pp. 142-147).

Settles, B. (2012). Biomedical named entity recognition using conditional random fields and rich feature sets. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 732-742. https://doi.org/10.3115/v1/D12-1064

Snyder, R.M. (2015). An Introduction to Topic Modeling as an Unsupervised Machine Learning Way to Organize Text Information.

Sporleder, C., & Lascarides, A. (2004). Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure. International Conference on Computational Linguisticshttps://doi.org/10.3115/1220355.1220362

Stab, C., & Gurevych, I. (2017). Annotating argument components and relations in persuasive essays. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL) (pp. 1108-1116).

Subba, R., & Di Eugenio, B.M. (2007). Automatic Discourse Segmentation using Neural Networks.

Tofiloski, M., Brooke, J., & Taboada, M. (2009). A Syntactic and Lexical-Based Discourse Segmenter. Annual Meeting of the Association for Computational Linguisticshttps://doi.org/10.3115/1667583.1667609

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Proceedings of the 31st Conference on Neural Information Processing Systems, 6000-6010. https://doi.org/10.5555/3292670.3292730

Yin, S., Fu, C., Zhao, S., Li, K., Sun, X., Xu, T., & Chen, E. (2023). A survey on multimodal large language modelsIEEE Transactions on Pattern Analysis and Machine Intelligencehttps://arxiv.org/pdf/2306.13549

Zaeri, P., & Roozafzai, Z. S. (2024). Technology-Enhanced Art and Sustainable Discourse Practices. International Journal of Culture and Art Studies8(2), 109-120. https://doi.org/10.32734/ijcas.v8i2.16654

Zaeri, P., & Roozafzai, Z. S. (2024 b). Visual arts as a catalyst for social change: Communicating powerful messages. International Journal of Arts and Humanities6(1), 268-274. https://doi.org/10.25082/IJAH.2025.01.001Zhou, L., & Hovy, E. (2016). On the summarization of dynamically introduced and evolving topics in social media. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1405-1415. https://doi.org/10.18653/v1/P16-1144

Published online 25.06.2025