Evaluating Computation-intensive Large Language Model(LLM) and Data Mining Algorithms on Online Media Datasets.
Title: Evaluating Computation-intensive Large Language Model(LLM) and Data Mining Algorithms on Online Media Datasets.
DNr: Berzelius-2024-25
Project Type: LiU Berzelius
Principal Investigator: Tianyi Zhou <tzho@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2024-01-17 – 2024-08-01
Classification: 10201
Homepage: https://cordis.europa.eu/project/id/834862
Keywords:

Abstract

Data science is one of the fastest-growing areas in computer science and attracts a lot of research interest. This leads to the development of many algorithms which solve versions of related problems such as online media polarization. Opinion and stance detection is important topics in research communities of both natural language processing (NLP) and social computing. The goal of stance detection is to automatically predict the attitude (i.e., favor, against, or neutral) of opinionated tweets (text) with a specified target. Opinion and stance detection is an important problem in the research of online polarization. Traditional approaches for stance detection are limited in small labeled datasets. With the evolution of very large pre-trained language models (VLPLMs) like ChatGPT (GPT-3.5), it is promising to apply VLPLMs on stance detection tasks of large-scale online media datasets such as news streams. Our goal is to evaluate existing and newly developed algorithms and VLPLMs on a larger number of real-world datasets. We will create novel large-scale real-world datasets by scraping data from the Web or from social networks, such as Twitter or Reddit. We will follow the data minimization principle, i.e., we will only process and store the data that is strictly necessary for evaluating the algorithms.