GPTalyze: open-source Github repository for analyzing short text snippets with ChatGPT

The lab recently released the GPTalyze open-source Github repository, written by lab members Michael Khavkin and Danielle Movsowitz Davidow. The library utilizes ChatGPT’s API to analyze short textual snippets, such as tweets, employing ChatGPT’s zero-shot-like abilities to summarize the discussed topics in a textual corpus and perform other Natural Language Processing (NLP) tasks, such as sentiment analysis and emotion detection.

ChatGPT has recently emerged as a powerful Large Language Model (LLM), enabling unprecedented and innovative public interaction with generative language AI. These capabilities were not overlooked by the research community, who started leveraging ChatGPT for data analysis of various data sources, including textual unstructured data from social networks, such as Twitter.

We demonstrate two very useful use cases: sentiment analysis and topic extraction. We evaluate the interaction with ChatGPT on a publicly available Twitter dataset containing tweets about the COVID-19 pandemic. More information is available on the dataset’s page at Kaggle (filename is “Corona_NLP_train.csv”).

Sentiment analysis: We asked ChatGPT to classify each tweet in a sample of 100 tweets discussing COVID-19 to either negative, neutral, or positive sentiment. Using the following confusion matrix, we can deduce that ChatGPT was only moderate in its accuracy, classifying only ~60% of the tweets correctly.

Topic extraction: We asked ChatGPT to extract the three main topics (and their sub-topics) discussed in a batch of 50 random tweets about COVID-19. This task would have been challenging to a human, but it was performed in seconds by ChatGPT with impressive results (we eyeballed the tweets manually to evaluate the received topics, and they corresponded to the returned topics).

Search this site

Tools

GPTalyze: open-source Github repository for analyzing short text snippets with ChatGPT