Text Mining Project (SS-2020), Uni Passau

Topic: Automating the Creation of Bias Lexica

Our objective is to automate the process of creating bias lexica, and using this automatically generated bias lexica to classify biased statements using a supervised learning algorithm. In our project, we use word2vec word embedding method to extract biased words from the dataset, then this extracted biased words are used as seeds to extract more biased words to effectively generate a comprehensive lexicon of biased words.

How to execute the project?

Before running any of the scripts, following are some of the important third-party libraries needed to be installed beforehand:

pip3 install nltk pandas newspaper3k
pip3 install numpy
pip3 install gensim sklearn

Once the above mentioned packages are installed, run the scripts in the following sequence:

code/scrapping/scrapping.py
code/preprocessing/preprocessing.py
code/generate_embeddings/generate_embeddings.py
code/bias_words_generation/generating_bias_words.py
code/evaluation/evaluation.py