项目作者: harshildarji

项目描述 :
Text Mining Project, SS-2020
高级语言: Python
项目地址: git://github.com/harshildarji/TMP-SS-2020.git
创建时间: 2020-05-31T21:42:36Z
项目社区:https://github.com/harshildarji/TMP-SS-2020

开源协议:

下载


Text Mining Project (SS-2020), Uni Passau

Topic: Automating the Creation of Bias Lexica

Our objective is to automate the process of creating bias lexica, and using this automatically generated bias lexica to classify biased statements using a supervised learning algorithm. In our project, we use word2vec word embedding method to extract biased words from the dataset, then this extracted biased words are used as seeds to extract more biased words to effectively generate a comprehensive lexicon of biased words.

How to execute the project?

Before running any of the scripts, following are some of the important third-party libraries needed to be installed beforehand:

  1. pip3 install nltk pandas newspaper3k
  2. pip3 install numpy
  3. pip3 install gensim sklearn

Once the above mentioned packages are installed, run the scripts in the following sequence:

  1. code/scrapping/scrapping.py
  2. code/preprocessing/preprocessing.py
  3. code/generate_embeddings/generate_embeddings.py
  4. code/bias_words_generation/generating_bias_words.py
  5. code/evaluation/evaluation.py