A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets