Victor Johnson
← Back to projects

SentimentAnalyser

June, 2018

SentimentAnalyser
Technologies used: Python • scikit-learn • NumPy • Pandas • Matplotlib • Jupyter
This project might be outdated as the repository is no longer maintained.

Python Jupyter Notebook scikit-learn Pandas NumPy Matplotlib

This project implements a machine learning–based sentiment analyzer that classifies financial news text into positive, negative, or neutral categories using NLP and linear models.


The project focuses on applying natural language processing techniques to financial text, using a cleaned and structured dataset derived from the research study “Good Debt or Bad Debt? Detecting Semantic Orientations in Economic Texts.” It demonstrates the complete workflow of a real-world NLP task, from loading and auditing data to preparing text for modeling and evaluating results.

The notebook walks through essential steps such as text cleaning, exploratory data analysis, word cloud visualization, and feature extraction using Bag-of-Words and TF-IDF with both word-level and character-level n-grams. Multiple baseline and linear models are trained, with attention given to class imbalance, hyperparameter tuning, and rigorous evaluation through error analysis and interpretability techniques.

Finally, the project emphasizes practical usability by saving trained models and allowing users to test sentiment predictions on new, unseen financial headlines. The structured project layout and reproducible workflow make it suitable both as a learning resource and as a foundation for more advanced financial NLP applications.