Sentiment analysis for Social feed comments (Machine Learning – Data Analytics using Scikit – learn – ML Python library)
Requirement: A Client in Digital marketing analytics space needed to automate their manual sentiment analysis process
Our Solution: The client is a digital marketing analysis company that helps its client figure out how their brands are doing on social media. It also analysis the various campaigns run by the end clients to study the social media impact and trends.
We developed a Cloud hosted Sentiment analysis platform to analyse Social feed received from different social channels (Facebook, Twitter, LinkedIn, YouTube) primarily comments and tweets.
The first big challenge was to accommodate feeds received in a variety of file formats. We created a data ingestion layer that encapsulated the process of converting outside data formats into application understandable format. We designed a UI to allow for the admin to define incoming file format and map it to the internal data structure, allowing them to handle as many incoming formats as they need. This had the added advantage of making the system social platform agnostic and future proof.
Our team work on data cleansing and noise removal by applying pre-processing techniques such as punctuation removal, stemming etc using NLTK (Natural language toolkit) and other custom pre-processing algorithms to cleanse the data. The pre-processing of the data is an essential step as it makes the raw text ready for mining, i.e., it eases the process of information extraction from the text and apply ML algorithms to it.
We used Term Frequency-Inverse Document Frequency (TF-IDF) method to transform the pre-processed data into machine processable data set. Post this, we used the following two strategies to perform the sentiment analysis:
- A proprietary lexicon-based approach, as a first step, to automate the as-is processing methodology of the client.
- A Machine Learning algorithm-based approach to parse the comments/tweets and classify them as “Positive”, “Negative” or “Neutral” by using Natural Language Processing and ML Analysis.
We used million+ records to train the ML Model.
To enable system management, we provided the Admin the ability to manage users, manage clients and their brands, manage cluster / categories and sentiment tags.
We also provided a report component. This allowed the user to create and download sentiment analysis detail and summary reports to share with the end clients.
Technology Stack
Frontend: HTML, CSS, JavaScript, jQuery, Bootstrap
Backend: Python 3.7, Django
Machine Learning: Logistics Regression, scikit-learn ML in Python
Database: AWS RDS – PostgreSQL
Server: AWS EC2 – NIGNX