A Study of the Indian BFSI Sector Based on Classification, Text Mining & Sentiment Analysis of Customer Feedback Using Python – By ICSS Student – Pijush Mandal
Category : Uncategorized
Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral. It’s also known as opinion mining, deriving the opinion or attitude of a speaker. A common use case for this technology is to discover how people feel about a particular topic.
With the recent advances in deep learning, the ability of algorithms to analyse text has improved considerably. Creative use of advanced artificial intelligence techniques can be an effective tool for doing in-depth research.
These basic concepts when used in combination, become a very important tool for analyzing millions of brand conversations with human level accuracy.
A Study of the Indian BFSI Sector Based on Classification, Text Mining & Sentiment Analysis of Customer Feedback Using Python
In the era of social media, use of social networking data to study customers’ attitude towards an organization, services or events has become an increasingly dominant trend in business strategic management research. Sentiment analysis, which is also called opinion mining, is a field of study that aims at extracting opinion and sentiment from natural language processing using computational methods. With the growth of Internet, numerous business websites have been deployed to allow online review and commenting the services in forms of either business forums or social networks. Mining opinion automatically using the reviews from such online platforms is not only useful for customers to seek for advice but also necessary for business to understand their customers and to improve their services. This paper presets the design and implementation of a system to group, summarize and analyze sentiment of various customer feedbacks. Our framework solves the problem of feedback overload, congestion, and difficulties in prioritizing valuable feedback for an organization; here we perform text mining, sentiment analysis and classification on our dataset from various websites. Virtual accuracy is achieved which shows the efficiency and reliability of the project for future implementation.
Understanding what customers think about business products or services has always been one of the most important issues in business strategic management, particularly in business decision-making program. The beliefs or perceptions of reality and the choices one makes somehow conditioned upon the way the others act. This is not true only for individuals but also for organizations. While consumers’ hunger for and rely on online advice or recommendations of products and services, business demand for utilities that can transform customers’ thoughts and conversations into customer insights or those for social media monitoring, reputation management and voice of customer programs. Traditionally individuals usually ask for opinion from friends and family members, while business rely on the surveys, focus groups, opinion polls, feedback collector and consultants. In the modern age of Big data, while millions of consumer reviews and discussions flood the internet daily basis, while individuals feel overwhelmed with information, it is as well impossible for business to keep that up manually. Thus there is a clear need of computational methods for automatically analyzing feedback.
In this paper we propose an effective method for managing feedback information, reducing overloads by method of grouping based on users’ activities, analyzing sentiment and providing summarization of the feedback. Our technique allows classifier and summarizer to extract information from feedback message and build a model from extraction of most frequent and common word in the message in ordered to group message into activities. Several approaches therefore have been proposed for the classification and sentiment analysis.
Impact of Sentiment Analysis
An organization has to have a complete understanding of their customer’s opinion and needs on their products or services they offer, but they face the challenge of dealing of unstructured text form sources of customer’s opinions and needs. Consumer’s products and services sentiments are now not only just a source of customers’ reviews and references but a source for customer services, business intelligence, and product brand reputation management.
Some of challenges and needs make organizations want to answer fundamental problems in the voice of the customer are:
- Are the customers satisfied with services, product and support?
- What do the customer like?
- What customer thinks of products and services offered by competitors?
- What influences the market and how opinions propagate?
These challenges include handling noise and linking with structured data. Business intelligence involves the use of technologies and methodologies for the collection, integration and analysis of the opinion as well as sentiment relevant in formation in a business for the purpose of better decision making in business.
As far the benefits form application of sentiment analysis in various contemporary company is concerned it is worth noticing application of a company or a brand with the analysis of reviews of customer product and services, provision of analytical perspectives financial investor who ant to discover and respond the market opinion its application in politics where marketing campaigns are interested in tacking sentiment expressed by voters associated with the candidates.
Like ways sentiment analysis can be used multiple areas in business like economics, finance and marketing. In economics allows responding to the question of how supervised learning methods can be used to learn the association between polarity of financial news and key financial indicator. For marketing domain, by judging the sentiment of the consumer it is very easy to place a share of heart of a new product on consumers mind.
Previously one of the most common existing methods to manually archive feedback into various folder with a view of reducing the number of information objects a user must process at any given time. But this is an insufficient solution as a folder names are not necessarily a true reflection of their content and their creation and maintenance can impose a significant burden on the user.
There are several examples of feedback analysis tool available such as:
- Feedier: It collects actionable feedback, Engage and value to organizations’ customers.
- Receptive: It easily collect, measure, and understand feedback form customers, internal team, and prospects. It is a specialist product for B2B and SAAS organization.
- Zonka Feedback: A comprehensive Feedback Management system with customizable surveys, instant alerts, real time report and more.
- Informizely: it quickly gathers customer insights with in-site surveys and polls.
- ai: It makes customers feedback analysis very easy.
Previously methods for sentiment analysis are mostly based on manually defined rules. With recent development of deep learning techniques, neural network based approaches becomes the mainstream. On the basis many researchers apply linguistic knowledge for better performance in sentiment analysis.
- Traditional Sentiment analysis: Many methods for sentiment analysis focus on feature engineering. The carefully designed features are then fed to machine learning methods in a supervised learning setting. Performance of sentiment classification therefore heavily depends on the choice of feature representation of text. In terms of features different kinds of representations have been used in sentiment analysis, including bag of words representation, word co-occurrences, and syntactic contexts. Despite its effectiveness feature engineering is labor intensive and is unable to extract and organize the discriminative information from data.
- Sentiment Analysis by Neural Network: The proposal of a simple and effective approach to learn distributive representation of word and phrase, neural network based models have shown their great success in mane natural language processing (NLP) tasks. Many models have been applied to classification, sentiment analysis and extract information. Neural network model improves coherences by exploiting the distribution of word co-occurrences through the use of neural word embedding. The extracted short and coherent pieces of text alone are sufficient for prediction, classification and can be used to explain the prediction and classification.
- Linguistic Knowledge: Linguistic knowledge has been carefully incorporated into models to realize the best potentials in terms of prediction accuracy. Classical linguistic knowledge or sentiment resources include sentiment lexicons, negators and intensifiers. Sentiment lexicons are valuables for rule based or lexicon based models, there are also studies for automatic construction of sentiment form social data or from multiple languages.
Previously extracting information form a feedback is done by manually but now a days it can be done through various online text mining tools like ‘Ranks.nl’ , ‘Vivisimo/Clusty’ , ‘Wordle’ etc. and various commercial text mining software like ‘ActivePoint’, ‘Aiaioo Labs’ , AKIN Desktop HyperSearch’ etc.
Classification can be done through various classifiers like:
- RIPPER Text classification: RIPPER classification algorithm is often used in automatic email filtering process; its architecture is based on rule-based framework. It has the ability to automatic generate rules for selecting keywords instead of manual selection and it is fast able to deal with large set of attributes.
Focus Key: RIPPER Text classification
- Nearest Neighbour Classification: This approach is explored in a study based featured selection using mutual information. It is very simple numeric based algorithm which simply treats the feature vector as a vector inn-dimensional space and find the nearest matching vector in terms of distance. Boone found that nearest neighbour is particularly effective when only examples of each folder are presented to the algorithm.
The statistical algorithms are able to fill gaps in the rule based methods but at the cost of more processing time. But one area where research is lacking in application is Natural Language Processing (NLP) for insignificant feature selection. While being tedious to apply but offers the potentials to classify more effectively on unclassified feedback as information extraction using text classification provides not only relative weights between attribute words but also helps in finding attribute.
Proposed algorithm utilizes NLP and probabilistic technique for feedback classification-association, recognition and prediction of new data class and sentiment.
We are using various classification technique for better results, those techniques are:
- Naïve Bayes Classification: Naïve Bayes is an algorithm based on statistical analysis with decisions and rule being made using numeric data. It processes a feedback to match words chosen at random from total words present in each folder. The words chance of being matched is proportional to the probability of finding the word in all the classes. Bayes classifier is then used in the next step to determine likelihood that the feedback being considered belongs to the right class or not.
- Support Vector Machine classification: Support Vector machine (SVM) is a supervised machine-learning algorithm, which can be used for both classification and association. In this classification algorithm each data item plotted in n-decimal space with the value of each feature being particular coordinates then classification can be find by finding the hyper plane that differentiate the classes very well.
Information extractions are done form unstructured or semi structured documents. Named-Entity-Recognition (NER) also known as entity identification and entity extraction is very suitable for extracting information form a data. By using NER a data can be easily classified into previously defined categories like ‘Name of Person’, ‘Organization Name’, ‘Date and Place’, ‘Expressions of Time’, ‘Monetary Value’, ‘Category of Transaction’ and more.
Internet plays a vital role in this work as dataset is collected from various websites (like ‘www.bankbazzar.com’, ‘www.glassdoor.co.in’, ’www.mouthshut.com’, and ‘www.indeed.co.in’), which contains feedbacks for the BFSI sector popularly used in India. Customer feedback and reviews refers to the statement given by various customers who have used these services so far. Referring to the words and star ratings used by them the feedbacks are classified into various services and then in carried towards the next step. Applying Naïve Bayes and Support vector machine classifier are used, after this by using supervised learning approach new feedback star rating can be predicted by the sentimental analysis, feedback can be classified according to their classes, and information can be extracted from the feedback. Scores are calculated and compare between the methods for better results and accuracy. Then by using various NER models for extracting information from the feedback as previously defined classes.
For classification and sentiment analysis each feedback is calculated using both Naïve Bayes classifier-Naïve Bayes sentiment analysis and SVM classifier-SVM sentiment analysis, based on which a comparative study is made leading to choose a better algorithm out of two. The steps are as follows the feedbacks recorded imported from the dataset and separated for every data class, Sentimental analysis and Classification algorithm are applied, positive negative and neutral feedbacks are calculated and classes are divides, scores are calculated using both methods. Comparisons are preferred and accuracy is judged accordingly.
For extracting information form a data by using Stanford NER data can be extracted. Stanford NER is a java implementation of Named Entity Recognizer. NER levels all the words in texts, which is text, name of tings such as person name company name etc. Stanford NER are used for defining 3 model which is 3 class model (Location, Person, Organization), 4 class model (Location, Person Organization, Misc.), 7 class model (Location, Person Organization, Money, Percent, Date, Time).
Feedbacks for the BFSI sector popularly used in India are collected from various websites (like ‘www.bankbazzar.com’, ‘www.glassdoor.co.in’, ’www.mouthshut.com’, and ‘www.indeed.co.in’), which contains feedbacks. Positive feedbacks are for good customer services, beneficial product or service, nice environment and well management. Negative feedbacks are for bad customer services, product or service are not good as the expected level of the customer, bad circumstances and management are not good in those sector to the customer as their expected level or the standard level in those BFSI sector. Average feedbacks are for the average services, product and management.
For visualize the data a little more by plotting some graphs with the Seaborn library. Seaborn’s FacetGrid allows creating grid of histogram places side by side, by using FacetGrid we can see if there is any relationship between the variables.
Overview of Python
Python is a general purpose, dynamic, high level and interpreted programming language. It supports Object Oriented programming approach to develop applications. It is relatively simple, so it’s easy to learn since it requires a unique syntax that focuses on readability. Developers can read and translate Python code much easier than other languages. In turn, this reduces the cost of program maintenance and development because it allows teams to work collaboratively without significant language and experience barriers.
Unlike other languages Python is dynamically typed that is why we don’t need to declare data types of the variables (for example, if we write a=10 it will automatically assign an integer value to the variable ‘a’). Like most languages, Python has a number of basic types including integers, floats, booleans, and strings. These data types behave in ways that are familiar from other programming languages.
Python can also be used to process text, display numbers or images, save data, etc. So, to for executing the Natural Language Processing we used python as the scripting language. Basic statements of python which are frequently used-
The if statement is used to check a condition and if the condition is
true, we run a block of statements (called the if-block), else we process another block of statements (called the else-block). Nested if or elif
can also be used for multiple conditions.
- The for statement iterates over the members of a sequence in order, executing the block each time. In contrast to for statement while loop is used when a condition needs to be checked each iteration, or to repeat a block of code.
- The try statement is sets exception handling blocks in the code. The keyword try and except are used to catch exceptions, when an error occurs within the try block, Python looks for a matching except block to handle it.
- The def statement is used to define a function or method.
- The import statement is used to import modules whose functions can be used in current program.
- The print statement is used to send output to the standard output unit of your computer system. But in python 3 it has become a function.
Important Libraries of Python used in Project
One of Python’s greatest assets is its extensive set of libraries. Libraries are sets of routines and functions that are written in a given language. A robust set of libraries can make it easier for developers to perform complex tasks without rewriting many lines of code. These are the basic libraries that transform Python from a general purpose programming language into a powerful and robust tool for data analysis and visualization. Libraries which are used in my project are-
- NumPy is the foundational library for scientific computing in Python, and many of the libraries on this list use NumPy arrays as their basic inputs and outputs. In short, NumPy introduces objects for multidimensional arrays and matrices, as well as routines that allow
developers to perform advanced mathematical and statistical functions on those arrays with as little code as possible.
- Pandas adds data structures and tools that are designed for practical data analysis in finance, statistics, social sciences, and engineering. Pandas works well with incomplete, messy, and unlabeled data (i.e., the kind of data you’re likely to encounter in the real world), and provides tools for shaping, merging, reshaping, and slicing datasets.
- SciPy builds on NumPy by adding a collection of algorithms and high-level commands for manipulating and visualizing data. This package includes functions for computing integrals numerically, solving differential equations, optimization, and more.
- NLTK, the name of this suite of libraries stands for Natural Language Toolkit and, it is a set of libraries designed for Natural Language Processing (NLP). NLTK’s basic functions allow you to tag text, identify named entities, and display parse trees, which are like sentence diagrams that reveal parts of speech and dependencies. From there, you can do more complicated things like sentiment analysis and automatic summarization.
On the era of modern age as the online interaction has bridged physical distance and allowed companies to pursue profit and expand their business as well as reputation all over the world, keeping touch in with their customers has simultaneously more and more important for business. To have their finger on the pulse of the customer, business must have access to reliable feedback and able to analyze it properly.
Sentiment analysis, classification and extracting information is yet a challenging problem, and gains the interests of many researchers from different disciplines, its application are practical, promising and various in many industries including BFSI sector.
Project Done by ICSS Student – Pijush Mandal (PDF)
Highest Selling Technical Courses of Indian Cyber Security Solutions:
Cybersecurity services that can protect your company:
Other Location for Online Courses: