Thursday, 1 December 2016


Problem: Predicting the success of Bank Telemarketing
A Portuguese bank wants to know which user stereotype is more likely to issue a term deposit with them via a telemarketing call.
Banks are always looking for ways to invest in higher gain financial products. One of those ways is issuing the user a term deposit. That way they can gather money from the client and repurpose it in other type of investments. It’s very important for the institution (Bank) to know which is its target audience (Among its clients) and who is more likely to get one term deposit (under what circumstances and characteristics) through its telemarketing calls.
Question at Issue
Which type of users (which data combination) are more likely to issue a term deposit? Would the call to the customer end in “success” or “failure” in terms of term deposit subscription?  How do certain attributes affect the decision for a user to issue a term deposit with the ban? Are all the attributes of importance to the decision-making process?
Can the number of times that the same user being contacted taken as a determining factor?  Could any legal implications external to the data gathered be decisive to the user for issuing a term deposit?
Purpose
Determine the ideal combination, and factors which can be decisive for  a user to issue a term deposit via a telemarketing call.
Information
The sources of information will be gathered from a data set from the Machine Learning Repository (UCI) in the form of CSV file. The data set consist of 45211 instances, 20 attributes and has multivariate characteristics in it. Its data is retrieved from a series of marketing campaigns done by the Portuguese banking institution. Moreover, in order to analyze this data various information sources would be referred ranging from technical blogs to scholarly papers. Thus, evaluating and brainstorming each algorithm being applied to predict the class value.
Assumptions
We assume that the provided data is accurate. We assume that the attribute duration of the call is not necessary, therefore we have eliminated it.  Furthermore, we assume that this model will help decision-making of the bank if its current users are prone to be engaged in a term deposit.
Concepts
It’s a “classification problem” we are dealing with because it has a two discrete values “success” or “failure” of selling the term deposit as output. The “term deposit” refers to a period in which the user puts money in the bank for a certain period of time; the bank will then pay the user an agreed interest rate of the deposited money. The learning algorithm we used is “Supervised learning”.  Supervised learning means getting a desired output for a pair of input values and always has the right answers given to it. The supervised algorithms used were ZeroR (predicts the majority class), OneR (Selects the rule with the smallest error based on each predictor), Logistic Regression (Predicts a probability based on a binary class value), Decision Tree (predicts the value of a target variable by using simple decision rules that come from the data features), and Naive Bayes (Calculates the posterior probability of a class value occurring). To understand the numeric components of data, we used Principle Component Analysis which means finding a linear combination of a set of variables that has maximum variance and removing its effect, repeating this successfully (Reducing Dimensionality). We applied a univariate analysis which allowed us to compare each attribute and select which were of statistically significance. Chi-squared (qualitative variables) and Logistic Regression (quantitative variables) were applied for the analysis. Based on the data’s attributes and our insights we hypothesize that the client is more prone to subscribe to a term deposit if he has been contacted more by the bank, and does not have a housing loan. Our reasoning behind this, is that people are more prone to save if they are not in debt (Term deposits are closer to saving than spending). House loans are a huge liability), and will be reminded (by the bank) of a possibility to gather extra income by issuing a term deposit.
Implications and Consequences
By finalizing this study, we are going to be able to come up with a decision which certain type of user is more keen to subscribe for a term deposit. Thus, the conducted study, will be able to pinpoint a stereotype to target bank’s business; potentially losing millions of dollars in the wrong type of customers is not acceptable and this study would help in cutting down expenses. On top of that, the money could be used to invest in other markets. On the other end, if the business opts for it, it could potentially save money by delivering the right target for the appropriate customer base (reducing worthless investment). With the gathered money, it could repurpose it to grow its business and cut down costs and short time. The negative aspect of the project could be that, the model might predict some false results or misclassification of customers which might mislead in terms of targeting people.
Point Of View
Our team believes that there is an increased interest from banks to better understand the customer patterns so they could discover interesting consuming habits that could potentially open them (bank) for new businesses. We also believe that this could discover other important information hidden in the data, besides what the bank wants. However, from customer’s perspective it may be a case of telemarketing pushing towards sales. There is a possibility that some customers are reluctant to invest over phone but are ready to do face to face transaction.
Interpretation and Inferences
The inferences made after univariate analysis of data showed that loan and housing variables were not statistically significant. Different classification models such  as logistic regression and decision trees helped us understand if the decision boundary is linear or nonlinear and so they were used on the data. After experimenting, it was discovered that Naive Bayes Algorithm was the best as it provided the highest number of YES correctly among the other algorithms (ZeroR, OneR, Logistic Regression, Decision Tree - J48), thus making Naive Bayes Algorithm the right choice for our data.
Summary
We were able to learn, once we finished analyzing the logic of this problem, that banks are institutions who are trying to extend their financial umbrella. They issue term deposits so they can reinvest or repurpose that money from the users in other ways to increase revenue. For that it is imperative for them to know which customer segment (with what characteristics) they are trying to pursue. That would result in reduced waste and greater opportunities for money growth.