Churn-Modeling

A predictive churn model is a powerful tool for identifying which of your customers will stop engaging with your business. With that information, you can built retention strategies, discount offers, email campaigns, and more that keep your high-value customers buying.

Data Preprocessing

Importing the libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing the dataset

In [2]:
dataset = pd.read_csv('Churn_Modeling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
In [3]:
dataset.head()
Out[3]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
In [4]:
print(X)
print('\n')
print(y)
[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


[1 0 1 ... 1 1 0]

Encoding categorical data (Geography, Gender)

In [5]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
In [6]:
print("X -> {}".format(X))
print('\n')
print("y -> {}".format(y))
X -> [[0.0000000e+00 0.0000000e+00 6.1900000e+02 ... 1.0000000e+00
  1.0000000e+00 1.0134888e+05]
 [0.0000000e+00 1.0000000e+00 6.0800000e+02 ... 0.0000000e+00
  1.0000000e+00 1.1254258e+05]
 [0.0000000e+00 0.0000000e+00 5.0200000e+02 ... 1.0000000e+00
  0.0000000e+00 1.1393157e+05]
 ...
 [0.0000000e+00 0.0000000e+00 7.0900000e+02 ... 0.0000000e+00
  1.0000000e+00 4.2085580e+04]
 [1.0000000e+00 0.0000000e+00 7.7200000e+02 ... 1.0000000e+00
  0.0000000e+00 9.2888520e+04]
 [0.0000000e+00 0.0000000e+00 7.9200000e+02 ... 1.0000000e+00
  0.0000000e+00 3.8190780e+04]]


y -> [1 0 1 ... 1 1 0]

Exploratory Data Analysis

Statistical Description of the dataset

In [7]:
dataset.describe()
Out[7]:
RowNumber CustomerId CreditScore Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
count 10000.00000 1.000000e+04 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.00000 10000.000000 10000.000000 10000.000000
mean 5000.50000 1.569094e+07 650.528800 38.921800 5.012800 76485.889288 1.530200 0.70550 0.515100 100090.239881 0.203700
std 2886.89568 7.193619e+04 96.653299 10.487806 2.892174 62397.405202 0.581654 0.45584 0.499797 57510.492818 0.402769
min 1.00000 1.556570e+07 350.000000 18.000000 0.000000 0.000000 1.000000 0.00000 0.000000 11.580000 0.000000
25% 2500.75000 1.562853e+07 584.000000 32.000000 3.000000 0.000000 1.000000 0.00000 0.000000 51002.110000 0.000000
50% 5000.50000 1.569074e+07 652.000000 37.000000 5.000000 97198.540000 1.000000 1.00000 1.000000 100193.915000 0.000000
75% 7500.25000 1.575323e+07 718.000000 44.000000 7.000000 127644.240000 2.000000 1.00000 1.000000 149388.247500 0.000000
max 10000.00000 1.581569e+07 850.000000 92.000000 10.000000 250898.090000 4.000000 1.00000 1.000000 199992.480000 1.000000
In [8]:
dataset.columns
Out[8]:
Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object')

Geographical Analysis

Map

country

Germany looses most of the customers, company must look into it.

Gender

Gender

Females customers are more likely to leave the bank.

Based on Activity

Active

Based on number of products used

Products

Age

Age

High churn rate among customers of age 45-60.

Balance

Balance

Balance is not creating a significant impact on churn.

Credit Score

CreditScore

Credit Score is not creating a significant impact on churn.

Splitting the dataset into the Training set and Test set

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Feature Scaling

In [13]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
In [14]:
print(X_train)
[[-0.5698444   1.74309049  0.16958176 ...  0.64259497 -1.03227043
   1.10643166]
 [ 1.75486502 -0.57369368 -2.30455945 ...  0.64259497  0.9687384
  -0.74866447]
 [-0.5698444  -0.57369368 -1.19119591 ...  0.64259497 -1.03227043
   1.48533467]
 ...
 [-0.5698444  -0.57369368  0.9015152  ...  0.64259497 -1.03227043
   1.41231994]
 [-0.5698444   1.74309049 -0.62420521 ...  0.64259497  0.9687384
   0.84432121]
 [ 1.75486502 -0.57369368 -0.28401079 ...  0.64259497 -1.03227043
   0.32472465]]

Applying various machine learning algorithms

Deploying Logistic Regression

In [15]:
from sklearn.linear_model import LogisticRegression
lr_classifier = LogisticRegression()
lr_classifier.fit(X_train, y_train)
y_lr_pred = lr_classifier.predict(X_test)
In [16]:
from sklearn.metrics import classification_report, confusion_matrix
print("confusion_matrix:\n {}".format(confusion_matrix(y_test, y_lr_pred)))
print("\nclassification_report: \n {}".format(classification_report(y_test, y_lr_pred)))
confusion_matrix:
 [[1526   69]
 [ 309   96]]

classification_report:
              precision    recall  f1-score   support

          0       0.83      0.96      0.89      1595
          1       0.58      0.24      0.34       405

avg / total       0.78      0.81      0.78      2000

Deploying Support Vector Machine Classifier

In [17]:
from sklearn.svm import SVC
svm_classifier = SVC()
svm_classifier.fit(X_train, y_train)
y_svm_pred = svm_classifier.predict(X_test)
In [18]:
from sklearn.metrics import classification_report, confusion_matrix
print("confusion_matrix:\n {}".format(confusion_matrix(y_test, y_svm_pred)))
print("\nclassification_report: \n {}".format(classification_report(y_test, y_svm_pred)))
confusion_matrix:
 [[1547   48]
 [ 225  180]]

classification_report:
              precision    recall  f1-score   support

          0       0.87      0.97      0.92      1595
          1       0.79      0.44      0.57       405

avg / total       0.86      0.86      0.85      2000

Deploying Random Forest Classifier

In [19]:
from sklearn.ensemble import RandomForestClassifier
Rf_classifier = RandomForestClassifier()
Rf_classifier.fit(X_train, y_train)
y_rf_pred = Rf_classifier.predict(X_test)
In [20]:
from sklearn.metrics import classification_report, confusion_matrix
print("confusion_matrix:\n {}".format(confusion_matrix(y_test, y_rf_pred)))
print("\nclassification_report: \n {}".format(classification_report(y_test, y_rf_pred)))
confusion_matrix:
 [[1525   70]
 [ 208  197]]

classification_report:
              precision    recall  f1-score   support

          0       0.88      0.96      0.92      1595
          1       0.74      0.49      0.59       405

avg / total       0.85      0.86      0.85      2000

Building an Artificial Neural Network

In [21]:
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
/home/shivang/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
In [22]:
# Initialising the ANN
classifier = Sequential()
In [23]:
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))

# Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))

# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

Training the ANN

In [24]:
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
In [25]:
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
Epoch 1/100
8000/8000 [==============================] - 1s 110us/step - loss: 0.4912 - acc: 0.7954
Epoch 2/100
8000/8000 [==============================] - 1s 77us/step - loss: 0.4266 - acc: 0.7960
Epoch 3/100
8000/8000 [==============================] - 1s 84us/step - loss: 0.4204 - acc: 0.8091
Epoch 4/100
8000/8000 [==============================] - 1s 76us/step - loss: 0.4146 - acc: 0.8259
Epoch 5/100
8000/8000 [==============================] - 1s 77us/step - loss: 0.4107 - acc: 0.8305
Epoch 6/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.4071 - acc: 0.8335
Epoch 7/100
8000/8000 [==============================] - 1s 76us/step - loss: 0.4048 - acc: 0.8335
Epoch 8/100
8000/8000 [==============================] - 1s 93us/step - loss: 0.4030 - acc: 0.8332
Epoch 9/100
8000/8000 [==============================] - 1s 160us/step - loss: 0.4015 - acc: 0.8336
Epoch 10/100
8000/8000 [==============================] - 1s 111us/step - loss: 0.4006 - acc: 0.8341
Epoch 11/100
8000/8000 [==============================] - 1s 149us/step - loss: 0.3995 - acc: 0.8344
Epoch 12/100
8000/8000 [==============================] - 1s 136us/step - loss: 0.3987 - acc: 0.8350
Epoch 13/100
8000/8000 [==============================] - 1s 96us/step - loss: 0.3984 - acc: 0.8351
Epoch 14/100
8000/8000 [==============================] - 1s 104us/step - loss: 0.3981 - acc: 0.8341
Epoch 15/100
8000/8000 [==============================] - 1s 96us/step - loss: 0.3976 - acc: 0.8354
Epoch 16/100
8000/8000 [==============================] - 1s 106us/step - loss: 0.3979 - acc: 0.8355
Epoch 17/100
8000/8000 [==============================] - 1s 132us/step - loss: 0.3970 - acc: 0.8350
Epoch 18/100
8000/8000 [==============================] - 1s 84us/step - loss: 0.3968 - acc: 0.8347
Epoch 19/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3963 - acc: 0.8341
Epoch 20/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3964 - acc: 0.8345
Epoch 21/100
8000/8000 [==============================] - 1s 81us/step - loss: 0.3964 - acc: 0.8351
Epoch 22/100
8000/8000 [==============================] - 1s 76us/step - loss: 0.3962 - acc: 0.8342
Epoch 23/100
8000/8000 [==============================] - 1s 76us/step - loss: 0.3959 - acc: 0.8362
Epoch 24/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3959 - acc: 0.8349
Epoch 25/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3957 - acc: 0.8342
Epoch 26/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3957 - acc: 0.8355
Epoch 27/100
8000/8000 [==============================] - 1s 80us/step - loss: 0.3954 - acc: 0.8345
Epoch 28/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3951 - acc: 0.8340
Epoch 29/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3946 - acc: 0.8361
Epoch 30/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3947 - acc: 0.8361
Epoch 31/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3944 - acc: 0.8360
Epoch 32/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3947 - acc: 0.8370
Epoch 33/100
8000/8000 [==============================] - 1s 76us/step - loss: 0.3940 - acc: 0.8390
Epoch 34/100
8000/8000 [==============================] - 1s 78us/step - loss: 0.3944 - acc: 0.8356
Epoch 35/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3940 - acc: 0.8361
Epoch 36/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3936 - acc: 0.8386
Epoch 37/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3942 - acc: 0.8364
Epoch 38/100
8000/8000 [==============================] - 1s 79us/step - loss: 0.3934 - acc: 0.8380
Epoch 39/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3934 - acc: 0.8381
Epoch 40/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3931 - acc: 0.8375
Epoch 41/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3928 - acc: 0.8385
Epoch 42/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3923 - acc: 0.8391
Epoch 43/100
8000/8000 [==============================] - 1s 85us/step - loss: 0.3918 - acc: 0.8397
Epoch 44/100
8000/8000 [==============================] - 1s 103us/step - loss: 0.3910 - acc: 0.8401
Epoch 45/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3899 - acc: 0.8400
Epoch 46/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3885 - acc: 0.8404
Epoch 47/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3873 - acc: 0.8372
Epoch 48/100
8000/8000 [==============================] - 1s 80us/step - loss: 0.3859 - acc: 0.8387
Epoch 49/100
8000/8000 [==============================] - 1s 77us/step - loss: 0.3817 - acc: 0.8381
Epoch 50/100
8000/8000 [==============================] - 1s 76us/step - loss: 0.3788 - acc: 0.8376
Epoch 51/100
8000/8000 [==============================] - 1s 76us/step - loss: 0.3749 - acc: 0.8386
Epoch 52/100
8000/8000 [==============================] - 1s 77us/step - loss: 0.3704 - acc: 0.8425
Epoch 53/100
8000/8000 [==============================] - 1s 84us/step - loss: 0.3667 - acc: 0.8436
Epoch 54/100
8000/8000 [==============================] - 1s 81us/step - loss: 0.3642 - acc: 0.8486
Epoch 55/100
8000/8000 [==============================] - 1s 96us/step - loss: 0.3624 - acc: 0.8495
Epoch 56/100
8000/8000 [==============================] - 1s 136us/step - loss: 0.3600 - acc: 0.8500
Epoch 57/100
8000/8000 [==============================] - 1s 108us/step - loss: 0.3576 - acc: 0.8565
Epoch 58/100
8000/8000 [==============================] - 1s 110us/step - loss: 0.3563 - acc: 0.8584
Epoch 59/100
8000/8000 [==============================] - 1s 128us/step - loss: 0.3546 - acc: 0.8587
Epoch 60/100
8000/8000 [==============================] - 1s 133us/step - loss: 0.3533 - acc: 0.8597
Epoch 61/100
8000/8000 [==============================] - 1s 80us/step - loss: 0.3532 - acc: 0.8606
Epoch 62/100
8000/8000 [==============================] - 1s 79us/step - loss: 0.3515 - acc: 0.8601
Epoch 63/100
8000/8000 [==============================] - 1s 84us/step - loss: 0.3509 - acc: 0.8627
Epoch 64/100
8000/8000 [==============================] - 1s 90us/step - loss: 0.3510 - acc: 0.8594
Epoch 65/100
8000/8000 [==============================] - 1s 91us/step - loss: 0.3497 - acc: 0.8605
Epoch 66/100
8000/8000 [==============================] - 1s 78us/step - loss: 0.3493 - acc: 0.8614
Epoch 67/100
8000/8000 [==============================] - 1s 81us/step - loss: 0.3488 - acc: 0.8602
Epoch 68/100
8000/8000 [==============================] - 1s 77us/step - loss: 0.3483 - acc: 0.8611
Epoch 69/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3486 - acc: 0.8622
Epoch 70/100
8000/8000 [==============================] - 1s 90us/step - loss: 0.3480 - acc: 0.8606
Epoch 71/100
8000/8000 [==============================] - 1s 77us/step - loss: 0.3477 - acc: 0.8595
Epoch 72/100
8000/8000 [==============================] - 1s 76us/step - loss: 0.3471 - acc: 0.8606
Epoch 73/100
8000/8000 [==============================] - 1s 80us/step - loss: 0.3473 - acc: 0.8616
Epoch 74/100
8000/8000 [==============================] - 1s 96us/step - loss: 0.3460 - acc: 0.8619
Epoch 75/100
8000/8000 [==============================] - 1s 96us/step - loss: 0.3465 - acc: 0.8607
Epoch 76/100
8000/8000 [==============================] - 1s 80us/step - loss: 0.3462 - acc: 0.8621
Epoch 77/100
8000/8000 [==============================] - 1s 92us/step - loss: 0.3462 - acc: 0.8607
Epoch 78/100
8000/8000 [==============================] - 1s 77us/step - loss: 0.3459 - acc: 0.8582
Epoch 79/100
8000/8000 [==============================] - 1s 74us/step - loss: 0.3452 - acc: 0.8629
Epoch 80/100
8000/8000 [==============================] - 1s 80us/step - loss: 0.3449 - acc: 0.8630
Epoch 81/100
8000/8000 [==============================] - 1s 81us/step - loss: 0.3454 - acc: 0.8621
Epoch 82/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3448 - acc: 0.8597
Epoch 83/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3450 - acc: 0.8604
Epoch 84/100
8000/8000 [==============================] - 1s 77us/step - loss: 0.3446 - acc: 0.8626
Epoch 85/100
8000/8000 [==============================] - 1s 78us/step - loss: 0.3437 - acc: 0.8607
Epoch 86/100
8000/8000 [==============================] - 1s 93us/step - loss: 0.3444 - acc: 0.8611
Epoch 87/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3436 - acc: 0.8617
Epoch 88/100
8000/8000 [==============================] - 1s 92us/step - loss: 0.3442 - acc: 0.8616
Epoch 89/100
8000/8000 [==============================] - 1s 75us/step - loss: 0.3430 - acc: 0.8640
Epoch 90/100
8000/8000 [==============================] - 1s 85us/step - loss: 0.3439 - acc: 0.8612
Epoch 91/100
8000/8000 [==============================] - 1s 87us/step - loss: 0.3437 - acc: 0.8625
Epoch 92/100
8000/8000 [==============================] - 1s 78us/step - loss: 0.3426 - acc: 0.8614
Epoch 93/100
8000/8000 [==============================] - 1s 131us/step - loss: 0.3429 - acc: 0.8610
Epoch 94/100
8000/8000 [==============================] - 1s 168us/step - loss: 0.3434 - acc: 0.8611
Epoch 95/100
8000/8000 [==============================] - 1s 113us/step - loss: 0.3426 - acc: 0.8626
Epoch 96/100
8000/8000 [==============================] - 1s 95us/step - loss: 0.3429 - acc: 0.8601
Epoch 97/100
8000/8000 [==============================] - 1s 99us/step - loss: 0.3420 - acc: 0.8621
Epoch 98/100
8000/8000 [==============================] - 1s 79us/step - loss: 0.3425 - acc: 0.8611
Epoch 99/100
8000/8000 [==============================] - 1s 80us/step - loss: 0.3422 - acc: 0.8630
Epoch 100/100
8000/8000 [==============================] - 1s 79us/step - loss: 0.3419 - acc: 0.8612
Out[25]:
<keras.callbacks.History at 0x7f3ad2723940>
In [26]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
In [27]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
In [28]:
print (cm)
[[1525   70]
 [ 203  202]]
In [29]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
             precision    recall  f1-score   support

          0       0.88      0.96      0.92      1595
          1       0.74      0.50      0.60       405

avg / total       0.85      0.86      0.85      2000

In [30]:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_train)
In [31]:
X_pca
Out[31]:
array([[ 2.0518478 ,  0.60570437],
       [-1.56368114, -0.55007194],
       [-0.5326157 , -0.06866931],
       ...,
       [-0.48768333,  0.59445388],
       [ 2.10582722,  0.27929122],
       [-2.0070203 ,  0.29731549]])
In [32]:
pca_df = pd.DataFrame(data=X_pca, columns=["pca 1", "pca 2"])
pca_df["pred"] = y_train

Evaluation

Here there's a close competition but Support Vector Machines win with the Precision = 0.86, Recall =0.86 and F1-score = 0.85.