Classify credit card users based on spending patterns and other metrics. Carry out PCA to reduce dimensionality and then cluster using the Kmeans algorithm.
Code Written in Python using Jupyter Notebook. Open the notebook here.ipynb) for code and thorough analysis.
Our main task is to cluster credit card users into different groups and see if we can find any meaningful patterns. We will use Principal Component Analysis (PCA) to reduce the dimension of the feature space and then use the K-means algorithm to find clusters.
The dataset contains 8950 observations and 18 attributes for each observation. Dataset is included in the repository.
The following procedures are carried out in the notebook:
Determine what attributes are useful for the task at hand. Handle missing values, standardize, and normalize the data.
Use PCA to reduce the dimensionality of our data. Select an appropriate number of components and analyze total variance explained. Interpret to make sense of the principal components.
Cluster using the Kmeans algorithm and find the optimum numebr of clusters. Use within cluster sum-of-squares to arrive at the result.
Visualize the clustered data, draw a decision boundary and try to interpret the clusters in context to the problem at hand.
We chose 2 principal components in order to visualize results. In this scenario, segregating into 3 clusters was appropriate.