Exploring Depth and Width in Multi-Layer Perceptrons for Credit Risk Prediction

Name: Cynthia Chinenye Udoye

Student No: 22029346

Abstract

This tutorial explores how the architecture of Multi-Layer Perceptrons (MLPs), specifically their depth (number of layers) and width (number of neurons per layer), impacts their performance in predicting credit risk. Using the "Credit-G" dataset, we address class imbalance by applying class weights and test various combinations of layers and neurons. Our findings reveal that while increasing the depth of MLPs enhances their learning capacity, it also increases the risk of overfitting. Similarly, adding more neurons often increases computational time without consistently improving accuracy. The best-performing model achieved an AUC-ROC score of 0.80. This tutorial provides a clear and practical guide for using neural networks to tackle imbalanced data challenges in credit risk prediction.

Introduction

Predicting credit risk is very important in finance. It helps banks and other financial institutions figure out which applicants are likely to repay loans, reducing the risk of losing money and increasing profits (Thomas et al., 2002). Credit risk models make the decision-making process in lending more reliable.

In this tutorial, we explore the use of Multi-Layer Perceptrons (MLPs), a type of neural network, for classifying credit data into 'good' and 'bad' risk categories. MLPs are particularly well-suited for this task due to their ability to model complex nonlinear relationships in data (LeCun et al., 2015). However, datasets like the "Credit-G" dataset often exhibit class imbalance, with significantly more 'good' cases than 'bad' (He and Garcia, 2009). To address this, we apply class weights to ensure equitable learning from both classes.

The objective of this tutorial is to investigate how the architecture of MLPs, specifically the number of layers (depth) and neurons per layer (width), influences model performance. While deeper networks can capture intricate patterns and wider layers enhance representational power, both architectures pose challenges, including increased computational cost and potential overfitting (Hinton et al., 2012). Performance will be assessed using metrics such as AUC-ROC, accuracy, precision, recall, F1-score, and validation loss, providing a comprehensive evaluation of these trade-offs.

This tutorial will take you through every step of the process, including understanding the dataset, preparing the data, testing different MLP designs, and looking at the results. By the end, we hope to gain valuable insights into building effective neural networks for credit risk prediction and similar challenges in finance.

Dataset and Preprocessing

The "Credit-G" dataset from OpenML contains financial records labelled as 'good' or 'bad' credit risks. It is commonly used in credit risk modelling because it reflects real-world lending data. However, the dataset is imbalanced, with more 'good' cases than 'bad,' making it harder for models to learn from the minority class.

The dataset includes both categorical and numerical features. The following preprocessing steps were applied:

  1. One-Hot Encoding: Categorical features were converted into numerical values to make them usable by the model.
  2. Scaling: Numerical features were standardised using StandardScaler to ensure that no single feature dominated the learning process.
  3. Data Splitting: The dataset was divided into training, validation, and test sets using stratified sampling to maintain the class distribution across all subsets.
  4. Class Weights: Class weights were calculated and applied during training to give equal importance to 'good' and 'bad' cases.

Key Considerations:
Reproducibility was ensured by using consistent random seeds for data splitting and preprocessing. Ethical considerations, such as avoiding bias amplification, were kept in mind to ensure fair treatment of both classes in the dataset.

Model Architecture and Design

The Multi-Layer Perceptrons (MLP) is a neural network that learns patterns in data. In this tutorial, the MLP was used to classify credit risks as 'good' or 'bad' based on the features of the "Credit-G" dataset (OpenML, n.d.).

1. Model Overview

Diagram illustrating the structure of a Multi-Layer Perceptron (MLP) with an input layer, two hidden layers, and an output layer. Each layer is interconnected with weights and biases, highlighting the depth and width of the network.
Figure 1: The structure of Multi-Layer Perceptrons (MLP) consists of interconnected layers facilitating complex computations (Datacamp, n.d.).

2. Building the MLP

The build_mlp_model() function creates a multi-layer perceptron (MLP) model using TensorFlow and Keras. It accepts parameters for the number of layers, neurons per layer, activation functions, and optimizer settings. The model is compiled for binary classification with metrics like accuracy and AUC. This design allows flexibility in architecture exploration, enabling customisation of depth and width to optimise performance on the dataset.

Building the MLP
Figure 2: Building the MLP

3. Training and Evaluation with Class Weights

The train_and_evaluate_with_class_weights function trains and evaluates the MLP model while addressing class imbalance by applying class weights. It takes the model, training and validation datasets, class weights, and training parameters such as epochs and batch size. The function outputs the training history and evaluation metrics, including accuracy and AUC, to measure the model's performance effectively.

4. Experimentation and Hyperparameter Tuning

The run_experiments function automates the process of exploring various configurations of the MLP model. It iterates through combinations of depth (number of layers), width (neurons per layer), and other hyperparameters. The results of each configuration are logged to facilitate comparison and identify the optimal architecture for the dataset. This function simplifies architecture exploration and supports effective model optimisation.

5. Improving the Model

6. Impact of Depth and Width

7. Using Class Weights

The "Credit-G" dataset is imbalanced, with more 'good' cases than 'bad.' Class weights were applied to make the model focus more on the minority class ('bad'). This improved the recall for 'bad' cases, balancing the model’s predictions (He & Garcia, 2009).

Experimentation and Results

1. Experimental Setup

To explore how the architecture of Multi-Layer Perceptrons (MLPs) impacts credit risk prediction, different combinations of depths (number of hidden layers) and widths (number of neurons per layer) were examined. This table represents the performance of the models with varying configurations of depth, width, and epochs, using the Adam optimiser.

Impact of Depth and Width on Model Performance Metrics
Figure 3: Impact of Depth and Width on Model Performance Metrics.

The following metrics were used to evaluate model performance:

Class weights were applied during training to handle the dataset's imbalance, ensuring both classes were equally represented in the learning process.

2. Presentation of Results

The results are summarised using heatmaps and line plots to show how performance changes with different depths and widths:

Model Performance Across Depths and Widths
Figure 4: Model Performance Across Depths and Widths.
AUC-ROC Trends Across Widths and Depths
Figure 5: AUC-ROC Trends Across Widths and Depths.

3. Key Observations

Discussion & Evaluation of the Best-Performing Model

The best-performing model used 4 layers and 64 neurons per layer. This setup balanced learning complex patterns and avoiding overfitting or unnecessary computational cost. It effectively predicted both "good" and "bad" credit risks, even with an imbalanced dataset.

Model for the best depth and width
Figure 6: Model for the best depth and width.

1. Key Metrics on the Test Dataset

This model captured complex patterns in the data while avoiding overfitting, thanks to dropout and regularization techniques. The use of class weights addressed the imbalanced dataset, ensuring that "bad" credit risks were not ignored.

While the precision for 'bad' cases was lower, indicating some false positives, the high recall ensured the model successfully identified 'bad' credit risks. This balance makes the model highly useful in scenarios where identifying risky credit cases is more critical than avoiding occasional false alarms. By balancing depth, width, and regularization, this architecture demonstrated strong performance and reliability for credit risk prediction.

2. Confusion Matrix

The confusion matrix for the best model shows:

3. Visualisation

The confusion matrix confirms that while the model performs well overall, it still occasionally misclassifies 'bad' credit as 'good.' This misclassification may be influenced by the inherent imbalance in the dataset. However, the application of class weights during training helps mitigate this issue by emphasising the minority class, thereby balancing the trade-off between precision and recall.

Confusion Matrix for Model Performance Evaluation
Figure 7: Confusion Matrix for Model Performance Evaluation (Best Layer and Neurons Configuration).

4. Insights

Conclusion and Recommendations

This tutorial explored how the architecture of Multi-Layer Perceptrons (MLPs), specifically the number of layers (depth) and neurons per layer (width), affects their performance in predicting credit risk. The best model, with 4 layers and 64 neurons per layer, achieved a good balance between complexity and accuracy. It handled class imbalance effectively using class weights and achieved an AUC-ROC score of 0.80, along with strong precision and recall for both 'good' and 'bad' credit risks.

Practical Applications

The findings demonstrate that MLPs can be a valuable tool for credit risk prediction, helping financial institutions assess loan applications more reliably. By identifying patterns in credit data, these models can reduce default risks and improve decision-making in lending processes.

Recommendations for Future Exploration

In conclusion, this tutorial highlights how thoughtful design choices in neural network architecture can lead to efficient and reliable credit risk models. Future work could extend these insights to other financial applications, offering even broader benefits.

References

View Code on GitHub