Post

Loan Insights

Loan Insights

For this project, I had to work with EasyLoan, a company that offers a variety of loan services, including personal loans, car loans, and mortgages. EasyLoan operates in several markets, including Canada, the United Kingdom, and the United States.

The analytics team is responsible for reporting on the performance of these services across different regions. My task was to ensure the data was accessible and reliable before the team began analyzing and preparing reports for the strategy department.

Database Schema
lending

To learn more about me and my work, expand the ABOUT ME section.

ABOUT ME Learn more about me on my ABOUT page. Below, you can find links to explore more of my projects categorized by topics, access my resume, and contact me.
| PROJECTS | RESUME | CONTACT |

Cleaning Client Data for the Analytics Dashboard

Situation:

The analytics team at EasyLoan wanted to create a dashboard that provided detailed client information. However, they needed to ensure that the data in the client table met their expectations in terms of format and accuracy before they could proceed with the dashboard.

Task:

I was assigned the task of writing a query to clean and format the client data to match the specific requirements. This involved:

  • Ensuring the date_of_birth field is in the correct format (YYYY-MM-DD).
  • Cleaning the employment_status field so that it only contains “employed” or “unemployed” in lowercase.
  • Ensuring the country field is correctly listed as either “USA,” “UK,” or “CA” in uppercase.

Action:

To achieve the task, I created a SQL query that:

  1. Checked the format of the date_of_birth column and converted any incorrect formats to YYYY-MM-DD.
  2. Standardized the employment_status column by converting all employment-related statuses to either “employed” or “unemployed,” defaulting unknown values to “unknown.”
  3. Standardized the country column to only accept “USA,” “UK,” or “CA” in uppercase and replaced any invalid country values with “UNKNOWN.”

Here’s the query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
    client_id,
    CASE
        WHEN date_of_birth ~ '^\d{4}-\d{2}-\d{2}T' THEN date_of_birth
        ELSE TO_CHAR(TO_DATE(date_of_birth, 'Month DD, YYYY'), 'YYYY-MM-DD')
    END AS date_of_birth,
    CASE
        WHEN LOWER(employment_status) IN ('full-time', 'fulltime', 'part-time', 'parttime', 'employed', 'emplouyed') THEN 'employed'
        WHEN LOWER(employment_status) = 'unemployed' THEN 'unemployed'
        ELSE 'unknown'
    END AS employment_status,
    CASE
        WHEN UPPER(country) IN ('USA', 'UK', 'CA') THEN UPPER(country)
        ELSE 'UNKNOWN'
    END AS country
FROM 
    client;

Result:

The query successfully standardized the client data by cleaning the format for date_of_birth, employment_status, and country, making the data ready for the analytics dashboard. This allowed the team to move forward with creating the client insights dashboard.

client_iddate_of_birthemployment_statuscountry
011963-07-08unemployedUSA
121957-02-07unemployedUK
231993-02-21employedCA
341978-03-19employedCA
452000-10-02employedUSA
...............
2952961963-05-02employedCA
2962972004-08-25employedUK
2972981997-08-03unemployedUK
2982992004-08-16unemployedCA
2993001965-03-02employedUSA

300 rows × 4 columns

Filling in Missing Repayment Channels

Situation:

The backend system experienced an issue, causing some values in the repayment_channel column to be missing. These missing values are critical to the analysis of loan repayments, and they need to be filled before proceeding with further analysis. Fortunately, a pattern has been identified for the missing values, which allows us to derive the repayment channel based on the repayment_amount.

Task:

I was tasked with writing a query that would fill in the missing or invalid repayment channel values based on the following criteria:

Repayments higher than 4000 dollars should be made via bank account. Repayments lower than 1000 dollars should be made via mail.

Action:

To address this task, I created a query that:

Checks for rows where repayment_channel is either missing (NULL) or contains invalid values (‘-‘). Fills in the missing values based on the repayment amount: If the repayment_amount is greater than 4000, the repayment channel is set to ‘bank account’. If the repayment_amount is less than 1000, the repayment channel is set to ‘mail’. Returns the updated values of the repayment_channel while leaving other valid entries unchanged.

Here’s the query:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT
	repayment_id,
	loan_id,
	repayment_date,
	repayment_amount,
	CASE
		WHEN (repayment_channel IS NULL OR repayment_channel = '-') AND repayment_amount > 4000 THEN 'bank account'
		WHEN (repayment_channel IS NULL OR repayment_channel = '-') AND repayment_amount < 1000 THEN 'mail'
		ELSE repayment_channel
	END AS repayment_channel
FROM
	repayment;

Result:

The query successfully identified and replaced the missing or invalid repayment channel values with the appropriate repayment method, ensuring that the repayment table adheres to the required criteria.

repayment_idloan_idrepayment_daterepayment_amountrepayment_channel
013572022-10-16 00:00:00+00:001675.83bank account
128052023-01-12 00:00:00+00:00867.22debit card
238432022-06-02 00:00:00+00:00718.83phone
342432022-12-26 00:00:00+00:001620.97credit card
459912023-03-18 00:00:00+00:002182.17phone
..................
149514961922023-02-11 00:00:00+00:002506.47debit card
149614979992022-11-21 00:00:00+00:001324.61bank account
149714989662023-01-26 00:00:00+00:001375.81debit card
14981499202023-03-30 00:00:00+00:001231.25debit card
149915005602022-04-21 00:00:00+00:00361.11mail

1500 rows × 5 columns

Analyzing Loans for US Clients Using the Online Contract System

Situation:

As of January 1st, 2022, all US clients began using an online system to sign contracts. The analytics team needs to analyze loan data specifically for US clients who used this new system, in order to gain insights on how it has impacted loan agreements.

Task:

I was tasked with writing a query that retrieves the relevant loan data for US clients who signed their contracts using the new online system starting from January 1st, 2022. The output should include the following columns:

client_id: The unique identifier for the client. contract_date: The date the contract was signed. principal_amount: The amount of the loan’s principal. loan_type: The type of loan taken by the client.

Action:

To fulfill the task, I created a query that:

Filters the data to include only rows where the contract_date is on or after January 1st, 2022. Restricts the results to clients who reside in the USA. Selects the relevant columns: client_id, contract_date, principal_amount, and loan_type.

Here’s the query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
    c.client_id,
    ct.contract_date,
    l.principal_amount,
    l.loan_type
FROM 
    loan l
JOIN 
    client c ON l.client_id = c.client_id
JOIN 
    contract ct ON l.contract_id = ct.contract_id
WHERE 
    c.country = 'USA'
    AND ct.contract_date >= '2022-01-01';

Result:

The query successfully retrieved loan data for US clients who used the online contract system starting from January 1st, 2022, providing the analytics team with the necessary information for their analysis.

client_idcontract_dateprincipal_amountloan_type
02672022-03-08 00:00:00+00:00179230personal
1502022-01-13 00:00:00+00:00143729mortgage
22802022-01-02 00:00:00+00:00171122car
3792022-01-24 00:00:00+00:0043784mortgage
42452022-01-03 00:00:00+00:0095003mortgage
...............
8982022-01-13 00:00:00+00:00148282personal
902322022-02-08 00:00:00+00:0078386personal
91952022-03-20 00:00:00+00:0039098personal
921232022-03-06 00:00:00+00:0048213car
932112022-02-23 00:00:00+00:00147004mortgage

94 rows × 4 columns

Comparing Interest Rates Across Countries for Different Loan Types

Situation:

The business strategy team is considering offering more competitive interest rates to the US market. To support this initiative, the analytics team wants to compare the average interest rates for each loan type across different countries. This comparison will help determine if there are significant differences in the interest rates offered in the US compared to other countries.

Task:

I was tasked with writing a query that retrieves the average interest rates for each loan type, broken down by country. The output should include: loan_type: The type of loan being offered. country: The country where the loan was offered. avg_rate: The average interest rate for the corresponding loan type and country.

Action:

To address this task, I wrote a query that: Groups the data by loan_type and country. Calculates the average interest rate (avg_rate) for each combination of loan type and country using the AVG() function. Selects the relevant columns: loan_type, country, and avg_rate.

Here’s the query:

1
2
3
4
5
6
7
8
9
10
SELECT
	loan_type, 
	country,
	AVG(l.interest_rate)as avg_rate
FROM 
	loan l
JOIN
	client c ON l.client_id = c.client_id
GROUP BY l.loan_type, c.country
ORDER BY l.loan_type, c.country;

Result:

The query successfully provided the analytics team with the average interest rates for each loan type across different countries, enabling them to compare rates and support the business strategy team’s decision-making process.

loan_typecountryavg_rate
0carCA0.112039
1carUK0.122613
2carUSA0.103636
3mortgageCA0.044068
4mortgageUK0.042281
5mortgageUSA0.043860
6personalCA0.217253
7personalUK0.198738
8personalUSA0.202721
This post is licensed under CC BY 4.0 by the author.