Loan Insights

Posted Sep 17, 2024 Updated Oct 11, 2024

Loan Insights

By Samir Alikperov

8 min read

Loan Insights

For this project, I had to work with EasyLoan, a company that offers a variety of loan services, including personal loans, car loans, and mortgages. EasyLoan operates in several markets, including Canada, the United Kingdom, and the United States.

The analytics team is responsible for reporting on the performance of these services across different regions. My task was to ensure the data was accessible and reliable before the team began analyzing and preparing reports for the strategy department.

Database Schema

To learn more about me and my work, expand the ABOUT ME section.

ABOUT ME

Learn more about me on my ABOUT page. Below, you can find links to explore more of my projects categorized by topics, access my resume, and contact me.
| PROJECTS | RESUME | CONTACT |

Cleaning Client Data for the Analytics Dashboard

Situation:

The analytics team at EasyLoan wanted to create a dashboard that provided detailed client information. However, they needed to ensure that the data in the client table met their expectations in terms of format and accuracy before they could proceed with the dashboard.

Task:

I was assigned the task of writing a query to clean and format the client data to match the specific requirements. This involved:

Ensuring the date_of_birth field is in the correct format (YYYY-MM-DD).
Cleaning the employment_status field so that it only contains “employed” or “unemployed” in lowercase.
Ensuring the country field is correctly listed as either “USA,” “UK,” or “CA” in uppercase.

Action:

To achieve the task, I created a SQL query that:

Checked the format of the date_of_birth column and converted any incorrect formats to YYYY-MM-DD.
Standardized the employment_status column by converting all employment-related statuses to either “employed” or “unemployed,” defaulting unknown values to “unknown.”
Standardized the country column to only accept “USA,” “UK,” or “CA” in uppercase and replaced any invalid country values with “UNKNOWN.”

Here’s the query:

  
SELECT
    client_id,
    CASE
        WHEN date_of_birth ~ '^\d{4}-\d{2}-\d{2}T' THEN date_of_birth
        ELSE TO_CHAR(TO_DATE(date_of_birth, 'Month DD, YYYY'), 'YYYY-MM-DD')
    END AS date_of_birth,
    CASE
        WHEN LOWER(employment_status) IN ('full-time', 'fulltime', 'part-time', 'parttime', 'employed', 'emplouyed') THEN 'employed'
        WHEN LOWER(employment_status) = 'unemployed' THEN 'unemployed'
        ELSE 'unknown'
    END AS employment_status,
    CASE
        WHEN UPPER(country) IN ('USA', 'UK', 'CA') THEN UPPER(country)
        ELSE 'UNKNOWN'
    END AS country
FROM 
    client;

Result:

The query successfully standardized the client data by cleaning the format for date_of_birth, employment_status, and country, making the data ready for the analytics dashboard. This allowed the team to move forward with creating the client insights dashboard.

	client_id	date_of_birth	employment_status	country
0	1	1963-07-08	unemployed	USA
1	2	1957-02-07	unemployed	UK
2	3	1993-02-21	employed	CA
3	4	1978-03-19	employed	CA
4	5	2000-10-02	employed	USA
...	...	...	...	...
295	296	1963-05-02	employed	CA
296	297	2004-08-25	employed	UK
297	298	1997-08-03	unemployed	UK
298	299	2004-08-16	unemployed	CA
299	300	1965-03-02	employed	USA

300 rows × 4 columns

Filling in Missing Repayment Channels

Situation:

The backend system experienced an issue, causing some values in the repayment_channel column to be missing. These missing values are critical to the analysis of loan repayments, and they need to be filled before proceeding with further analysis. Fortunately, a pattern has been identified for the missing values, which allows us to derive the repayment channel based on the repayment_amount.

Task:

I was tasked with writing a query that would fill in the missing or invalid repayment channel values based on the following criteria:

Repayments higher than 4000 dollars should be made via bank account. Repayments lower than 1000 dollars should be made via mail.

Action:

To address this task, I created a query that:

Checks for rows where repayment_channel is either missing (NULL) or contains invalid values (‘-‘). Fills in the missing values based on the repayment amount: If the repayment_amount is greater than 4000, the repayment channel is set to ‘bank account’. If the repayment_amount is less than 1000, the repayment channel is set to ‘mail’. Returns the updated values of the repayment_channel while leaving other valid entries unchanged.

Here’s the query:

  
SELECT
	repayment_id,
	loan_id,
	repayment_date,
	repayment_amount,
	CASE
		WHEN (repayment_channel IS NULL OR repayment_channel = '-') AND repayment_amount > 4000 THEN 'bank account'
		WHEN (repayment_channel IS NULL OR repayment_channel = '-') AND repayment_amount < 1000 THEN 'mail'
		ELSE repayment_channel
	END AS repayment_channel
FROM
	repayment;

Result:

The query successfully identified and replaced the missing or invalid repayment channel values with the appropriate repayment method, ensuring that the repayment table adheres to the required criteria.

	repayment_id	loan_id	repayment_date	repayment_amount	repayment_channel
0	1	357	2022-10-16 00:00:00+00:00	1675.83	bank account
1	2	805	2023-01-12 00:00:00+00:00	867.22	debit card
2	3	843	2022-06-02 00:00:00+00:00	718.83	phone
3	4	243	2022-12-26 00:00:00+00:00	1620.97	credit card
4	5	991	2023-03-18 00:00:00+00:00	2182.17	phone
...	...	...	...	...	...
1495	1496	192	2023-02-11 00:00:00+00:00	2506.47	debit card
1496	1497	999	2022-11-21 00:00:00+00:00	1324.61	bank account
1497	1498	966	2023-01-26 00:00:00+00:00	1375.81	debit card
1498	1499	20	2023-03-30 00:00:00+00:00	1231.25	debit card
1499	1500	560	2022-04-21 00:00:00+00:00	361.11	mail

1500 rows × 5 columns

Analyzing Loans for US Clients Using the Online Contract System

Situation:

As of January 1st, 2022, all US clients began using an online system to sign contracts. The analytics team needs to analyze loan data specifically for US clients who used this new system, in order to gain insights on how it has impacted loan agreements.

Task:

I was tasked with writing a query that retrieves the relevant loan data for US clients who signed their contracts using the new online system starting from January 1st, 2022. The output should include the following columns:

client_id: The unique identifier for the client. contract_date: The date the contract was signed. principal_amount: The amount of the loan’s principal. loan_type: The type of loan taken by the client.

Action:

To fulfill the task, I created a query that:

Filters the data to include only rows where the contract_date is on or after January 1st, 2022. Restricts the results to clients who reside in the USA. Selects the relevant columns: client_id, contract_date, principal_amount, and loan_type.

Here’s the query:

  
SELECT
    c.client_id,
    ct.contract_date,
    l.principal_amount,
    l.loan_type
FROM 
    loan l
JOIN 
    client c ON l.client_id = c.client_id
JOIN 
    contract ct ON l.contract_id = ct.contract_id
WHERE 
    c.country = 'USA'
    AND ct.contract_date >= '2022-01-01';

Result:

The query successfully retrieved loan data for US clients who used the online contract system starting from January 1st, 2022, providing the analytics team with the necessary information for their analysis.

	client_id	contract_date	principal_amount	loan_type
0	267	2022-03-08 00:00:00+00:00	179230	personal
1	50	2022-01-13 00:00:00+00:00	143729	mortgage
2	280	2022-01-02 00:00:00+00:00	171122	car
3	79	2022-01-24 00:00:00+00:00	43784	mortgage
4	245	2022-01-03 00:00:00+00:00	95003	mortgage
...	...	...	...	...
89	8	2022-01-13 00:00:00+00:00	148282	personal
90	232	2022-02-08 00:00:00+00:00	78386	personal
91	95	2022-03-20 00:00:00+00:00	39098	personal
92	123	2022-03-06 00:00:00+00:00	48213	car
93	211	2022-02-23 00:00:00+00:00	147004	mortgage

94 rows × 4 columns

Comparing Interest Rates Across Countries for Different Loan Types

Situation:

The business strategy team is considering offering more competitive interest rates to the US market. To support this initiative, the analytics team wants to compare the average interest rates for each loan type across different countries. This comparison will help determine if there are significant differences in the interest rates offered in the US compared to other countries.

Task:

I was tasked with writing a query that retrieves the average interest rates for each loan type, broken down by country. The output should include: loan_type: The type of loan being offered. country: The country where the loan was offered. avg_rate: The average interest rate for the corresponding loan type and country.

Action:

To address this task, I wrote a query that: Groups the data by loan_type and country. Calculates the average interest rate (avg_rate) for each combination of loan type and country using the AVG() function. Selects the relevant columns: loan_type, country, and avg_rate.

Here’s the query:

  
SELECT
	loan_type, 
	country,
	AVG(l.interest_rate)as avg_rate
FROM 
	loan l
JOIN
	client c ON l.client_id = c.client_id
GROUP BY l.loan_type, c.country
ORDER BY l.loan_type, c.country;

Result:

The query successfully provided the analytics team with the average interest rates for each loan type across different countries, enabling them to compare rates and support the business strategy team’s decision-making process.

	loan_type	country	avg_rate
0	car	CA	0.112039
1	car	UK	0.122613
2	car	USA	0.103636
3	mortgage	CA	0.044068
4	mortgage	UK	0.042281
5	mortgage	USA	0.043860
6	personal	CA	0.217253
7	personal	UK	0.198738
8	personal	USA	0.202721

SQL

DATA ANALYSIS LOAN

This post is licensed under CC BY 4.0 by the author.

Cleaning Client Data for the Analytics Dashboard

Situation:

Task:

Action:

Result:

Filling in Missing Repayment Channels

Situation:

Task:

Action:

Result:

Analyzing Loans for US Clients Using the Online Contract System

Situation:

Task:

Action:

Result:

Comparing Interest Rates Across Countries for Different Loan Types

Situation:

Task:

Action:

Result:

Trending Tags