Loan Insights
For this project, I had to work with EasyLoan, a company that offers a variety of loan services, including personal loans, car loans, and mortgages. EasyLoan operates in several markets, including Canada, the United Kingdom, and the United States.
The analytics team is responsible for reporting on the performance of these services across different regions. My task was to ensure the data was accessible and reliable before the team began analyzing and preparing reports for the strategy department.
To learn more about me and my work, expand the ABOUT ME section.
ABOUT ME
Learn more about me on my ABOUT page. Below, you can find links to explore more of my projects categorized by topics, access my resume, and contact me.| PROJECTS | RESUME | CONTACT |
Cleaning Client Data for the Analytics Dashboard
Situation:
The analytics team at EasyLoan wanted to create a dashboard that provided detailed client information. However, they needed to ensure that the data in the client
table met their expectations in terms of format and accuracy before they could proceed with the dashboard.
Task:
I was assigned the task of writing a query to clean and format the client
data to match the specific requirements. This involved:
- Ensuring the
date_of_birth
field is in the correct format (YYYY-MM-DD
). - Cleaning the
employment_status
field so that it only contains “employed” or “unemployed” in lowercase. - Ensuring the
country
field is correctly listed as either “USA,” “UK,” or “CA” in uppercase.
Action:
To achieve the task, I created a SQL query that:
- Checked the format of the
date_of_birth
column and converted any incorrect formats toYYYY-MM-DD
. - Standardized the
employment_status
column by converting all employment-related statuses to either “employed” or “unemployed,” defaulting unknown values to “unknown.” - Standardized the
country
column to only accept “USA,” “UK,” or “CA” in uppercase and replaced any invalid country values with “UNKNOWN.”
Here’s the query:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
client_id,
CASE
WHEN date_of_birth ~ '^\d{4}-\d{2}-\d{2}T' THEN date_of_birth
ELSE TO_CHAR(TO_DATE(date_of_birth, 'Month DD, YYYY'), 'YYYY-MM-DD')
END AS date_of_birth,
CASE
WHEN LOWER(employment_status) IN ('full-time', 'fulltime', 'part-time', 'parttime', 'employed', 'emplouyed') THEN 'employed'
WHEN LOWER(employment_status) = 'unemployed' THEN 'unemployed'
ELSE 'unknown'
END AS employment_status,
CASE
WHEN UPPER(country) IN ('USA', 'UK', 'CA') THEN UPPER(country)
ELSE 'UNKNOWN'
END AS country
FROM
client;
Result:
The query successfully standardized the client data by cleaning the format for date_of_birth, employment_status, and country, making the data ready for the analytics dashboard. This allowed the team to move forward with creating the client insights dashboard.
client_id | date_of_birth | employment_status | country | |
---|---|---|---|---|
0 | 1 | 1963-07-08 | unemployed | USA |
1 | 2 | 1957-02-07 | unemployed | UK |
2 | 3 | 1993-02-21 | employed | CA |
3 | 4 | 1978-03-19 | employed | CA |
4 | 5 | 2000-10-02 | employed | USA |
... | ... | ... | ... | ... |
295 | 296 | 1963-05-02 | employed | CA |
296 | 297 | 2004-08-25 | employed | UK |
297 | 298 | 1997-08-03 | unemployed | UK |
298 | 299 | 2004-08-16 | unemployed | CA |
299 | 300 | 1965-03-02 | employed | USA |
300 rows × 4 columns
Filling in Missing Repayment Channels
Situation:
The backend system experienced an issue, causing some values in the repayment_channel column to be missing. These missing values are critical to the analysis of loan repayments, and they need to be filled before proceeding with further analysis. Fortunately, a pattern has been identified for the missing values, which allows us to derive the repayment channel based on the repayment_amount.
Task:
I was tasked with writing a query that would fill in the missing or invalid repayment channel values based on the following criteria:
Repayments higher than 4000 dollars should be made via bank account. Repayments lower than 1000 dollars should be made via mail.
Action:
To address this task, I created a query that:
Checks for rows where repayment_channel is either missing (NULL) or contains invalid values (‘-‘). Fills in the missing values based on the repayment amount: If the repayment_amount is greater than 4000, the repayment channel is set to ‘bank account’. If the repayment_amount is less than 1000, the repayment channel is set to ‘mail’. Returns the updated values of the repayment_channel while leaving other valid entries unchanged.
Here’s the query:
1
2
3
4
5
6
7
8
9
10
11
12
SELECT
repayment_id,
loan_id,
repayment_date,
repayment_amount,
CASE
WHEN (repayment_channel IS NULL OR repayment_channel = '-') AND repayment_amount > 4000 THEN 'bank account'
WHEN (repayment_channel IS NULL OR repayment_channel = '-') AND repayment_amount < 1000 THEN 'mail'
ELSE repayment_channel
END AS repayment_channel
FROM
repayment;
Result:
The query successfully identified and replaced the missing or invalid repayment channel values with the appropriate repayment method, ensuring that the repayment table adheres to the required criteria.
repayment_id | loan_id | repayment_date | repayment_amount | repayment_channel | |
---|---|---|---|---|---|
0 | 1 | 357 | 2022-10-16 00:00:00+00:00 | 1675.83 | bank account |
1 | 2 | 805 | 2023-01-12 00:00:00+00:00 | 867.22 | debit card |
2 | 3 | 843 | 2022-06-02 00:00:00+00:00 | 718.83 | phone |
3 | 4 | 243 | 2022-12-26 00:00:00+00:00 | 1620.97 | credit card |
4 | 5 | 991 | 2023-03-18 00:00:00+00:00 | 2182.17 | phone |
... | ... | ... | ... | ... | ... |
1495 | 1496 | 192 | 2023-02-11 00:00:00+00:00 | 2506.47 | debit card |
1496 | 1497 | 999 | 2022-11-21 00:00:00+00:00 | 1324.61 | bank account |
1497 | 1498 | 966 | 2023-01-26 00:00:00+00:00 | 1375.81 | debit card |
1498 | 1499 | 20 | 2023-03-30 00:00:00+00:00 | 1231.25 | debit card |
1499 | 1500 | 560 | 2022-04-21 00:00:00+00:00 | 361.11 |
1500 rows × 5 columns
Analyzing Loans for US Clients Using the Online Contract System
Situation:
As of January 1st, 2022, all US clients began using an online system to sign contracts. The analytics team needs to analyze loan data specifically for US clients who used this new system, in order to gain insights on how it has impacted loan agreements.
Task:
I was tasked with writing a query that retrieves the relevant loan data for US clients who signed their contracts using the new online system starting from January 1st, 2022. The output should include the following columns:
client_id: The unique identifier for the client. contract_date: The date the contract was signed. principal_amount: The amount of the loan’s principal. loan_type: The type of loan taken by the client.
Action:
To fulfill the task, I created a query that:
Filters the data to include only rows where the contract_date is on or after January 1st, 2022. Restricts the results to clients who reside in the USA. Selects the relevant columns: client_id, contract_date, principal_amount, and loan_type.
Here’s the query:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
c.client_id,
ct.contract_date,
l.principal_amount,
l.loan_type
FROM
loan l
JOIN
client c ON l.client_id = c.client_id
JOIN
contract ct ON l.contract_id = ct.contract_id
WHERE
c.country = 'USA'
AND ct.contract_date >= '2022-01-01';
Result:
The query successfully retrieved loan data for US clients who used the online contract system starting from January 1st, 2022, providing the analytics team with the necessary information for their analysis.
client_id | contract_date | principal_amount | loan_type | |
---|---|---|---|---|
0 | 267 | 2022-03-08 00:00:00+00:00 | 179230 | personal |
1 | 50 | 2022-01-13 00:00:00+00:00 | 143729 | mortgage |
2 | 280 | 2022-01-02 00:00:00+00:00 | 171122 | car |
3 | 79 | 2022-01-24 00:00:00+00:00 | 43784 | mortgage |
4 | 245 | 2022-01-03 00:00:00+00:00 | 95003 | mortgage |
... | ... | ... | ... | ... |
89 | 8 | 2022-01-13 00:00:00+00:00 | 148282 | personal |
90 | 232 | 2022-02-08 00:00:00+00:00 | 78386 | personal |
91 | 95 | 2022-03-20 00:00:00+00:00 | 39098 | personal |
92 | 123 | 2022-03-06 00:00:00+00:00 | 48213 | car |
93 | 211 | 2022-02-23 00:00:00+00:00 | 147004 | mortgage |
94 rows × 4 columns
Comparing Interest Rates Across Countries for Different Loan Types
Situation:
The business strategy team is considering offering more competitive interest rates to the US market. To support this initiative, the analytics team wants to compare the average interest rates for each loan type across different countries. This comparison will help determine if there are significant differences in the interest rates offered in the US compared to other countries.
Task:
I was tasked with writing a query that retrieves the average interest rates for each loan type, broken down by country. The output should include: loan_type: The type of loan being offered. country: The country where the loan was offered. avg_rate: The average interest rate for the corresponding loan type and country.
Action:
To address this task, I wrote a query that: Groups the data by loan_type and country. Calculates the average interest rate (avg_rate) for each combination of loan type and country using the AVG() function. Selects the relevant columns: loan_type, country, and avg_rate.
Here’s the query:
1
2
3
4
5
6
7
8
9
10
SELECT
loan_type,
country,
AVG(l.interest_rate)as avg_rate
FROM
loan l
JOIN
client c ON l.client_id = c.client_id
GROUP BY l.loan_type, c.country
ORDER BY l.loan_type, c.country;
Result:
The query successfully provided the analytics team with the average interest rates for each loan type across different countries, enabling them to compare rates and support the business strategy team’s decision-making process.
loan_type | country | avg_rate | |
---|---|---|---|
0 | car | CA | 0.112039 |
1 | car | UK | 0.122613 |
2 | car | USA | 0.103636 |
3 | mortgage | CA | 0.044068 |
4 | mortgage | UK | 0.042281 |
5 | mortgage | USA | 0.043860 |
6 | personal | CA | 0.217253 |
7 | personal | UK | 0.198738 |
8 | personal | USA | 0.202721 |