Skip to content

In this dataset, we are using the data from the Posper to analyse it and trying to find the pattern in the Prosper data.

Notifications You must be signed in to change notification settings

AhmedKhaled8/Prosper-Loans-Data-Exploration

Repository files navigation

Prosper Loans Data Exploration

by Ahmed Khaled Mohamed Salah Abdelrahman

Dataset

This dataset is a financial dataset and this is related to the loan, borrowers, lenders, interest rates and stuffs like that. Prosper or Prosper Marketplace Inc. is a San Francisco, California based company specializing in loans at low interest rates to the borrowers. In this dataset, we are using the data from the Posper to analyse it and trying to find the pattern in the Prosper data. [*]

The dataset is one of Udacity datasets for this project... Download

Summary of Findings

1. Univariate Visualizations

Feature Modification Comments
BorrowerState The states (levels) were sorted according to the count of each It's obvious that most of the compaines borrowers live or come from California with about 8600. New York, Texas, and Florida come next with more than 4500 borrower for each. Wyoming is the least.
LoanStatus No modifications were required for the count plot as the feature is ordinal At the moment the data was collected, based on the new dataset, 45 thousand loans were still active, 15 thousand were completed, 5 thousand were canceled and around a thousand were past their due by (1- 15) days.
LenderYield A good bin size should be selected to observe unusual peaks in the histogram From the two visualizations above, the majority of the lenders, 30 thousand (43%), have an yield (interest) in the range of 10%-20%. Looking in more detail, we can observe that around 6000 lenders (8.5%) use exactly 30%, it is the most common percentage among all lenders. Not too many use a percentage higher than 30% or less than 10%.
Occupation As there are 68 occupations in the new data set. Selecting the top 25 occupations should simplfy the visualizing As illustrated, 6000 (8.5%) of the borrowers are listed as professionals. Executives, administrative assistants and teachers come in second, third and fourth with about 3000 each (4.25%). Many jobs have about 2000 instances. Tradesman - Mechanic is the 25th in all occupations with about 700.
EmploymentStatus No modification is required in either sorting or slicing Employed is the most common status among borrowers with about 55 thousand of 70 thousand borrowers (78.5%). Full-time was specified separately by 5 thousands so as Self-employed. Not too many retired or not-employed people were recorded in the data set.
EmploymentStatusDuration As the first visual show a very high right skew, log scale in the x-axis will be very helpful. A division by 12 is done to deal with years instead of months just for convention The majority of borrowers worked around 10 years, 17500 of them (25%). 15000 (21.5%) worked for 3-10 years and 12500 (18%) worked for 10-30 years.
CurrentlyInGroup Using pie chart would be more illustrating visual to show percentages of the True/False levels 97.3% of the borrowers are working individually, 2.7% are working in groups.
IncomeRange No modifications were required 37% of borrowers has income in the range of 50K-75K, 34% in the range of 25K-50K and 22% gains income of 100K and more.
StatedMonthlyIncome An outlier caused the histogram to be all cumulated in one bin, a kde plot would show that. Focusing on small range of interest would solve the problem The majority of the StatedMonthlyIncome, 13000 borrowers (18.5%) is in the range of 4500 - 5000. Surprisingly, No borrower stated that his monthly income is in the range of 7000-8000.
Investors Histogram showed a high right skew. Using the log scale would create more reasonable visual. Apart from borrowering without investors (28.5%), the number of borroweres who have more than 68 and less than 100 investors are the highest (10.7%)
ProsperRating_numeric No modifications were required The distribution of the prosper rating shows a uni-modal curve at the rating 4. on both sides the curve tends to decrease until it reaches a tail at rating 1 with 8.76% and rating 7 with 5.93%.
LoanOriginalAmount The histogram should a multi-modal curve. This may be reasonable as the company may provide discrete specific amounts of loans The distribution of the amounts of loans starts from 1000 increasing until it reaches it's first peak at 5000, 12000 borrowers were granted loan amount of 5000$. The curve then decreases but suddenly has a sharp rise to 10000 $. Actually 8000 borrowers granted this amount as a loan. The curve decreases again and again a sudden rise to the amount of 15000 $ which has 8000 borrowers. The same happens and we may find rises at 20000$ and 25000$ with 1800 borrowers for each amount.
BorrowerRate The histogram would show a multi-modal curve if a suitable bin size is selected From the two above figures, we can detuct that there are 3 main rates of interest for the borrower; 15%, 26%, 31.5%. Nearly for each of the previous interest rates, about 5000 borrowers are recorded.

2. Bivariate Visualizations

X Y Visualization Comment
Investors LoanOriginalAmount Scatter There is a direct realtionship between the number of investors and the amount of the loan. A line with a positive slope can be fitted between the points.
IncomeRange LoanOriginalAmount Violin / Box The borrowers with the 'Not employed' and the borrowers with income in the range of 1-25K share the same distribution in the amount of loan with a mean of qbout 4000$. The range of amounts increase with the increase of the income. A borrower with an income in the range of 50K-75K $ are less likeable to have a loan amount greater than 15K$. The distribution of the loan amount in the case of the income range (100K+) is quitely flat, the majority in this category are recieving loan of amount of just less than 15K$. Still, it is also likely if you are in this category to get more amounts than those in the (50K-75K).
ProsperRating_numeric LoanOriginalAmount Violin / Box We can deduce that the borrower with a Prosper rating of 4 or above is more likely to get a loan higher than 10K $. The majority of the borrowers who have a rating of 4, 5, 6, 7 got loans with amount in the range (5000, 15000). Surprisingly, the highest amount listed in the data is in the rating 6 not 7. Nearly all borrowers with a rating of 1 gets loans of 4500 $, even if others with rating 1 got more than 5000$.
Investors BorrowerRate Scatter An inverse realtioship is noticed between the number of investors and the borrower rate of interest. The majority of the borrowers who have 600 or more investors, their rate of interest doesn't surpass 15%.
ProsperRating_numeric Investors Bar Plot PAs seen, with higher prosper rating, a higher number investors the borrower get. Few borrowers got a rating of 7, but they still manage to attract more investors.
CurrentlyInGroup LoanOriginalAmount Histogram Being in a group doesn't have that big effect of getting more amount to your loan. The distribution in quite the same between both cases of working individually and working in a group.

3. Multivariate Visualizations

X Y Hue Visualization Comments
ProsperRating_numeric Investors CurrentlyInGroup Point Having a higher Prosper rating helps you attract more investors, but combining the grade with working in a group helps you even more. With high ratings, the number of investors increases with the acknowledge of being in a group more than individually.
IncomeRange LoanOriginalAmount ProsperRating_numeric Point Considering taking each Prosper rating separately, we can deduce that the amount change of the loan is higher with the change in the income level in the ratings of 4, 5, 6, and 7. The ratings of 1, 2, and 3 don't show that high gradient. Surprisingly, A rating of 7 showed a lower mean of loan amount that lower ratings 5, and 6.

Key Insights for Presentation

The presentation focuses in:

  • Occupation: The most common occupation among borrowers.

    • We found out that 'Professional', 'Executive', 'Administrative Assistant', and 'Teacher' are the most common occupations.
  • Income: The ranges of income of the borrowers.

    • We found out that most of borrowers (34%) are in the range of 25K-50K. And 22% are in the range of 100K+. Few people are either not employed or stated that they don't earn.
  • Prosper Rating: The raing given by Prosper to the borrowers.

    • We found out that rating 4 is the most common rating and it decreases on both sides until it reaches its tail at ratings 1 and 7.
  • Investors vs Loan Original Amount:

    • A direct realationship is observed between the 2 features.
  • Prosper Rating vs Loan Original Amount:

    • The borrowers with ratings of 4, 5, 6, and 7 are more likely to get loans with an amount of over than 10K.
  • Prosper Rating vs Investors:

    • A combination of higher prosper rating, a higher number investors the borrower get. Few borrowers got a rating of 7, but they still manage to attract more investors.
  • Prosper Rating vs Investors vs Currently In Group

    • A higher Prosper rating helps you attract more investors, but combining the grade with working in a group helps you even more.
  • Income vs Loan Original Amount vs Prosper Rating

    • We can deduce that the amount change of the loan is higher with the change in the income level in the ratings of 4, 5, 6, and 7. The ratings of 1, 2, and 3 don't show that high gradient.

About

In this dataset, we are using the data from the Posper to analyse it and trying to find the pattern in the Prosper data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published