Nonbank online lenders are making more microloans than commercial banks nowadays. The microfinance institutions worldwide serve more than 70 million borrowers and a total loan portfolio estimated at AUD $40 billion in 2019(Microcredit Summit).
This trend will only continue to accelerate in the 2020s due to the dramatically increased amount of individual consumer loans. Thanks to advanced network technology, conforming loans are now so easy to make that online lenders have been able to outcompete banks on speed and ease of approval.
Understanding the Risks of online Micro-credit loans
The risks we face are mainly credit risk and fraud risk. Fraud risk can also said to be the most difficult aspect in the business of risk control. There are a large numbers of start-up finance companies who lack of corresponding risk control experience and capabilities, so they expose themselves to professional fraudsters. They identify business vulnerabilities, and through the online engagement, can bring financial companies significant losses. On the other hand, it is also related to the continuous improvement and the research strength of the fraud industry chain. Fraud has evolved from individual fraud in the past to an organized and large-scale group fraud. The theft of accounts within the fraud network and data leakage have been derived as the basic account database A series of complete industrial networks including black market transactions, ID Mapping, and targeted attacks have been developed. The division of labour and technology are also very professionally refined.
Credit risk is defined as the risk that the borrower defaults. In other words, it is the possibility that the borrower fails to repay the debt or loan in time due to various reasons. Generally, you can analyse credit risk from two perspectives: ability to repay and willingness to repay. However, in the case of micro-credit loans, since the amount is generally between $2,000 and $50,000, a person who is employed will rarely default. At this point, defaults are more from the perspective of repayment willingness, that is, the attitude of the borrower to repay the loan. Many people will borrow money and not repay it. If we identify them with the goal of default probability, many people with normal borrowing intentions can still be singled out.
Data types commonly used in big data risk control
We mainly divide the data into four dimensions, including identity attributes, credit attributes, behavioural data, and consumption attributes.
Identity attributes are the most basic, including real identity information like name, ID number, mobile phone number, bank card details, address, marriage, education, employment experience, etc. Credit attributes include many aspects, such as past performance records, fixed assets, current assets, income information, historical loan application information, repayment records, overdue records. All will be included to measure a person's repayment ability and willingness to repay. In the past, we went to the bank to apply for loans, the above two dimensions are the traditional data sources for risk control. However, because most people have relatively incomplete records in this area and the process is lengthy and troublesome, only a small number of people can enjoy real time financial loan services.
The behaviour data is commonly used in big data risk control, this involves a wide range of aspects, mainly the behaviour characteristics reflected by the user's activities on the APP, address book information, travel records, social platform data. Including browsing different categories frequency, time, risk appetite. Consumption attributes are the first area of data expansion. This is mainly found in e-commerce or transaction data. For example, daily shopping products, consumption amount and consumption time can be analysed from different angles to analyse a person's consumption stability, consumption grade, and risk characteristics such as financial capability.
Based on the collection of the data above, we can develop a user profile, which is a tagged user model abstracted based on their social attributes, living habits and consumption behaviour. "Tagging" the user is the core of the user portrait. Each label is usually a human-defined unique identifier. Using highly concise labels to describe a category of people will play a vital role in the risk control of online microfinance.
The primary method of data acquisition
The first method is through authorization. Suppose there is a borrower, a blue-collar worker, who does not have a credit card, and needs to borrow AUD $10,000 dollars on a pier to pier (P2P) platform. When you come to this platform you will ask the user for authorization, and the platform will forward the user to a third-party data company in the form of a web page. This third-party data company will start the data scraping tool after obtaining the person's authorization. It can be in the form of API or web crawling mode, and then go to a third-party or fourth-party website to crawl data, such as a certain credit card company, or a certain e-commerce company.
The second mode is the network mode. In fact, the essence is the same as the bank credit investigation mode. For example, if you see that there are many device-based fingerprint and Software Development Kit (SDK) companies on the market, after embedding the code, some data will be collected from the APP periodically or in real time. Such as the hardware of the device, the email number of the device, what APP has been installed on the device, and even the location of the device.
Application of big data risk control
Thanks to advance digital technology, many micro enterprises and individuals can obtain financial support without collateral. How to truly extract risk representations from big data and further transform this data into real-time financial risk decision-making services is the challenge.
The common method of big data risk management is face recognition. The principle is to call the API interface of the police system and identify the photos/videos taken by the applicant in real time with the ID card data reserved by the customer in the police system. Then verify the application through face recognition technology whether the person is the borrower himself. Fingerprint recognition and voiceprint recognition are the same. Then there is fraud identification, identifying online application information and behaviours. Companies can use a Software Development Kit or Java Script to collect applicants' behaviours in various links. Calculate the time it take the consumer to read the terms and condition, fill in information, and apply for loans. If the time taken is much shorter than the normal customer application time, such users can be identified as fraudulent users. The time for users to apply is also very important. Generally, applicants who apply for loans after 11pm have a higher percentage of fraud and default.
The identification of mobile device information includes commonly used consumption records such as bank card consumption, e-commerce shopping, public utility expense records, and bulk commodity consumption. You can also refer to travel records, mobile phone bills, and special member consumption. For example, consumption data such as the time of first-class air rides, the level of strata fees, golf club consumption, yacht club membership fees, luxury members, luxury car shop consumption records and other consumption data can be used as an important reference for their credit scores.
The data from these sources plus the data provided by the applicant can form a strong profile of the applicant. Using this data set, a risk control system can be established to effectively determine the user's fraud risk, repayment willingness, and repayment ability. Taking cash loans as an example, we can classify common big data risk control rules for cash loans as follows:
(1) Cross-validation comparison
Cross-check comparison is a commonly used term in the accounting industry. In risk control, it mainly refers to the method of using multi-dimensional data to test the logical correspondence relationship. For example, suppose the user fills in the "income level" as variable A, and "work location" as Variable B, the "area" of the IP address at the time of application is variable C. From the perspective of A+B, if the user fills in A with a monthly income of tens thousand dollars, but B shows that he works in an orchard in a regional area, we should probably suspect that the user is concealing his income. From the perspective of B+C, if variable B shows that the user's work location is in Sydney, but C shows that the IP address is in the North Territory at the time of application, or frequently changes the IP address to apply, perhaps we should consider the risk of financial fraud. If combined with the dimensions of A+B+C, if variable B shows that the customer works in the Sydney CBD, variable A shows that the monthly income is $6,000, and the IP address and application location are both in Sydney. Through the cross-validation of the three variables, it can be inferred that the applicant is a middle-income office worker resident in Sydney. If you add more variables, such as filling in the cell phone number frequently used call location, etc., you can verify the reliability of the data from more angles.
There are subtle differences between cross-checking and cross-validation comparison, but both are methods to verify the authenticity and reliability of users with multi-dimensional data. Here is just a brief example. The applicant provides the address of the company, but uses the external data verification result shows that the company is not at this address, and the user may fraudulent.
(3) Strong feature screening
Some variables have greater weight in the consideration of risk control, such as the frequency of long-term borrowings. The higher the number of times means that users have serious long-term borrowing and have a higher risk of default. The location information of the mobile device can also be used to perform a certain degree of identification. If the device often appears in a gambling place or gambling area in the middle of the night, the applicant's risk of involvement in gambling is higher.
(4) Risk relationship
The risk relationship mainly verifies the information of individuals associated with the applicant. For example, whether there are many blacklisted contacts in the applicant's address book, whether the applicant's application phone number or IP has been used by another applicant, etc.
In general, the applications for big data are still in the early stages of development, but we are slowly discovering its great value. Digital technology combined with data brings real-time risk control capabilities. Because of this ability, risk screening, risk control model adjustments, and financial product adjustments can all be carried out in real time, which is hard to imagine in the past. Therefore, today's finance is combined with the real economy and real-time data to enhance sustainable financial capabilities. The major concern at present is the amount of data is not large enough, not complete enough, and how to coordinate the contradiction between data openness and citizens privacy. In the future, it is necessary to combine artificial intelligence and blockchain, Internet of Things and other technologies to realize the immutability of data, the ability to collect data in a timely manner, so as to better serve the financial sector.
All dollars in AUD