From 28.9 million US households failing to qualify for $2000 of incremental credit (FDIC, 2015), to 44 million adults being credit-unscorable citing absence of credit histories, poor access to credit undermines mobility and equity in our societal fabric. Such inequity exacerbates racial, ethnic, and economic fractures as low-income consumers are often unbanked due to inflexible minimum account requirements, hidden service fees, and eroded trust in formal institutions. In addition, over 15% of African-American and Latino consumers lack credit histories compared to 9% of the Caucasian pool. Such exclusion is principally responsible for the adoption of fringe banking including loan-sharks and usurers that use egregious one-time transaction costs and higher interest rates for preliminary financial services to perpetuate a cycle of indebtedness. Credit is seen as the cornerstone of sustainable development evidenced by a 27.9% rise in US job creation with microcredit (Brown, 2010) and 176% increase in average income through the Grameen Bank (Shahidul, 2014). Pioneered by big data analytics, the use of alternate risk modelling in credit analysis will serve to herald inclusive credit-worthiness. Given user behavioral data provides granular insights across social media interaction, utilities, and housing, can analytics help level the gaping inequalities plaguing formal credit markets? Or would it instead abet this very inequity by unearthing and penalizing critical financial pain-points for underserved communities?
Fundamental credit exclusion stems from profit-maximizing banks portraying risk-averse behaviour to avoid the small loans market. This is motivated by higher perceived default risk for such communities due to “thin-file” inadequate credit histories. The absence of real-estate-based collateral, coupled with higher overhead costs for small-scale lending creates a business need to avoid this market segment, as the risk exceeds the bad-rate cutoffs (the probability of default). Alternate risk modelling is vital in “enhancing” behaviour characteristics for such “thin-file” applications, allowing distinguishment for default predictions and identification of truly credit-worthy underbanked individuals. Primarily, new consumer-facing data sources yield better insight into payment behaviour than traditional tabular credit histories limited to interactions with formal banking channels. Such alternate data spans utilities (usage and payment patterns for gas, electricity, and water), telecom (billing histories ranging from TV and mobile to broadband), fringe lending (histories of alternate credit arrangements spanning payday loans, cheque-cashing etc.), and social media interaction amongst others. The relevancy of alternate insights was addressed in a pilot initiative by the Commercial International Bank (CIB) to create credit scoring for drivers based on their customer ratings and comments (Loufield, Johnson, & Ferenzy, 2018). This study led to not only the integration of a novel segment into their lending portfolio but also the achievement of the lowest segment-wide default rates observed. Other preliminary successes include Fair Isaac (FICO Expansion Score) that uses membership, bankruptcy, judgment, and asset data to cover almost 70% of traditional unbanked consumers, or LexisNexis that leverages address history and professional licensure for over 90% coverage.
Using FairIsaac alternate scoring for each score bracket, early tests measuring repayment rates against defaults showed promising empirical evidence in Figure 1. As expansion score increases, the ratio of repayment to default significantly increases, establishing credit worthiness. As a result, the newer models clearly exhibit significant levels of risk separation. Oliver Wyman aptly summarizes the intended consequence of alternate data being better default recognition with more confidence and accurate probabilities (Caroll & Rehmani, 2017), as presented in Figure 2. “Thin-file” credit histories yield flatter default risk curves due to the inability to distinguish applicants on fewer data points leading to a rejection for a majority of the applicant pool. Alternate data, by spearheading newer comparison frontiers (like the extent of social media interaction, utility payments, mobile usage statistics), adds more predictive power to existing credit records. This added separation helps achieve a steeper risk curve that identifies true delinquency risk and enables a larger percentage of the lending pool to fall within the bank’s risk practices.
Figure 1: Fair Isaac Good:Bad Ratio Results (Schneider & Schutte, 2007)
Figure 2: Risk Separation with Alternate Data (Carroll & Rehmani, 2017)
In addition, the improved separation stems from alternate data’s ability to eliminate arbitrary “lumping” policies in formal credit. “Lumping” refers to reporting practices where events with vastly varied triggers are assumed to yield equivalent predictive value. For example, foreclosures arising out of illness and crippling medical bills are not statistically equivalent to those caused by a national economic collapse, or from poor real-estate investing. However, using them as a single proxy value for delinquency unfairly penalizes customers for “Black-Swan”, or unpredictable and infrequent, events out of their control. Specifically, a study by the Federal Reserve found that most customers with impaired credit did not engage in credit-harming behaviour again (Bos & Nakamura, 2013). Furthermore, in an experimental lending to customers with negative details on credit reports, defaults were under 27% for two years, suggesting the events causing the impaired credit often involved circumstances beyond control that do not reflect the individual’s true default probability. Alternate data has the ability to identify instances when “lumping” has occurred, principally through the recognition of newer confounding variables collected that may differ across previously similarly-reported events. Such variables offer high information entrophy through granular separation of historic events to better predict future default.
Additionally, alternate analytics is beneficial in retracting the length of the time-frame used to ascertain credit-worthiness. Historically, for applicants with limited credit histories, banks have had to rely on long histories with limited data-points, a method which can unfairly penalize periods with significant financial distress. As the National Consumer Law Center (NCLC) notes in the aftermath of the Foreclosure Crisis of 2008-2009, over 4.5 million families facing foreclosure had their credit scores significantly degraded in the United States, with those mortgage defaults being vital factors for traditional credit models for the subsequent decade (Wu, 2013). Traditional modelling hence serves to entirely exclude over 70% of defaulters from credit markets for 10-year horizons (Hedberg & Krainer, 2012) with scores not returning to pre-foreclosure levels for over seven years (Brevoort & Cooper, 2010). Primarily, access to utility and telecom payment patterns and fringe banking histories provides a more recent and detailed perspective on delinquencies, or the lack thereof. This larger pool of information allows alternate models to apply preferential weighting that can possibly better characterize present credit reliability of the applicant, hence not overly skewed by historic outliers that no longer reflect the individual’s financial health.
However, the implementation of alternate analytics is often constrained, linked to practical considerations on precision. In an initiative where NCLC reporters ordered their own “alternate” credit data from four major data brokers, 13 of 15 reports studied were riddled with inaccuracies, including personal identifiers (address, mobile details etc.), employment sector classification and salary estimates. Such errors stem from difficulties in data assimilation to link often-incomplete consumer data across several information sources (“fuzzy matching”). The consequence is resultant noise in the dataset that causes greater estimation errors in credit models and yields them unfit for deployment at scale. Secondly, alternate modelling also spurs the potent risk of exacerbating inequality by capturing key financial distress factors for underserved individuals. For example, the inclusion of utility payment patterns can unravel defaults on 30-60 day windows, often involving low-income households with volatile incomes delaying summer and winter bill spikes to manage other critical expenses like rent and food. In addition, use of educational data is mobility-impeding and can discriminate against groups affected by systemic racism, as over 36% of non-Hispanic whites have a college education, whilst only 16% Hispanics and 23% African-Americans do in the United States. Similarly, over 40% non-Hispanic whites have managerial or professional employment histories as opposed to 30% African Americans (Wu, 2013). In addition, since individuals are likely to interact with others of similar socio-economic, cultural, or racial background, the use of social media histories is self-fulfilling by grouping entire social sects into high-risk buckets. The issue was most recently pronounced by Lenddo, a credit firm attempting to judge individual credit-worthiness as a function of their social media interaction (friendships, messages, networks etc.) with any existing loan defaulter. Given residential housing segregation, the inclusion of telecom location data may also magnify societal inequity and asymmetry by serving as a proxy for race or income.
This tendency for alternate modelling to reiterate societal bias draws to the black-box argument in machine learning. The difficulty in recognizing exact factors driving probability estimates combined with ingrained biases in training data create vehicles to perpetuate discrimination, or as Cathy O’Neill poignantly puts, “weapons of math destruction”. The mitigation of these pitfalls for alternate modelling have prompted novel scoring designs, particularly “second-chance scoring”, pioneered by UltraFICO. “Second-chance scoring” presents a voluntary opt-in service that re-evaluates only those applications rejected from traditional modelling by using custom analytics to verify the possibility of re-approval at better rates. Hence, new analytics can only benefit, not harm, the applicant's credit-worthiness. By leveraging partnerships with Finicity, Experian, and the National Consumer Telecom and Utilities Exchange, UltraFICO presents an equivocally revolutionary product to responsibly integrate behavioral analytics to redefine credit.
To conclude, by harnessing the volume, velocity, and variety of big data, alternate analytics presents disruptive solutions to the fissures plaguing modern banking. From capturing enhanced behavioural insights, resolving pre-existing “lumpings”, and refactoring prediction time-frames, this information paradigm can fundamentally restructure the cornerstones of credit-worthiness and democratize the narrative of social mobility. While model biases, discrimination and data precision issues hold this optimistic narrative in contempt, novel “second-chance scoring” designs refocus on the up-side that the reimagination of risk modelling heralds. As FairIsaac, UltraFICO, and LexisNexis begin to dominate this landscape, conventional consumer banking truly stares right at the precipice of a paradigm.