Waterloo Business Review

Machine Learning May Destroy Objective Truth on the Internet Forever

Introduction
In today’s technological culture, the biggest and loudest voices are screaming about artificial intelligence (AI). While it is considered to be a massive part of the future of tech, there seems to be a lot of confusion on what exactly it is, and how it plans on achieving the future it seems to promise. In actuality, AI is an umbrella concept that refers to anything connected to making machines smarter. On the other hand, it’s much less commercialized sibling, machine learning (ML) is the one that specializes in the process by which these machines can learn. ML is essentially letting an algorithm look at past data, and using the patterns that are found to predict future outcomes. As such, the philosophy behind AI is giving decision making over to machine learning and utilizing data to automate that decision making at a speed and frequency unlike ever before.

At this point, it is becoming increasingly clear that machine learning is starting to become a foundational technology all across the globe. With the amount of data being embedded into business operations, there has never been more potential for companies to make both smarter and faster decisions. As such, it isn’t surprising when you begin to notice that modern startups are starting to prioritize the incorporation of machine learning into their own software solutions, backed by some of the biggest companies in the world. In May 2020, Intel announced an investment of $132 million into 11 disruptive technology startups - all of which focused on building machine learning applications.

However, startups investing in new technology is nothing new. The real kicker comes when people start to realize that this technological shift is not only taking place in startups, but in legacy institutions which are by and large the ones that wait out new technology until it becomes something largely unavoidable. For example, earlier this year, CIBC - one of the oldest banks in Canada infamously known for its outdated technological stack - announced a “Digital Transformation” which would use “scalable computing power for CIBC's enterprise data lake and [an] AI/ML platform to power smart, innovative client solutions.” While these institutions move notoriously slow, even their choice to dip their toe into the technology means they know they need to soon traverse a data driven landscape one way or another. To add to this, they aren’t the only ones. As early as 2019, 70% of companies in the USA either had a digital transformation strategy in play or were currently working on one, and these companies spent more than $2 trillion on it. If someone follows the market, they can see where the investment is actually landing up.

“In 2020, the global artificial intelligence market size was valued at $62.35 billion and is now expected to expand at a compound annual growth rate (CAGR) of 40.2% from 2021 to 2028.” Not to mention that the global machine learning market itself is projected to grow from $15.50 billion in 2021 to $152.24 billion in 2028 at a CAGR of 38.6%.

The amount of money and manpower being pressed into the vacuum of machine learning proves that it is not a fad, but a reality that will manifest itself in the consumer landscape as a common tool sooner than we know. However, by the nature of ML, there is a double-edged sword that calls into question the underlying consequences of increased dependence on algorithmic decision making.

The problem with giving any form of AI control in an already complicated digital age is difficulties in being able to enforce whether or not it made neutral and unbiased decisions. Giving the reins to ML may effectively damage this neutrality for good. The way businesses/governments operate and the way they communicate with the public would shift in incredibly dangerous ways. While the rise of machine learning plants the seeds for a new technological era, it also builds a foundation that has the potential to spread bias and misinformation to a degree unparalleled in human history.

The Black Box of Machine Learning
The successful growth of new technologies is usually a sign of a prospering economy and a forward-minded community. The problem does not come with the renewed interest in transforming the digital landscape, but in utilizing machine learning to do so. As AI becomes more and more powerful, it's true that it can lead to extremely positive changes and transform economic, social, and political landscapes, but it can also pose significant risk and become a hotbed of unintentional misuse.

What makes machine learning different from traditional technologies is that it essentially gives up the ownership of decision making to AI. While the input and output are controlled, and thus whatever comes to consumers and used by the company is chosen by developers, it is very difficult to actually determine how the technology got to its specific decision. For example, if a banking institution utilized a machine learning algorithm to determine if users were likely to close their accounts, its input could be “transaction data”, and the output would be a simple “yes or no”. What is being put in and then received is controlled and measured, but how the system actually gets to that conclusion is more complicated.

What this means is that machine learning applications sometimes have the potential to develop in unexpected and undetectable ways and that makes it by its very definition, a black box technology.

“A black box is a device, system or object which can be viewed in terms of its outputs without any or limited knowledge of its actual inner workings.” In many cases, deep machine learning can be monitored by professionals. There is a long and arduous process by which ML engineers develop learning algorithms - written code that evaluates the data and makes conclusions from it. The actual programming behind ML and how it “learns” is complex and understood by very few. Hence, even in the hands of an expert, there is cause for concern in blindly initiating this technology to every industry. Even if you consider that everyone using machine learning software at every company is qualified enough to maneuver such a complex technology - which is not the case - the inherent nature of the software is all about giving AI the reins. The eventuality of such a choice is that soon it will outrun the person, no matter how closely they keep watch. The base on which AI operates is neutrality. Sometimes that neutrality itself becomes an initiator of bias and misinformation. AI has very little understanding of ethics because it is largely a human concept, and when you give it control over decision making and then lose track of how it's making those decisions, that's when the chaos ensues. When this happens, machine learning can become far more dangerous than people give it credit for.

This is when a phenomenon known as algorithmic bias comes to the forefront, and becomes something that - if undetected and unmanaged - can significantly harm our economic, social, and political landscape as we know it.

Algorithmic Bias
To understand algorithmic bias one needs to understand algorithms. Algorithms are a set of instructions for accomplishing a task. In relation to machine learning, they helpdictate how a machine processes data to produce its predictive outputs. In other words, they are the brain behind machine learning and are the methods by which they learn to think for themselves.

Algorithmic bias can be understood as a “deviation from some standard of fairness.” Some form of systematic and repeatable error in a system that is responsible for some person’s understanding of unfair or wrongful data processing. There are a few common types of algorithmic bias:

Negative Legacy is bias that comes directly from bias present in the input “training data”. Training data is what the machine learning model uses to make its predictions. For example, a study by Princeton researchers noticed that ML models trained to perform language translation tasks on a set of input texts that reflected traditional gender tropes instead of more modern ones led to output associations that associated female names with attributes “parents” and “wedding” and male names with “professional” and “salary”.

Algorithmic Prejudice is the correlation between protected features and other factors. For example, early policing algorithmics did not have access to racial data when making predictions, but models relied heavily on geographic data (eg. zip code), which is correlated with race. Often, communities of the same race congregate in similar areas. In this way, models that are “blind” to demographic data like gender and race can still encode this information through other features that are statically correlated with protected attributes.

Underestimation is bias as a result of insufficient data. For example, Amazon’s earlier ML model to screen applicants in the hiring process ended up largely favouring male applicants over females because of Amazon’s disproportionate male workforce taking up a large amount of the sample input size of the model.

The Parasitic Impact of Algorithmic Bias
Since machine learning is now coming to the forefront of every part of our social lives, it is affecting political choices, personal finance, health care, education, and pretty much anything you can think of. Algorithmic bias can affect society because it can unintentionally become an object of discrimination and misinformation plaguing all our most frequently used systems. The numerous instances of these are often left largely undiscussed and are swept behind the headlines of machine learning’s numerous benefits.

For example, the previous iteration of Google’s Vision AI image classifier first incorrectly labelled an African American couple as gorillas and later a picture of a black hand upholding a monocular as a black hand holding up a gun. The white couples were correctly classified and the white hand holding up the monocular as well. In this case, both instances were evidence of underestimation where there was not a diverse enough sample for it to not make unintentionally racist predictions. This kind of issue might seem small in scope, but Google’s Vision AI was used in multiple areas before this change was implemented; the AI was implemented by large policing forces, schools, supermarkets, and even in apartment complexes. The total number of incorrect classifications per year by computer vision and machine learning are still unknown, but some specific and horrifying examples come to the forefront. One of them was a teenager by the name Ousmane Bah who was wrongly accused of theft at an Apple Store and the other was Amara K.Majeed who was wrongly accused of involvement in the 2019 Sri Lanka bombings. On both accounts, this was because of incorrect facial recognition.

The public sector faces some of the same issues. In the USA, the Department of Justice employs machine learning in the criminal risk assessment algorithm to calculate recidivism (“the odds of reoffending”) and determine different aspects of punishment and parole. At the Data for Black Lives conference in 2018, research revealed that because there were more African American inmates in the prison system, the algorithms which would work by using the population of the prison system as input data, would tend to offer that majority harsher recidivism scores. While the idea that harsher recidivism scores can be provided to a demographic that has greater representation in the group is not unreasonable, the problem is that this group is overrepresented as a result of biased policing systems from decades of institutional racism and discrimination in the USA. The terrifying realization that the algorithm would put more African Americans in prison and then used that inherently biased data to make other predictions on other American Africans indicated a cyclical racially biased criminal system.

Without actively realizing it MALpractices are now actually sending people to jail and keeping them there for all the wrong reasons. Considering that the total reported number of inmates in US prisons was over 3 million in 2019, that is a lot of lives left to the whim of technology people don’t fully understand. The consequences of technological progression are sometimes forgotten in the desire to progress and grow.

When software like this becomes problematic, it creates a larger systemic issue with the organizations that end up employing it. Companies that have used faulty ML in the past have had to deal with reputational damage, lost opportunities, and a lack of trust from users.

In regards to the reputational damage, Amazon’s AI-infused hiring practices in 2018 were harshly criticized when it was revealed that their ML model was biased against women as a result of being trained on primary male resumes.

The Opportunity Cost Conundrum
The most common pitfall is that economic damage often plays a part when businesses respond to ML and AI related misuse. In January 2020, Facebook was ordered to pay $550 million to settle a class action lawsuit over its unlawful use of facial recognition technology. Companies stand to lose a significant sum over a long period of time as ML may cause more and more ethical issues on larger scales. The loss of money from reputational damage can take a long time to recover from.

Reputational damage and that lack of trust from users is starkly at odds from the potential millions that can be gained with successful ML implementations. Benefits in the “form of revenue growth, time, capital and efficiency savings can range from between $250,000 to $20 million” and the “ROI (Return on Investment) on most standard machine learning projects in the first year is 2-5 times the cost”.

This particular dilemma is what makes managing the demand for machine learning and the risks associated with messing it up a potential conversation for businesses - not a done deal. With end users also conflicted between the benefits of the software and then its potential privacy and ethical concerns, it solidifies that there is no easy answer available in dealing with innovative technology that can both progress and regress society.

The Chain of Bias
At the end of the day, machine learning, like most technologies, is complicated. It remains a powerful system of change in this world, but that doesn’t always have to mean that change will be positive. Living in a world already bursting at the seams with issues related to misinformation, the dangers of machine learning very accurately reflect the time. If the technology posed one and done kind of issues perhaps progression wouldn’t be a concern. But one big thing that a lot of people seem to miss is that machine learning, and more specifically algorithmic bias, form a powerful chain of events that are pervasive to every other system a company uses.

In essence, the understanding is that bias in machine learning compounds. If biased data is introduced at any cycle of the ML lifecycle of a product, it will affect every single system that it communicates with. Just as humans spread our biases, so does machine learning. And if the technology is truly a reflection of our times, it starts to beg the question on just how wise it truly is to trust a communicating network of AI technologies to do the job that sometimes we humans can’t even do.

Conclusion
The answer for how exactly the world should react to the era of machine learning is honestly unclear. While machine learning is without a doubt a complex and potentially dangerous technology, it’s one that there might be no permanent recourse against. It seems that instead of discussing solutions against the technology, people should prepare themselves for a new world order and the same kind of shift that took place when the Internet became as crucial to society as it is now. Perhaps more effort