Introduction
Our interactions with the world are increasingly dependent on personal data. We expect our food delivery app to know our location, our Google searches to return exactly what we are looking for on the first page of results, and our doctors to have access to our entire medical history during a visit so they can make a timely and accurate diagnosis. These expectations allow companies like Uber, Athenahealth, and Google to become the owners of massive amounts of data, as they collect, store, and process the information. MIT’s Sloan Review conducted an investigation that estimated that the volume of data being stored by data-centric companies is increasing by 40% each year (Short & Todd, 2017). Google alone is estimated to process 3.5 billion searches per day, which equates to 20 petabytes of data (Heshmore, 2017) (for comparison, that is approximately one million times the amount of data that can be stored on your smartphone).
For data centric companies data is not only useful to service their own technology, but it is also a secondary source of income and a contributor to the company’s value. These companies have physical offices, labs, and equipment, which are obvious tangible assets that can easily be assigned a monetary value based on comparable market prices. However, the true value of data centric companies lies in their data, which is less tangible and harder to properly value.
Data is the largest overlooked intangible asset in the current technology market. Although the importance and value of data has come to light in recent years, effectively valuing data is not as simple as finding an equivalent asset or transaction to derive its value from. Society’s current view and treatment of data is preventing it from being correctly valued, as it is not currently treated as a unique entity, nor do transactions involving data involve the assessment strategies and security measures that other high value assets receive. The true value of data will only be understood after society fundamentally shifts their understanding and treatment of data. However, this shift requires changes in human behaviour and technological infrastructure, which are elements that are often slow to change.
Increasing Value Through Structured Sharing
Proprietary data provides companies in certain sectors with a competitive advantage. However, when data is “shared” like any other commodity, it becomes easier to place a tangible value on. Although this idea is applicable to many different industries, especially e-commerce data, traffic data, and urban development data, healthcare is an excellent candidate to demonstrate this point. Healthcare systems generate a high concentration of publicly funded data through research activities and trials, and this data often resides in “fit-for-purpose silos” where it is only applied to the direct research it was collected for, and not used for any other applications (Deloitte, 2021). However, technology companies like Google have shown that this data is valuable in parallel research applications through their use of GoogleFit and FitBit data to further advance research within Google Health (Lomas, 2020).
Breaking down data silos and maximizing the usefulness of these datasets can be facilitated by data marketplaces. Data marketplaces are places where existing data sets can be purchased by teams who are looking for data. The marketplace not only facilitates the connection, but creates a tangible, recorded transaction that places a direct monetary value on the data (Deloitte, 2021). As more of these transactions happen, it gets easier to place a monetary value on data as there are comparable market prices from past transactions to base the price of current offerings on. These marketplaces also increase the value of the data for the initial holders, as the data is not only valuable in the context of their research, but also has a secondary, monetary value that can be realized in the marketplace (Deloitte, 2021). Additionally, data marketplaces lower the barriers of entry to markets where collecting the required data is expensive and time consuming, as new entrants can purchase existing data instead of completing their own data collection processes.
The idea of data marketplaces has started to emerge in the Canadian healthcare system, as Canada’s Digital Technology Supercluster has funded the Secure Health and Genomics Platform, which will create new digital capabilities for using health and genomic data to improve the diagnosis and treatment of patients (Deloitte, 2021). The focus will start on cancer and difficult to treat diseases, then expand into other areas of clinical practice, health, and wellness (Deloitte, 2021). The aim of this platform is to make data easier to find, share, and develop into insights that improve the health and wellness of patients (Deloitte, 2021).
Data sharing facilitates increased innovation by breaking down barriers to entry, as well as allows for faster innovation as the data collection process doesn’t necessarily need to be conducted as often.
The most obvious example of how sharing data can expedite innovation lies in the response to COVID-19. A direct example of a transaction where data was treated as a valued asset, equivalent to money, was in Israel's deal with Pfizer where an agreement was reached to share data in exchange for the opportunity to vaccinate their citizens before many larger and wealthier countries (Laurent, 2021). Through this transaction, Pfizer gained access to “aggregated epidemiological data”, which allowed them to monitor the effectiveness of their vaccine in real time, and determine the efficacy of their vaccine against COVID-19 variants faster, and with more confidence then their competitors (Laurent, 2021).
In the case of the COVID-19 pandemic, within and across nation sharing of data allowed for a quick response, more accurate tracking, as well as record breaking vaccine development. Covax, a group of 92 countries who have agreed to collaborate to accelerate testing and vaccine development, formed the ACT Accelerator which is directly aimed at facilitating the sharing of data among researchers and manufacturers (“Covax”, 2021).
Although COVID data was not shared through data marketplaces, it demonstrates the power that lies in breaking down the data silos that currently exist in society. Data marketplaces will not only provide a structured, centralized location for businesses and research teams to share data, which can help accelerate innovation, but also involve the exchange of money. As the number of transactions for data increases, it will become easier to associate data with a monetary value based on historical transactions or transactions made for similar data assets, creating a clearer, more accurate valuation for data as an asset.
The Data-Centric Development Approach
Zooming into how data is handled at an institutional level also reveals areas where the treatment of data needs to change to allow it to be valued correctly. Currently, the most popular approach in software development is domain-driven development. This process places logic at the centre of the application, which means that the application is designed and built around business logic. A simple example of this is the log-in process to a social media site. When building the log-in process, developers would have focused on the specific text boxes and buttons that you need to be able to log in, as well as security measures to prevent other people from accessing your account. However, by focusing on business logic, the domain-driven development paradigm makes storing multiple copies of data a common practice. Continuing with the earlier example, this means that multiple copies of your user name, first/last name and phone number could be stored within the application, and developers are free to make more copies of it at any point they feel is necessary.
The fundamental problem with this approach is that the database and the data that it contains are not treated as unique entities. When a company develops multiple applications or products, each one has a unique database, where data may be copied, exported, imported or changed at will. This degrades the value of the data, as many copies can exist in various, inccurate states (not updated, updated a year ago, or fully up to date). Additionally, this paradigm allows for data to be copied as many times as the development team sees fit.
A thought experiment about data copying that has recently emerged compares data to other valuable assets such as cash or gold. In order to preserve the value of assets like cash and gold, they aren’t freely copied, in fact it is illegal to produce fake currency. If the true value of data is to be realized, it also needs to be treated as a valuable asset that should not be freely copied. This realization is a significant problem for the software community, which has been freely copying and moving data in order to extend existing products or build new ones.
A solution to this problem lies in a fundamental paradigm shift. The data-centric development approach encourages developers to treat pieces of data and the database as a whole as a unique entity. This approach is achieved by placing the database at the center of the application(s) and requiring that developers store all of the data in one location. This also prevents the need for copying data, and completely avoids the issue with inaccurate data, as there is only one copy at any given time. This approach means that data is treated as the primary and permanent asset, such that it is a Single Source of Truth (SSOT).
This paradigm allows development teams to adopt an overarching philosophy that treats data as the high value asset that it is. This psychological and practical shift allows the true value of data to emerge, as it becomes an asset that needs to be carefully managed and respected by its holders. The data itself gains a higher value as there is only one copy, and the paradigm prevents any further copying or manipulation.
The data-centric approach also increases the value of the data for consumers and stakeholders within the company, as the stored data becomes more reliable and accessible with this approach. As a SSOT, the data is easily accessible since it is stored in one location, and there is no longer any question about which database is the most representative. Additionally, there is no doubt that the data is up to date as any updates or changes are made instantaneously to the SSOT, which removes the update pipelines or nightly chron jobs that exist in domain-driven development. Overall this approach allows for a faster, more accurate, and secure product for consumers, and more accurate metrics for company stakeholders.
Realizing the Value of Personal Data
Although awareness around internet safety and the sharing of personal data has risen in recent years, the general public is still generally unaware of the true value of their personal data. This has been one of the contributing factors that allows corporations to treat data in a poor, unprotected manner, as data providers are not educated enough to advocate for proper treatment and storage of their information. The increased volume and complexity of data flows has strained the traditional knowledge-and-consent system and left individuals without meaningful control over their personal information and privacy (Government of Canada, 2019). One solution to this is increasing the government's investment in education and legislation to ensure that data is treated as the high-valued commodity that it is. Although legislation does exist to disclose data collection and storage policies to consumers, the onus is placed on the consumer to understand and accept this information, which is called a Privacy Self-Management approach (Government of Canada, 2019). The required privacy information is often communicated in long and complex formats, which consumers neither have the time or legal training to understand (think about the privacy agreement that you click ‘I Accept’ on before reading when you access a new digital product) (Government of Canada, 2019). The Canadian government has recognized this problem, and has recommended clarifications for the existing Personal Information Protection and Electronic Documents Act (PIPEDA) to improve both education and protection efforts (Government of Canada, 2019). Educating consumers about the value of their data, as well as how it could be used by corporations would allow them to make informed decisions about their personal information, as well as hold corporations accountable. Additional legislation can also help to ensure that corporations are communicating with consumers in an appropriate manner, which also contributes to helping consumers make informed decisions.
Overall, increasing education and legislation around the treatment of data will help the true value of data to be recognized. Consumers would be able to make better decisions about when to divulge their data, and corporations will be kept accountable for the correct treatment of the given data. Although this may decrease the amount of data that consumers are willing to provide, it allows for the proper treatment of data, and gives consumers the control they should have.
Conclusion
As society evolves, more aspects of our everyday lives become dependent on data. As our dependency increases, our knowledge, perception and treatment of data needs to align with data’s importance in order to allow for its true value to be realized. This will allow for data to be valued more accurately and easily, as comparable transactions will start to exist. The first step to properly valuing data is to break down the fit-for-purpose silos that exist in research based industries through data marketplaces. Placing high value data in marketplaces not only allows the owner to realize a monetary value for their asset, but also creates a transaction history which can help to value other data sets in the future, which further increases the accuracy of how data is valued. The second step to properly valuing data is through a fundamental paradigm shift in how software developers build products. Shifting from a domain-centric to a data-centric approach allows data to be treated like other high-value, unique assets, such as cash and gold. This approach makes data an asset that needs to be carefully managed and respected by its holders. Additionally, the data itself gains a higher value as there is only one copy. The final step in properly valuing data is education and legislation to give consumers the information and power they need to make informed decisions. Being informed about the value and corporate use of personal data will allow consumers to advocate for the proper treatment of their data, and allow them to think about their data as they would think about other important assets.
Combining these three changes requires time and effort from many different parties, and will require perseverance and determination from leaders in government and industry. However, the effort to properly value data will result in the value of data centric companies to be accurately understood, and consumers will benefit from additional quality, security and control over their personal assets.