The Data We Cannot See
- 10 minutes read - 2106 wordsVideo
Essay
A number of years ago, the Internet here at the College slowed to a crawl. Tons of calls poured into the Helpdesk. “The Internet is down,” people shouted, raising their pitchforks in protest and pounding on the door. Locating the problem was not hard, a student tucked away in a dormitory near College creek downloading an inordinate amount of data. So, I cut his network access.
The source of the data? None other than the Library of Congress. “My Internet went down,” a student said, walking through my door with an annoyed look. I informed him that he had impacted business operations and asked him what he was downloading from the Library of Congress. He replied, “Not what. The. The Library of Congress.” He had been downloading everything they had made available Online, which was a lot.
At the College, books are important, so I am going to use them as a measurement. Roughly speaking, all the books in the Library of Congress could be completely housed in 10TB of data. Anyone here remember card catalogs? Anyway, the 500 or so students on campus today download 10TB of data every week. While I would love to brag they are reading that many books, most of it is actually video and images.
Human beings have always created and consumed data. Archaeologists specialize in finding the data left behind and piecing it together. Historians scour records, writings, and accounts to do the same. Over time, what has changed is the capacity for human beings to store data, from the development of language, oral tradition, to writing, painting, printing, photography, audio, video and, now, the transition of the world from analog to digital. A good bit of this is quickly summarized in this picture I took of Machu Picchu, showing the juxtaposition of data found in the format of stacked stones and the penultimate personal expression of data-creation, the Selfie!
Throughout history, the value of information has always depended on a number of variables. However, in the beginning of the 21st century, there was a fundamental shift, where technology created the opportunity for the value of any data to exceed the cost of storing it. Coupled with innovations in the creation and management of data, we reached a tipping point. All data now has value.
Much in the same way all Gold or Silver has value, all data has value. The type of data is largely irrelevant, it could be something as simple as key words in your email, browsing habits and history, something as benign as a family photograph, or as obvious as intellectual property. What is interesting is that the value of the individual piece of data matters not, because the storage of data is so cheap that the value of data outweighs the cost of storing it. This was the beginning of the Great Data Rush.
Organizations that understood all data has value, capitalized on the opportunity. A cycle began, that continues to this day, a data cycle. Data is extracted and stored. From that data, revenue is generated in one of, now, many ways. That revenue is used to fund innovation in technologies that either generate more data, siphon more revenue from data, or both. The flood of Data Wealth built the foundation for data-based services in the connected world.
The allure of free e-mail and social media granted by those profiting most from the Great Data Rush evolved into a symbiotic relationship, where the individual—and later, even the corporation—was economically forced to move away from hosting their own services and data, simply due to cost. The data was pulled away from the small piles, where it was only valuable to the individual or the organization, and into the Cloud where revenue can be extracted. Believe it or not, Nintendo actually discovered this back in the 1980s, but it was just a game to them.
Companies that store data are unregulated banks. Data is not a currency, in most cases, but collecting it allows for it to be treated in a similar manner. Companies can take your data, combined with the data of everyone else utilizing their systems, and generate income. This goes beyond advertising, to selling a part of you that you may not even be aware exists. And, their return is far beyond that of banks, because of the exponential rise of data.
We have entered the zettabyte era. In 2012, the amount of data in the world passed a zettabyte. Now, more than that amount of traffic traverses the world every year, and it is estimated there could be 40 zettabytes of data in existence today. If this much data was stored in books, it would take four billion Libraries of Congress to hold them all. But, where is the data coming from?
Well, the individual is not only the consumer, but now the creator of data. Everyone of us today creates more data than we likely know or even imagine. I am not even a fan of social media, and I still take tons of pictures of my adorable son. Here he is waking me up. Here he is going for a walk. Playing on the playground. And, in case you do not trust me when I say how adorable he is, here is a picture of him from the side. Think of how many pictures I take, just to be able to find ones not showing his face or giving away his location. But, it is not even just this, it is not the deliberate content creation.
Let’s take a look at my data. This is me. Exciting, right? Taking a closer look, I have both public and private data. I’d like to think it’s more like this, but let’s keep things realistic. If we are being honest, there is a part of me that I do not even know. But, there is more to it, because there is also a part of me that I believe is private and is not. Don’t search for my name on the Internet, if I didn’t want to see what’s out there, you really don’t! Here’s the kicker, though, part of that unknown data, that blindspot, that part of us that we do not even know, has been learned, but not by us. It may not be public knowledge, but it is known by one entity or another. How was it learned?
Going back to those images of my son, if you had the data itself, you could extract the exact latitude and longitude of where the picture was taken. Now, given all the pictures we take, not to mention everyone around us taking pictures, imagine the story that could be written knowing the exact location, date, time, and every individual in all of those images. This is the data we cannot see. Embedded in every new technology, every gadget upgraded, is the capacity to pull more data out of you. Stepping further, the applications themselves not only use these technologies to pull data out of you, they feed off of your input. This is the data we cannot see.
The reason that social media applications are so powerful, is because you interact entirely within their sphere of influence. Social media tracks everything you do, from what you post, to what you view, whether you like or dislike something, to how quickly you scroll past or interact with content. A profile is built to keep you engaged, to ensure you not only to return to the platform but to stay logged in for longer periods of time. Studies are now emerging showing information affects the brain in the same manner as money and other rewards. And, the more time you stay logged in, the more they learn, allowing for you to be easily targeted.
With the sheer amount of data available and the tools to store, sort, and manipulate it, information can be tailored, like a designer drug, and targeted like a smart bomb, delivered to individuals as the most effective form of advertising ever deployed. Information is here to stay, here to manipulate, and we have given it so much data about ourselves without thinking of the consequences, that it is startlingly effective. We need to engage with the ethics of the use of information as an addictive substance, and how information unseen is being weaponized to manipulate.
Earlier, I told a story about a student here downloading the Library of Congress. I laughed, some others hopefully did. But, no one batted an eye when I mentioned how quickly I was able to track the student’s location, or determine what he was downloading and from where. Why don’t we ask these questions? DNA tests are widely available, and they are relatively cheap. You can obtain your own biological information for under $100. But, at what cost? Your data, for many of these systems, is both housed and used by the company to generate more revenue. What is the cost of a DNA test where your data is kept private? Hundreds of dollars more, at least. So, at a minimum, your genetic information is worth that, and more as time goes on.
Can you imagine what will be buried under mounds of data for future historians and data archaeologists to uncover? Not only will our biological information and that of our families be exposed, but everywhere we went and every thought we expressed, whether it was a facial expression caught on a camera, a text message, a post made on social media, or even an emoticon we used, is captured. What about your heartbeat?
In all of history, I can recall one heartbeat rate during one event. Can anyone guess? It was in 1969? Neil Armstrong, taking manual control and landing on the moon, 150 beats per minute. How many heartbeat rates are recorded now? All of it can and will be collected, analyzed and used to build a profile, perhaps even a narrative. Built in conjunction with the data we cannot see, and with such aggregation and learning, that narrative will only become more accurate. Accurate and frighteningly useful, in both the right and wrong ways. Will Robert Frost’s famous poem relate to us in the future, or will its inhabitants always be able to look back and see every step they took forward? Which path do we choose now?
Well, as an individual, we can train ourselves to more cognizant of the data we create and share, and understand how it can be manipulated against us. Simply put, the more information you allow into the world, the more vulnerable you are. But, getting that message out to the masses is an entirely different problem, and getting them to accept it at the expense of convenience is even harder.
We need to reframe the conversation on data, even consider introducing new vocabulary into the discussion, such as “data exploitation.” Given our previous analogy to banks, we could consider language such as “data laundering,” or even go as far as suggesting “data trafficking.” We need to understand that data has value, it can be exceptionally personal and private to the point where it is very much a part of who we are, and it can be given, taken, traded, and used against us, even in the same way our loved ones can be leveraged.
We need to consider how to build a firewall within ourselves. We need to weave logic, privacy, and data into the traditional curriculum and reinforce such a tapestry with critical thinking. We should find a way to help children build the capacity to filter the bombardment of information, instead of allowing them to become addicted to such enormous amounts of input. It may even be necessary to introduce mindfulness, to help them unplug. Such a firewall, I believe, would prove useful for children as they inherit a world that struggles not only with the volume of information, but how that information will be used against them.
The amount of data, of information, that we create and consume is incredible. Not only have we lost the ability to be private, it is uncertain that we even have the capacity to handle the sheer volume of information we consume. We need to build a firewall within all of us, to protect not only ourselves, but the World at large, as a very real part of who we are, both singularly and collectively as a species, steps into a binary immortality. An immortality that can be shaped and manipulated, and turned on and off, by a will that does not reside within our own souls.
Thank you for your time.