From pre-historic days till 2003, humanity produced five exabytes (million gigabytes) of data, through texts, drawings and paintings, music, measurements and calculations etc. In 2011, the same amount of data was generated in two days. In 2013, the same quantity will be produced every 10 minutes. We are awash in data and would be drowning in it, were it not for the fast emergence of ‘big data’ analysis.
The term ‘big data’ was only coined in 2006 and really became mainstream in 2012 when the White House and the Davos World Economic Forum (and other important institutions) made it a central issue to tackle. It consists of two main ideas: First, we are now producing data at an exponential rate. Second, new methods of analysis are needed, as the old ones do not work on such a scale.
Data is now being generated in every little corner of our lives: By our computers, our mobiles and smartphones, our cameras, our GPS and other digital devices and all kinds of sensors and scanners, from medical equipment to research tools and security apparatus.
The internet has played a major role in this ‘big data’ revolution: 200 million emails are sent every minute; 20 million tweets are sent out every hour; 100,000 hours of video are uploaded to YouTube every day; and billions of blogs churn out information continuously. In one hour, the internet data flow can fill 10 billion DVDs, which, if stacked, would reach ten times higher than Mount Everest! And this amount is doubling every 20 months.
This brings some good and some bad news. Let us first look at the brighter side.
More information means more potential knowledge, hence more improvements to our lives and more progress for humanity. In particular, recording and analysing consumer behaviour (through store loyalty cards, online browsing and shopping data) as well as monitoring workers’ habits can only lead to increases in productivity and sales.
Indeed, a recent study of 179 large companies showed that those that have made ‘big data’ an essential part of their strategies have increased their productivity by 5 to 6 per cent, all other factors being equal. More importantly for the job market, a report last year concluded that the US alone needs between 140,000 and 190,000 new workers with data-handling expertise and estimated that 1.5 million managers would need to be “data literate”.
In science, ‘big data’ has become part of the research paradigm, particularly with projects like the Human Genome, the Digital Sky Survey and the Large Hadron Collider. Indeed, decoding the human genome took ten years and cost $100 million (Dh367.8 million) the first time it was done, about ten years ago. It can now be done in a week and costs less than $10,000 and it will soon be done in a day and for $1,000 — that is 500 times faster and 100,000 times cheaper. Likewise, researchers at the Large Hadron Collider discovered the Higgs particle only because they could select one in six million events each second and analyse the collection over months — needles in a haystack.
Even fields as diverse as political science and sports are now requiring expertise in data analysis, whereby a small effect extracted from tonnes of information can lead to an edge over one’s competitor. For example, analysing voters’ “Like” patterns on Facebook pages or clicks on particular news stories can help develop better campaign tactics.
But ‘big data’ could, if we are not careful, also translate into ‘big brother’. Companies that specialise in collecting and selling data (known as “data brokers”) have appeared, making big money from our data, without our knowledge. One such company, Acxiom, says it has data on 500 million people worldwide, including almost every US consumer, covering each person’s age, location, education level, purchasing preferences and habits and even whether a woman is pregnant. Another firm has partnered with Facebook and loyalty card companies to aggregate and sell tonnes of data representing consumer-specific behaviour, which can then be targeted by specific advertisements on one’s Facebook page or upon one’s Google search, not to mention email ads and phone calls.
Finally, there are two important ideas I wish to highlight. First, the availability of massive data does not in and by itself translate into knowledge. Indeed, it could produce wrong conclusions, false discoveries or what specialists call “false positives”. Nate Silver’s recent bestseller, appropriately titled The Signal and the Noise, hammers this very point over 500 pages, showing countless examples of how we can be misled into either finding in the data “confirmations” of our own pre-conceptions, or letting the “noisy” data produce a false “peak” here or there. Analysing data, particularly the massive kind, requires sophisticated and careful understanding.
Secondly, this accelerating data revolution implies new kinds of jobs: Data analysts and consultants, who are already being sought and hired worldwide. And for educators, this means the need to adjust existing curricula. As Mark Lautman, an economic developer, recently wrote: “Eighty per cent of the jobs [we] will have in the future don’t even exist yet.” We need to prepare students for the jobs and the lives of tomorrow, not of today.
Nidhal Guessoum is an associate dean at the American University of Sharjah.