As an addicted Trekkie, sadly, I must quickly disabuse you — even though there are people who consider Big Data to be every bit (byte?) as exciting as the USS Enterprise’s second officer.
Big Data actually refers to immense datasets that are collected in fields as diverse as astronomy and genomics. As Wikipedia tells it, “as of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created”, so there are a lot of data about.
The dynamic of Big Data is the search for relationships among these data and teasing out correlations that may not be obvious from the constituent datasets that comprise it. Our technical capacity to search immense data repositories means that correlations can be found in a way never before possible.
In their new book Big Data: a revolution that will transform how we live, work and think, Viktor Mayer-Schönberger, an internet governance academic from Oxford, and Kenneth Cukier, the data editor of The Economist, recount an interesting example of how Big Data, collected by Google from the three billion search requests it receives each day, was used to track influenza in the US.
Google took the 50 million most common search terms used by Americans and compared the list with Centers for Disease Control (CDC) data on the spread of seasonal flu between 2003 and 2008.
After stupendous computer activity, they settled on 45 search terms that were strongly correlated with official figures. These included many obvious terms such as flu, cough, medications for cough but others that were not so obviously linked.
As Mayer-Schönberger and Cukier point out in their book, unlike the CDC, “they could tell it in near real time, not a week or two after the fact”. Although not without their critics and errors, Google flu trends are now available for many countries.
The authors concede that there is no universally accepted definition of Big Data, but rather see the term as referring to “things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and government, and more”.
Our capacity to collect, link and analyse data electronically is growing exponentially. Mayer-Schönberger and Cukier draw a parallel between the present and the era that followed the invention of the Guttenberg printing press around 1439.
In the half-century starting in 1453, they quote an estimate that eight million books were printed, “more than all the scribes of Europe had produced since the founding of Constantinople 1,200 years earlier”.
In 2003, following a decade of effort, the human genome was sequenced — “Now… a single facility can sequence that much DNA in a day”.
And because Big Data includes all the data available, population samples will no longer be needed in the way they are today and the work of statisticians will be redefined.
There are many features of Big Data to ponder for medicine. How will we practise with more information about correlation and less about causation?
If Big Data shows that people who take regular exercise have better cancer survival, what will we advise our patients? Is the correlation sufficient to advise them to exercise, even though the causal pathway is not known?
This will increase our need, and that of our patients, to live with uncertainty.
What meaning does privacy and even confidentiality have in this new age?
We should surely be thinking about and discussing these things now.
Professor Stephen Leeder is the editor in chief of the MJA and professor of public health and community medicine at the University of Sydney.
Jane McCredie is on leave.
Potential COI: The author’s son Nick is country director of Google France.