These are the results of the EMC-sponsored IDC Digital Universe study, "Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East"— which found that despite the unprecedented expansion of the digital universe due to the massive amounts of data being generated daily by people and machines, only 0.5% of the world's data is being analyzed.
The digital universe will double every two years between now and 2020. A major factor behind the expansion of the digital universe is the growth of machine generated data, increasing from 11% of the digital universe in 2005 to over 40% in 2020.
Large quantities of useful data are getting lost: The promise of Big Data lies within the extraction of value from large, untapped pools of data. However, the majority of new data is largely untagged file-based and unstructured data, which means little is known about it. In 2012, 23% (643 exabytes) of the digital universe would be useful for Big Data if tagged and analyzed. However, currently only 3% of the potentially useful data is tagged, and even less is analyzed.
Much of the digital universe is unprotected: The amount of data that requires protection is growing faster than the digital universe itself. Less than a third of the digital universe required data protection in 2010, but that proportion is expected to exceed 40% by 2020. In 2012, while about 35% of the information in the digital universe required some type of data protection, less than 20% of the digital universe actually has these protections.
While emerging markets accounted for 23% of the digital universe as recently as 2010, their share is already up to 36% in 2012. By 2020, IDC predicts that 62% of the digital universe will be attributable to emerging markets. By 2020, China alone is expected to generate 22% of the world's data.
As the infrastructure of the digital universe becomes ever more connected, information won't reside within the region where it is consumed, nor will it need to. By 2020, IDC estimates that nearly 40% of data will be "touched" by cloud computing (private and public), meaning that somewhere between a byte's origination and consumption, it will be stored or processed in a cloud.