Big Data is THE biggest buzzwords around at the moment and I believe big data will change the world. Some say it will be even bigger than the Internet. What’s certain, big data will impact everyone’s life. Having said that, I also think that the term ‘big data’ is not very well defined and is, in fact, not well chosen. Let me use this article to explain what’s behind the massive ‘big data’ buzz and demystify some of the hype.
Basically, big data refers to our ability to collect and analyze the vast amounts of data we are now generating in the world. The ability to harness the ever-expanding amounts of data is completely transforming our ability to understand the world and everything within it. The advances in analyzing big data allow us to e.g. decode human DNA in minutes, find cures for cancer, accurately predict human behavior, foil terrorist attacks, pinpoint marketing efforts and prevent diseases. Take this business example: Wal-Mart is able to take data from your past buying patterns, their internal stock information, your mobile phone location data, social media as well as external weather information and analyze all of this in seconds so it can send you a voucher for a BBQ cleaner to your phone – but only if you own a barbeque, the weather is nice and you currently are within a 3 miles radius of a Wal-Mart store that has the BBQ cleaner in stock. That’s scary stuff, but one step at a time, let’s first look at why we have so much more data than ever before.
In my talks and training sessions on big data I talk about the ‘datafication of the world’. This datafication is caused by a number of things including the adoption of social media, the digitalization of books, music and videos, the increasing use of the Internet as well as cheaper and better sensors that allow us to measure and track everything. Just think about it for a minute:
- When you were reading a book in the past, no external data was generated. If you now use a Kindle or Nook device, they track what you are reading, when you are reading it, how often you read it, how quickly you read it, and so on.
When you were listening to CDs in the past no data was generated. Now we listen to Music on your iPhone or digital music player and these devices are recording data on what we are listening to, when and how often, in what order etc.
- Today, most of us carry smart phones and they are constantly collecting and generating data by logging our location, tracking our speed, monitoring what apps we are using as well as who we are ringing or texting.
-Sensors are increasingly used to monitor and capture everything from temperature to power consumption, from ocean movements to traffic flows, from dust bin collections to your heart rate. Your car is full of sensors and so are smart TVs, smart watches, smart fridges, etc. Take my new scales (which I – as a gadget freak – love!), they measure (and keep a record of) my weight, my % body fat, my heart rate and even the air quality in our bed room. When I step on the scales they automatically recognize me, take all the measurement and then send them via Bluetooth to my iPhone which gives me stats on how my Body Mass Index etc. is changing. This information is then also synced with the data collected by my Up band, which tracks how many calories I have consumed and burnt in a day and how well I have slept at night.
- Finally, combine all this now with the billions of internet searches performed daily, the billions of status updates, wall posts, comments and likes generated on Facebook each day, the 400+ million tweets sent on Twitter per day and the 72 hours of video uploaded to YouTube every minute.
I am sure you are getting the point. The volume of data is growing at a freighting rate. Google’s executive chairman Eric Schmidt brings it to a point: “From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days…and the pace is accelerating.” Not only do we have a lot of data, we also have a lot of different and new types of data: text, video, web search logs, sensor data, financial transactions and credit card payments etc. In the world of ‘Big Data’ we talk about the 4 Vs that characterize big data:
- Volume – the vast amounts of data generated every second
- Velocity – the speed at which new data is generated and moves around (credit card fraud detection is a good example where millions of transactions are checked for unusual patterns in almost real time)
- Variety – the increasingly different types of data (from financial data to social media feeds, from photos to sensor data, from video capture to voice recordings)
- Veracity – the messiness of the data (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech)
So, we have a lot of data, in different formats, that is often fast moving and of varying quality – why would that change the world? The reason the world will change is that we now have the technology to bring all of this data together and analyze it. In the past we had traditional database and analytics tools that couldn’t deal with extremely large, messy, unstructured and fast moving data. Without going into too much detail, we now have software like Hadoop and others which enable us to analyze large, messy and fast moving volumes of structured and unstructured data. It does it by breaking the task up between many different computers (which is a bit like how Google breaks up the computation of its search function). As a consequence of this, companies can now bring together these different and previously inaccessible data sources to generate impressive results. Let’s look at some real examples of how big data is used today to make a difference:
- The FBI is combining data from social media, CCTV cameras, phone calls and texts to track down criminals and predict the next terrorist attack.
Facebook is using face recognition tools to compare the photos you have up-loaded with those of others to find potential friends of yours (see my post on how Facebook is exploiting your private information using big data tools).
- Politicians are using social media analytics to determine where they have to campaign the hardest to win the next election.
- Video analytics and sensor data of Baseball or Football games is used to improve performance of players and teams. For example, you can now buy a baseball with over 200 sensors in it that will give you detailed feedback on how to improve your game.
- Artists like Lady Gaga are using data of our listening preferences and sequences to determine the most popular playlist for her live gigs.
- Google’s self-driving car is analyzing a gigantic amount of data from sensor and cameras in real time to stay on the road safely.
- The GPS information on where our phone is and how fast it is moving is now used to provide live traffic up-dates.
- Companies are using sentiment analysis of Facebook and Twitter posts to determine and predict sales volume and brand equity.
- Supermarkets are combining their loyalty card data with social media information to detect and leverage changing buying patterns. For example, it is easy for retailers to predict that a woman is pregnant simply based on the changing buying patterns. This allows them to target pregnant women with promotions for baby related goods.
- A hospital unit that looks after premature and sick babies is generating a live steam of every heartbeat. It then analyses the data to identify patterns. Based on the analysis the system can now detect infections 24hrs before the baby would show any visible symptoms, which allows early intervention and treatment.
And these examples are just the beginning. Companies are barely starting to get to grips with the new world of big data. In conclusion then, big data will change the world. In terms of language I prefer to talk about the ‘datafication of the world’ in relation to the ever-growing amounts of data and ‘large-scale analytics’ (or simply ‘analytics’ because what is large now will be normal tomorrow) in relation to our ability to analyze and harness big data. At the moment I am spending a lot of my time helping companies understand the massive potential as well as big threats of big data. I work with executive teams of companies spanning all sectors and sizes to help them develop strategies to harness big data and find each of these discussions and projects amazingly fascinating because they all open up new opportunities. Here, I would love to hear from you. Do you see opportunities for yourself or your business? Does this new world of big data scare you or excite you? Have you already started harnessing big data? Or have I failed to convince you and do you believe big data is just hype? Please share your views..
Well done and thanks for an excellent article. My concerns would be two fold
1. The misuse or exploitation of information that is not in the best interests of the community.
2. Implications for laziness where we lose the skill of using our own minds to make sense of the world.
I’d be interested in your views.
Hi, thanks so much for your great explanation on this hot topic. In fact, what you are describing is not the future but the present we are currently living. We have a lot more to do, that’s for sure.
I work for a pharma company specialized in cancer, and we are already doing this kind of analysis. We’re getting very good results and I bet we will get more success in a few year times.
I consider the “big data” term not so well chosen and very opportunist. We have been applying business intelligence for many years, so this supposed new trend for me it’s only a matter of quantity rather than a matter of quality.
I see some fears, with regards to the individual intimacy and the manipulation of the marketing campaigns that will create us the need to acquire goods we don’t really need.
But for sure, the opportunities are much bigger than the fears or threads, let’s see if the IT community thread this important topic properly and we don’t create another new yearly hype to make more money out of it, regardless the real sense of the technology.
Thanks for an excellent article. I’ve read a few articles on big data, and amazed none of them mention anything about statisticians. Do they play any role in big data? I’m not a statistician but thought they would.