What is Big Data?
I joined the Learning Services team in June 2013. Coming to e-learning from a librarianship background there have been many words and terms that were unfamiliar to me
‘Big data’ is a term that I hadn’t heard before I started working in Technology Enhanced Learning (TEL); so when the chance came up for me to attend a lunch-hour Information Services Session on big data, I booked right away. The hour long session titled “Big Data and little ole you” was presented by Iain Dobson, who is an IT Services Manager at UoE.
Iain explained what big data was by breaking down what data is. Data is composed of many things such as, word, number, symbols, etc. You can get information from data if the data is in an understandable context. For example, “010186” looks like numbers but when like this “01/01/86” can see it’s a date. Once the data is understood, knowledge can be extracted from the data which can lead to making predictions, opinions, and judgements. This in a nutshell is what is meant by data.
As for Big data, Wikipedia defines big data as:
“The term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”
I have inserted a table that shows the progression from a bit to a petabyte, which is how data is measured. Currently, a petabyte is considered “big data”, but this is likely to change as the technology develops. The technology is allowing us to store and analyse more data all the time and this is why a petabyte might not be considered big data in the not so distant future.
|1 Bytes||8 Bites|
|1 KB (Kilobyte)||1,024 Bytes|
|1 MB (megabyte)||1,024 KB||873 pages (plaintext)|
|1 GB (gigabyte)||1,024 MB||341 digital photos|
|1 TB (terabyte)||1,024 GB||349,525 digital photos|
|1 PB (petabyte)||1,024 TB||357,913,941 digital photos|
(Information courtesy of Computer Hope)
In order to comprehend just how massive big data is, Iain presented us with a picture (see below). This helps with understanding how much data is being put onto the internet every day. The mind boggles at how much information is put online, even in a minute! As seen in the image, 571 new websites are created every minute. I don’t even know if I look at 571 different websites in a year?
After a brief introduction to big data, I can start to grasp how much information is being put on to the internet, but what is the importance of all that information? What can be done with it?
Big Data can be very useful if it can be broken down and analysed for information. There are two different ways to approach analysing data which are descriptive and exploratory. Descriptive data analysis means quantifying the data and illustrating it meaningfully – like the image to the left. Exploratory data analysis looks for relationships and patterns to the data in order to understand the data. This leads to further exploration of the meaning of the data.
Currently, only 5% of data is analysed but advances are being made every day. Big data has already been used for data analysis in the retail field. Not surprisingly this was used to have a better understanding of their customers (Tesco clubcard ring a bell?). Additionally, some exciting projects are happening at the University of Edinburgh. One such project is being led by Dr. Beatrice Alex; she is analysing over 6 million documents and images in order to gain a better understanding of nineteenth century trade across the British Empire. Further details on the project can be found here:
Hopefully I have given you some insight into what Big Data is and what can be done with it. It will be exciting to see what the future hold for Big Data and what secrets can be found.