BIG DATA , IOE, TU
Introduction to Big Data
There is no place where Big Data does not exist! The curiosity about what Big Data is has been soaring in the past few years. Let me tell you some mind-boggling facts! Forbes reports that every minute, users watch 4.15 million YouTube videos, send 456,000 tweets on Twitter, post 46,740 photos on Instagram and there are 510,000 comments posted and 293,000 statuses updated on Facebook!
Just imagine the huge chunk of data that is produced with such activities. This constant creation of data using social media, business applications, telecom and various other domains is leading to the formation of Big Data.
Evolution of Big Data
When was the last time you guys remember using a floppy or a CD to store your data? Let me guess, had to go way back in the early 21st century right? The use of manual paper records, files, floppy and discs have now become obsolete. The reason for this is the exponential growth of data. People began storing their data in relational database systems but with the hunger for new inventions, technologies, applications with quick response time and with the introduction of the internet, even that is insufficient now. This generation of continuous and massive data can be referred to as Big Data.
Forbes reports that there are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating. Internet of Things(IoT) is one such technology which plays a major role in this acceleration. 90% of all data today was generated in the last two years.
What is Big Data ?
One popular interpretation of big data refers to extremely large data sets.
A National Institute of Standards and Technology report defined big data as consisting of “extensive datasets—primarily in the characteristics of volume, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis.” Some have defined big data as an amount of data that exceeds a petabyte—one million gigabytes.
Characteristics Of Big Data
Big data can be described by the following characteristics:
The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is one characteristic which needs to be considered while dealing with Big Data solutions.
The next aspect of Big Data is its variety.Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.
The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.
This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.
This refers to the quality of the collected data. If source data is not correct, analyses will be worthless. As the world moves toward automated decision-making, where computers make choices instead of humans, it becomes imperative that organizations be able to trust the quality of the data.
Data must be understandable to nontechnical stakeholders and decision makers. Visualization is the creation of complex graphs that tell the data scientist’s story, transforming the data into information, information into insight, insight into knowledge, and knowledge into advantage.
How can organizations make use of big data to improve decision-making? A McKinsey article about the potential impact of big data on health care in the U.S. suggested that big-data initiatives “could account for $300 billion to $450 billion in reduced health-care spending, or 12 to 17 percent of the $2.6 trillion baseline in US health-care costs.” The secrets hidden within big data can be a goldmine of opportunity and savings.
The three different formats of big data are:
Sources of Big Data
These data come from many sources like
Big Data Applications
These are some of the following domains where Big Data Applications has been revolutionized:
Challenges of Big Data
One of the issues with Big data is the exponential growth of raw data. The data centers and databases store huge amounts of data, which is still rapidly growing. With the exponential growth of data, organizations often find it difficult to rightly store this data.
The next challenge is choosing the right Big Data tool. There are various Big Data tools, however choosing the wrong one can result in wasted effort, time and money too.
Next challenge of Big Data is securing it. Often organizations are too busy understanding and analyzing the data, that they leave the data security for a later stage, and unprotect data ultimately becomes the breeding ground for the hackers.