Big Data Overview

BIG DATA , IOE, TU

Introduction to Big Data

There is no place where Big Data does not exist! The curiosity about what Big Data is has been soaring in the past few years. Let me tell you some mind-boggling facts! Forbes reports that every minute, users watch 4.15 million YouTube videos, send 456,000 tweets on Twitter, post 46,740 photos on Instagram and there are 510,000 comments posted and 293,000 statuses updated on Facebook!

Just imagine the huge chunk of data that is produced with such activities. This constant creation of data using social media, business applications, telecom and various other domains is leading to the formation of Big Data.

Evolution of Big Data

When was the last time you guys remember using a floppy or a CD to store your data? Let me guess, had to go way back in the early 21st century right? The use of manual paper records, files, floppy and discs have now become obsolete. The reason for this is the exponential growth of data. People began storing their data in relational database systems but with the hunger for new inventions, technologies, applications with quick response time and with the introduction of the internet, even that is insufficient now. This generation of continuous and massive data can be referred to as Big Data.

Forbes reports that there are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating. Internet of Things(IoT) is one such technology which plays a major role in this acceleration. 90% of all data today was generated in the last two years.

What is Big Data ?

One popular interpretation of big data refers to extremely large data sets.

A National Institute of Standards and Technology report defined big data as consisting of “extensive datasets—primarily in the characteristics of volume, velocity, and/or variability—that require a scalable architecture for efficient storage, manipulation, and analysis.” Some have defined big data as an amount of data that exceeds a petabyte—one million gigabytes.

Characteristics Of Big Data

Big data can be described by the following characteristics:

Volume
Variety
Velocity
Veracity
Variability
Visualization
Value

Volume:

The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is one characteristic which needs to be considered while dealing with Big Data solutions.

Variety :

The next aspect of Big Data is its variety.Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.

Velocity –

The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.

Variability –

This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

Veracity

This refers to the quality of the collected data. If source data is not correct, analyses will be worthless. As the world moves toward automated decision-making, where computers make choices instead of humans, it becomes imperative that organizations be able to trust the quality of the data.

Visualization

Data must be understandable to nontechnical stakeholders and decision makers. Visualization is the creation of complex graphs that tell the data scientist’s story, transforming the data into information, information into insight, insight into knowledge, and knowledge into advantage.

Value

How can organizations make use of big data to improve decision-making? A McKinsey article about the potential impact of big data on health care in the U.S. suggested that big-data initiatives “could account for $300 billion to $450 billion in reduced health-care spending, or 12 to 17 percent of the $2.6 trillion baseline in US health-care costs.” The secrets hidden within big data can be a goldmine of opportunity and savings.

The three different formats of big data are:

Structured: Organized data format with a fixed schema. Ex: RDBMS
Semi-Structured: Partially organized data which does not have a fixed format. Ex: XML, JSON
Unstructured: Unorganized data with an unknown schema. Ex: Audio, video files etc.

Sources of Big Data

These data come from many sources like

Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data on a day to day basis as they have billions of users worldwide.
E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which users buying trends can be traced.
Weather Station: All the weather station and satellite gives very huge data which are stored and manipulated to forecast weather.
Telecom company: Telecom companies study the user trends and accordingly publish their plans and for this they store the data of its million users.
Share Market: Stock exchange across the world generates huge amount of data through its daily transaction.

Big Data Applications

These are some of the following domains where Big Data Applications has been revolutionized:

Entertainment: Netflix and Amazon use it to make shows and movie recommendations to their users.
Insurance: Uses this technology to predict illness, accidents and price their products accordingly.
Driverless Cars: Google’s driverless cars collect about one gigabyte of data per second. These experiments require more and more data for their successful execution.
Education: Opting for big data powered technology as a learning tool instead of traditional lecture methods, which enhanced the learning of students as well aided the teacher to track their performance better.
Automobile: Rolls Royce has embraced this technology by fitting hundreds of sensors into its engines and propulsion systems, which record every tiny detail about their operation. The changes in data in real-time are reported to engineers who will decide the best course of action such as scheduling maintenance or dispatching engineering teams should the problem require it.
Government: A very interesting use case is in the field of politics to analyse patterns and influence election results. Cambridge Analytica Ltd. is one such organization which completely drives on data to change audience behaviour and plays a major role in the electoral process.These are some of the following domains where Big Data Applications has been revolutionized:
Entertainment: Netflix and Amazon use it to make shows and movie recommendations to their users.
Insurance: Uses this technology to predict illness, accidents and price their products accordingly.
Driverless Cars: Google’s driverless cars collect about one gigabyte of data per second. These experiments require more and more data for their successful execution.
Education: Opting for big data powered technology as a learning tool instead of traditional lecture methods, which enhanced the learning of students as well aided the teacher to track their performance better.

Challenges of Big Data

One of the issues with Big data is the exponential growth of raw data. The data centers and databases store huge amounts of data, which is still rapidly growing. With the exponential growth of data, organizations often find it difficult to rightly store this data.

The next challenge is choosing the right Big Data tool. There are various Big Data tools, however choosing the wrong one can result in wasted effort, time and money too.

Next challenge of Big Data is securing it. Often organizations are too busy understanding and analyzing the data, that they leave the data security for a later stage, and unprotect data ultimately becomes the breeding ground for the hackers.

INTRODUCTION TO BIG DATA

GOOGLE FILE SYSTEM

MAP FRAMEWORK

NoSQL

SEARCHING AND INDEXING IN BIG DATA

CASE STUDY HADOOP

OLD QUESTION BANK

Big Data Overview