Digital Transformation (ZZ-1103)
Published in School of Digital Science, Universiti Brunei Darussalam, 2022
This lesson is from Wikipedia and Courseera course by BCG & University of Virginia
Introduction
- Big data is a field that treats ways to
- analyze,
- systematically extract information from, or otherwise
- deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.
- The dictionary definition of big data is
- extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
- Big data analysis challenges include
- capturing data,
- data storage,
- data analysis,
- search,
- sharing,
- transfer,
- visualization,
- querying,
- updating,
- information privacy, and
- data source.
- Big data was originally associated with three key concepts:
- volume
- variety
- velocity.
- Current usage of the term big data tends to refer to the use of
- predictive analytics
- user behavior analytics, or
- certain other advanced data analytics methods that
- extract value from big data, and
- seldom to a particular size of data set.
- There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem.
- Analysis of data sets can find new correlations to
- spot business trends
- prevent diseases
- combat crime and so on
- Analysis of data sets can find new correlations to
- Scientists, business executives, medical practitioners, advertising and governments alike regularly
- meet difficulties with large data-sets in areas including
- Internet searches, fintech, healthcare analytics, geographic information systems, urban informatics, and business informatics.
- meet difficulties with large data-sets in areas including
- The size and number of available data sets have grown rapidly as data is collected by devices such as
- mobile devices, cheap and numerous information-sensing Internet of things devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.
- Relational database management systems and desktop statistical software packages used to visualize data often have difficulty processing and analyzing big data.
- The processing and analysis of big data may require “massively parallel software running on tens, hundreds, or even thousands of servers”.
- What qualifies as “big data” varies depending on the capabilities of those analyzing it and their tools.
- Furthermore, expanding capabilities make big data a moving target.
- For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.
Definition
- The term big data has been in use since the 1990s, with some giving credit to John Mashey for popularizing the term.
- Big data usually includes
- data sets with sizes beyond the ability of commonly used software tools
- to capture, curate, manage, and process data within a tolerable elapsed time.
- Big data philosophy encompasses
- unstructured
- semi-structured and
- structured data, however the main focus is on unstructured data.
- Big data “size” is a constantly moving target;
- as of 2012 ranging from a few dozen terabytes to many zettabytes of data.
- Big data requires a set of techniques and technologies with new forms of integration to reveal insights from data-sets that are diverse, complex, and of a massive scale.
- Vs
- “Variety”, “veracity”, and various other “Vs” are added by some organizations to describe it, a revision challenged by some industry authorities.
- The Vs of big data were often referred to as the “three Vs”, “four Vs”, and “five Vs”.
- They represented the qualities of big data in volume, variety, velocity, veracity, and value.
- Variability is often included as an additional quality of big data.
- A 2018 definition states
- “Big data is where parallel computing tools are needed to handle data”, and notes,
Characteristics
- Volume
- The quantity of generated and stored data.
- The size of the data determines the value and potential insight, and whether it can be considered big data or not.
- The size of big data is usually larger than terabytes and petabytes.[30]
- Variety
- The type and nature of the data.
- The earlier technologies like RDBMSs were capable to handle structured data efficiently and effectively.
- However, the change in type and nature from structured to semi-structured or unstructured challenged the existing tools and technologies.
- The big data technologies evolved with the prime intention to
- capture, store, and process
- the semi-structured and unstructured (variety) data
- generated with high speed (velocity), and huge in size (volume).
- Later, these tools and technologies were explored and used for handling structured data also but preferable for storage.
- Eventually, the processing of structured data was still kept as optional, either using big data or traditional RDBMSs.
- This helps in analyzing data towards effective usage of the hidden insights exposed from the data collected via social media, log files, sensors, etc.
- Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion.
- Velocity
- The speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.
- Big data is often available in real-time.
- Compared to small data, big data is produced more continually.
- Two kinds of velocity related to big data are
- the frequency of generation and
- the frequency of handling, recording, and publishing.[31]
- Veracity
- The truthfulness or reliability of the data, which refers to the data quality and the data value.[32]
- Big data must not only be large in size, but also must be reliable in order to achieve value in the analysis of it.
- The data quality of captured data can vary greatly, affecting an accurate analysis.[33]
- Value
- The worth in information that can be achieved by the processing and analysis of large datasets.
- Value also can be measured by an assessment of the other qualities of big data.[34]
- Value may also represent the profitability of information that is retrieved from the analysis of big data.
- Variability
- The characteristic of the changing formats, structure, or sources of big data.
- Big data can include structured, unstructured, or combinations of structured and unstructured data.
- Big data analysis may integrate raw data from multiple sources.
- The processing of raw data may also involve transformations of unstructured data to structured data.
Impact
- Netflix
- The competitive advantage of Netflix is not or not only making videos available online, it is improving the whole experience of discovering those videos in the first place.
- Actually, Netflix collects enormous amounts of data and analyzes customer watching habits to generate personalized recommendations and offerings.
- And they go even further, they analyse what people like to watch and why, and use this data as a basis to produce their own series.
- Netflix might be an extreme example. This whole business model is built on Big Data as a competitive advantage.
- But Big Data can also bring incremental improvements to existing business across different areas of application.
- The most common use of Big Data is probably personalization of offering.
- Companies like Amazon are obvious examples.
- But Brick and Mortar companies also use Big Data to get to know their customers and offer customized solutions.
- You think you got those discount vouchers from your supermarkets by chance? Think again.
- Fraud Reduction is another example of how Big Data can be used to create value.
- Credit card companies like Visa analyse billions of transactions to identify unusual patterns, and therefore reduce fraud in real time.
- According to Visa, that saves them $2 billion each year.
- Big Data can also be used for Predictive Maintenance.
- It means that a company can use the data it collects about operations to predict performance issues before they even happen.
- This is extremely valuable especially in asset intensive industries.
- An oil and gas client for example, had hundreds of wells across three continents connected to an analytics platform, which integrates data from those facilities and generates insight.
- Big Data has many more application areas across all industries, all functions.
- We estimate that leaders in Big Data generate an average of 12% more revenue than those who do not maximize their use of analytics
- Now, how do those leaders better manage to unlock value from Big Data?