How to turn big data to smart data?
Data-based decisions require Smart Data. Big data cleansing and precise questions help reduce the load of data.
Data is like tracks in the snow
Corporate business processes have fundamentally changed as a result of globalization and digitalization. The barriers to entry for new market players have been lowered. Digital platforms that serve as networks benefiting millions of people are forcing analog suppliers out of the market or even disrupting entire industries. Today, companies need to adapt and evolve agilely in order to stand out from the competition. Data is the fuel that drives corporate agility and organizational learning cycles.
Imagine the earth as one enormous field under a blanket of fresh snow: men, women, and children leave tracks in the snow with their winter boots. Animal prints also become visible. A complex, multifaceted network of tracks is formed in just a short time. At some point, the smooth white surface becomes a tightly woven tangle of tracks. We can no longer tell which track leads where and which child belongs to which parents.
In order to identify individual tracks and establish relationships in the midst of this apparent chaos, we need to use precise questions to limit the complexity of the tracks and remove many of the footprints. This is the only way to determine patterns and find answers to questions. The supercomputer in The Hitchhiker’s Guide to the Galaxy took millions of years to come up with “42” as the answer to “the ultimate question of life, the universe, and everything”. But no one knew what to do with the answer “42” because no one knew what the question was.
In order to untangle the tracks, we need:
- precise questions that allow for usable responses
- the ability to remove any irrelevant tracks accordingly
Exiled from the data warehouse to the data swamp
Many companies do the exact opposite. They indiscriminately collect data and fill data warehouses and data lakes. The majority of the data is generated in real time. It is collected using sensors, devices, audio-video networks, log files, transactional applications, the web, and social media. Much of this data lands in a data swamp. This term is used to describe data that is no longer usable for a meaningful business analysis with a reasonable outlay. The tangle of tracks is not only enormous, but also obsolete. Data is only valuable when it can be processed in real time. Obsolete data is practically worthless for companies in an era in which highly dynamic developments are the norm. Your resources would be better used if you concentrated solely on collecting relevant data.
Going back to our snowy field: companies should only record the tracks that are specifically suited to their needs and relevant to their industry.
What is big data?
Big data is the term currently used to refer to the vast quantities of data collected by today’s technologies. It is characterized by a large volume, a great degree of variety (unstructured data), high velocity (data processing frequency), and high veracity (trustworthiness/representative nature of the data). These four characteristics are used to determine the value of the data. After all, not all collected data is valuable. The quality, correctness, and completeness of the data are all important factors for subsequent, accurate analyses. Big data on its own doesn’t tell us anything. Smart data is required in order to derive meaningful insights from the data we collect.
Source: IBM «The Four V’s of Big Data»;McKinsey Global Institute, Twitter, Ciso, Gartner, EMC, SAS, IBM, MEPTEC, QAS; BBVA «The Five V’s of Big Data»
Data streamlining and structuring
The first step to making big data usable is reducing the volume. We only keep the information that is useful for finding solutions to our problems. It may be necessary to limit the variety of data through a screening process. This first step increases the value and veracity of the data (see infographic).
In the next step, the unstructured data is structured and cleaned up once again. Processes such as text and language detection Natual Language Processing come into play in this step. Now the big data has been turned into smart data. A downstream analysis platform evaluates the data to create knowledge that can be understood immediately. Decision-makers are able to derive insights from this data in order to unlock existing potential.
An example:
Customer Experience Analytics (Customer Insights)
In this case, big data is generated as part of every transaction and at every point of contact between the customer and the company. Smart data is the data that makes it possible to determine which customers are in which phase of the customer life cycle or to understand customer satisfaction. The results allow the company to tap into existing potential.
Identifying challenges and using smart data
In order to benefit from the data they collect, companies are encouraged to formulate questions related to current and future challenges. Data can then be collected, structured, and provided to decision-makers as smart data on the basis of these questions. Companies must offer customers useful solutions and take proactive steps on the market faster than ever before. In order to keep up, findings from data must be continuously and automatically integrated into everyday business and management processes. This is the only way that companies will be able to optimize these processes or introduce new developments.
An example:
Employee experience analytics (Employee Insights)
The only data collected from points of contact between employees and the company will be data (smart data) that is determined to be highly relevant in terms of employee satisfaction, dedication, and motivation. With appropriate analysis, companies are able to respond to trends proactively.
Advantages of smart data
- Reduced uncertainty in the decision-making process
- Data-based management and decision-making increases acceptance (full transparency)
- Predictive scenarios help companies understand what comes next
- Reduced costs thanks to greater efficiency in terms of data processing and analysis
- Acceleration of organizational learning cycles thanks to faster decision-making
In the age of data, companies must be able to process data intelligently in order to initiate useful developments if they want to be successful.