Saturday, April 22, 2017

The 4Vs of Big Data Veracity - Definition and Examples

4Vs of Big Data Veracity

The 4Vs of Big Data: Veracity. Definition and Examples


Including veracity as the fourth big data attribute highlights the importance of addressing and managing the uncertainty inherent in some types of data.

In addition to the 3vs of the big data already seen in the previous post, this time we will expand conceptually, the fourth v of the big data.

Most data comes in raw, with missing or incorrect fields. This can be more complex if the providers who deliver the information use different formats and if the data come from different countries, as they can be dramatically different depending on local customs and uses. Clearing this data can be the most challenging activity to generate value.

Veracity: Uncertainty of data. Refers to the level of reliability associated with certain types of data. Striving to achieve high-quality data is an important requirement and a major big data challenge, but even the best data cleansing methods can not eliminate the inherent unpredictability of some data, such as time, economics or future data purchase of a customer. Refers to the biases, noise and abnormality in data. How accurate is that data in predicting business value? We are talking about how to predict and change behavior related to consumption through the use of big data, but from the perspective of the veracity of the data obtained.

Some data are inherently uncertain, for example, the feelings and sincerity of human beings; GPS sensors bouncing off Manhattan skyscrapers; climatic conditions; economic factors; and the future. When dealing with these types of data, no data cleansing can correct them. Even so, and despite the uncertainty, the data still contain valuable information. The need to recognize and address this uncertainty is one of the hallmarks of big data.

Uncertainty manifests itself in big data in many ways. It is in the skepticism that surrounds the data created in human environments like social networks; The ignorance of how the future will develop and how the people, nature or hidden forces of the market will react to the variability of the world around them.

An example of this uncertainty is found in energy production: time is uncertain, but even so, a utility should anticipate production. In many countries, regulations require that part of the production come from renewab le energy sources, but neither wind nor clouds can be predicted accurately. So how can you plan it?

To manage uncertainty analysts have to create a context around data. One way to do this is through data fusion, where combining multiple, less reliable sources results in a more accurate and useful data point, such as social commentary added to information about a geospatial location. Another way to manage uncertainty is through the advanced mathematics that encompasses it, such as solid optimization techniques and fuzzy logic approaches.

By nature, we humans do not like uncertainty, but ignoring it can create even more problems than uncertainty itself. In the big data era, managers need to address the uncertainty dimension differently. You must recognize it, accept it, and determine how to apply it to your benefit; The only certainty about uncertainty is that it will not disappear.

According to experts, every hour can process an average of 60 million transactions from nearly 2 billion cards in 220 countries and territories, through more than 40 million businesses. With this vast data set, there is the possibility of detecting changes in the economy with depth. However, this is where the concept of veracity comes into play because information must be properly purified and filtered prior to making a decision that would compromise millions of people.

To conclude, Big data allows to obtain a more complete image of the preferences and demands of the clients; Through this deep understanding, companies of all kinds find new ways of interacting with their current and future customers. But always starting from the veracity of the previously obtained data.


No comments:

Post a Comment