Five Things You Should Know About Big Data
Big Data is often used as a buzzword, and a trending topic for plenty of organizations around the world, but it is not a new one. This term was coined in 1987 by John Mashey, former Chief Scientist at Silicon Graphics, and referred to a huge volume of information. This definition is outdated as new factors have been added over the last decades. There are also different versions of what we call the “V’s” of Big Data (volume, velocity, variety, veracity and value). While companies and professionals are fairly aware of these different versions, there are still many unknown versions that are rarely discussed.
The misconception about the size
Big Data is wonderful, but it has also contributed to the misconception about the “size” of data and its importance for artificial intelligence developments There is also a common misunderstanding of what these two sets of technologies are.
Even if we say that AI/ML systems get better if we keep adding more data, the reality is that this is not necessarily true). We need more data to cover all situations we want our systems to predict (e.g., we detect cats and dogs from pictures, and now we want to detect horses too -- therefore we need more “data” = horse pictures). So what we need is to increase the data scope, but higher data volume is not always a guarantee of success.
A new player: the Lakehouse
Big Data paradigms and storage approaches are continuously evolving. Data Warehouses and Data Lakes are known terms that companies are aware of.But guess what? There is a new kid in town: the Lakehouse. Companies such as AWS and Databricks adopted this term to explain something that would combine flexibility to collect all kinds of data, and the ability to keep control of where the data is being stored for later use.
Evolving at your own pace
Most of the organizations feel like they are not caught up with Big Data technologies. There are no bad companies, or winners or losers, but incremental levels of adoption and maturity. Every company has the right to start the Big Data journey when they want or can, and evolve based on their own abilities and internal pace of innovation. Bill Schmarzo, Chief technology officer of the Big Data Practice of EMC Global Services, created a Big Data Business Model Maturity Index a framework to measure how effective an organization is at leveraging data and analytics to power the business.
A team sport
The scope of activities for Big Data and even AI is wide. Big Data is not only for technical folks. Technical professionals are key enablers, but other internal stakeholders are required to implement Big Data technologies. e.g., product managers, scrum masters.
It's not too late
Contrary to popular belief, it is not “too late” to join the Big Data industry or to start leveraging Big Data technologies. We cannot, and don’t need to, learn everything, but we can definitively acquire specific skills that will help us join Big Data teams, or contribute to the adoption of Big Data technologies within our organization. The learning journey depends on different factors (previous academic background, professional experience, ability to work with data, etc.), but everyone can join this technology wave with a bit of effort and tons of curiosity.
Big Data is a game-changing technology, and it’s normal to feel intimidated. If you are curious and want to learn more, join Adrian Gonzalez Sanchez for his upcoming online workshop “Introduction to Big Data” on November 30.