Hadoop helped propel the data revolution by bringing it agility and versatility, but now it has become a huge machine. It continues to increase in complexity, allowing lightweight alternatives to challenge it. And chasing a Hadoop solution is becoming more like chasing ERP (Enterprise Resource Planning). Don’t let the old habits come back. Be sure to embrace the Hadoop philosophy rather than the Hadoop machine.
Choosing a Hadoop distribution has become a project in itself. Hadoop distribution choices now tend to be made around an immutable technological set. This is the opposite of the original philosophy of Hadoop creators. They believed Hadoop should be a flexible framework with pluggable components that would lead to more agility.
What if new technologies better fit the needs?
Some already think the elephant has become too “fat” and over-hyped, especially in the world of fast/real time analytics. New big data alternatives with more speed and flexibility to fit a wider variety of needs are emerging.
Moreover it is important to note that even today, SQL fits the needs of smaller data types in place of Hadoop completely. And has become important to understand when not to use Hadoop rather than using it for everything.
New technology stacks are being implemented to answer specific needs while not rendering Hadoop obsolete. However, IT poses the question of how much you can add to the same stack.
The add-on components and services created in pursuit of more fitting data solutions are complicated to deploy and manage. A complex stack is hard to debug and fine tune if you are not a startup with Stanford or MIT graduates––making you dependent on one or a few key providers.
Everything about this is counterintuitive to the Hadoop philosophy since making the system more complex will additionally slow down a business’ big data crunching. A framework that mines the agility and versatility of the Hadoop technology stack and integrates them seamlessly into a smarter, less involved management of real-time big data is the new ideal.
Some will argue that the massification of the platform is necessary if Hadoop is going to fit into the real enterprise world as opposed to the startup world. All these features are required because of security, auditability, etc. But I would say that (re)-introducing complexity leads to more complexity to deal with it in the first place.
There’s no question whether Hadoop is a good tool for enterprises, especially when database storage for loosely related files in a distributed file system is the primary use. I like Hadoop because of its raison d’être rather than the technology itself. Technology evolves every day, but philosophies remain. Hadoop is more about data processing at scale and low cost than the technology race.