OPEN SOURCE IS THE NEW NORMAL IN DATA AND ANALYTICS
- April 7, 2019
- Posted by: admin
- Category: Uncategorized
- Scott Gnau is the CTO at Hortonworks and is enabling the next generation data architecture and driving technology vision in the enterprise. Hear what he has to say in this Article Published in Forbes Magazine,
- You hear a lot these days about how the growing deluge of digital data is changing the nature of how nearly every business operates fundamentally.
- I’ve argued for a while now that we’re at or near a data tipping pointbeyond which lies a new world where companies analyze many fundamentally new types of data in real-time and use it to make business decisions that were previously impossible.
- But after all tipping points, there are winners and losers. I believe in this case, the winners will share one really important quality: a deliberate choice in favor of leveraging open source technologies at the heart of their modern data architecture.
- With Hadoop (the software platform for distributed processing of large sets of data) at the core, open source data architectures have reached a new level of maturity that some companies may not yet fully appreciate. As several mega-trends have converged — cloud computing, artificial intelligence, the internet of things and streaming — open source technologies have risen to the occasion with projects including Apache NiFi, Apache Storm, and Apache Kafkato drive innovation.
- Open source data architectures are no longer analogous to research projects forever running in test environments for trials and experimentation. They’re now considered mainstream in the IT environments and are widely deployed in live production in several industries. In fact, it’s become so common that if you’re building a modern data architecture, chances are high you’re using an open source stack. For a historical comparison, Hadoop in 2017 is roughly where Linux was in 2005: breaking out from the technical curiosity into a mainstream technology used everywhere and driving business outcomes.
- Developers who’ve come of age in the GitHub era look at open source architectures as their preferred choice not only because they cost less to deploy and operate, but because they can drive meaningful value out of the core collaboration model. They’re comfortable with it not only because they can examine and tinker with the underlying source code to fully understand how it works, but also enhance it for specific needs, and contribute those enhancements back to the community at large.
- There are a lot of great examples. Ford Motor Company (a client of ours) is using open sourceat the heart of its Smart Mobility initiative, gathering all kinds of data — averaging about 25 gigabytes of data per hour per car — to help improve the experience of driving and riding in its Ford Fusion hybrids.
- Macy’s, another company we work with, is using open source data technology to get a better understanding of its customers in order to communicate with them more effectively, and crafting advertising campaigns that reach the right shoppers at the right moment. And our client Progressive Insurance used Hadoop to analyze more than 15 billion milesworth of driving data gathered from its Snapshot devices plugged into the data ports of millions of cars. Drivers who tend to show safer driving habits get discounts on their insurance policies.
- Netflix is another great example. It is both a big user and a contributor to open source software. Data is key for Netflix to deliver the best experience to customers and it leverages many open source tools and services to get the most value from its data. Netflix even has its own Open Source Software Center.
- And some companies have found that using open source architectures provides the only cost-effective path to getting something done. U.K.-based utility company Centrica, parent of British Gas and client of ours, priced out a solution from an existing IT supplier that would have cost 5 million pounds deployed only 12 computing nodes.By switching to open source options, it spent 750,000 pounds for 250 computing nodes. The operational savings alone from shutting down its legacy solution basically meant the new systems paid for themselves.
- Outside of data and analytics, solutions based on open source technology are challenging well-established commercial vendors. MongoDB’s database is winning converts away from proprietary databases. Web application developers have flocked away from commercial products to open source development tools like Ruby on Rails, which has been used to build Hulu, Airbnb, Shopify and Square. Open source software even lies at the heart of new ways to build and operate massive data centers: The Facebook-sponsored Open Compute Projectaims to do nothing less than upend IT infrastructure by creating open standards and enabling rapid commoditization.
- The examples make it clear that the tipping point is more or less here, which means the time is now to decide how you’re doing to react. We’re past the point where making small incremental changes or playing it safe with traditional proprietary infrastructures will get you past it successfully. Either choice leaves you at risk of being left behind while your competitors move ahead. These are not incremental decisions, but are rather architectural disruptions.
- The losers will be the ones who make incremental moves only to realize they’re on an unsustainable path. They’ll wind up like the proverbial frog in boiling water that doesn’t notice the small temperature increases — until it’s too late.