Disruptive Possibilities: How Big Data Changes Everything – by Jeffrey Needham.
I agreed to write a pre-release review this brief (70 page), intriguing book for O’Reilly in return for a complementary copy. I enjoyed it hugely. On the first read-through I was perplexed by the authors rattlesnake-fast pace, throwing out ideas and theories with self-assurance but little reference or external justification. So I read it again and if you download this book, I do not hesitate to say you will enjoy the author’s challenging, assertive and clearly experienced take on Big Data and what it means for industry.
And if you come from within the industry itself, as Jeff and I do, you will enjoy it all the more for the rear-mirror view Jeff sketches out of how IT infrastructures reached where they are now.
Of all the technologies that came and went and still remain stitched together to support commercial information processing as it is today, Jeff is strongest on Oracle, a little video, File Systems and Storage Networks. I hope if he revises and improves this book he will add some real-world references to his story, to give the rest of us some concrete examples to enjoy.
I can’t say I remember Jeff Needham from anywhere else before reviewing this book, so I checked out his profile on LinkedIn. If you try it, the author is the Jeff from Hortonworks, source for Apache Hadoop training and certification. Looking at his resume, I think I could once have been a customer! Jeff says he ‘owned’ portions of one of the first ANSI-compliant, 64-bit C compilers back in the last century!
An experienced chap, then.
Writing a pre-release review carries some responsibility: I hope these comments and critiques will encourage you to identify how this book relates to your own position, and download the book. I also have to give out advice that may swiftly need updating if the author, editor or publisher react to me. Namely: the book is incredibly short, without references, and without many supporting case-studies beyond the forcefulness and self-confidence of the author.
You know what Hadoop is, then? And how it came into being? This book will certainly explain that. Jeff was there at the birth, involved deeply in solving the challenges faced by Yahoo and the early internet businesses with high customer volumes and transaction volumes. He focuses on infrastructure, storage, and platform. Software, algorithms and design take second place in Jeff’s world view.
Jeff explains how the responses of the pioneers to these issues are re-usable for the other drivers of high volume computing. Future IT requirements are an exponentially growing multiple of:
- large customer numbers
- the development of recording not just entity state and transactions but also all changes, views, and interactions of those
- data from sensors
- mobile, of course.
What is ‘Big Data’? The author’s key assertion is that Moore’s Law – the prediction of continued improvement in the computing price/performance ratio – applies more to processing power than to data storage costs. And this, he believes, shapes the way super-computing evolves to embrace massively parallel but ideally very simple information processing operations on cheap hardware: the ‘platform’, he calls it.
( Aside: On the details, I don’t entirely buy everything he says. I bounced the idea off Kelly Sommers (@kellabyte) on Twitter and she commented: “I’ve found scaling RAM far more expensive than storage. Instagram recently saved 75% moving to persisted DB from in-mem DB.”)
The consequence,he explains very convincingly, is nothing short of disruption. Disruption to business models, and disruption to incumbent IT staff and processes.
An extremely convincing and thought-provoking book. Not a source for explaining exactly what Hadoop is built from or how it works, but still well worth the read.
Also refreshingly different in style to most other books in this area. Long, passionate stretches of dense argument that cover the ground as fast as a thriller.