Reading List : Hadoop and Big Data Books

Hadoop Reading List

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide

“Hadoop: The Definitive Guide” is the ideal guide for anyone who wants to know about the Apache Hadoop  and all that can be done with it.Good book on basics of Hadoop (HDFS, MapReduce & other related technologies). This book provides all necessary details to start work with Hadoop, program using it

“Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk.” — Doug Cutting, Hadoop Founder, Yahoo!

Hadoop Operations: A Guide for Developers and Administrators

This book is a great resource for getting Hadoop up and running in a serious production environment.

Hadoop In Action

Hadoop in Action

If  you find Hadoop: The Definitive Guide a little intimidating , get your hands on this book and then go ahead with some practical examples.

Hadoop Essentials: A Quantitative Approach

This book adopts a unique approach to helping developers and CS students learn Hadoop MapReduce programming fast. Rather than filled with disjointed, piecemeal code snippets to show Hadoop MapReduce programming features one at a time, it is designed to place your total Hadoop MapReduce programming learning process in a common application context of mining customer spending patterns ensconced in large volumes of credit card transaction record data

Hadoop For Dummies

“Hadoop For Dummies” helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters

Hadoop in Practice

“Hadoop in Practice” collects nearly 100 Hadoop examples and presents them in a problem/solution format.

Big Data Analytics with R and Hadoop

It is a brief introduction to R and Hadoop and to use them together to solve big data problems.

MapReduce Design Patterns

Mapreduce Design Patterns

This book brings together a collection of MapReduce design patterns.

“A clear exposition of MapReduce programs for common data processing patterns—this book is indespensible for anyone using Hadoop.-Tom White”

Hadoop Beginner’s Guide

This book is  a good starting point for Beginners covering basic Hadoop concepts and tools.

Optimizing Hadoop for MapReduce

Read this book to learn how to configure your Hadoop cluster to run optimal MapReduce jobs.

Hadoop Real-World Solutions Cookbook

“Hadoop Real-World Solutions Cookbook ” serves recipes for working with Hadoop. The book has 10 different chapters dealing with the basics such as setting up Hadoop, getting data into and out of Hadoop and working with HDFS.

Pro Hadoop

This book gives the ins and outs of MapReduce; how to structure a cluster, design, and implement the Hadoop file system; and how to build your first cloud computing tasks using Hadoop

Mastering Hadoop

Another book which gives you basics of HadoopMapReduce and gives knowledge on how to optimize your MapReduce jobs.

Books on Hadoop Ecosystem

Listing down few books focusing on Hadoop Ecosystem projects below –

HBase : The Definitive Guide

HBase: The Definitive Guide

Programming Hive

Programming Pig

Apache Sqoop Cookbook


Apache Hadoop Yarn

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 


Please share your reviews/experiences with some of the books listed above in comments.


Related articles :


String Interning – What ,Why and When ?

What is String Interning 

String Interning is a method of storing only one copy of each distinct String Value, which must be immutable.

In Java String class has a public method intern() that returns a canonical representation for the string object. Java’s String class privately maintains a pool of strings, where String literals are automatically interned.

When the intern() method is invoked on a String object it looks the string contained by this String object in the pool, if the string is found there then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

The intern() method helps in comparing two String objects with == operator by looking into the pre-existing pool of string literals, no doubt it is faster than equals() method. The pool of strings in Java is maintained for saving space and for faster comparisons. Normally Java programmers are advised to use equals(), not ==, to compare two strings. This is because == operator compares memory locations, while equals() method compares the content stored in two objects.

Why and When to Intern ?

Thought Java automatically interns all Remember that we only need to intern strings when they are not constants, and we want to be able to quickly compare them to other interned strings. The intern() method should be used on strings constructed with new String() in order to compare them by == operator.

Let’s take a look at the following Java program to understand the intern() behavior.

public class TestString {

	public static void main(String[] args) {
		String s1 = "Test";
		String s2 = "Test";
		String s3 = new String("Test");
		final String s4 = s3.intern();
		System.out.println(s1 == s2);
		System.out.println(s2 == s3);
		System.out.println(s3 == s4);
		System.out.println(s1 == s3);
		System.out.println(s1 == s4);



Recommended Readings for Hadoop

Originally Posted here – []

I am writing this series to mention some of the recommended reading to understand Hadoop , its architecture, minute details of cluster setup etc.

Understanding Hadoop Cluster Setup and Network – Brad Hedlund, with his expertise in Networks, provide minute details of cluster setup, data exchange mechanisms of a typical Hadoop Cluster Setup.

MongoDB and Hadoop – Webinar by Mike O’Brien,Software Engineer, MongoDB on how MongoDB and Hadoop can be used together , using core MapReduce and Pig and Hive as well.

Please post comments if you have come across some great article/webinar link, which explains things in great details with ease.