Wednesday, October 21, 2009

Hadoop

Hadoop is an open-source Java software framework for data-intensive distributed applications. It enables applications to interact with thousands of data nodes.

Hadoop is an implementation of MapReduce which is a programming model for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Hadoop was created by Doug Cutting, currently with Cloudera.

No comments:

Post a Comment