Monday, January 21, 2013

MapReduce Book

The book "MapReduce Design Patterns" by Donald Miner and Adam Shook is a good intermediate resource on MapRedue.

Each pattern is explained in context, with pitfalls and caveats clearly identified to help avoid common design mistakes when modeling large data architecture. It also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop.  They are:
  • Summarization patterns: get a top-level view by summarizing and grouping data
  • Filtering patterns: view data subsets such as records generated from one user
  • Data organization patterns: reorganize data to work with other systems, or to make MapReduce analysis easier
  • Join patterns: analyze different datasets together to discover interesting relationships
  • Metapatterns: piece together several patterns to solve multi-stage problems, or to perform several analytics in the same job
  • Input and output patterns: customize the way you use Hadoop to load or store data
This book does not have the step-by-step instructions of a "recipe" book, thus avoiding line-by-line breakdowns and delivering a lot of content in its 436 pages. (There is also a usable summary in 30 or so pages.)

No comments:

Post a Comment