A site devoted mostly to everything related to Information Technology under the sun - among other things.

Monday, January 21, 2013

MapReduce Book

The book "MapReduce Design Patterns" by Donald Miner and Adam Shook is a good intermediate resource on MapRedue.

Each pattern is explained in context, with pitfalls and caveats clearly identified to help avoid common design mistakes when modeling large data architecture. It also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop.  They are:
  • Summarization patterns: get a top-level view by summarizing and grouping data
  • Filtering patterns: view data subsets such as records generated from one user
  • Data organization patterns: reorganize data to work with other systems, or to make MapReduce analysis easier
  • Join patterns: analyze different datasets together to discover interesting relationships
  • Metapatterns: piece together several patterns to solve multi-stage problems, or to perform several analytics in the same job
  • Input and output patterns: customize the way you use Hadoop to load or store data
This book does not have the step-by-step instructions of a "recipe" book, thus avoiding line-by-line breakdowns and delivering a lot of content in its 436 pages. (There is also a usable summary in 30 or so pages.)

No comments:

About Me

My photo
I am a senior software developer working for General Motors Corporation.. I am interested in intelligent computing and scientific computing. I am passionate about computers as enablers for human imagination. The contents of this site are not in any way, shape, or form endorsed, approved, or otherwise authorized by HP, its subsidiaries, or its officers and shareholders.

Blog Archive