Each pattern is explained in context, with pitfalls and caveats clearly identified to help avoid common design mistakes when modeling large data architecture. It also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop. They are:
- Summarization patterns: get a top-level view by summarizing and grouping data
- Filtering patterns: view data subsets such as records generated from one user
- Data organization patterns: reorganize data to work with other systems, or to make MapReduce analysis easier
- Join patterns: analyze different datasets together to discover interesting relationships
- Metapatterns: piece together several patterns to solve multi-stage problems, or to perform several analytics in the same job
- Input and output patterns: customize the way you use Hadoop to load or store data
No comments:
Post a Comment