Big Data for Chimps: A Guide to Massive-Scale Data Processing in Practice + code

Купить бумажную книгу и читать

Купить бумажную книгу

По кнопке выше можно купить бумажные варианты этой книги и похожих книг на сайте интернет-магазина "Лабиринт".

Using the button above you can buy paper versions of this book and similar books on the website of the "Labyrinth" online store.

Реклама. ООО "ЛАБИРИНТ.РУ", ИНН: 7728644571, erid: LatgCADz8.

Название: Big Data for Chimps: A Guide to Massive-Scale Data Processing in Practice

Автор:Philip (flip) Kromer, Russell Jurney

Издательство: O'Reilly Media

Год: 2015

Страниц: 220

Язык: English

Формат: pdf+code

Размер: 3,5 Mb

Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a practical, actionable view of big data by working with real data and real problems.

Perfect for beginners, this book’s approach will also appeal to experienced practitioners who want to brush up on their skills. Part I explains how Hadoop and MapReduce work, while Part II covers many analytic patterns you can use to process any data. As you work through several exercises, you’ll also learn how to use Apache Pig to process data.

Learn the necessary mechanics of working with Hadoop, including how data and computation move around the cluster

Dive into map/reduce mechanics and build your first map/reduce job in Python

Understand how to run chains of map/reduce jobs in the form of Pig scripts

Use a real-world dataset—baseball performance statistics—throughout the book

Work with examples of several analytic patterns, and learn when and where you might use them

Introduction: Theory and Tools

Chapter 1Hadoop Basics

Chimpanzee and Elephant Start a Business

Map-Only Jobs: Process Records Individually

Pig Latin Map-Only Job

Setting Up a Docker Hadoop Cluster

Wrapping Up

Chapter 2MapReduce

Chimpanzee and Elephant Save Christmas

Pygmy Elephants Carry Each Toy Form to the Appropriate Workbench

Example: Reindeer Games

Hadoop Versus Traditional Databases

The MapReduce Haiku

Wrapping Up

Chapter 3A Quick Look into Baseball

The Data

Acronyms and Terminology

The Rules and Goals

Performance Metrics

Wrapping Up

Chapter 4Introduction to Pig

Pig Helps Hadoop Work with Tables, Not Records

Fundamental Data Operations

LOAD Locates and Describes Your Data

STORE Writes Data to Disk

Development Aid Commands

Pig Functions

Piggybank

Apache DataFu

Wrapping Up

Tactics: Analytic Patterns

Chapter 5Map-Only Operations

Pattern in Use

Eliminating Data

Selecting Records That Satisfy a Condition: FILTER and Friends

Project Only Chosen Columns by Name

Transforming Records

Operations That Break One Table into Many

Operations That Treat the Union of Several Tables as One

Wrapping Up

Chapter 6Grouping Operations

Grouping Records into a Bag by Key

Group and Aggregate

Calculating the Distribution of Numeric Values with a Histogram

The Summing Trick

Wrapping Up

References

Chapter 7Joining Tables

Matching Records Between Tables (Inner Join)

How a Join Works

Enumerating a Many-to-Many Relationship

Joining a Table with Itself (Self-Join)

Joining Records Without Discarding Nonmatches (Outer Join)

Selecting Only Records That Lack a Match in Another Table (Anti-Join)

Selecting Only Records That Possess a Match in Another Table (Semi-Join)

Wrapping Up

Chapter 8Ordering Operations

Preparing Career Epochs

Sorting All Records in Total Order

Sorting Records Within a Group

Numbering Records in Rank Order

Wrapping Up

Chapter 9Duplicate and Unique Records

Handling Duplicates

Set Operations

Wrapping Up

book+code

Дата создания страницы: