Hadoop: The Definitive Guide

E-Book Overview

What I really liked most about this books was that I could read the vast majority of it straight through and enjoyed the process. Very well structured and the example surrounding weather station data was an appropriate choice to give a good perspective on most of the problems. A good mix of practical theory, examples and code snippets.

E-Book Content

Hadoop: The Definitive Guide Tom White foreword by Doug Cutting Beijing • Cambridge • Farnham • Köln • Sebastopol • Taipei • Tokyo Hadoop: The Definitive Guide by Tom White Copyright © 2009 Tom White. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected] Editor: Mike Loukides Production Editor: Loranah Dimant Proofreader: Nancy Kotary Indexer: Ellen Troutman Zaig Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Printing History: June 2009: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Hadoop: The Definitive Guide, the image of an African elephant, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. TM This book uses RepKover™, a durable and flexible lay-flat binding. ISBN: 978-0-596-52197-4 [M] 1243455573 For Eliane, Emilia, and Lottie Table of Contents Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1. Meet Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop The Apache Hadoop Project 1 3 4 4 6 8 9 12 2. MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A Weather Dataset Data Format Analyzing the Data with Unix Tools Analyzing the Data with Hadoop Map and Reduce Java MapReduce Scaling Out Data Flow Combiner Functions Running a Distributed MapReduce Job Hadoop Streaming Ruby Python Hadoop Pipes Compiling and Running 15 15 17 18 18 20 27 27 29 32 32 33 35 36 38 v 3. The Hadoop Distributed Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 The Design of HDFS HDFS Concepts Blocks Namenodes and Datanodes The Command-Line Interface Basic Filesystem Operations Hadoop Filesystems Interfaces The Java Interface Reading Data from a Hadoop URL Reading Data Using the FileSystem API Writing Data Directories Querying the Filesystem Deleting Data Dat
You might also like

Distributed Computing: Principles, Algorithms, And Systems
Authors: Ajay D. Kshemkalyani , Mukesh Singhal    105    0


Introduction To Parallel Computing: [a Practical Guide With Examples In C]
Authors: W. P. Petersen , P. Arbenz    124    0



Mri: Basic Principles And Applications
Authors: Mark A. Brown , Richard C. Semelka    145    0


From Gestalt Theory To Image Analysis: A Probabilistic Approach
Authors: Agnés Desolneux , Lionel Moisan , Jean-Michel Morel (auth.)    139    0



Fortran 90 For Scientists And Engineers
Authors: Brian Hahn    159    0


Synthesis And Optimization Of Dsp Algorithms
Authors: Constantinides , Cheung , Luk.    160    0


Encyclopedia Of Physical Science And Technology - Computer Software
Authors: Robert A. Meyers (Editor-in-Chief)    199    0