E-Book Overview
Python is one of the top 3 tools that Data Scientists use.One of the tools in their arsenal is the Pandas library.This tool is popular because it gives you so much functionality out of the box.In addition, you can use all the power of Python to make the hard stuff easy! Learning the Pandas Libraryis designed to bring developers and aspiring data scientists who are anxious to learn Pandas up to speed quickly. It starts with the fundamentals of the data structures. Then, it covers the essential functionality. It includes many examples, graphics, code samples, and plots from real world examples. The Content Covers: Installation Data Structures Series CRUD Series Indexing Series Methods Series Plotting Series Examples DataFrame Methods DataFrame Statistics Grouping, Pivoting, and Reshaping Dealing with Missing Data Joining DataFrames DataFrame ExamplesPreliminary ReviewsThis is anexcellent introduction benefitting from clear writing and simple examples. The pandas documentation itself is large and sometimes assumes too much knowledge, in my opinion. Learning the Pandas Library bridges this gap for new users and even for those with some pandas experience such as me. -Garry C. I have finished readingLearning the Pandas Libraryand I liked it... very useful and helpful tips even for people who use pandas regularly. -Tom Z.
E-Book Content
Treading on Python Series
Learning Pandas Python Tools for Data Munging, Data Analysis, and Visualization Matt Harrison Technical Editor: Copyright © 2016 While every precaution has been taken in the preparation of this book, the publisher and author assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
2
Table of Contents
From the Author Introduction Installation Data Structures Series Series CRUD Series Indexing Series Methods Series Plotting Another Series Example DataFrames Data Frame Example Data Frame Methods Data Frame Statistics Grouping, Pivoting, and Reshaping Dealing With Missing Data Joining Data Frames Avalanche Analysis and Plotting Summary About the Author Also Available One more thing
3
From the Author
PYTHON IS EASY TO LEARN. YOU CAN LEARN THE BASICS IN A DAY AND BE productive with it. With only an understanding of Python, moving to pandas can be difficult or confusing. This book is meant to aid you in mastering pandas. I have taught Python and pandas to many people over the years, in large corporate environments, small startups, and in Python and Data Science conferences. I have seen what hangs people up, and confuses them. With the correct background, an attitude of acceptance, and a deep breath, much of this confusion evaporates. Having said this, pandas is an excellent tool. Many are using it around the world to great success. I hope you do as well. Cheers! Matt
4
Introduction
I HAVE BEEN USING PYTHON IS SOME PROFESSIONAL CAPACITY SINCE THE TURN OF the century. One of the trends that I have seen in that time is the uptake of Python for various aspects of "data science"- gathering data, cleaning data, analysis, machine learning, and visualization. The pandas library has seen much uptake in this area. pandas 1 is a data analysis library for Python that has exploded in popularity over the past years. The website describes it thusly: “pandas is an open source, BSD-licensed library providing highperformance, easy-to-use data structures and data analysis tools for the Python programming language.” -pandas.pydata.org My description of pandas is: pandas is an in memory nosql database, that has sql-like constructs, basic statistical and analytic support, as well as graphing capability. Because it is built on top of Cython, it has less