Python Pandas Tutorial – Data Analysis With Python And Pandas

Hello Everyone, this post is about a very important data analysis python library i.e., Pandas. So welcome to Python Pandas Tutorial. In this tutorial you will learn some basics of pandas, dataframes, different ways of creating dataframes, reading and writing csv and excel files and many more. So let’s start python pandas without wasting of time.

But before proceeding to further, first of all we need to understand the concept of Data Analysis. So let’s give a quick look on data analysis.

Python Pandas Tutorial – What Is data Analysis ?

Introduction

Data Analysis is a process of extracting useful, relevant and meaningful  informations from observations in a systematic manner.

Data analysis is done for the following purposes –

  • Parameter Estimation (inferring the unknowns)
  • Model Development and Prediction (Forecasting)
  • Feature Extraction (identifying patterns) and classification
  • Hypothesis testing (Verification of postulates)
  • Fault detection (process monitoring)

Types Of Data For Data Analysis

In data analysis, mainly two types of data  –

  • Deterministic (non-random)
  • Stochastic (non-deterministic)

Data Life Cycle

Python Pandas Tutorial
Python Pandas Tutorial

In the above figure, you can see that data stored in different formats. It can be a csv file, excel file, html file or any others. So data is basically stored in different formats. Then you have to convert all these data into a single format and store it in somewhere, this is called Data Warehousing.

Now once you have stored data you can perform certain analysis on it such as predictive modeling, join or merge data and many others things. After analysis, you can even plot it in a graph and that stage is called Data Visualization.  

So, this is a general overview of Data Life Cycle.

Why Data Analysis ?

Now we will see why data analysis is useful, with an example.

Let’s consider we have a data set, in which we have data about weather information across the globe from 2015 – 2018.  We have country wise weather data from 2015 – 2018. So there are percentage of rain within that particular country, we have data about that in data set.

Now, what if you want to find only a particular country’s data. In this example, let’s say America, and in that particular country you want to find percentage of rain between 2016 – 2017. Now what should you do. So basically what you need to do is, in the given particular data set you need to perform certain analysis.That analysis should give you percentage of rain in America between 2016 – 2017. And this is called Data analysis.

So this basically explains – what is data analysis and why we use it ?

So till now we have discussed about data analysis, but now we will discuss about how to do data analysis in python. So let’s move ahead.

Python Pandas Tutorial – Introduction To Pandas

What Is Pandas ?

Pandas is an open source python library providing high – performance, easy to use data structures and data analysis tools for python programming language.

  • It is very popular library for data science.
  • It runs on top of NumPy.
  • The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.
  • Wes McKinney is developer of pandas and developed in 2008.
  • The cool thing about Pandas is that it takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows and columns called data frame that looks very similar to table in a statistical software (think Excel or SPSS for example).

Features Of Pandas

Pandas has following features –

  • High-level data structures (data frames)
  • More streamlined handling of tabular data and rich time series functionality.
  • Data alignment, missing-data friendly statistics, groupby, merge and join methods.
  • You can use pandas data structures, and freely draw on NumPy and SciPy functions to manipulate them.

Pandas Data Types

Pandas is well suited for many kinds of data such as –

  • Tabular data with heterogeneously-typed columns
  • Arbitrary matrix data with row and column labels
  • Ordered and unordered time series data
  • Any other form of observational / statistical data sets

Python Pandas Tutorial – Getting started With Pandas

So now in this section, we will learn to implement pandas in python.

Creating New Project

First of all open your IDE and create a new project and inside this project create a new python file. In my case, my project is like this –

Python Pandas Tutorial
Python Pandas Tutorial

Installing Pandas Module

So working with pandas in python, you have to install pandas module.

  • Go to terminal and run following command.

Now pandas module has been installed successfully and now you can work with it.

Importing Pandas

Now you have to import pandas module in your project. So write the following code.

  • pd is an alias of pandas, because using pandas everytime is not a good way.
  • By importing pandas, you can use all the classes and methods of pandas module.

Implementing Pandas In python

So now, you have to do following things to implement pandas –

  • First of all import pandas module so that you can use all the classes and methods of pandas.
  • Then you have to create a dataframe.
  • Dataframe is a main object in pandas. Dataframe is a data structure which is used to represent tabular data such as excel files, csv files etc.
  • There are many ways to create dataframe and i will discuss it later.
  • Now write the following code snippets.

  • Here i created a dictionary which contains some informations about fruits.
  • Then created a dataframe using dictionary.

Now let’s see the result.

Python Pandas Tutorial
Python Pandas Tutorial
  • Here you can see in this dataframe, you have columns and rows.
  • Name and Quantity are column headers.
  • 0,1,2,3 are the default index assigned to each using the function range(n).

So now you have learnt how to work with pandas. Now we will discuss about different ways of creating dataframes. So let’s start.

Python Pandas Tutorial – Different Ways Of Creating Dataframes

we can create pandas dataframes in the following ways, so let’s discuss them one by one.

  • From python Dictionaries
  • From list of tuples
  • From list of dictionaries
  • Using CSV files
  • Using Excel files

From Python Dictionaries

Creating dataframes from python dictionaries is very easy.

  • first of all you need to create a dictionary.
  • Then pass this dictionary as argument in DataFrame() method.
  • Then simply print the dataframe.

So write the following code snippets to create dataframes from python dictionary.

  • If you pass the index, then the length of the index should equal to the length of the arrays.
  • If you don’t pass index, then by default, index will be range(n), where n is the array length.
  • Here i have not passed any index.

Result

Python Pandas Tutorial
Python Pandas Tutorial
  • So here you can see 0,1,2,3 are the indexes which are default and assigned to each row using the function range(n).

And now if you want to create an indexed dataframe then pass the index parameter while creating dataframe. So write the following code for doing this.

Result

Python Pandas Tutorial
Python Pandas Tutorial

And now you can see that the index parameter assigns an index to each row.

From List Of Tuples

For creating pandas dataframes from list of tuple, you need to do following tasks –

  • Create a list in which each element of list will be tuple.
  • This tuple is nothing but a row in your dataframe.
  • Then pass the columns parameter inside the Dataframe() method and specify the column names.
  • So write the following code to implement it practically.

Result

Python Pandas Tutorial
Python Pandas Tutorial

Once again it created a dataframe successfully, and this is the second way of creating pandas dataframes. Let’s move forward and see the another ways.

From List Of Dictionaries

You can also create pandas dataframe from list of dictionaries.

The difference between creating dataframe from dictionary and list of dictionaries is that –

  • In creating dataframe from dictionary, each key contain values i.e., row values. But in creating dataframe from dictionary, each element in the list represents one row along with column specification.
  •  You can see in the list, each record has column – value, column – value and so on

  • In this example, i have created a list of dictionaries that contains book data.
  • Then passed this list as an argument in DataFrame() method.

Result

So the result of creating dataframe from list of dictionaries is here.

Python Pandas Tutorial
Python Pandas Tutorial

Using CSV Files

You can also create dataframes from CSV files. Let’s discuss how to do that.

Write the following code snippets for creating dataframes using CSV file.

  • For creating dataframes using CSV files, first of all you have to read CSV file, for more details check Python CSV Reader Tutorial – Reading CSV Files with Python.
  • read_csv() is a method  that will read the csv into dataframe.
  • If your csv file is not in the same folder where your program file is placed then you have to provide the proper path of that CSV file otherwise just pass the csv file name as argument in the read_csv() method.
  • Now run the code and see the result.

Result

So this is our dataframe that is created by using CSV file.

Python Pandas Tutorial
Python Pandas Tutorial

Using Excel Files

Creating dataframes using excel files is pretty much similar to using csv files.

So write the following code.

  • read_excel() method is used to read the excel into dataframe.
  • Here you have to pass one extra argument which is sheet because excel file contains sheets.
  • Now run the code and check the result.
Python Pandas Tutorial
Python Pandas Tutorial

So basically these are the different way of creating pandas dataframe, some another ways are also present. If you want to explore them then follow this documentation.

And now we have completed python pandas tutorial successfully and learned lots of things.

Suggested Articles :

So here, i am wrapping up Python Pandas Tutorial. In the next tutorial, you will learn python pandas operations that means what type of operations you can perform in pandas. Till then stay tuned with Simplified Python. And if you have any doubt regarding this tutorial then just leave your comments. Happy Coding 🙂

 

4 thoughts on “Python Pandas Tutorial – Data Analysis With Python And Pandas”

  1. Hi Gulsanober Saba
    I have excel file which updating data in every second (having RTD).
    My requirement is that, all data which get update in every second should be update in python at a same time.
    So how should be coded this in python ?

    Reply

Leave a Comment