groupby: Pandas DataFrame

We can combining data based on header and apply different aggregate function to it.

DataFrame.groupby(by,axis, level, as_index, sort, group_keys, squeeze, observed)

by : The column or label or list over which the groups will be formed.
axis : 0 ( default ) over different axis
lavel :groupby particular level or levels
as_index : bool ( default True )
sort: (default True) Sort group keys
group_keys: While using apply we can refer to
squeez: (default False ) reduce the dimension
We used this CSV file for our examples. You can download and run the script.

sales.csv Six columns , sale_id, c_id ( customer id ), p_id (product_id), qty ( quantity sold) ,store ( name )

You can also use this DataFrame for our demo examples here.
import pandas as pd 
my_dict={'sale_id':[1,2,3,4,5,6,7,8,9],
         'c_id':[2,2,1,4,2,3,2,3,2],
         'p_id':[3,4,3,2,3,3,2,2,3],
	 'product':['Monitor','CPU','Monitor','RAM','Monitor','Monitor','RAM','RAM','Monitor'],
         'qty':[2,1,3,2,3,2,3,2,2],
         'store':['ABC','DEF','ABC','DEF','ABC','DEF','ABC','DEF','ABC']}
sales = pd.DataFrame(data=my_dict)
print(sales)
using groupby on product
import pandas as pd 
sales=pd.read_csv("sales.csv") 
my_sale=sales.groupby(['product'])
print(my_sale.first())
Output is here
         sale_id  c_id  p_id  qty store
product                                
CPU            2     2     4    1   DEF
Monitor        1     2     3    2   ABC
RAM            4     4     2    2   DEF
Along with product we are also getting first matching row values.
We can apply groupby on two columns ( product and store )
my_sale=sales.groupby(['product','store'])
Output
               sale_id  c_id  p_id  qty
product store                          
CPU     DEF          2     2     4    1
Monitor ABC          1     2     3    2
        DEF          6     3     3    2
RAM     ABC          7     2     2    3
        DEF          4     4     2    2
We can apply all above parameters
my_sale=sales.groupby(['product'],observed=False,squeeze=False,as_index=False,group_keys=True,sort=False)
Output
   product  sale_id  c_id  p_id  qty store
0  Monitor        1     2     3    2   ABC
1      CPU        2     2     4    1   DEF
2      RAM        4     4     2    2   DEF

Using get_group()

Collect rows of a particular group
my_sale=sales.groupby(['product'])
print(my_sale.get_group('Monitor'))
Output ( rows of all Monitor under product )
   sale_id  c_id  p_id  product  qty store
0        1     2     3  Monitor    2   ABC
2        3     1     3  Monitor    3   ABC
4        5     2     3  Monitor    3   ABC
5        6     3     3  Monitor    2   DEF
8        9     2     3  Monitor    2   ABC

Using sum()

We can get sum of any column data by grouping them under another column. Here we will get total quantity sold against different types of products.
my_sale=sales.groupby(['product'])[['qty']].sum()
Output
         qty
product     
CPU        1
Monitor   12
RAM        7
We can use two columns for grouping and get the breakup of sum of quantity sold
my_sale=sales.groupby(['product','store'])[['qty']].sum()
Output
               qty
product store     
CPU     DEF      1
Monitor ABC     10
        DEF      2
RAM     ABC      3
        DEF      4

Using max()

What is highest quantity sold within every ( group of ) products ?
my_sale=sales.groupby(['product'])[['qty']].max()
Output
         qty
product     
CPU        1
Monitor    3
RAM        3

Using min()

What is the lowest quantity sold within every ( group of ) products ?
my_sale=sales.groupby(['product'])[['qty']].min()
Output
         qty
product     
CPU        1
Monitor    2
RAM        2

Using mean()

Get the mean of each product quantity sold
my_sale=sales.groupby(['product'])[['qty']].mean()
Output
              qty
product          
CPU      1.000000
Monitor  2.400000
RAM      2.333333

Using std()

Using standard deviation function std()
my_sale=sales.groupby(['product'])[['qty']].std()
              qty
product          
CPU           NaN
Monitor  0.547723
RAM      0.577350

Using count()

Total number of sales against each product. This can be treated as total number of rows against each product
my_sale=sales.groupby(['product'])[['qty']].count()
Output
         qty
product     
CPU        1
Monitor    5
RAM        3

Using apply()

my_sale=sales.groupby(['product'])
print(my_sale.apply(lambda x: x['qty'].sum() > 2))
Output
product
CPU        False
Monitor     True
RAM         True
dtype: bool
Pandas DataFrame
Your Rating




Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2019 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer