Introduction
The GroupBy
object in pandas
is a powerful
tool for grouping and analyzing data. With methods like
aggregate()
, filter()
,
transform()
, and apply()
, you can
efficiently perform operations on subsets of your dataset.
Below, we use real-world examples such as sales analysis, payroll calculations, and customer data insights.
aggregate()
The aggregate()
method applies aggregation functions like
sum
, mean
, or count
to groups
of data.
Real-World Example: Calculate the total and average sales by category.
import pandas as pd
# Sales data
data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing'],
'Sales': [1500, 1200, 1700, 800]}
df = pd.DataFrame(data)
# Grouping by category
grouped = df.groupby('Category')
result = grouped.aggregate(['sum', 'mean'])
print(result)
Output:
Sales sum mean Category Clothing 2000 1000.0 Electronics 3200 1600.0
filter()
Use filter()
to include or exclude groups based on a
condition.
Real-World Example: Display categories with total sales greater than 2500.
# Filter categories with total sales > 2500
filtered = grouped.filter(lambda x: x['Sales'].sum() > 2500)
print(filtered)
Output:
Category Sales Electronics 1500 Electronics 1700
transform()
The transform()
method applies a function to each group
and returns a DataFrame with the same shape as the original.
Real-World Example: Calculate each sale's percentage contribution to total sales in its category.
# Calculate sales percentage
df['Percent'] = grouped['Sales'].transform(lambda x: x / x.sum() * 100)
print(df)
Output:
Category Sales Percent Electronics 1500 46.88 Electronics 1700 53.12 Clothing 1200 60.00 Clothing 800 40.00
apply()
The apply()
method applies a custom function to each
group and returns the result.
Real-World Example: Find the largest sales difference within each category.
# Calculate the largest sales difference in each category
result = grouped.apply(lambda x: x['Sales'].max() - x['Sales'].min())
print(result)
Output:
Category Clothing 400 Electronics 200
Summary
Method | Description |
---|---|
aggregate() | Applies aggregation functions like sum and mean on grouped data. |
filter() | Filters groups based on a condition. |
transform() | Applies a function to each group without changing the original data shape. |
apply() | Applies a custom function to each group of data. |