GroupBy in Python

Learn how to analyze data efficiently using pandas methods like aggregate(), filter(), transform(), and apply().

Start Learning

Introduction

The GroupBy object in pandas is a powerful tool for grouping and analyzing data. With methods like aggregate(), filter(), transform(), and apply(), you can efficiently perform operations on subsets of your dataset.

Below, we use real-world examples such as sales analysis, payroll calculations, and customer data insights.

aggregate()

The aggregate() method applies aggregation functions like sum, mean, or count to groups of data.

Real-World Example: Calculate the total and average sales by category.


      import pandas as pd
      
      # Sales data
      data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing'],
              'Sales': [1500, 1200, 1700, 800]}
      
      df = pd.DataFrame(data)
      
      # Grouping by category
      grouped = df.groupby('Category')
      result = grouped.aggregate(['sum', 'mean'])
      
      print(result)
          

Output:

            Sales
            sum   mean
            Category                    
            Clothing      2000  1000.0
            Electronics   3200  1600.0
          

Read More

filter()

Use filter() to include or exclude groups based on a condition.

Real-World Example: Display categories with total sales greater than 2500.


      # Filter categories with total sales > 2500
      filtered = grouped.filter(lambda x: x['Sales'].sum() > 2500)
      print(filtered)
      
          

Output:

      Category       Sales
      Electronics    1500
      Electronics    1700
          

Read More

transform()

The transform() method applies a function to each group and returns a DataFrame with the same shape as the original.

Real-World Example: Calculate each sale's percentage contribution to total sales in its category.


      # Calculate sales percentage
      df['Percent'] = grouped['Sales'].transform(lambda x: x / x.sum() * 100)
      print(df)
    
      

Output:

      Category       Sales      Percent
      Electronics    1500       46.88
      Electronics    1700       53.12
      Clothing       1200       60.00
      Clothing        800       40.00
          

Read More

apply()

The apply() method applies a custom function to each group and returns the result.

Real-World Example: Find the largest sales difference within each category.



      # Calculate the largest sales difference in each category
      result = grouped.apply(lambda x: x['Sales'].max() - x['Sales'].min())
      print(result)
      
          

Output:

      Category
      Clothing       400
      Electronics    200
          

Read More

Summary

Method Description
aggregate() Applies aggregation functions like sum and mean on grouped data.
filter() Filters groups based on a condition.
transform() Applies a function to each group without changing the original data shape.
apply() Applies a custom function to each group of data.

Test Your Understanding with Flash Cards