GroupBy in Python

Learn how to analyze data efficiently using pandas methods like aggregate(), filter(), transform(), and apply().

Start Learning

Introduction

The GroupBy object in pandas is a powerful tool for grouping and analyzing data. With methods like aggregate(), filter(), transform(), and apply(), you can efficiently perform operations on subsets of your dataset.

aggregate()

The aggregate() method applies aggregation functions like sum, mean, or count to groups of data.

Real-World Example: Calculate the total and average sales by category.


      import pandas as pd
      
      # Sales data
      data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing'],
              'Sales': [1500, 1200, 1700, 800]}
      
      df = pd.DataFrame(data)
      
      # Grouping by category
      grouped = df.groupby('Category')
      result = grouped.aggregate(['sum', 'mean'])
      
      print(result)
          

Output:

            Sales
            sum   mean
            Category                    
            Clothing      2000  1000.0
            Electronics   3200  1600.0
          

Read More

filter()

Use filter() to include or exclude groups based on a condition.

Real-World Example: Display categories with total sales greater than 2500.


      # Filter categories with total sales > 2500
      filtered = grouped.filter(lambda x: x['Sales'].sum() > 2500)
      print(filtered)
      
          

Output:

      Category       Sales
      Electronics    1500
      Electronics    1700
          

Read More

transform()

The transform() method applies a function to each group and returns a DataFrame with the same shape as the original.

Real-World Example: Calculate each sale's percentage contribution to total sales in its category.


      # Calculate sales percentage
      df['Percent'] = grouped['Sales'].transform(lambda x: x / x.sum() * 100)
      print(df)
    
      

Output:

      Category       Sales      Percent
      Electronics    1500       46.88
      Electronics    1700       53.12
      Clothing       1200       60.00
      Clothing        800       40.00
          

Read More

apply()

The apply() method applies a custom function to each group and returns the result.

Real-World Example: Find the largest sales difference within each category.



      # Calculate the largest sales difference in each category
      result = grouped.apply(lambda x: x['Sales'].max() - x['Sales'].min())
      print(result)
      
          

Output:

      Category
      Clothing       400
      Electronics    200
          

Read More

Feature aggregate() filter() transform() apply()
Purpose Aggregates numerical data using multiple functions. Filters groups based on a condition. Applies a function element-wise while keeping the same shape. Applies a function along rows or columns.
Typical Use Case Summing, averaging, or finding min/max values for groups. Keeping only groups that meet a condition. Standardizing or normalizing data within groups. Applying custom functions row-wise or column-wise.
Returns A scalar or DataFrame depending on the functions used. A subset of the original data. A Series or DataFrame of the same shape. A Series, DataFrame, or scalar.
Shape Preservation Reduces dimensions. Keeps only matching groups. Maintains the original shape. Can change the shape depending on the function.
Supports GroupBy? Yes Yes Yes Yes
Function Behavior Returns a scalar or a reduced object. Returns True to keep groups, False to drop them. Returns a Series of the same length as the input. Can return any type of object.
Example Usage df.groupby('Category').agg({'Sales': ['sum', 'mean']}) df.groupby('Category').filter(lambda x: x['Sales'].sum() > 1000) df.groupby('Category')['Sales'].transform(lambda x: x / x.mean()) df.apply(lambda row: row['Sales'] * 2, axis=1)
Performance Fast for built-in aggregation functions. Efficient but can be slow for complex conditions. Faster than apply() for element-wise operations. Can be slow if used inefficiently.
Use with Multiple Functions Yes No No Yes
Maintains Grouping? No No Yes No

Test Your Understanding with Flash Cards