Aggregation Framework Overview

MongoDB Aggregation is a powerful framework for processing data documents and returning computed results. It enables complex data analysis and transformation operations including counting, summing, averaging, matching, grouping, and filtering.

MongoDB provides three primary approaches to perform aggregation:

  • Aggregation Pipeline - The most flexible and recommended approach using a series of processing stages
  • Map-Reduce - For complex aggregation operations requiring custom logic
  • Single Purpose Aggregation Methods - Simplified methods for basic counting and averaging

Sample Database - Suzuki Cars Collection

The following examples use a cars collection containing documents with name, type, and price fields:


[
 {"name": "swift","type": "hatchback","price":800000},
 {"name": "ciaz","type": "sedan","price":1000000},
 {"name": "baleno","type": "hatchback","price":850000},
 {"name": "baleno rs","type": "hatchback","price":950000},
 {"name": "scross","type": "crossover","price":11500000},
 {"name": "brezza","type": "suv","price":990000},
 {"name": "gypsy","type": "suv","price":750000}
]    

Group aggregate by Name

$group stage is used to group documents by a specified identifier. It is a fundamental stage in the aggregation pipeline that allows you to group documents based on a specified expression and perform accumulations like sum, average, min, max, etc.

Grouping documents by a specific field to identify unique values.

{ "_id" : "gypsy" }
{ "_id" : "baleno rs" }
{ "_id" : "ciaz" }
{ "_id" : "baleno" }
{ "_id" : "swift" }
{ "_id" : "brezza" }
{ "_id" : "scross" }
db.suzuki.aggregate([{$group:{ _id:'$name' }}])

This aggregation pipeline operation groups documents by the name field, displaying each unique car name as an _id. This is useful for identifying distinct values in your collection.

{ "_id" : "sedan" }
{ "_id" : "hatchback" }
{ "_id" : "crossover" }
{ "_id" : "suv" }
db.suzuki.aggregate([{$group:{ _id:'$type' }}])

This aggregation operation groups documents by the type field to identify all unique vehicle categories (sedan, hatchback, SUV, crossover). It returns distinct values without counting or any additional calculations.


$match

The $match stage filters documents to pass only those that match the specified condition(s) to the next pipeline stage. It is similar to a query filter and is used to select documents based on criteria.

{ "_id" : "brezza", "type" : "suv", "price" : 990000 }
{ "_id" : "gypsy", "type" : "suv", "price" : 750000 }
db.suzuki.aggregate([{$match:{ type:'suv' }}])

This aggregation pipeline uses the $match stage to filter documents where the type field equals 'suv'. Only SUV vehicles are returned in the result set.

{ "_id" : "scross", "type" : "crossover", "price" : 11500000 }
db.suzuki.aggregate([{$match:{ price:{$gt:10000000} }}])

This aggregation filters documents to include only those with a price greater than 10,000,000. The result set contains only the crossover vehicle 'scross' that meets this condition.


$sum Operator

The $sum operator aggregates numeric values by adding them together. It can count documents (using 1) or sum specific field values. This is essential for totaling sales, prices, quantities, and other numeric metrics.

{ "_id" : "suv", "total" : 2 }
{ "_id" : "crossover", "total" : 1 }
{ "_id" : "sedan", "total" : 1 }
{ "_id" : "hatchback", "total" : 3 }
db.suzuki.aggregate([{$group:{ _id:'$type', total:{$sum:1} }}])

This aggregation counts the number of cars (documents) in each vehicle category. The $sum:1 increments the counter by 1 for each document in the group.

{ "_id" : "crossover", "total" : 11500000 }
{ "_id" : "suv", "total" : 1740000 }
{ "_id" : "sedan", "total" : 1000000 }
{ "_id" : "hatchback", "total" : 2600000 }
db.suzuki.aggregate([{$group:{ _id:'$type', total:{$sum:'$price'} }}])

This aggregation calculates the total cumulative price for all cars within each vehicle category. The $sum:'$price' adds up the price field values for all grouped documents.



$avg Operator

The $avg operator calculates the average (mean) value of numeric fields across grouped documents. It's useful for finding average prices, ratings, quantities, and other metrics to understand typical values in your dataset.

{ "_id" : "suv", "total" : 870000 }
{ "_id" : "crossover", "total" : 11500000 }
{ "_id" : "sedan", "total" : 1000000 }
{ "_id" : "hatchback", "total" : 866666.6666666666 }
db.suzuki.aggregate([{$group:{ _id:'$type', total:{$avg:'$price'} }}])

This aggregation calculates the average price for each vehicle category by summing all prices in a group and dividing by the document count. Hatchback average is approximately 866,666.67, while the crossover average is 11,500,000.


$min Operator

The $min operator returns the smallest value of a field across all documents in a group. Use it to find the lowest price, minimum rating, smallest quantity, or other minimum metrics to identify the least expensive or smallest items in your data.

{ "_id" : "sedan", "total" : 1000000 }
{ "_id" : "hatchback", "total" : 800000 }
{ "_id" : "suv", "total" : 750000 }
{ "_id" : "crossover", "total" : 11500000 }
db.suzuki.aggregate([{$group:{ _id:'$type', total:{$min:'$price'} }}])

This aggregation finds the lowest-priced vehicle in each category. For example, the cheapest SUV (gypsy) costs 750,000, while the cheapest hatchback (swift) costs 800,000.


$max Operator

The $max operator returns the largest value of a field across all documents in a group. Use it to find the highest price, maximum rating, largest quantity, or other maximum metrics to identify the most expensive or largest items in your dataset.

{ "_id" : "hatchback", "total" : 950000 }
{ "_id" : "crossover", "total" : 11500000 }
{ "_id" : "sedan", "total" : 1000000 }
{ "_id" : "suv", "total" : 990000 }
db.suzuki.aggregate([{$group:{ _id:'$type', total:{$max:'$price'} }}])

This aggregation identifies the most expensive vehicle in each category. The crossover (scross) is the most expensive at 11,500,000, while the hatchback (baleno rs) peaks at 950,000.


$filter Operator

The $filter operator selects a subset of an array to return based on specified conditions. It is useful for filtering elements within an array field of documents.


db.orders.aggregate([
   { $project:
      {
        item: 1,
        filtered_inventory: {
           $filter: {
              input: "$inventory",          // array to filter
              as: "inv",                    // variable name for each element
              cond: { $gt: [ "$$inv.qty", 50 ] } // condition
           }
        }
      }
   }
])

In this example, the $filter operator is used within a $project stage to filter the inventory array. It returns only those inventory items where the quantity (qty) is greater than 50. The filtered results are stored in a new field called filtered_inventory.


$lookup Operator

The $lookup operator performs a left outer join to another collection in the same database to filter in documents from the "joined" collection for processing.


db.orders.aggregate([
   { $lookup:
      {
        from: "inventory",      // collection to join
        localField: "item",    // field from the orders collection  
        foreignField: "sku",    // field from the inventory collection
        as: "inventory_docs"    // output array field
      }
    }
])

In this example, the $lookup stage joins the orders collection with the inventory collection. It matches documents where the item field in orders equals the sku field in inventory. The resulting matched documents from inventory are added to each orders document in a new array field called inventory_docs.