Posted in Algorithm

Space efficient Counting Sort using Hash Map

Reducing the space complexity of counting sort from O(n+k) to O(n)

Space efficient Counting Sort using Hash Map
Image generated by AI

Among all other sorting algorithms, counting sort holds a glamorous time complexity with a very inefficient auxiliary space requirement which diminish the glamour it deserves. Today, we’re going to experiment and tinker that algorithm to revive its unseen glamour!

TL;DR It is a non comparison-based sorting algorithm which works by counting the occurrences (frequency) of each input. We’re going to use hash map instead of array to store those frequencies.

Radix sort uses counting sort algorithm to sort the digits of given unsorted numbers. This article doesn’t discuss about how counting sort works. It’d be better to first understand how the array based counting sort works before diving into this hash map based solution.

The Problem

Considering the input size as n and max element as k counting sort has a time complexity of O(n + k) which is pretty decent but the main issue with this algorithm is the auxiliary space complexity which is O(n + k) and that’s what it is infamous for. It works well as long as the range of numbers is short. But, the space complexity increases proportionally with the maximum number of the given array. Let’s look into this with examples:

Say, we have an array of 5 numbers to sort: [9, 4, 3, 4, 8], here the maximum number is 9 thus counting sort algorithm will allocate a count array of length 10 to make the indices from 0 to 9 available. Why? Because it will store the count (or frequency) of each number in the corresponding index. For our input array[9, 4, 3, 4, 8], the count array will look like this:

int count[]  
    [0] => 0  
    [1] => 0  
    [2] => 0  
    [3] => 1  
    [4] => 2  
    [5] => 0  
    [6] => 0  
    [7] => 0  
    [8] => 1  
    [9] => 1

Even for a small input array of length 5, count array has lots of unused spaces in indices 0, 1, 2, 5, 6, 7 which are holding 0 all the way. Let’s change just a single number from the unsorted input array. Now, our modified array looks like this: [9, 4, 3, 4, 800], we’ve changed 8 to 800 and now the algorithm will require count array to be of length 801! Like this:

int count[]  
    [0] => 0  
    [1] => 0  
    [2] => 0  
    [3] => 1  
    [4] => 2  
    [5] => 0  
    [6] => 0  
    [7] => 0  
    [8] => 0  
    [9] => 1  
    ...  
    ...  
    [799] => 0  
    [800] => 1

The required size of the array costs 801 * 4 = 3204 bytes of auxiliary space where most of the indices are unused. That’s definitely not very efficient for storing just 5 numbers.

Space efficient Counting Sort using Hash Map Medium Space efficient Counting Sort using Hash Map

The Solution

Instead of using array for storing the count of numbers, we can use Map or HashMap. That should be an <Integer, Integer>HashMap and the hash map will store the count of each numbers at the key of that number.

Consider the problematic input array [9, 4, 3, 4, 800] again: the countMap map will be like this after counting all the numbers of the given array:

HashMap countMap:  
    (9) => 1  
    (4) => 2  
    (3) => 1  
    (800) => 1

Look, the space required has been heavily reduced and now we need to only store the count of the numbers we need, no unused space is required in this process. But the space optimization comes at a cost. Even though the hashing algorithms are well optimized, the insertion and retrieval operation sometimes require O(n) complexity in rare-worst cases. As they say:

The typical and desired time complexity for basic operations like insertion, lookup, and deletion in a well-designed hash map is O(1) on average

We can rest assured that, for our integer arrays, on average we’ll get O(1) time complexity with the bonus of O(n) space complexity instead of O(n + k)!

HashMap based implementation

To implement the algorithm, we’re going to use Java as our programming language.

/* CountSort.java */
import java.util.HashMap;

class CountSort {
  public static void sort(int[] arr) {
    int max = Integer.MIN_VALUE, temp, length = arr.length;
    
    // Map to count the numbers
    HashMap<Integer, Integer> countMap = new HashMap<>();

    // Count the occurences of each number
    for (int i = 0; i < length; i++) {
      // the number is not in the map
      if(countMap.get(arr[i]) == null) {
        countMap.put(arr[i], 1);
      }
      // the number is in the map
      else {
        temp = countMap.get(arr[i]);
        countMap.put(arr[i], temp + 1); // increment the counter
      }

      // store the maximum number
      if (arr[i] > max) {
        max = arr[i];
      }
    }

    int cumulativeSum = 0; // variable to store cumulative sum

    // taverse from 0 to 'max' to calculate cumulative sum
    for (int i = 0; i <= max; i++) {
        temp = countMap.get(i) == null ? 0 : countMap.get(i);
      
        if (temp != 0) {
            cumulativeSum += temp;
          countMap.put(i, cumulativeSum);
        }
    }

    int[] sorted = new int[length]; // output array

    // insert elements in the deserving position of the output array
    for (int i = 0; i < length; i++) {
      temp = countMap.get(arr[i]);
      sorted[temp - 1] = arr[i];
      countMap.put(arr[i], temp - 1);
    }
    
    print(sorted);
  }

  // print array
  private static void print(int[] arr) {
    int length = arr.length;
    for (int i = 0; i < length; i++) {
      System.out.print(arr[i] + " ");
    }
  }
}

Now, a driver class to run the algorithm:

class Main {
  public static void main(String[] args) {
    int[] arr = {4, 2, 2, 8, 3, 3, 1};
    
    CountSort.sort(arr);
  }
}

Output: 1 2 2 3 3 4 8


Feel free to suggest any improvements and modifications to the algorithm in the comment section below!