joey lim

  • work
  • blog

Finding duplicates: algorithmic optimization

26 April, 2021 / 3 minute read

Problem

Given a list of numbers, what is the optimal method to determine whether it contains any duplicates? While learning about algorithms and data structures, I found it helpful to analyze Big O time and space complexities of the solutions to this problem.

"Contains Duplicate" is a simple example of this problem.

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Solution 1: Brute force

Runtime: 884ms, faster than 15.35% of JavaScript submissions

var containsDuplicate = function (nums) { for (let i = 0; i < nums.length; i++) { for (let j = i + 1; j < nums.length; j++) { if (nums[i] == nums[j]) { return true; } } } return false; };

Explanation:

First, we loop through each element i of the array, and for each element i, we want to check if any of the subsequent elements i + 1 are duplicates of i. If a duplicate is found, we return true. If no duplicates are found after iterating through the array, the function returns false.

This brute force solution uses a nested for loop - giving us a runtime of O(n2). Since no extra space is needed, the space complexity is O(1). How could the runtime be improved?

Solution 2: Sort, then iterate

Runtime: 76ms, faster than 98.12% of JavaScript submissions

var containsDuplicate = function (nums) { nums.sort(function (a, b) { return a - b; }); for (let i = 0; i < nums.length; i++) { if (nums[i] == nums[i + 1]) { return true; } } return false; };

Explanation:

By sorting the array first, a nested loop is no longer needed. We only need to iterate through the sorted array and check if consecutive elements are duplicates. Because the array is sorted, any duplicates will be in consecutive order.

What is the time complexity of Array.prototype.sort()? Chrome's V8 uses Timsort, a combination of merge sort and insertion sort. The time complexity of merge sort is O(n log n), while iterating through the array once is O(n). Therefore, the time complexity of the solution is O(n log n), a significant improvement over the initial solution. Again, since no extra space is needed, the space complexity is O(1).

Solution 3: Hash map

Runtime: 88 ms, faster than 69.01% of JavaScript submissions

var containsDuplicate = function (nums) { const map = {}; for (let i = 0; i < nums.length; i++) { if (!map.hasOwnProperty(nums[i])) { map[nums[i]] = 1; } else { return true; } } return false; };

Explanation:

In this solution, a hash map is used to keep track of whether the current number has already appeared in the list. Reading and inserting data into a hash map is fast - O(1). Here, we iterate through each element of the list once until we find one that already exists in our hash map. This gives us a time complexity of O(n).

The trade-off we make in this solution is that implementing a hash map requires extra space corresponding to the size of the list, O(n), while the previous solutions do not require any extra space.

Big O vs real world performance

Remembering that Big O is commonly used to refer to the upper-bound of an algorithm's running time, an algorithm with a runtime of O(n) will not necessarily be faster than one with a runtime of O(n log n), especially for a smaller dataset. As we can see with the small dataset of test cases used here, Solution 2 is actually faster than Solution 3 (76ms vs 88ms).

© 2022 — Designed & developed by Joey Lim