yokolet's notelets

  1. Design
  2. Find Median from Data Stream

Design

Find Median from Data Stream

Problem Description

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10**(-5) of the actual answer will be accepted.

Constraints:

  • -10**5 <= num <= 10**5
  • There will be at least one element in the data structure before calling findMedian.
  • At most 5 * 10**4 calls will be made to addNum and findMedian.

https://leetcode.com/problems/find-median-from-data-stream/

Examples

Example 1

Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]

Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

How to solve

A stream of data is the input for this problem. In such a case, how to maintain the input data so far is a key to solve the problem. The approach taken here is two heaps which maintain smaller and bigger values compared to the median. When a query comes in, the heap data structure is able to return the answer by O(1).

The solution has two heaps: max heap to maintain smaller values than median, min heap to maintain bigger values then median. In Python, the heap is always non-decreasing order and no way around. A common practice is save (-1) * value to make it non-increasing order.

We want to make min heap (bigger number) to larger or equal to max heap (smaller number). If two heaps have the same size, min heap should have a new value. At first, push -num to max heap, then pop from max heap. Finally, push (-1) * value to min heap. If not (min heap is larger), max heap should have a new value. At first, push num to min heap, then pop from min heap. Finally, push (-1) * value to max heap.

To answer the query, check how many values came in so far. If it is odd, min heap top has the median. If it is even, the average of min and max heap’s top values is the median.

Solution

  • class MedianFinder {
    private:
        int count = 0;
        priority_queue<int> max_heap;
        priority_queue<int, vector<int>, greater<int>> min_heap;
    
    public:
        MedianFinder() {}
    
        void addNum(int num) {
            if (max_heap.size() == min_heap.size()) {
                max_heap.push(num);
                min_heap.push(max_heap.top());
                max_heap.pop();
            } else {
                min_heap.push(num);
                max_heap.push(min_heap.top());
                min_heap.pop();
            }
            ++count;
        }
    
        double findMedian() {
            return count % 2 == 1 ? (float) min_heap.top() : (min_heap.top() + max_heap.top()) / 2.0;
        }
    };
    
  • import java.util.*;
    
    class MedianFinder {
        private int count = 0;
        private Queue<Integer> min_heap = new PriorityQueue<>();
        private Queue<Integer> max_heap = new PriorityQueue<>(Collections.reverseOrder());
    
        public MedianFinder() {}
    
        public void addNum(int num) {
            if (min_heap.size() == max_heap.size()) {
                max_heap.add(num);
                min_heap.add(max_heap.poll());
            } else {
                min_heap.add(num);
                max_heap.add(min_heap.poll());
            }
            ++count;
        }
    
        public double findMedian() {
            return count % 2 == 1 ? (double) min_heap.peek() : (min_heap.peek() + max_heap.peek()) / 2.0;
        }
    }
    
  • var MedianFinder = function() {
        this.count = 0;
        this.min_heap = new MinPriorityQueue(); // already included by leetcode
        this.max_heap = new MaxPriorityQueue(); // already included by leetcode
    };
    
    /**
     * @param {number} num
     * @return {void}
     */
    MedianFinder.prototype.addNum = function(num) {
        if (this.max_heap.size() == this.min_heap.size()) {
            this.max_heap.enqueue(num);
            this.min_heap.enqueue(this.max_heap.dequeue().element);
        } else {
            this.min_heap.enqueue(num);
            this.max_heap.enqueue(this.min_heap.dequeue().element);
        }
        this.count++;
    };
    
    /**
     * @return {number}
     */
    MedianFinder.prototype.findMedian = function() {
        return this.count % 2 == 1 ?
            this.min_heap.front().element.toFixed(1) :
            (this.min_heap.front().element + this.max_heap.front().element) / 2.0;
    };
    
  • class MedianFinder:
    
        def __init__(self):
            self.count = 0
            self.maxheap = []
            self.minheap = []
    
        def addNum(self, num: int) -> None:
            if len(self.maxheap) == len(self.minheap):
                heapq.heappush(self.minheap,
                              -heapq.heappushpop(self.maxheap, -num))
            else:
                heapq.heappush(self.maxheap,
                              -heapq.heappushpop(self.minheap, num))
            self.count += 1
    
        def findMedian(self) -> float:
            if self.count % 2 == 1:
                return float(self.minheap[0])
            else:
                return (-self.maxheap[0] + self.minheap[0]) / 2.0
    
  • # This solution gets Time Limit Exceeded, so doesn't pass
    require 'algorithms'
    
    class MedianFinder
      def initialize()
        @count = 0
        @min_heap = Containers::MinHeap.new
        @max_heap = Containers::MaxHeap.new
      end
    
    =begin
        :type num: Integer
        :rtype: Void
    =end
      def add_num(num)
        if @min_heap.size == @max_heap.size
          @max_heap.push(num)
          @min_heap.push(@max_heap.pop)
        else
          @min_heap.push(num)
          @max_heap.push(@min_heap.pop)
        end
        @count += 1
      end
    
    =begin
        :rtype: Float
    =end
      def find_median()
        @count % 2 == 1 ? @min_heap.min.to_f : (@min_heap.min + @max_heap.max) / 2.0
      end
    end
    

Complexities

  • Time: addNum: O(log(n)), findMedian: O(1)
  • Space: O(n)
Hard
Design
Heap (Priority Queue)
Data Stream
Sorting
Two Pointers