Problem statement

https://leetcode.com/problems/online-majority-element-in-subarray/

Solution

Quite difficult problem if we want to solve it without randomness (see later what I mean by this). The idea is to build segment tree, where on each segment we will keep two values (candidate, frequency of this candidate). Then the idea is to use Boyer-Moore majority vote algorithm to merge two segments.

Indeed, it can be shown, that if we have one segment with candidate a and another segment with candidtate b, then if candidates are the same, we sum their frequencies, if not, we choose the more frequent one and subtract frequencies.

In this way we always have only one candidate for each segment, but no one says that it will be the majority element (just like in Boyer-Moore), so we need another pass in the end to check if it is correct element. How we can do it? For each element we create array of indexes for this element and then we do binary search to check how many elements we have.

Here we use segment trees with signature (t, l, r), where t is number of segment and l and r are left and right points of it. In this way it is easy to traverse the tree. Code for build is taken from cp-algorithms.com, for query it also can be taken from this place, but there is alternative way to do it, look in my code.

Complexity

Time to build tree is O(n), space to build is O(n). As well as O(n) to create self.indexes dictionary of lists. For query we can do it in O(log n) time: first part is find candidate and the same complexity is to check if it is good candidate.

Code

from bisect import bisect, bisect_left

class SegmentTree:
    def __init__(self, n, arr):
        self.nums = [() for _ in range(4*n)]   #can make smaller?
        self.arr = arr

    def merge(self, t1, t2):
        if t1[0] == t2[0]: return (t1[0], t1[1] + t2[1])
        if t1[1] > t2[1]: return (t1[0], t1[1] - t2[1])
        return (t2[0], t2[1] - t1[1])

    def build(self, t, l, r):
        if l == r: 
            self.nums[t] = (self.arr[l], 1)
        else:
            mid = (l + r)//2
            self.build(2*t, l, mid)
            self.build(2*t+1, mid+1, r)
            self.nums[t] = self.merge(self.nums[2*t], self.nums[2*t+1])

    def query(self, t, l, r, ql, qr):
        if l > qr or r < ql: return (-1, 0)
        if l >= ql and r <= qr: return self.nums[t]
        m = (l + r)//2
        return self.merge(self.query(2*t, l, m, ql, qr), self.query(2*t+1, m+1, r, ql, qr))

class MajorityChecker:
    def __init__(self, arr):
        self.n = n = len(arr)
        self.Tree = SegmentTree(n, arr)
        self.Tree.build(1, 0, n - 1)
        self.indexes = defaultdict(list)
        for i in range(n):
            self.indexes[arr[i]].append(i)
        
    def query(self, l, r, tresh):
        cand = self.Tree.query(1, 0, self.n-1, l, r)
        if cand[1] == 0 : return -1
        s = bisect_left(self.indexes[cand[0]], l) 
        e = bisect(self.indexes[cand[0]], r)
        return cand[0] if e - s >= tresh else -1

Remark: other ways to solve

Another smart idea is that if we have majority element, than if we take say 20 elements from our range, then probability that we did not seen it is very small. So, for each of candidates, do binary search to calculate how many times we seen it and check if it is big enough. I am not sure however how to avoid collisions fully, like in rolling hash, where we compare strings element by element. Here it seems we can not do it.

Another idea is if we sort elements by frequency, than we can have complexity O(sqrt(n)*log n) for one query. The idea is that there can be no more than O(sqrt(n)) candidates for each range: given range with size A, for candidate to worth to check its size should be >= A/2.

  1. If A <= sqrt(n), then we can have no more than sqrt(n) candidates.
  2. If A > sqrt(n), then we can have no more than n/(A/2) = O(sqrt(n)) candidates again.
class MajorityChecker:
    def __init__(self, arr):
        self.loc = defaultdict(list)
        for i, n in enumerate(arr):
            self.loc[n].append(i)    
        self.nums = sorted(self.loc.keys(), key = lambda n: len(self.loc[n]))[::-1]
        
    def query(self, left, right, threshold):
        for n in self.nums:
            if len(self.loc[n]) < threshold: return -1
            l, r = bisect.bisect_left(self.loc[n], left), bisect.bisect(self.loc[n], right)
            if r - l >= threshold: return n
        return -1