Problem statement

https://binarysearch.com/problems/Best-Interval-to-Remove/

Solution

  1. First of all, there is a trick, how I handle with empty segments [x, x]. We extend each segment 0.25 to the left and to the right and then multiply by 4. In this way we never have empty segments and also the answer will not change: if segments were separate in original date, they still be separate in new data.
  2. Second trick I use is coordinate compression. We look at segment [1, 5] as segment which cover segments [1, 2] + [2, 3] + [3, 4] + [4, 5]. Also we put as our coordinates not only ends of segment [x, y-1], but also x-1, we will see a bit later why.

Let us consider the case intervals = [[2, 5], [3, 4], [4, 7], [6, 8]]. Then we have:

  1. intervals = [(7, 21), (11, 17), (15, 29), (23, 33)]
  2. d = {7: 0, 11: 1, 15: 2, 17: 3, 21: 4, 23: 5, 29: 6, 33: 7}. What we keep in d are compressed coordinates.
  3. acc = [1, 2, 3, 2, 1, 2, 1]. What we keep in acc is coverage for each segment [0, 1], [1, 2], ...: how many times it is covered: that is first part is covered once, second also once, than twice and so on.
  4. arr = [1, 0, 0, 0, 1, 0, 1] in this array we keep only elements from acc which are not one. We need this, because we only interested in ones: when we remove them, coverege will be zero and number of parst can increase.
  5. arr2 = [1, 2, 2, 2, 3, 4, 5]. Here we calculate special cumulative sum of arr: what we keep here is when we go from 1 to 0 or from 0 to 1, we increase sum. It will help us to underatand how many parts we will have. Imagine the case when we have part [1, 1, 2, 2, 2, 2, 2, 3, 4, 4, 5], then how many parts there will be? There will be one more gap in the left, because 1 becomes equal to 0. Also there will be one gap in the right, because new_acc = [1, 0, 1, 1, 2, 2, 1, 0, 2, 2, 1]. Number of gaps can be calculated, using parity of begin and end: if both of them are odd, it means, that ends of our segments will create gaps as well all group of ones in the middle. If we have both ends even, it means, that we have only gaps between. If one is odd and another is even, it will give one gap. However there can be cases, when gap is not created in the end. Imagine the case acc = [1, 1, 2, 2, 1, 0, 1, 1]. Then if we consider elements from 0th to 4, we will have acc_new = [0, 0, 1, 1, 0, 0, 1, 1]`. What happened here is that in fact on the left gap is not created, because it is the end of array. On the right gap is not created, because there were already gap there. So for each interval we check these conditions and update number of gaps.

Finally, what we do is calculate number of gaps when we take all segments: we look at acc and find how many places we have when we change from 0 to not 0 or from not 0 to 0.

Complexity

It is O(n log n) for time and O(n) for space.

Code

class Solution:
    def solve(self, intervals):
        intervals = [(4*x-1, 4*y+1) for x, y in intervals]
        s = set()
        for x, y in intervals:
            s |= set([x, y])

        d = {x:i for i, x in enumerate(sorted(list(s)))}

        diffs = [0]*(len(d))
        for x, y in intervals:
            diffs[d[x]] += 1
            diffs[d[y]] -= 1

        acc = list(accumulate(diffs))[:-1]
        arr = [i == 1 for i in acc]
        arr2 = list(accumulate((x!=y for x, y in zip([0] + arr, arr))))

        ans, m = 0, len(acc)

        for x, y in intervals:
            A, B = d[x], d[y] - 1
            beg, end = arr2[A], arr2[B]
            cand = (end - beg + beg % 2 + end % 2)//2

            if acc[A] == 1 and (A == 0 or acc[A-1] == 0):
                cand -= 1

            if acc[B] == 1 and (B == m - 1 or acc[B+1] == 0):
                cand -= 1

            ans = max(ans, cand + 1)

        return ans + sum((x*y == 0 and x + y != 0) for x, y in zip(acc, acc[1:]))//2