Problem statement

https://leetcode.com/problems/remove-boxes/

Solution

Very difficult problem, in fact one of the top 10 most difficult problem I met on leetcode.

Let us consider the following example and the following sequence of steps of removal - it is not optimal one and it is just shown for the purpose of example.

132211232111223144 -> 1322232111223144 -> 132223111223144 -> 133111223144 -> 1111223144 -> 11113144 -> 1111144 -> 44 -> "".

Let us look in original string, what intervals we processed:

1 3 2 2 1 1 2 3 2 1 1 1 2 2 3 1 4 4

. . . . _ _ . . . . . . . . . . . .

. . . . . . . . _ . . . . . . . . .

. . _ _ _ _ _ . . . . . . . . . . .

. _ _ _ _ _ _ _ . . . . . . . . . .

. . . . . . . . . . . . _ _ . . . .

. . . . . . . . . . . . . . _ . . .

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ . .

. . . . . . . . . . . . . . . . _ _

Then these intervals have property: either they do not intersect or one lies fully in another. Also there will be interval which have the most left element (because in the end it will be removed, so there can be two cases:

  1. If this interval do not cover all range we consider like in our example, then problem can be divided into two smaller problems.
  2. If it covers full interval (imagine our example, but without two last elements), then problem is not separated into two subproblems. However in this case we can think about what elements inside were already taken or not: we are interested in 1. In our example ones on positoins [4, 5] are already taken, that is there is interval which cover them, and ones on places [9, 10, 11] are not taken. Let us consider the first not taken one. (there can be several not-taken ones, but we do not care) Then, we can say that for interval [1, 8] problem can be solved independently. We have 1--------1112231, where we imagine that we solved this first part already. We can notice that what we need to solve now is almost the problem for the right half but with one extra 1 added to the left.

It means, that we need to consider states dp(i, j, k) is answer for the following problem: given substring B[i:j+1] and we added k symbols B[i] before the start. Taking into account all what we have before we can have two options:

  1. Interval which is covering element B[i] has length 1, then we have (k+1)**2 + dp(i+1, j, 0).
  2. Interval has length more than one, then we can say that problem for interval i+1, m-1 is independent and we can split our problem into two subproblems again: look at 1--------1112231 case once again.

Complexity

It is O(n^4) for time and O(n^3) for space. I think more careful analysis can be made.

Code

class Solution:
    def removeBoxes(self, B):
        
        @lru_cache(None)
        def dp(i, j, k):
            if i > j: return 0
            indx = [m for m in range(i+1, j+1) if B[m] == B[i]]
            ans = (k+1)**2 + dp(i+1, j, 0)
            return max([ans] + [dp(i+1, m-1, 0) + dp(m, j, k+1) for m in indx])   
            
        return dp(0, len(B)-1, 0)

Remark

We can use indx = [m for m in range(i+1, j+1) if B[m] == B[i] and (m == i+1 or B[m] != B[m-1])]

to evaluate our indexes, than we will do some early prunnig: speed will be several times faster. You can look at this as to greedy strategy: if we have range 1111222223333, we will never take only part of group.

Question: is it really O(n^4)? Because if n = 100, it will be too big?

Yes, it is in fact yes, even though constant is quite small. For example for string [1, 2]*(n//2) there will be approximately n^4/94 runs of function dp: we can figure this out if we count number of calls (note that we need to count number of not hashed calles but all calls).

Similar problems:

0312. Burst Ballons

0664. Strange Printer

1246. Palindrome Removal