Leetcode 1397. Find All Good Strings

https://leetcode.com/problems/find-all-good-strings

Solution 1: optimal complexity, but difficult to code and understand

Part 1: Prefix function

Remind the definition of prefix function on a simple example: let s = ababcaba, then prefix funtion is [0,0,1,2,0,1,2,3], because for example for substring ababcab, the longest equal prefix and suffix ab have length 2. There is efficient way to evaluate prefix function, using KMP algorighm. More precisely our function prefix(self, s) will give us the length of longest common prefix and suffix, for example for our s = ababcaba result will be equal to 3 and we can find it in O(len(s)).

Part 2: Preprosess evil

Now, we want to preprocess our evil string and create some useful data, we are going ot use later. Let us againg consider A = evil = ababcaba. Create connections as T by k matrix, where T is the size of our alphabet and k = len(evil). In connections[i][j] we are going to store the length of maximum suffix of string A_0,...,A_{i-1} + char(j) which is at the same time is prefix of A (here char(0) = “a”, char(1) = “b” and so on). A bit difficult to understand, let us consider example: connections[3][0] = 1, because A_0 A_1 A_2 + char(0) = abaa and this string has suffix a which is prefix of A. connections[3][1] = 4, because A_0 A_1 A_2 + char(1) = abab, which is prefix of A. Let us fully fill this table connections:

	a	b	c
k = 0, starts with “”	1	0	0
k = 1, starts with “a”	1	2	0
k = 2, starts with “ab”	3	0	0
k = 3, starts with “aba”	1	4	0
k = 4, starts with “abab”	3	0	5
k = 5, starts with “ababc”	6	0	0
k = 6, starts with “ababca”	1	7	0
k = 7, starts with “ababcab”	8	0	0

Let us note one thing: there is a lot of zeros in this table, we can say that this table is sparse. So, we can write this table in different way, I called this list conn:

conn[0] = [0,1]
conn[1] = [0,1], [1,2]
conn[2] = [0,3]
conn[3] = [0,1], [1,4]
conn[4] = [0,3], [2,5]
conn[5] = [0,6]
conn[6] = [0,1], [1,7]
conn[7] = [0,8]

Looks better, is it? Howerer we still need one more table: connections_num_zeros, where we keep for every k and j number of zero elements in k-th row of matrix connections, which we met before element number j. Why we need this in the future? Because we need to deal with alphabetical order.

Part 3: Dynamic progaramming

Let us reformulate a bit our original problem and introduce funcion findLesserGood(self, n, k, string, N, evil), where n is length of or string, k is length of evil and N is modulo which we use for long arithmetics. This function finds number of strings alphabetically less or equal than string, such that it do not have evil substring inside. Then it is enough to apply this function twice (for s1, s2), get difference and handle border case (string s1).

We have dp_bord[i][j] and dp_notb[i][j] two n x k arrays, first one to handle border cases, where we search for number of strings, starting with i characters of string and such that its longest suffix equal to prefix of evil has length j. Border cases means that next element we can take should be more or equal than corresponding element is string. Second array dp_notb[i][j] is for not border cases with the same logic it is number of all string of length i, such that the longest common suffix of it and prefix of evil has lengh j.

Precalcualted matrices help us efficiently update our tables: when we try to add new symbol we need to understand in O(1), how changes its longes common suffix and prefix of string evil. That is why we evaluated our connections and con tables.

Part 4: Complexity

n is lenth of original string s1 and s2, k is length of evil and T is the size of alphabet.

To create connections in part 2 we have complexity O(k^2T) (I think it can be improved to O(kT) if we do it in smarter way, but it is not the bottleneck, so we keep it like this), because we iterate over k, over all alphabet and apply our O(k) complexity prefix function once inside, same time is needed to evaluate connections_num_zeros. Memory here is O(Tk) to keep couple of matrixes.

Complexity of findLesserGood is O(nkT): there is two loops with size n and k and also inside we go over all conn list, which is sure not more than k. In fact, this complexity is equal to O(nk), because matrix connections is very sparse and conn has O(k) elements instead of O(kT) (if I am wrong, please let me know!). Space complexity is obviously O(nk).

Finally, time complexity is O(nk + k^2T) = O(nk) for our constraints and space complexity is O(Tk + nk) = O(nk) also.

Part 5. Code

Now, if we put all the stuff we discussed togeter, we can have the following code:

class Solution:
    def prefix(self, s):
        p = [0] * len(s)
        for i in range(1, len(s)):
            k = p[i - 1]
            while k > 0 and s[k] != s[i]: 
                k = p[k - 1]
            if s[k] == s[i]:
                k += 1
            p[i] = k
        return p[-1]

    def CreateConnections(self, evil, T):
        k = len(evil)
        connections = [[-1]*T for _ in range(k)]
        conn = [[] for _ in range(k)]
        for i in range(k):
            for letter in self.alphabet:
                curr_max = self.prefix(evil + "#" + evil[0:i] + letter)
               
                if curr_max != k:    #process case when we reached the end of evil string
                    connections[i][ord(letter) - ord("a")] = curr_max

                    if curr_max != 0:
                        conn[i].append([ord(letter) - ord("a"), curr_max])

        connections_num_zeros = [[0] for _ in range(k)]
        for i in range(k):
            for j in range(1, T + 1):
                connections_num_zeros[i] += [connections_num_zeros[i][-1] + (connections[i][j-1] == 0)]

        return connections_num_zeros, conn

    def findLesserGood(self, n, k, string, N, evil):
        dp_bord = [[0]*k for _ in range(n)]
        dp_notb = [[0]*k for _ in range(n)]
          
        dp_bord[0][0] = 1
                  
        for it_n in range(n-1):
            for it_k in range(k):
                ord_num = ord(string[it_n + 1]) - ord("a")

                for letter, s  in self.con[it_k]:
                    dp_notb[it_n+1][s] += dp_notb[it_n][it_k]
                    if letter < ord_num:
                        dp_notb[it_n+1][s] += dp_bord[it_n][it_k]

                dp_notb[it_n +1][0] += self.con_numzero[it_k][-1] * dp_notb[it_n][it_k]     
                dp_notb[it_n +1][0] += self.con_numzero[it_k][ord_num] * dp_bord[it_n][it_k]
                        
                dp_notb[it_n+1][it_k] %= N

            index = self.prefix(evil + "#" + string[0:it_n+2])
            if index != k and sum(dp_bord[it_n]) != 0:
                dp_bord[it_n + 1][index] = 1
   
        return sum(dp_bord[-1]) + sum(dp_notb[-1])

    def findGoodStrings(self, n, s1, s2, evil):
        self.alphabet = string.ascii_lowercase
        N, k = 10**9 + 7, len(evil)
      
        self.con_numzero, self.con = self.CreateConnections(evil, len(self.alphabet))
        res1 = self.findLesserGood(n + 1, k, "#" + s2, N, evil)
        res2 = self.findLesserGood(n + 1, k, "#" + s1, N, evil)
        return (res1 - res2 + 1 - (evil in s1)) % N

Solution 2

Here is much simpler solution with worse theoretical complexity but in fact it works quite fast. Let comm(i, s) be defined as the longest common suffix of evil[:i] + s which is equal to prefix of evil. Why we need it? Because we want to know what is the maximum number of symbols we already used from evil. Then we use function dp(idx, pre, B1, B2), where idx is index we currently on, pre is how far we reached string evil and B1 and B2 are indicators of borders. Then:

If pre == m, it means, that we reached evil, so we not allowed to continue, return 0.
If idx == n, it means, that string is ended, we return 1.
Calculate L and R: range of symbols we can take.
Finally, iterate over these symbols and run function recursively with new parameters.

Complexity:

Time complexity of comm function part is $\mathcal{O}(k^3\cdot T)$ , where $k$ is length of evil and $T$ is size of alphabet. Time complexity of dp is $\mathcal{O}(n\cdot k\cdot T)$ , so final complexity is $\mathcal{O}(k^3\cdot T + n\cdot k\cdot T)$ . Space complexity is $\mathcal{O}(k\cdot T + n\cdot k)$ .

class Solution:
    def findGoodStrings(self, n, s1, s2, evil):
        @lru_cache(None)
        def comm(i, s):   #given evil[:i] + s return common prefix 
            ans = 0
            for j in range(i+2):
                if (evil[:i] + s)[-j:] == evil[:j]:
                    ans = j
            return ans
        
        @lru_cache(None)
        def dp(idx, pre, B1, B2):
            if pre == m: return 0
            if idx == n: return 1
            
            total = 0
            L = s2[idx] if B2 else "z"
            F = s1[idx] if B1 else "a"
            
            for c in alph[alph.index(F):alph.index(L) + 1]:
                total += dp(idx + 1, comm(pre, c), B1 and c == F, B2 and c == L)
                total %= N
            
            return total
         
        m, N, alph = len(evil), 10**9 + 7, string.ascii_lowercase
        return dp(0, 0, 1, 1) % N

If you like the solution, you can upvote it on leetcode discussion section: Problem 1397