dp
string
kmp
]
Leetcode 1397. Find All Good Strings
https://leetcode.com/problems/find-all-good-strings
Solution 1: optimal complexity, but difficult to code and understand
Part 1: Prefix function
Remind the definition of prefix function on a simple example: let s = ababcaba
, then prefix funtion is [0,0,1,2,0,1,2,3]
, because for example for substring ababcab
, the longest equal prefix and suffix ab
have length 2. There is efficient way to evaluate prefix function, using KMP algorighm. More precisely our function prefix(self, s)
will give us the length of longest common prefix and suffix, for example for our s = ababcaba
result will be equal to 3 and we can find it in O(len(s)).
Part 2: Preprosess evil
Now, we want to preprocess our evil
string and create some useful data, we are going ot use later. Let us againg consider A = evil = ababcaba
. Create connections
as T by k
matrix, where T
is the size of our alphabet and k = len(evil)
. In connections[i][j]
we are going to store the length of maximum suffix of string A_0,...,A_{i-1} + char(j)
which is at the same time is prefix of A
(here char(0) = “a”, char(1) = “b” and so on). A bit difficult to understand, let us consider example: connections[3][0] = 1
, because A_0 A_1 A_2 + char(0) = abaa
and this string has suffix a
which is prefix of A
. connections[3][1] = 4
, because A_0 A_1 A_2 + char(1) = abab
, which is prefix of A
.
Let us fully fill this table connections
:
a | b | c | d | e | f | |
---|---|---|---|---|---|---|
k = 0, starts with “” | 1 | 0 | 0 | 0 | 0 | 0 |
k = 1, starts with “a” | 1 | 2 | 0 | 0 | 0 | 0 |
k = 2, starts with “ab” | 3 | 0 | 0 | 0 | 0 | 0 |
k = 3, starts with “aba” | 1 | 4 | 0 | 0 | 0 | 0 |
k = 4, starts with “abab” | 3 | 0 | 5 | 0 | 0 | 0 |
k = 5, starts with “ababc” | 6 | 0 | 0 | 0 | 0 | 0 |
k = 6, starts with “ababca” | 1 | 7 | 0 | 0 | 0 | 0 |
k = 7, starts with “ababcab” | 8 | 0 | 0 | 0 | 0 | 0 |
Let us note one thing: there is a lot of zeros in this table, we can say that this table is sparse. So, we can write this table in different way, I called this list conn
:
conn[0] = [0,1]
conn[1] = [0,1], [1,2]
conn[2] = [0,3]
conn[3] = [0,1], [1,4]
conn[4] = [0,3], [2,5]
conn[5] = [0,6]
conn[6] = [0,1], [1,7]
conn[7] = [0,8]
Looks better, is it? Howerer we still need one more table: connections_num_zeros
, where we keep for every k
and j
number of zero elements in k
-th row of matrix connections
, which we met before element number j
. Why we need this in the future? Because we need to deal with alphabetical order.
Part 3: Dynamic progaramming
Let us reformulate a bit our original problem and introduce funcion findLesserGood(self, n, k, string, N, evil)
, where n
is length of or string
, k
is length of evil
and N
is modulo which we use for long arithmetics. This function finds number of strings alphabetically less or equal than string
, such that it do not have evil
substring inside. Then it is enough to apply this function twice (for s1
, s2
), get difference and handle border case (string s1
).
We have dp_bord[i][j]
and dp_notb[i][j]
two n x k
arrays, first one to handle border cases, where we search for number of strings, starting with i
characters of string
and such that its longest suffix equal to prefix of evil
has length j
. Border cases means that next element we can take should be more or equal than corresponding element is string
. Second array dp_notb[i][j]
is for not border cases with the same logic it is number of all string of length i
, such that the longest common suffix of it and prefix of evil
has lengh j
.
Precalcualted matrices help us efficiently update our tables: when we try to add new symbol we need to understand in O(1)
, how changes its longes common suffix and prefix of string evil
. That is why we evaluated our connections
and con
tables.
Part 4: Complexity
n
is lenth of original string s1
and s2
, k
is length of evil
and T
is the size of alphabet.
To create connections in part 2 we have complexity O(k^2T)
(I think it can be improved to O(kT)
if we do it in smarter way, but it is not the bottleneck, so we keep it like this), because we iterate over k, over all alphabet and apply our O(k)
complexity prefix function once inside, same time is needed to evaluate connections_num_zeros
. Memory here is O(Tk)
to keep couple of matrixes.
Complexity of findLesserGood
is O(nkT)
: there is two loops with size n
and k
and also inside we go over all conn
list, which is sure not more than k
. In fact, this complexity is equal to O(nk)
, because matrix connections
is very sparse and conn
has O(k)
elements instead of O(kT)
(if I am wrong, please let me know!). Space complexity is obviously O(nk)
.
Finally, time complexity is O(nk + k^2T) = O(nk)
for our constraints and space complexity is O(Tk + nk) = O(nk)
also.
Part 5. Code
Now, if we put all the stuff we discussed togeter, we can have the following code:
class Solution:
def prefix(self, s):
p = [0] * len(s)
for i in range(1, len(s)):
k = p[i - 1]
while k > 0 and s[k] != s[i]:
k = p[k - 1]
if s[k] == s[i]:
k += 1
p[i] = k
return p[-1]
def CreateConnections(self, evil, T):
k = len(evil)
connections = [[-1]*T for _ in range(k)]
conn = [[] for _ in range(k)]
for i in range(k):
for letter in self.alphabet:
curr_max = self.prefix(evil + "#" + evil[0:i] + letter)
if curr_max != k: #process case when we reached the end of evil string
connections[i][ord(letter) - ord("a")] = curr_max
if curr_max != 0:
conn[i].append([ord(letter) - ord("a"), curr_max])
connections_num_zeros = [[0] for _ in range(k)]
for i in range(k):
for j in range(1, T + 1):
connections_num_zeros[i] += [connections_num_zeros[i][-1] + (connections[i][j-1] == 0)]
return connections_num_zeros, conn
def findLesserGood(self, n, k, string, N, evil):
dp_bord = [[0]*k for _ in range(n)]
dp_notb = [[0]*k for _ in range(n)]
dp_bord[0][0] = 1
for it_n in range(n-1):
for it_k in range(k):
ord_num = ord(string[it_n + 1]) - ord("a")
for letter, s in self.con[it_k]:
dp_notb[it_n+1][s] += dp_notb[it_n][it_k]
if letter < ord_num:
dp_notb[it_n+1][s] += dp_bord[it_n][it_k]
dp_notb[it_n +1][0] += self.con_numzero[it_k][-1] * dp_notb[it_n][it_k]
dp_notb[it_n +1][0] += self.con_numzero[it_k][ord_num] * dp_bord[it_n][it_k]
dp_notb[it_n+1][it_k] %= N
index = self.prefix(evil + "#" + string[0:it_n+2])
if index != k and sum(dp_bord[it_n]) != 0:
dp_bord[it_n + 1][index] = 1
return sum(dp_bord[-1]) + sum(dp_notb[-1])
def findGoodStrings(self, n, s1, s2, evil):
self.alphabet = string.ascii_lowercase
N, k = 10**9 + 7, len(evil)
self.con_numzero, self.con = self.CreateConnections(evil, len(self.alphabet))
res1 = self.findLesserGood(n + 1, k, "#" + s2, N, evil)
res2 = self.findLesserGood(n + 1, k, "#" + s1, N, evil)
return (res1 - res2 + 1 - (evil in s1)) % N
Solution 2
Here is much simpler solution with worse theoretical complexity but in fact it works quite fast. Let comm(i, s)
be defined as the longest common suffix of evil[:i] + s
which is equal to prefix of evil
. Why we need it? Because we want to know what is the maximum number of symbols we already used from evil
.
Then we use function dp(idx, pre, B1, B2)
, where idx
is index we currently on, pre
is how far we reached string evil
and B1
and B2
are indicators of borders. Then:
- If
pre == m
, it means, that we reachedevil
, so we not allowed to continue, return0
. - If
idx == n
, it means, that string is ended, we return1
. - Calculate
L
andR
: range of symbols we can take. - Finally, iterate over these symbols and run function recursively with new parameters.
Complexity:
Time complexity of comm
function part is $\mathcal{O}(k^3\cdot T)$, where $k$ is length of evil
and $T$ is size of alphabet. Time complexity of dp
is $\mathcal{O}(n\cdot k\cdot T)$, so final complexity is $\mathcal{O}(k^3\cdot T + n\cdot k\cdot T)$. Space complexity is $\mathcal{O}(k\cdot T + n\cdot k)$.
class Solution:
def findGoodStrings(self, n, s1, s2, evil):
@lru_cache(None)
def comm(i, s): #given evil[:i] + s return common prefix
ans = 0
for j in range(i+2):
if (evil[:i] + s)[-j:] == evil[:j]:
ans = j
return ans
@lru_cache(None)
def dp(idx, pre, B1, B2):
if pre == m: return 0
if idx == n: return 1
total = 0
L = s2[idx] if B2 else "z"
F = s1[idx] if B1 else "a"
for c in alph[alph.index(F):alph.index(L) + 1]:
total += dp(idx + 1, comm(pre, c), B1 and c == F, B2 and c == L)
total %= N
return total
m, N, alph = len(evil), 10**9 + 7, string.ascii_lowercase
return dp(0, 0, 1, 1) % N
If you like the solution, you can upvote it on leetcode discussion section: Problem 1397