suffix array
dp
string
]
Leetcode 1977. Number of Ways to Separate Numbers
Problem statement
https://leetcode.com/problems/number-of-ways-to-separate-numbers/
Solution 1
Let us start as many other solutions with the following idea. Let dp[i][k]
be the number of solutions for string s[:i]
(not including) such that the last element has length k
. How we can get dp[i][k]
? We need to look at dp[i-k][1], ..., dp[i-k][k-1]
and probably at dp[i-k][k]
. Can we take the last one or not depends on comparison of two k
-digit strings. Let us for the moment forgot about this and talk about how to find dp[i][k]
efficiently.
We notice, that we need co calculate cumulative sums dp[i-k][1], ..., dp[i-k][k-1]
, so it is good idea keep them as well: by definition
mp[i][k] = dp[i][1] + ... + dp[i][k]
. Then we can say that what we need to find is mp[i-k][k]
. So far, so good, but what to do with comparison of strings. It is quite expensive and as we will see a bit later we can allow to make this comparison only in O(1)
(or probably in O(log n)
in other languages.
For this we want to use suffix array. What it is?
Let us consider for simplicity example banana. Then what we want to construct is list of all suffixes: banana, anana, nana, ana, na, a
and then sort them in increasing order: we have a, ana, anana, banana, na, nana. Actually what we keep is order if indexes: (5, 3, 1, 0, 4, 2)
, this is called suffix array. After we comput prefix array, we can check strings just in O(1)
. Please check this link for more details https://cp-algorithms.com/string/suffix-array.html
Now we are happy, when we can calculate it in O(1)
? Not exaclty - we have complexity O(n^2) = 10^7
with quite big constant and python do not allow us to do this. See solution 2 for more details.
Complexity
Time complexity of suffix array part is O(n*log^2n)
and it is quite fast and tested by me in some other problems. Time complexity of dp
part is O(n^2)
, because we have n
states for i
and not more than n
for j
. Notice however that we have quite expansive comparison function: it is O(1)
, but we do a lot of indexing which is quite slow in python. So, verdict is TLE on the last test.
Code
class Solution:
def numberOfCombinations(self, s):
def ranks(l):
index = {v: i for i, v in enumerate(sorted(set(l)))}
return [index[v] for v in l]
def suffixArray(s):
line = ranks(s)
n, k, ans, sa = len(s), 1, [line], [0]*len(s)
while k < n - 1:
line = ranks(list(zip_longest(line, islice(line, k, None), fillvalue=-1)))
ans, k = ans + [line], k << 1
for i, k in enumerate(ans[-1]): sa[k] = i
return ans, sa
@lru_cache(None)
def compare(i, j, l, k):
a = (c[k][i], c[k][(i+l-(1<<k))%n])
b = (c[k][j], c[k][(j+l-(1<<k))%n])
return 0 if a == b else 1 if a < b else -1
c, _ = suffixArray([int(i) for i in s])
n, M = len(s), 10**9 + 7
dp = [[0]*(n+1) for _ in range(n+1)]
mp = [[0]*(n+1) for _ in range(n+1)]
for k in range(n+1):
dp[0][k] = 1
mp[0][k] = 1
for i in range(1, n+1):
for k in range(1, i + 1):
if s[i-k] == "0": continue
dp[i][k] = mp[i-k][k-1]
if i >= 2*k and compare(i-2*k, i-k, k, floor(log2(k))) >= 0:
dp[i][k] += dp[i-k][k]
for k in range(n + 1):
mp[i][k] = mp[i][k-1] + dp[i][k]
return mp[-1][-1]
Solution 2
I spend like 3
hours making it work: if it do not works with lists, try it with numpy array. However we need to carefully rewrite it in such way that we use it functionality. The logic of code exaclty the same as the first solution, but without original solution it is impossible to understand what is going on, so please check the first solution and compare them for more understanding.
The idea is to replace loops (which are slow) with fast numpy indexing. I was not able to do it every time but what I did was enough to be accepted. I have one strange error I was not able to fix:
f = lambda k: compare(i-2*k, i-k, k, logs[k]) >= 0
check2 = np.array([f(i) for i in range(1, i//2+1)])
can be replaced with
f = lambda k: compare(i-2*k, i-k, k, logs[k]) >= 0
check2 = f(np.arange(1, i//2+1))
but for some reason it is not working like this.
Complexity
It is still O(n^2)
, but now it is AC
Code
import numpy as np
class Solution:
def numberOfCombinations(self, s):
def ranks(l):
index = {v: i for i, v in enumerate(sorted(set(l)))}
return [index[v] for v in l]
def suffixArray(s):
line = ranks(s)
n, k, ans, sa = len(s), 1, [line], [0]*len(s)
while k < n - 1:
line = ranks(list(zip_longest(line, islice(line, k, None), fillvalue=-1)))
ans, k = ans + [line], k << 1
for i, k in enumerate(ans[-1]): sa[k] = i
return ans, sa
def compare(i, j, l, k):
a = (c[k][i], c[k][(i+l-(1<<k))%n])
b = (c[k][j], c[k][(j+l-(1<<k))%n])
return 0 if a == b else 1 if a < b else -1
c, sa = suffixArray([int(i) for i in s])
n, M = len(s), 10**9 + 7
dp = np.zeros([n+1, n+1], dtype = int) #[[0]*(n+1) for _ in range(n+1)]
mp = np.zeros([n+1, n+1], dtype = int) #[[0]*(n+1) for _ in range(n+1)]
for k in range(n+1):
dp[0][k] = 1
mp[0][k] = 1
logs = [0] + [floor(log2(k)) for k in range(1, n+1)]
s_zero = np.array([i != "0" for i in s], dtype = int)
for i in range(1, n+1):
dp[i][1:i+1] = mp[range(i-1,-1,-1), range(i)] * s_zero[i-1::-1]
check1 = dp[range(i-1, (i-1)//2, -1), range(1, i//2 + 1)]
f = lambda k: compare(i-2*k, i-k, k, logs[k]) >= 0
check2 = np.array([f(i) for i in range(1, i//2+1)])
dp[i][1:i//2+1] = dp[i][1:i//2+1] + check1*check2
dp[i][1:i//2+1] = [x % M for x in dp[i][1:i//2+1]]
mp[i] = np.cumsum(dp[i])
return mp[-1][-1] % M