string
trie
rolling hash
suffix array
]
Leetcode 1698 Number of Distinct Substrings in a String
Problem statement
https://leetcode.com/problems/number-of-distinct-substrings-in-a-string/
Solution 1
One way to solve this problem is to calculate hashes for every substring. However this is what I call YOLO approach, which is good to pass contest if we take big random number, but if we want to have proper solution, we need to compare full strings and we can not avoid it.
Complexity
Time complexity is O(n^2)
, space complexity is O(n)
.
Code
class Solution:
def countDistinct(self, s):
def RabinKarp(text, M, q):
h, t, d = (1<<(8*M-8))%q, 0, 256
ans = set()
for i in range(M):
t = (d * t + ord(text[i]))% q
ans.add(t)
for i in range(len(text) - M):
t = (d*(t-ord(text[i])*h) + ord(text[i + M]))% q
ans.add(t)
return len(ans)
q = (1<<64) - 179
ans = 0
for i in range(1, len(s) + 1):
ans += RabinKarp(s, i, q)
return ans
Solution 2
There is also solution using suffix arrays. The idea is to calculate suffix array first and then to calculate lcp
array: this array will consist of biggest common prefixes lengths between pair of adjacent suffixes in suffix array.
Complexity
Time complexity is O(n*log^2n)
, space complexity is O(n * log n)
.
Notice, that there is way to calculate suffix array in O(n)
, so it is possible to have solution with just O(n)
complexity.
Code
class Solution:
def countDistinct(self, s):
def ranks(l):
index = {v: i for i, v in enumerate(sorted(set(l)))}
return [index[v] for v in l]
def suffixArray(s):
line = ranks(s)
n, k, ans, sa = len(s), 1, [line], [0]*len(s)
while k < n - 1:
line = ranks(list(zip_longest(line, islice(line, k, None), fillvalue=-1)))
ans, k = ans + [line], k << 1
for i, k in enumerate(ans[-1]): sa[k] = i
return ans, sa
def lcp(Arr, suffixArr, inv_suff):
n, ans, k = len(arr), [0]*len(arr), 0
for i in range(n):
if inv_suff[i] == n-1:
k = 0
continue
j = suffixArr[inv_suff[i] + 1]
while i + k < n and j + k < n and arr[i+k] == arr[j+k]:
k += 1
ans[inv_suff[i]] = k
if k > 0: k -= 1
return ans
arr = [ord(i) - ord("a") for i in s]
c, sa = suffixArray(arr)
t = lcp(arr, sa, c[-1])
return len(s)*(len(s) + 1)//2 - sum(t)
Remark
It can be solved with trie
as well with O(n^2)
complexity: we just need to put all suffixes to trie and check how many different nodes we have.