Problem statement

https://leetcode.com/problems/number-of-distinct-substrings-in-a-string/

Solution 1

One way to solve this problem is to calculate hashes for every substring. However this is what I call YOLO approach, which is good to pass contest if we take big random number, but if we want to have proper solution, we need to compare full strings and we can not avoid it.

Complexity

Time complexity is O(n^2), space complexity is O(n).

Code

class Solution:
    def countDistinct(self, s):
        def RabinKarp(text, M, q):
            h, t, d = (1<<(8*M-8))%q, 0, 256
            ans = set()

            for i in range(M): 
                t = (d * t + ord(text[i]))% q
            
            ans.add(t)

            for i in range(len(text) - M):
                t = (d*(t-ord(text[i])*h) + ord(text[i + M]))% q
                ans.add(t)
            return len(ans)
        
        q = (1<<64) - 179
        ans = 0
        for i in range(1, len(s) + 1):
            ans += RabinKarp(s, i, q)
            
        return ans

Solution 2

There is also solution using suffix arrays. The idea is to calculate suffix array first and then to calculate lcp array: this array will consist of biggest common prefixes lengths between pair of adjacent suffixes in suffix array.

Complexity

Time complexity is O(n*log^2n), space complexity is O(n * log n). Notice, that there is way to calculate suffix array in O(n), so it is possible to have solution with just O(n) complexity.

Code

class Solution:
    def countDistinct(self, s):
        def ranks(l):
            index = {v: i for i, v in enumerate(sorted(set(l)))}
            return [index[v] for v in l]

        def suffixArray(s):
            line = ranks(s)
            n, k, ans, sa = len(s), 1, [line], [0]*len(s)
            while k < n - 1:
                line = ranks(list(zip_longest(line, islice(line, k, None), fillvalue=-1)))
                ans, k = ans + [line], k << 1
            for i, k in enumerate(ans[-1]): sa[k] = i
            return ans, sa
        
        def lcp(Arr, suffixArr, inv_suff):
            n, ans, k = len(arr), [0]*len(arr), 0

            for i in range(n):
                if inv_suff[i] == n-1:
                    k = 0
                    continue

                j = suffixArr[inv_suff[i] + 1]
                while i + k < n and j + k < n and arr[i+k] == arr[j+k]:
                    k += 1

                ans[inv_suff[i]] = k
                if k > 0: k -= 1

            return ans
        
        arr = [ord(i) - ord("a") for i in s]
        c, sa = suffixArray(arr)
        
        t = lcp(arr, sa, c[-1])
        return len(s)*(len(s) + 1)//2 - sum(t)
        

Remark

It can be solved with trie as well with O(n^2) complexity: we just need to put all suffixes to trie and check how many different nodes we have.