Problem statement

https://leetcode.com/problems/minimum-number-of-operations-to-make-string-sorted/

Solution

Actually, what is asked in this problem is given string understand what is the lexicographical order of this string. For example we have abc, acb, bac, bca, cab, cba, then order of string cab is 4 (we start from zero). However, the main difficulty of this problem is that:

  1. We can have n = 3000, so we expect to have linear or linearithmic complexity.
  2. We have repeating symbols and here we need to use combinatorics.

I tried to sumbit like 5 different O(n^2) approaches, but they are not working on python as you can guess, so I was forces to think about something better. Let us look at string cabbcdba for simplicity. We need to find how many string there is before this string in lexicographcal order.

  1. Strings, start with a or b should be OK for sure, so what we need to do is to find how many symbols less than c we have. Total number of strings is 5 * 7!, because we have two options for the first place and others we can put whatever we want. Also, we have b 3, a 2 times and c 2 times. So, we need to divide total number of options by 2! * 3! * 3!. Google multinomial coefficients for more details. So, total number of options here is 5 * 7!/2!/2!/3!.
  2. Now look at strings starts with c. There is not symbols, less than a, so here answer is 0.
  3. Now look at strings starts with ca, next symbols should be a and we have only one options here. So, total number of options is 1 * 5!/3!, because we have b 3 times among the rest symbols we did not used yet.
  4. And so on

To achieve good complexity we will use the fact that our string consist only 26 letters. Also we start our process from the end: in this way it will be possible to keep cnt array of frequencies and update it and calculate sum(cnt[:ind]): number of elements smaller than given one. Also, we need to work with 10^9 + 7 module, which is hopefully prime, so we can use Fermat little theorem to make fast divisions given this module. To divide number by q is the same as multiply it by q^(N-2) % N.

Complexity

If we have n the length of string s and m is the size of alphabet, we have complexity $O(nm)$ for cnt part. Also we evaluate inverses a lot and potentially we can do it m times for each iteration, so final time complexity will be $O(nm\cdot \log M)$. Space complexity is $O(n)$.

Code

class Solution:
    @lru_cache(None)
    def f(self,i):
        return 1 if i <= 1 else (self.f(i-1)*i) % (10**9+7)

    def makeStringSorted(self, s):
        N, n, out = 10**9 + 7, len(s), 0
        cnt = [0]*26
        for i in range(n-1, -1, -1):
            ind = ord(s[i]) - ord("a")
            cnt[ind] += 1
            ans = sum(cnt[:ind]) * self.f(n-i-1)
            for j in range(26):
                ans = ans * pow(self.f(cnt[j]), N-2, N) % N
            out += ans
        return out % N

Solution 2

Here is solution with the better complexity: $O(mn + n\cdot \log M)$

class Solution:
    @lru_cache(None)
    def f(self,i):
        return 1 if i <= 1 else (self.f(i-1)*i) % (10**9+7)
    
    @lru_cache(None)
    def inv(self, i):
        N = 10**9 + 7
        return pow(i, N-2, N)

    def makeStringSorted(self, s):
        N, n, out = 10**9 + 7, len(s), 0
        cnt = [0]*26
        for i in range(n-1, -1, -1):
            ind = ord(s[i]) - ord("a")
            cnt[ind] += 1
            ans = sum(cnt[:ind]) * self.f(n-i-1)
            for j in range(26):
                ans = ans * self.inv(self.f(cnt[j])) % N
            out += ans
        return out % N

Solution 3

There is smart way how to evaluate inverses of numbers in some range $[1, k]$ modulo $N$ in $O(k)$ time, see patterns page. So, complexity can be improved to $O(mn + \log M + n) = O(mn)$. Also here we use idea that on each step we can recalculate comb_tot in O(1), not in O(m), however it will not improve final $O$ complexity, because of sum(cnt[:num]) step, but it makes it faster like 2 times.

class Solution:
    def makeStringSorted(self, s):
        MOD = 10 ** 9 + 7
        inverses = [1] * (len(s) + 1)
        for a in range(2, len(s) + 1):
            inverses[a] = MOD - ((MOD // a) * inverses[MOD % a]) % MOD

        ans, tot, comb_tot = 0, 0, 1
        cnt = [0] * 26
        for cur in map(ord, reversed(s)):
            num = cur - 97
            cnt[num] += 1
            tot += 1
            comb_tot = (comb_tot * tot * inverses[cnt[num]]) % MOD
            ans += comb_tot * sum(cnt[:num]) * inverses[tot]
        return ans % MOD