Leetcode 1680. Concatenation of Consecutive Binary Numbers

https://leetcode.com/problems/concatenation-of-consecutive-binary-numbers

One way to solve this problem in competition is just create very long binary string and then find result of division by 10**9 + 7 and it will work fine. However in interview setup it is not the best idea and I prefer more honest way. Let us look at first several n to see how it works: 1 110 11011 11011100

Let us try to find answer to n, using information about n-1. 110 = 1 * 100 + 10 (all in binary representation), 11011 = 110 * 100 + 11, 11011100 = 11011 * 1000 + 100 and so on. We can see that on each step we need to multiply number by lenght of new number and add new number (and use %M) and that is all!

Complexity: time complexity is O(n), space is O(1).

class Solution:
    def concatenatedBinary(self, n):
        ans, M = 0, 10**9 + 7
        for x in range(n):
            ans = (ans * (1 << (len(bin(x+1)) - 2)) + x+1) % M
        return ans 

Further thoughts When you found O(n) solution, you should ask a question, can you do better? Here I had very strong intuition that you can do it faster and I came up with O(log^2 n) solution. Then I checked discussions and there is in fact at least two solutions with this complexity: one uses idea of geometrical progressions: idea behind is the following: we can see that in binary representation of desired number there will be O(log n) sequences of ones with equal gaps for each of O(log n) lengths of numbers. You can find this solution in discussion, I will code my own version if I have time. Another solution using matrix multiplications and I was very impressed to see it.

Faster solution with O(log^2 n) complexity

It is easier to chose some n and go into details about this n. Let us choose n = 54 and discuss how my algorithm works:

Function bin_pow will return powers of 2, which sum to given number, for example bin_pow(37) = [1, 4, 32], because 1 + 4 + 32 = 37.

Now, we have n = 54, and we have:

1 number with length 1: just 1
2 numbers with length 2: 10, 11
4 numbers with length 3: 100, 101, 110, 111.
8 numbers with length 4: 1000, ..., 1111
16 numbers with length 5: 10000, ..., 11111. Now, we have 23 more numbers and we keep to split them into groups:
16 numbers with lenght 6, which start with 10...., that is 100000, 100001, ... 101111. Why we choose 16 of them? Because it is the biggest group we can take such that this group is full: given start we have all possible endings.
4 numbers with length 6, which start with 1100, that is 110000, 110001, 110010, 110011
2 numbers with length 6, which start with 11010, that is 110100 and 110101.
1 number with length 6: 110110, which is equal to 54, our last number.

We will keep lenghts of our groups in list B, so we have: B = [1, 2, 4, 8, 16, 16, 4, 2, 1]

We will keep length of numbers in each group in list C, so we have: C = [1, 2, 3, 4, 5, 6, 6, 6, 6]

So far we have the following picture:

[1][10 11][100 101 110 111] ... [110000, 110001, 110010, 110011], [110100, 110101], [110110]

We will keep starting places of each group in list D, in our case we have: D = [266, 262, 250, 218, 138, 42, 18, 6, 0]. Let us look from the end: last group we do not need to multiply by anything, previous group has 1 number with length 6, so we need to multiply it by 2^6. Next we have 2*6 + 1*6 shift, 4*6 + 2*6 + 1*6, 16*6 + 4*6 + 2*6 + 1*6, 16*5 + 16*6 + 4*6 + 2*6 + 1*6 and so on: it can be done efficiently, using accumulate function in python.

Final step is to iterate over our groups and evaluate number modulo 10**9 + 7 in each group:

Let us look at group [110000, 110001, 110010, 110011] for better underastnding. This group will be somewhere in the middle of number, like ...[110000, 110001, 110010, 110011]... and we need to understand what impact it has on our number, we need several values:

Place, where this group is located, for this we use D list, d for given group.
Lenth of elements in each group, for this we use C list, c for given group.
Number of elements in our group, for this we use B list, b for given group.
Also we need to now, from which element we start, in our case it is 110000, we can evaluate it as a - b + 1, where a is corresponding element in accumulate(B): here we have accumulate(B) = [1, 3, 7, 15, 31, 47, 51, 53, 54] and 51-4 + 1 = 48 is current number we start with, a for given group

Finally, we need to do some mathematics: we need to evaluate sum: [(a-b+1) * 2^(b*c) + (a-b+2)*2^(b*(c-1)) + ... + (a-b + c-1) * 2^(b*1) + (a-b + c) * 2^(b*0)]*2^d.

This formula can be written in closed form! It is almost like sum of geometrical progression, but in fact, sum of several of them which lead us to closed form solution. We also need to work using modular arithmetic, so we use powerful pow python function. In my notation t1 = [2^(bc) - 1] % MOD and t2 = [1/(2^c - 1)] % MOD. Here we use Fermat’s little theorem to evaluate inverse number in modular arithmetic.

Complexity: time complexity of evaluating B, C, D lists is just O(log n). However when we work with pow function, complexity to evaluate sum in each group will be also O(log n), which leasd to O(log^2 n) time complexity. Space complexity is O(log n).

class Solution:
    def concatenatedBinary(self, n):
        def bin_pow(num): return [1<<i for i, b in enumerate(bin(num)[:1:-1]) if b == "1"]
        ans, MOD, q = 0, 10**9 + 7, len(bin(n)) - 3

        B = bin_pow((1<<q) - 1) + bin_pow(n - (1<<q) + 1)[::-1]
        C = list(range(1, q+1)) + [q+1]*(len(B) - q)
        D = list(accumulate(i*j for i,j in zip(B[::-1], C[::-1])))[::-1][1:] + [0]
        
        for a, b, c, d in zip(accumulate(B), B, C, D):
            t1 = pow(2, b*c, MOD) - 1
            t2 = pow(pow(2, c, MOD)-1, MOD - 2, MOD)
            ans += t2*((a-b+1+t2)*t1-b)*pow(2, d, MOD)

        return ans % MOD

If you like the solution, you can upvote it on leetcode discussion section: Problem 1680