equal product

The obvious solution is to carry out the multiplication and compare. But is it possible to solve this without carrying out the multiplication?

This is interesting, because if the numbers are stored as 32 bit integers, carrying out the multiplication requires 64 bit computations to avoid overflow. That is maybe not so problematic, but what if they were 64 bit or even larger?

The usual solution

We can emulate 64 bit multiplication with several 32 bit operations, for instance with Karatsuba multiplication. This is probably the fastest way (unless 64 bit arithmetic is available).

An alternative solution

This says that if we know a value

x

modulo several relatively prime numbers

a_i

x

is unique up to a multiple of the product

N

of the

a_i

:s. We will exploit this by finding several numbers

a_i

such that:

First off, doing multiplication modulo

2^32

is very efficient in computers. So it is good to select

a_1=2^32

. If we are to utilize 32-bit multiplications without overflow for the rest of the

a_i

, we need to keep below

2^16

. That means we have to use at least three more coefficients

a_2

a_3

a_4

to satisify point three in the bullet list.

Modulo computation

Compilers are able to generate efficient code for calculating modulo when the divisor is known ahead of time. The following C++ code:

template <unsigned M>
unsigned divide(unsigned x) {
    return x % M;
}
template unsigned divide<65519>(unsigned);

unsigned int divide<65519u>(unsigned int):
        mov     eax, edi
        mov     edx, 2148040849
        imul    rax, rdx
        shr     rax, 47
        imul    edx, eax, 65519
        mov     eax, edi
        sub     eax, edx
        ret

One can see that there is no division, just two multiplications, a subtract and a shift.

For other numbers, slightly different variations are generated. Here are some examples:

unsigned int divide<62297u>(unsigned int):
        mov     edx, edi
        mov     eax, edi
        imul    rdx, rdx, 223307689
        shr     rdx, 32
        sub     eax, edx
        shr     eax
        add     eax, edx
        shr     eax, 15
        imul    edx, eax, 62297
        mov     eax, edi
        sub     eax, edx
        ret
unsigned int divide<65449u>(unsigned int):
        mov     eax, edi
        imul    rax, rax, 1075169127
        shr     rax, 46
        imul    edx, eax, 65449
        mov     eax, edi
        sub     eax, edx
        ret
unsigned int divide<65479u>(unsigned int):
        mov     eax, edi
        imul    rax, rax, 1074676525
        shr     rax, 46
        imul    edx, eax, 65479
        mov     eax, edi
        sub     eax, edx
        ret
unsigned int divide<65497u>(unsigned int):
        mov     eax, edi
        mov     edx, 2148762361
        imul    rax, rdx
        shr     rax, 47
        imul    edx, eax, 65497
        mov     eax, edi
        sub     eax, edx
        ret

Implementation

We need to search for three numbers

a_i,i>1

which are odd (to avoid a common factor with

a_1=2^{32}

), below

2^{16}

and relatively prime. They also need to multiply to at least

2^{32}

Going downwards from

2^{16}

, the three first fast I find are 65485, 65483 and 65481. The moduli takes about 0.38 ns to compute, compared to 0.46 ns for 65487 and 0.56 ns for 65535.

We need to check these numbers are relatively prime. Using the factor program, I get:

None of these numbers share any factors, so they are relatively prime. And because they are odd, they are also relatively prime to

a_1=2^{32}

Resulting code

template<unsigned M>
constexpr unsigned
modulo(unsigned x)
{
  return x % M;
}

/**
 * multiplies x and y modulo a_i, while avoiding
 * internal overflow
 * @return xy mod a_i
 */
template<unsigned a_i>
constexpr unsigned
multiply_modulo(const unsigned x, const unsigned y)
{
  const auto xm = modulo<a_i>(x);
  const auto ym = modulo<a_i>(y);
  return modulo<a_i>(xm * ym);
}

/**
 * returns true if a*b == c*d, determined while avoiding
 * overflow
 * @param a
 * @param b
 * @param c
 * @param d
 * @return
 */
constexpr bool
products_equal(const unsigned a,
               const unsigned b,
               const unsigned c,
               const unsigned d)
{
  // we use the chinese remainder theorem with four coefficients

  // carry out the comparision modulo a_1=2**32 which is fast
  if (a * b != c * d) {
    return false;
  }

  // this coefficient was chosen after benchmarking
  constexpr unsigned a_2 = 65485;
  if (multiply_modulo<a_2>(a, b) != multiply_modulo<a_2>(c, d)) {
    return false;
  }

  // this coefficient was chosen after benchmarking
  constexpr unsigned a_3 = 65483;
  if (multiply_modulo<a_3>(a, b) != multiply_modulo<a_3>(c, d)) {
    return false;
  }

  // this coefficient was chosen after benchmarking
  constexpr unsigned a_4 = 65481;
  if (multiply_modulo<a_4>(a, b) != multiply_modulo<a_4>(c, d)) {
    return false;
  }

  return true;
}

Benchmarking

bool
reference_products_equal(const unsigned a,
                         const unsigned b,
                         const unsigned c,
                         const unsigned d)
{
  using U = std::uint64_t;
  return U{ a } * b == U{ c } * d;
}

TEST_CASE("Benchmark equality", "[!benchmark]")
{
  BENCHMARK_ADVANCED("64 bit multiplication")(
    Catch::Benchmark::Chronometer meter)
  {
    // prevent optimization by getting these at runtime
    std::random_device rd;
    const unsigned b = rd();
    const unsigned c = rd();
    const unsigned d = rd();

    meter.measure(
      [=](unsigned int i) { return reference_products_equal(i, b, c, d); });
  };
  BENCHMARK_ADVANCED("chinese remainder")(Catch::Benchmark::Chronometer meter)
  {
    // prevent optimization by getting these at runtime
    std::random_device rd;
    const unsigned b = rd();
    const unsigned c = rd();
    const unsigned d = rd();

    meter.measure([=](unsigned int i) { return products_equal(i, b, c, d); });
  };
}

An interesting observation

While testing the code I wrote, I used fuzzing to find test cases. The fuzzer checks that the reference implementation gives the same results as the implementation based on the chinese remainder theorem. I commented out the checks for the respective

a_i

, waited for the fuzzer to find an example which I then added to test cases.

The interesting thing is that the check for

a_2

could be removed without me finding any valid counterexample. I let the fuzzer run for about 100 cpu hours without any luck. This got me thinking. Is it perhaps for the particular choise of

a_i

this happens? Excluding

a_2

N

ends up very close below

2^{64}

and that might mean it is impossible to find values

a,b,c,d

such that

a b = N+x

and

c d = x

? If so, the code could be cut down to do three modulus operations instead of four.

Update: counterexample found

My friend asked AI to explain this. It failed, but generated a brute force program to search for a counterexample. The program worked and produced the following counterexample:

a=4294940936, b=4289265232, c=372413899, d=15529856

Determining if two products are equal without multiplying them