Primes generation (algorithms vs real code)

May 13, 2021

Primes is a quite useful tool and have many applications. They are an essential part of RSA cryptography, and while elliptic curve algorithms are gaining dominance nowadays, RSA is still widely used. Plus there are other applications like rolling hash functions, i.e. assigning an unique prime to each character in the required alphabet and multiplying them (and implementing rolling by dividing to the number corresponding to the previous character, and multiplying to the number corresponding to the new character).

There are several algorithms (or even families of algorithms) to calculate primes, and recently I’ve stumbled upon a new one: A new explicit algorithmic method for generating the prime numbers in order. It’s been looking promising, and I’ve programmed it, but unfortunately it wasn’t fast enough as the real code. Authors compared it with sieves of Eratosthenes and Sundaram using their unified framework, and got nice results.

However the implementations do not necessarily fit the frameworks. As actual software developers we want to have fast code now, that works on the real computers with all their limitations. And that’s a reason why nice algorithms are not always the best, e.g. algorithms utilizing linked lists could be not faster than using vectors/arrays as lists could lead to memory fragmentation and CPU cache misses.

Nevertheless, I’ve decided to make a performance comparison, and would like to present to you the code snippets and the numbers I’ve got. First, let me describe the algorithms I’ve used:

Proposed Algorithm (based on the aforementioned article). No optimizations are done except breaking the loop when the generated prime index is more than the limit.
Sieve of Eratosthenes. Used as is.
Sieve of Sundaram. Used as is.
Sieve of Atkin. Optimized to avoid additional calculations (please see details in https://www.geeksforgeeks.org/sieve-of-atkin/)

The performance results are presented on the image below. Sieve of Atkin is a winner (however, I have to admit that first I’ve tried the implementation presented in the original paper, and it was quite slow).

Each implementation consist of several parts:

allocating specialized std::vector<bool> container for the intermediate data (memory consumption is O(N) for all algorithms);
run the actual algorithm;
create the final std::vector<int> with the list of primes (the code is similar for each implementation, thus none of them got an advantage here).

Proposed Algorithm:

std::vector<int> proposed_algo(int limit) {
  std::vector<bool> prime(limit + 1, false);
  for (auto i = 1; i < 1 + (limit - 1) / 2; ++i) {
    prime[2 * i + 1] = true;
  }
  auto k = 1, j = 1;
  const auto sqr = [](int i) { return i * i; };
  const auto max_n = [limit, sqr](int i) {
    return (limit - sqr(2 * i + 1)) / (4 * i + 2);
  };

  while (k > 0) {
    k = max_n(j++);
  }
  k = j;
  for (auto i = 1; i < k + 1; ++i) {
    auto d = max_n(i - 1);
    for (auto n = 0; n < d; ++n) {
      auto index = (2 * i + 1) * (2 * i + 2 * n + 1);
      if (index > limit) {
        break;
      }
      prime[index] = false;
    }
  }

  std::vector<int> result({2});
  for (auto p = 3; p <= limit; ++p) {
    if (prime[p]) {
      result.push_back(p);
    }
  }
  return result;
}

Sieve of Eratosthenes:

std::vector<int> sieve_of_eratosthenes(int limit) {
  std::vector<bool> prime(limit + 1, true);

  for (auto p = 2; p * p <= limit; ++p) {
    if (prime[p]) {
      for (auto i = p * p; i <= limit; i += p) {
        prime[i] = false;
      }
    }
  }

  std::vector<int> result;
  for (auto p = 2; p <= limit; ++p) {
    if (prime[p]) {
      result.push_back(p);
    }
  }
  return result;
}

Sieve of Sundaram:

std::vector<int> sieve_of_sundaram(int limit) {
  const auto k = (limit - 2) / 2;
  std::vector<bool> marked(k + 1, true);

  for (auto i = 1l; i < k + 1; ++i) {
    for (auto j = i; (i + j + 2 * i * j) <= k; ++j) {
      marked[i + j + 2 * i * j] = false;
    }
  }

  std::vector<int> result({2});
  for (auto i = 1; i < k + 1; ++i) {
    if (marked[i]) {
      result.push_back(2 * i + 1);
    }
  }
  return result;
}

Sieve of Atkin:

std::vector<int> sieve_of_atkin(int limit) {
  std::vector<bool> prime(limit + 1, false);

  for (auto x = 1; x * x < limit; ++x) {
    for (auto y = 1; y * y < limit; ++y) {
      auto n = (4 * x * x) + (y * y);
      if (n <= limit && (n % 12 == 1 || n % 12 == 5)) {
        prime[n] = !prime[n];
      }

      n = (3 * x * x) + (y * y);
      if (n <= limit && n % 12 == 7) {
        prime[n] = !prime[n];
      }

      n = (3 * x * x) - (y * y);
      if (x > y && n <= limit && n % 12 == 11) {
        prime[n] = !prime[n];
      }
    }
  }

  for (auto r = 5; r * r < limit; ++r) {
    if (prime[r]) {
      for (auto i = r * r; i < limit; i += r * r) {
        prime[i] = false;
      }
    }
  }

  std::vector<int> result({2, 3});
  for (auto p = 5; p <= limit; ++p) {
    if (prime[p]) {
      result.push_back(p);
    }
  }
  return result;
}

As you could see, algorithm run time complexity doesn’t necessarily directly correlate to its implementation performance. Please keep the open mind, and verify the algorithms using the real code (at least on PoC level). Happy hacking!

Search This Blog

F = T ∇ Sτ

Primes generation (algorithms vs real code)

Comments

Post a Comment

Popular posts from this blog

Web application framework comparison by memory consumption

Trac Ticket Workflow

Shellcode detection using libemu