Calculate Prime Numbers: Sieve of Eratosthenes

Calculate Prime Numbers: Sieve of Eratosthenes

Posted by Brad Wood
Aug 14, 2009 06:34:00 UTC
Ahh... the quintessential math problem-- finding prime numbers. Last week while tinkering with a math challenge I needed to find all of the primes up to a given number. There was a version on cflib.org, but I thought I could do it in less code, so I dug in myself.I turned to Google for an algorithm and found a simple one called the Sieve of Eratosthenes. It works by making a list of all the numbers up to your max and systematically eliminating all the numbers which have have divisors other than 1 and themselves (primes). This is copied from Wikipedia: To find all the prime numbers less than or equal to a given integer n by Eratosthenes' method:
  1. Create a list of consecutive numbers from two to n, (2, 3, 4, ..., n).
  2. Initially, let p equal 2, the first prime number.
  3. Strike from the list all multiples of p less than or equal to n.
  4. Find the first number remaining on the list greater than p (this number is the next prime); replace p with this number.
  5. Repeat steps 3 and 4 until p2 is greater than n.
  6. All the remaining numbers on the list are prime.
My first version populated an array with all the integers up to your max number and deleted out the array elements as it eliminated them. The array shrinks in size until only primes are left.
[code]function getPrimes(num) {
	var i = 1;
	var p = 1;
	var m = 0;
	var t = 0;
	var l = 0;
	var arr = [];
	var arr2 = [];
	while (++i <= num) arr[i-1] = i;
	
	while(p<arrayLen(arr)) {
		i = 0;
		arr2 = arr;
		while (++i <= arrayLen(arr2)) {
			m = arr[p]*arr2[i];
			t = arr.indexOf(m);
			if(t>-1) arrayDeleteAt(arr,t+1); 
			if(m>=num) break;}
		p++;}		
	
	return arr;}
[/code]
This method was pretty effective, but performance absolutely sucked once you got past an upper bound of about 10,000. The function took roughly 30 seconds to find all the primes below 100,000. Firing up SeeFusion and watching stack traces showed 99% of the processing time was spent on this:
[code]java.util.Vector.indexOf(Vector.java:335)[/code]
Apparently searching an array many, many times is a very time consuming thing. Unfortunately I had to search the array if I expected to delete items out of it. My second version of the function left all the array indexes in and simply marked the non-primes as "0". This allowed me to always access specific array elements without searching. The downside is, the array doesn't shrink as you go, and the same non-primes will get marked over and over. At the end of the function it deletes out all 0's from the array and returns what's left (the primes).
[code]function getPrimes2(num) {
	var i = 0;
	var p = 1;
	var m = 0;
	var l = 0;
	var arr = [];
	while (++i <= num) arr[i] = i;
	
	while(++p<arrayLen(arr)) {
		i = 1;
		if(!arr[p]) continue;
		while (++i <= num) {
				m = p*i;
				if(m<=num) arr[m] = 0;
				else break;
				}}
	
	i = arrayLen(arr)+1;
	while (--i > 0) {
		if(!arr[i])
			arrayDeleteAt(arr,i);}
	
	return arr;}
[/code]
My second version was quite a bit faster. It could generate all the primes up to 1 million in about 30 seconds. The stack traces now showed a large amount of processing time doing this:
[code]coldfusion.runtime.CFPage.ArrayDeleteAt(CFPage.java:491)
[/code]
Apparently it is also very costly to delete hundreds of thousands of array indexes. My final version only differed from the second in that I simply looped over the array at the end and built a new array out of the prime numbers in the first array.
[code]function getPrimes3(num) {
	var i = 0;
	var p = 1;
	var m = 0;
	var l = 0;
	var arr = [];
	var arr2 = [];
	while (++i <= num) arr[i] = i;
	
	while(++p<arrayLen(arr)) {
		i = 1;
		if(!arr[p]) continue;
		while (++i <= num) {
				m = p*i;
				if(m<=num) arr[m] = 0;
				else break;
				}}
	
	i = 0;
	while (++i <= arrayLen(arr)) {
		if(arr[i] > 1 && arr[i] < num)
			arr2[arrayLen(arr2)+1] = arr[i];}
	
	return arr2;}
[/code]
Ahh, much better. Now I can generate all primes under 1 million in about 3 seconds. Performance is exponential though-- all the primes up to 10 million takes about 30 seconds still. I was surprised at the cost of searching and deleting from an array when performed millions of times. Who would have guessed it ended up being faster leaving the array in tact and copying out the parts you needed? Of course, these performance differences are completely negligible with small numbers under a few thousand. Regardless, I feel confident I've squeezed out about as much performance as I'm going to get with this algorithm in ColdFusion. It is a little tempting to compile a C++ or Java version just to compare raw performance. Also, in case you were wondering my prime calculator used roughly half the code as the example on cflib.org and was twice as fast at cranking out primes up to 10,000,000. (There's 664,579 of them) Update: I made it even faster!
Please check out this post where I made some more performance modifications to my function at a later date:
Generating Primes Revisited: My Modifications To The Sieve of Eratosthenes

 


Sean Guthrie

Not trying to one up just though you might like this up to 10 million takes 1 second to calc and 3 or 4 to write the answers to a file. Using similar logic in c++ the big diffrence is the bolean vector and its low memory requirments. Compiled and executed in linux. This should compile on a windows box just change or remove the system commands. Sample output

Wed Dec 2 20:43: 20 MST 2009 (start time)

Cacluations Complete Wed Dec 2 20:43: 21 MST 2009 (end time)

Answers Written to File

//code section------ // Name : bitvector.cpp // Author : Sean Guthrie // Version : 3 // Description : Prime Calcuator using a bolean vector array

#include <iostream> #include <fstream> #include <vector> #include <stdlib.h>

using namespace std;

int main(int argc, char *argv[]) { int x=5000000; vector<bool>v1(x); int i=0; int c=0; unsigned long long xz=3; system("date"); cout << "-------------------------------------------------" << endl; for (int b=3;b+b<=x;) {i=c; for (int u=0;u!=x;u++) {i=i+b; v1[i]=1; if (i+b>x) {u=x-1;} } b=b+2; c++; } cout << "Cacluations Complete" << endl; system("date"); cout << "-------------------------------------------------" << endl; ofstream outFile; outFile.open("dat1.dat", ios::app);

	if (!outFile.fail()) //Verify File Open
		 {outFile &lt;&lt; &quot;1&quot; &lt;&lt; endl;
          outFile &lt;&lt; &quot;2&quot; &lt;&lt; endl;
          for (int u=0;u!=x;u++)
		      {if (v1.at(u)==1)
		         {xz=xz+2;}
               else
                  {outFile &lt;&lt; xz &lt;&lt; endl;
                  xz=xz+2;}
              }
          }
	else //Open File Failed
		cout &lt;&lt; &quot;There was and error opening dat1.dat.&quot; &lt;&lt; endl;

cout << "Answers Written to File" << endl; return 0; }

Sean Guthrie

Srry.. I think I see a diffrence in our logic that needs to be mentioned. I am only tracking true or false about the prime condition and am using a counter to caculate the value at that position in the array. And have added logic to skip all even numbers which gave me a 40% increase in speed. I caculated another 30% to add logic to skip multiples of three but havent made it that far yet.

Brad Wood

@Sean: Thanks for the sample C++ code. I ported it over to cfscript and played with it for a bit. One thing my version did was once I marked a number as not being a prime, I short circuited the if (line 12 of the final example) so I would skip it on further iterations.

Skipping the evens isn't nearly that big of a deal then-- since when I go through and mark all the multiples of two as not primes, my short circuit will prevent all those from being evaluated again anyway. Basically, the number of usable indexes in the array shrinks the more multiples you eliminate.

I'm afraid ColdFusion will never be quite as fast for this sort of stuff as C++ or Java. Looking at the stack traces, most of the processing going on is casting srings to doubles over and over again for all the numeric operations and searching the page for scopes. neither of those things would be an issue in a strongly typed language that didn't do scope hunting.

Sean Guthrie

One thing I love about computers... there are so many ways to skin that cat. You got me all fired up about primes again. Just a report my P4 3.00Ghz overclocked @+20% pushed this code from 2 to 100B in 30 seconds. Thats 5.7M primes.

Site Updates

Entry Comments

Entries Search