Re: Prime Numbers

Date: Fri Jun 5 16:14:30 1998
Posted By: Steve Czarnecki, senior technical staff member, Lockheed Martin
Area of science: Computer Science
ID: 895130186.Cs
Message:

Andrea, 

This is a really neat assignment that has a lot more subtlety buried in the 
problem than perhaps you realize.  I hope you have some fun with it.

Behind any computer program is an algorithm, which is the rules or 
procedures for solving a problem.  The program is the implementation of the 
algorithm, or problem-solving strategy.

So let's talk about some prime number finding strategies.  The first 
strategy relies on the brute-force idea that a prime number is one which 
has no divisors other than itself and 1, so all I need to do is check all 
possible divisors.  

Algorithm #1:  Declare N=2 a prime.  For each value of N=3 to N=2,000,000 
check to see if number N is prime by checking whether the division N/K 
leaves a remainder for all integer values of K from K=2 to K=N-1. If all 
values of K leave a remainder, N is prime; if any value of K 
leaves no remainder, N is not prime. 

As I said, algorithm #1 is based on a very straightforward interpretation 
of the definition of a prime number.  It's also horribly inefficient (i.e., 
S-L-O-W).

Algorithm #1a: Similar to #1, but start with K=2 and stop as soon as you 
find that N/K divides evenly (does not leave remainder) and declare "N is 
not a prime" or stop when K = (N/2)-1 and declare "N is prime".   

This is somewhat more efficient; no sense trying all possible divisors 
after you've found one that divides into N without remainder (you've proven 
that N is not prime, so quit, already), and there's
no value in trying numbers larger than N/2, since you know beforehand that 
they can't divide into N without remainder.

Algorithm #1b: similar to 1a, but instead of trying all integer values 
trying K=2 to K=(N/2)-1, keep a list of previously found primes and only 
check to see if THEY divide into N.  

This strategy trims down the list of potential divisors immensely and 
speeds things up at the expense of a little bit more complicated 
bookkeeping.  The reason you only need to check prime numbers as possible 
divisors of N is the following: If K is not a prime number, then it has 
some prime factorization itself.  For argument's sake, let's say K=p1*p2.  
If you get to the point in algorithm 1b where you are trying to see if 
K=p1*p2 is a factor of N, then you have already concluded that neither p1 
nor p2 are factors of N.  And if THAT's the case, why bother with p1*p2?
The same argument applies for values of K that are the product of more than 
two prime factors, regardless of whether these factors are unique or 
repeated in the factorization of K.  For example, 12 is obviously 
not a factor of 29 or 30.  

There's no point checking in whether 12=2*2*3 divides into N=29, because if 
it does, then 4 and 3 must all divided into it as well, but if 4 divides 
into it, so must have 2. Therefore, we would have discarded N=29 as a 
candidate for a prime number back when we were checking 2 or 3 as possible 
factors.  But we haven't yet discarded 29, because neither 2 nor 3 are 
factors of 29 (nor are any other numbers).  12=2*2*3 can't be a factor of 
29 because NONE of its factors are divisors of 29;  there's no need 
to check 12 as a possible factor as it's already been "checked" as a 
consequence of not being a prime number itself.  

Furthermore, after we're done checking 29 (and concluding that it's a prime 
number), we go on to check 30, and find that 2 is a divisor, so we stop and 
declare that 30 is not a prime number, and go on to start checking 31 for 
primehood.  We never have to check whether 12 is a divisor of 30, because 
an earlier test involving one of 12's factors (2) has already "flunked" 30 
as a candidate for primehood.

These versions Algorithm 1 are based on long division.  Now let's try a 
different concept.

Back in times of the ancient Greeks, a mathematician named Eratosthenes 
invented a method to find prime numbers.  His name commemorates his 
algorith, "The Sieve of Eratosthenes".  What Eratosthenes realized is that long 
division is a lot of work, and all it's really doing for us when we're 
looking for primes is identifying numbers which are multiples of other 
numbers (by the way, the name for a non-prime number is "composite 
number").

Algorithm 2 (Sieve of Eratosthenes).  List all numbers between 2 and N.  
Cross out all multiples of 2, but not the number 2 itself.  Then cross out 
all multiples of 3 but not the number 3 itself. Continue the process for 4, 
then 5, etc.  Eventually, you will be ready to cross out multiples of 
(N/2)-1, at which point you will cross out N-2 and stop.  Any number 
remaining on the list is prime.

I thought the Sieve of Eratosthenes was a really cool idea (ahem, "elegant 
concept") when I first stumbled across a description of it in high school 
(not quite in the time of the ancient Greeks, but suffice it to say I was 
using FORTRAN II on an IBM 1620 with a 50 kHz clock speed... yes, that 
really was 50 kilohertz, not megahertz.  Last Sunday's newspaper flyer from 
Walmart says that they're selling a computer that's about 6000 times 
faster, with a 300 MHz clock).  But I digress...

Algorithm 2 is pretty fast, in that you don't ask the computer to do a 
whole bunch of long divisions.  It can be made faster, though:

Algorithm 2a: List all numbers between 1 and N.  Cross out multiples of 2 
except for the number 2 itself.  Find the next number on the list which 
hasn't been crossed out already (this will be 3) and cross out all 
multiples of 3 except for 3 itself.  Look for the next number on the list 
which hasn't been crossed out already (this will be 5) and cross out its 
multiples (except for itself). 

This version is somewhat faster than Algorithm 2 because any starting 
number K which hasn't already been crossed out is prime; any potential 
starting number which HAS already been crossed IS NOT prime.  Because it is 
not prime, it can be factored by some smaller prime numbers. All multiples 
of K have the same prime factorization (although the factors will appear a 
different number of times).  But all multiples of these smaller prime 
factors will have been already crossed off the list, which means all 
multiples of K will have already been crossed off the list, so why bother 
by crossing them off again?

The means Algorithm 2a is even speedier, but you run into a difficulty with 
the Sieve of Eratosthenes that I haven't mentioned yet: your computer may 
have problems dealing with a list of 2 billion numbers.  So how do you deal 
with this?

Algorithm 2b: Imagine executing Algorithm 2a yourself.  Because you can't 
find a sheet of paper big enough write down 2 billion numbers, you will 
list the numbers on multiple pieces of paper.  Page 1 might contain 1 
through 1000, page 2 the numbers 1001 through 2000, etc.  You would apply 
algorithm 2a first to page 1 to build a list of prime numbers between 2 and 
1000.  Then you could use the list of existing prime numbers to cross off 
all multiples of the primes on page 2 (the numbers from 1001 to 2000), and 
any remaining numbers must be prime so you add these to end of your list of 
prime numbers.  (Note that any multiple of a number between 1001 and 2000 
is greater than 2000, and is NOT on the current page).  Now repeat the 
process for page 3, crossing off all multiples of the numbers on the 
existing prime number list and adding what remains to your list of prime 
numbers. Then continue for page 4, then page 5, etc., until you've 
completed your effort with page 2,000,000.  

The parallels between "sheets of paper" and "maximum allowabe array size" 
(be it driven by language, operating system, or hardware memory 
constraints) ought to be fairly obvious.  Also, note that while some 
languages and operating systems might be able to handle a data array with 2 
billion entries, it will probably do so by swapping pages of memory between 
RAM and your hard drive. That would make Algorithm 2a much, much  slower 
than what it's capable of, and much slower than Algorithm 2b, assuming you 
work out the bookeeping so that page swapping never has to happen.

Also, to make this method work efficiently, you have to do a little 
computation to decide where the first and last multiple of each prime 
number is, or if there are any multiples at all on the page.  

So there's a tradeoff: some added bookkeeping complexity lets Algorithm 2b 
achieves very high speed.  On the other hand, Algorithm 1b is slower, but 
it's not a memory hog.  Life is full of hard choices :-)  

Experiment with these ideas and see which you like best (and why). Happy 
Computing!

Steve Czarnecki
Current Queue | Current Queue for Computer Science | Computer Science archives
Try the links in the MadSci Library for more information on Computer Science.