Large Integer Factorization Algorithm and Its Practice
08/02/2023
Last updated
08/02/2023
Last updated
In 1977, RSA (Rivest–Shamir–Adleman), a cryptosystem based on factorizing large integers was born. It was named after its three creators, Ron Rivest, Adi Shamir, and Leonard Adleman, using their surnames' initials.
With the significant contributions of the RSA algorithm to public-key cryptography and its profound impact worldwide, the three creators were then awarded the Turing Award in 2002.
Until today, the integer factorization problem remains one of the most crucial subjects in almost all aspects of public-key cryptography security.
The security of the RSA algorithm depends on large number factorization, which is
to factorize a large integer into a product of several large prime factors.
Some typical large integer factorization algorithms targeting small factors are:
Pollard's rho algorithm
Pollard’s p-1 algorithm
ECM algorithm (known as Lenstra elliptic curve factorization or elliptic-curve factorization method)
In this article, we will first introduce Pollard's p-1 algorithm and the ECM algorithm. Additionally, we will present a specific implementation of the ECM algorithm called GMP-ECM.
Definition
Calculation
To improve this method, the ECM (Elliptic Curve Method) introduces random elliptic curves to transform the original multiplicative group into a group consisting of points on the elliptic curve.
The elliptic curve method is a number-theoretic algorithm that utilizes the properties and arithmetic rules of points on an elliptic curve to search for factors that satisfy certain conditions.
The GMP-ECM implementation makes optimizations based on the original ECM method.
In Stage 1, GMP-ECM has optimized the selection and calculation of the elliptic curve.
In GMP-ECM, certain methods are employed for elliptic curve generation to ensure that the order of the generated elliptic curve satisfies certain properties, for example:
In GMP-ECM, the Montgomery curve is used, and the multiplication or addition of points follows the following rules:
The main idea of Stage 2 is the meet-in-the-middle strategy:
Use the tool which uses GMP-ECM to factorize:
Then, we successfully find the prime factor 1021791499165844943393503.
What's interesting is that, although this is a "well-known" challenge that has been practically demonstrated, it is still unknown whether it is a problem at class or class.
Powersmooth: If all the prime powers of the number are B-smooth, that is ,then is B-smooth.
Pollard’s p-1 algorithm is an integer factorization algorithm using Fermat's little theorem, invented by John Pollard in 1974. According to Fermat's little theorem, , if is a factor of which is to be factorized, then we can express it as . For , if , then is a multiple of , therefore
For data selection, the Pollard p-1 algorithm employs random numbers. If no result is obtained through random selection, it then generates from the prime numbers (less than (B-powersmooth)) until the prime number list is exhausted.
Input: (composite number)
Output: or a failed non-trivial factor
Select a smoothness bound .
, is a prime number.
Randomly select a number that is coprime with .
If , return .
If , choose a larger value for and return to Step 2, or return failure.
If , choose a smaller value for and return to Step 2, or return failure.
In Pollard's p-1 method, it requires that has a factor , and that has relatively small factors, allowing us to find a factor of using . However, when the factors of are large, Pollard's p-1 method is less likely to successfully factorize .
Assuming is a prime factor of , considering the elliptic curve and according to Hasse's theorem, the order of the elliptic curve satisfies the following condition: . When the parameters and of the elliptic curve vary, the order of the elliptic curve will fall within the range .
When using ECM to find factors of , there are three steps:
Select a and calculate .
Select a random curve out of and a random point on the curve.
Calculate . During the calculation, if the addition of points on the corresponding elliptic curve cannot be calculated, then a factor of has been found. Otherwise, you can select another curve for calculation, or return failure.
The principle behind the ECM algorithm finding factors of lies in the fact that when a randomly chosen elliptic curve order is B-smooth, during the calculation of , if we encounter or , it will result in a non-trivial factor of .
The overall algorithm process in GMP-ECM has been optimized. In GMP-ECM, the calculation of factors is divided into two steps, each step selecting two values .
The algorithm is divided into two stages, where calculations are performed using and , respectively. The calculations in stage 1 are similar to the ECM algorithm.
First, calculate the product of the point on the elliptic curve :
During the factorization, when the order of the randomly generated elliptic curve varies within the range , and the random elliptic curve is B1-smooth, it can factorize .
Suyama's form: Ensure that the order of the elliptic curve is divisible by 12.
Montgomery's form: Ensure that the order of the elliptic curve is divisible by 3.
Calculate:
Calculate:
Calculate:
Calculate: , , is a parameter in the elliptic curve.
In the first stage of GMP-ECM, the main task is to calculate the point on the elliptic curve . If it fails, meaning that , then the search range is expanded to in the second stage. In this new range, there exists a prime number , .
In Stage 2, we calculate two points: and .
If , it implies that . Therefore, calculating will obtain the factor .
The ECM method is expected to take time of to find a factor of a number , where .
Here, represents the complexity of multiplication modulo .
In the actual calculation process, due to the algorithm's randomness, the time to find the factor of may vary. Additionally, the ability of the algorithm to factorize and the time required for factorization also depend on the values of and .
For the calculations, you can refer to the recommended values of and as specified in the GMP-ECM documentation.
Take a large integer (2048 bit) as an example: