Integer factorization is the process of decomposing a *composite* number into a product of smaller integers, if these integers are restricted to be prime numbers then the process is called **prime factorization**

The fundamental theorem of arithmetic states that ever positive integer can be written uniquely as a product of primes

example:

\[ 65340 = 2^2 \cdot 3^3 \cdot 5 \cdot 11^2 \]

## Trial division

Trial division is the simplest algorithm for factoring an integer, we assume that \(s\) and \(t\) are factors of a number \(n\) such that \(n = st\) and \(s \leq t\) (note that \(s\) and \(t\) do not need to be prime numbers), when a divisor \(s\) is found then \(n / s\) is also a factor

```
vector<int> trial_division(int n) {
for (int i = 2; i * i <= n; i += 1) {
if (n % i == 0) {
// `n` is a composite number
return vector<int> {i, n / i};
}
}
// n is a prime number
return vector<int> {n};
}
```

## Fermat factorization

Fermat's observation was to write an integer as the difference of squares

\[ \begin{align} n &= x^2 - y^2 \label{fermat} \\ &= (x + y)(x - y) \end{align} \]

Assuming that \(s\) and \(t\) are **odd** factors of \(n\) such that \(n = st\) and \(s \leq t\) we can find \(x\) and \(y\) such that

\[ \begin{align*} s &= x - y \\ t &= x + y \end{align*} \]

Adding both equations

\[ s + t = 2x \\ x = \frac{s + t}{2} \]

Also

\[ y = \frac{t - s}{2} \]

Since we assumed that \(s\) and \(t\) are odd numbers, their difference is an even number which is divisible by \(2\) therefore \(x\) and \(y\) are integers, since \(s > 1\) and \(t \geq s\) we find that \(x \geq 1\) and \(y \geq 0\)

From \eqref{fermat} we also know that \(x = \sqrt{n + y^2}\) and hence \(x \geq \sqrt{n}\), also \(x = \tfrac{s + t}{2}\) and we know that the upper bound of \(s\) happens when \(s\) is as close as \(t\) as possible, given that \(s \leq t\), \(x \leq \tfrac{t + t}{2} \leq n\)

Implementation notes: since \(s\) and \(t\) are odd numbers, their product \(n\) is also an odd number, therefore the implementation below works with odd values of \(n\)

```
/**
* Factorization of an odd number `n` based on Fermat's
* factorization algorithm
*
* @param {int} n
* @return {vector<int>} a vector with two odd integers if `n` is not a
* prime number, a single integer if `n` is a prime number
*/
vector<int> fermat_factorization(int n) {
for (int x = (int) ceil(sqrt(n)); x <= n; x += 1) {
int ySquared = x * x - n;
// check if `y` is the square of some number
int y = (int) sqrt(ySquared);
if (y * y == ySquared) {
int s = x - y;
int t = x + y;
// `s` must be > 1
if (s != 1 && t != n) {
return vector<int> {s, t};
}
}
}
// n is a prime number
return vector<int> {n};
}
```

### Pollard's Rho factorization

Pollard's Rho factorization is a probabilistic factorization algorithm based on the assumption that a number \(n\) is a composite number and the following facts:

- since \(n\) is a
**composite number**there must be a factor \(d\) - let \(a\), \(b\) two positive integers, if \(a \equiv b \pmod{d}\) then the difference \(a - b\) is a multiple of \(d\), since \(n\) is also a multiple of \(d\) some multiple of \(d\) is a divisor of \(n\) and \(a - b\), particularly \(gcd(a - b, n)\) is a divisor of \(n\), let \(gcd(a - b, n) > 1\) then we have found two factors of \(n\) (\(gcd(a - b, n), \tfrac{n}{gcd(a - b, n)}\))

Now the problem is reduced to find \(a\) and \(b\) such that \(gcd(a - b, n) > 1\), we can use the following algorithm which picks random numbers in the range \([1, n - 1]\)

```
let `n` be the number to be factorized
let `x` be an array of integers
x[0] = random integer in the range [1, n - 1]
while we haven't two numbers such that `gcd(x_i, x_j, n) > 1`
x[i] = random integer in the range [1, n - 1]
for all `j < i` and `j >= 0`
if `gcd(x[i] - x[j], n) > 1`
return x[i], x[j]
```

This page has a good explanation on how this algorithm will find \(a\) and \(b\) with a probability \(~50\%\) after \(\sqrt{n}\) iterations, the algorithm above is not very helpful though since at the \(k\) iteration we have to do \(k - 1\) pairwise checks

Here's another algorithm to pick random numbers, let \(x\) be an integer in the range \([1, n - 1]\), a function that will generate a number in the range \([1, n - 1]\) based on a previous number is

\[ f(x) = x^2 + c \pmod{n} \]

Because there are only \(n - 1\) possible values our generator will eventually fall into a cycle, for example let \(n = 55, c = 2, x = 2\)

\[ \begin{align*} x_0 &= 2 \\ x_1 &= (2^2 + 2) \pmod{55} = 6 \\ x_2 &= (6^2 + 2) \pmod{55} = 38 \\ x_3 &= (38^2 + 2) \pmod{55} = 16 \\ x_4 &= (16^2 + 2) \pmod{55} = 38 \text{ which is equal to $x_2$ } \end{align*} \]

Pollard detected the cycle using Floyd's cycle-finding algorithm which is based on two pointers which move through a sequence at different speeds, one moves a unit and the other moves two units each time, if there's a cycle eventually the two pointers will encounter at some element belonging to the cycle, if we've analyzed all the elements of the sequence and saw not a single contiguous pair fullfills \(gcd(x_i - x_{i + 1}, n) > 1\) we need to choose other values for \(x_0, a\) and rerun the algorithm

```
// C++11
#include <random>
/**
* Computes a factor of `n` which is greater than `1`
* @param {long long} n The number to be factorized
* @return {long long} A positive integer which is a factor of `n`
* when the algorithm successfully finds a factor of `n`, -1 otherwise
*/
long long pollard_rho(long long n) {
if (n % 2 == 0) {
return 2;
}
std::random_device rd;
std::mt19937 engine(rd());
std::uniform_int_distribution<long long> dis(1, n - 1);
long long x = dis(engine);
long long c = dis(engine);
long long y = x; // y = x^2 + c (mod n)
do {
// tortoise goes x = f(x)
x = ((x * x) % n + c) % n;
// hare goes y = f(f(y))
y = ((y * y) % n + c) % n;
y = ((y * y) % n + c) % n;
long long gcd = __gcd(abs(x - y), n);
if (gcd > 1) {
return gcd;
}
} while (x != y);
return -1;
}
/**
* Pollard rho factorization runner, it makes multiple calls to
* `pollard_rho` until a factor is found
* @param {long long} n The number to be factorized
* @return {long long} A factor of `n` (it is `n` when `n` is a prime number)
*/
long long pollard_rho_factorization(long long n) {
long long factor;
do {
factor = pollard_rho(n);
} while(factor < 0);
return factor;
}
```

### Eratosthenes Sieve factorization of a range

We can also compute the factorization of a number by modifying the sieve of Erathostenes, remember that each state of the sieve hold a boolean telling if the number is prime or not, this time each state of the sieve will hold a pair of numbers

- the lowest prime that is a divisor of any index
`i`

- the maximum power of the lowest prime computed above (optional)

Let's represent \(n\) as \(p_1^{a_1} \cdot p_2^{a_2} \ldots p_n^{a_n}\), since we're hold for each position the lowest prime and its the maximum power, the state stored at the position \(n\) of the sieve will be \(p_1^{a_1}\), if we divide \(n\) by this number we will move to the state \(p_2^{a_2} \ldots p_n^{a_n}\), this recursive process is run until the current state reached in the sieve is \(1\)

```
// let `p` be the smallest prime factor of the index `i`, each element
// contains a pair `(p, x)` such that `p^x` is a divisor of `i`
// e.g.
//
// 8 = (2, 3)
// 15 = (3, 1)
// 6 = (2, 1)
//
vector<pair<int, int> > lp;
void eratosthenes_sieve_factorization(long long n) {
pair<int, int> unvisited(-1, -1);
// (-1, -1) is an unvisited state
lp.assign(n + 1, unvisited);
for (int i = 2; i * i <= n; i += 1) {
if (lp[i] == unvisited) {
// if an index is in an unvisited state it's a prime number
pair<int, int> base(i, 1);
lp[i] = base;
for (int j = i * i; j <= n; j += i) {
if (lp[j] == unvisited) {
// if a multiple of the prime number is in an unvisited
// state that means that it's lowest prime divisor is
// the current index `i`
lp[j] = base;
if (lp[j / i] != unvisited) {
// `j` is a multiple of `i`, in fact `j = i^x` because
// only numbers which don't have other factors but `i`
// reach this point, to accumulate the powers it's enough
// find out the power of `j / i`
lp[j].second += lp[j / i].second;
}
}
}
}
}
// all the prime numbers > sqrt(n) will have an unvisited state
// changing the unvisited state to prime_number^1
int sqrtN = sqrt(n);
if (sqrtN % 2 == 0) {
sqrtN += 1;
}
for (int i = sqrtN; i <= n; i += 2) {
if (lp[i] == unvisited) {
lp[i] = pair<int, int>(i, 1);
}
}
}
```