How to use RSA?

How to use RSA? - python

I trying to use RSA to encrypt my data in Python.
I created two small (private and public) keys
e : 14918179 N : 15372757
D : 7495819 N : 15372757
I tried to encrypt a small value (10) with those keys, and it worked. But the problem is that it takes a long time to do.
For example, I compared it to openssl by using a big key and long string and it worked under a second.
And I know there is a third library for using RSA (not a big fan of them).
I am trying to use this method to encrypt my data that is going to be sent to the server and it should do it under a second
How can I do it?

I think in general your questions on performance are answered here. wikipedia : modular exponentiation
The article describes
Direct exponentiation
Memory efficient exponentiation
Binary exponentiation
Direct Exponentiation
raise to the power e and take the modulo.
This is straight forward, but the size of the number pre modulo is extermely large.
Memory efficient exponentiation
Replacing the power operation with a multiply e times, allows the accumulated result to always be within the modulo range. This limits the size of the bignum and speeds up the operation.
Binary exponentiation
If you convert the power to a binary number
if e = 13 => 1101
pow(n, 13) = pow( n, 8) * pow(n,4) * pow(n, 1)
So for an m bit exponent, then only about m operations need to be done.
Combining the memory efficient and binary exponentiation solves most of the performance.
Python offers an implementation of these improvements using the 3 argument power function e.g.
>>> import timeit
>>> t = timeit.Timer( 'print(pow( 10,14918179, 15372757))' )
>>> t.timeit(1)
10140931
0.06365180000000237
>>> u = timeit.Timer( 'print(pow( 10,14918179) % 15372757)' )
>>> u.timeit(1)
10140931
15.021656000000007
The 3 parameter of pow takes .06s whilst the 2 parameter version of pow takes 15 seconds.

Related

How to calculate numbers with large exponents

I was writing a program where I need to calculate insanely huge numbers.
k = int(input())
print(int((2**k)*5 % (10**9 + 7))
Here, k being of the orders of 109
As expected, this was rather slow( taking upto 5 seconds to calculate) whereas my program needs to finish computing in 1 second.
After a little research online I found a function pow(), and by writing
p = 10**9 + 7
print(int(pow(2, k- 1,p)*10))
This works fine for small numbers but messes up at large numbers. I can understand why that is happening( because this isn't essentially what I want to calculate and the modulus operation with such a large number doesn't affect the calculation with small values of k).
I also found libraries like gmpy2 and numpy but I don't know how to use them since I'm just a beginner with python.
So how can I write an expression for what I want to calculate and which works fast enough and doesn't err at large numbers too?

You can optimize your operation by passing the number you want to take modulus from as the third argument of builtin pow and multiplying the result by 5
def func(k):
x = pow(2, k, pow(10,9) + 7) * 5
return int(x)

What optimisations are done that this code completes quickly?

I was solving a problem I came across, what is the sum of powers of 3 from 0 to 2009 mod 8.
I got an answer using pen and paper, and tried to verify it with some simple python
print(sum(3**k for k in range(2010)) % 8)
I was surprised by how quickly it returned an answer. My question is what optimisations or tricks are used by the interpreter to get the answer so quickly?

None, it's just not a lot of computation for a computer to do.
Your code is equivalent to:
>>> a = sum(3**k for k in range(2010))
>>> a % 8
4
a is a 959-digit number - it's just not a large task to ask of a computer.
Try sticking two zeros on the end of the 2010 and you will see it taking an appreciable amount of time.

The only optimization at work is that each instance of 3**k is evaluated using a number of multiplications proportional to the number of bits in k (it does not multiply 3 by itself k-1 times).
As already noted, if you boost 2010 to 20100 or 201000 or ..., it will take much longer, because 3**k becomes very large. However, in those cases you can speed it enormously again by rewriting it as, e.g.,
print(sum(pow(3, k, 8) for k in range(201000)) % 8)
Internally, pow(3, k, 8) still does a number of multiplications proportional to the number of bits in k, but doesn't need to retain any integers internally larger than about 8**2 (the square of the modulus).

No fancy optimizations are responsible for the fast response you observed. Computers are just a lot faster in absolute terms than you expected.

For loop computing recurrence relation takes very long

Q(x)=[Q(x−1)+Q(x−2)]^2
Q(0)=0, Q(1)=1
I need to find Q(29). I wrote a code in python but it is taking too long. How to get the output (any language would be fine)?
Here is the code I wrote:
a=0
b=1
for i in range(28):
c=(a+b)*(a+b)
a=b
b=c
print(b)

I don't think this is a tractable problem with programming. The reason why your code is slow is that the numbers within grow very rapidly, and python uses infinite-precision integers, so it takes its time computing the result.
Try your code with double-precision floats:
a=0.0
b=1.0
for i in range(28):
c=(a+b)*(a+b)
a=b
b=c
print(b)
The answer is inf. This is because the answer is much much larger than the largest representable double-precision number, which is rougly 10^308. You could try using finite-precision integers, but those will have an even smaller representable maximum. Note that using doubles will lead to loss of precision, but surely you don't want to know every single digit of your huuuge number (side note: I happen to know that you do, making your job even harder).
So here's some math background for my skepticism: Your recurrence relation goes
Q[k] = (Q[k-2] + Q[k-1])^2
You can formulate a more tractable sequence from the square root of this sequence:
P[k] = sqrt(Q[k])
P[k] = P[k-2]^2 + P[k-1]^2
If you can solve for P, you'll know Q = P^2.
Now, consider this sequence:
R[k] = R[k-1]^2
Starting from the same initial values, this will always be smaller than P[k], since
P[k] = P[k-2]^2 + P[k-1]^2 >= P[k-1]^2
(but this will be a "pretty close" lower bound as the first term will always be insignificant compared to the second). We can construct this sequence:
R[k] = R[k-1]^2 = R[k-2]^4 = R[k-3]^6 = R[k-m]^(2^m) = R[0]^(2^k)
Since P[1 give or take] starts with value 2, we should consider
R[k] = 2^(2^k)
as a lower bound for P[k], give or take a few exponents of 2. For k=28 this is
P[28] > 2^(2^28) = 2^(268435456) = 10^(log10(2)*2^28) ~ 10^80807124
That's at least 80807124 digits for the final value of P, which is the square root of the number you're looking for. That makes Q[28] larger than 10^1.6e8. If you printed that number into a text file, it would take more than 150 megabytes.
If you imagine you're trying to handle these integers exactly, you'll see why it takes so long, and why you should reconsider your approach. What if you could compute that huge number? What would you do with it? How long would it take python to print that number on your screen? None of this is trivial, so I suggest that you try to solve your problem on paper, or find a way around it.
Note that you can use a symbolic math package such as sympy in python to get a feeling of how hard your problem is:
import sympy as sym
a,b,c,b0 = sym.symbols('a,b,c,b0')
a = 0
b = b0
for k in range(28):
c = (a+b)**2
a = b
b = c
print(c)
This will take a while, but it will fill your screen with the explicit expression for Q[k] with only b0 as parameter. You would "only" have to substitute your values into that monster to obtain the exact result. You could also try sym.simplify on the expression, but I couldn't wait for that to return anything meaningful.
During lunch time I let your loop run, and it finished. The result has
>>> import math
>>> print(math.log10(c))
49287457.71120789
So my lower bound for k=28 is a bit large, probably due to off-by-one errors in the exponent. The memory needed to store this integer is
>>> import sys
>>> sys.getsizeof(c)
21830612
that is roughly 20 MB.

This can be solved with brute force but it is still an interesting problem since it uses two different "slow" operations and there are trade-offs in choosing the correct approach.
There are two places where the native Python implementation of algorithm is slow: the multiplication of large numbers and the conversion of large numbers to a string.
Python uses the Karatsuba algorithm for multiplication. It has a running time of O(n^1.585) where n is the length of the numbers. It does get slower as the numbers get larger but you can compute Q(29).
The algorithm for converting a Python integer to its decimal representation is much slower. It has running time of O(n^2). For large numbers, it is much slower than multiplication.
Note: the times for conversion to a string also include the actual calculation time.
On my computer, computing Q(25) requires ~2.5 seconds but conversion to a string requires ~3 minutes 9 seconds. Computing Q(26) requires ~7.5 seconds but conversion to a string requires ~12 minutes 36 seconds. As the size of the number doubles, multiplication time increases by a factor of 3 and the running time of string conversion increases by a factor of 4. The running time of the conversion to string dominates. Computing Q(29) takes about 3 minutes and 20 seconds but conversion to a string will take more than 12 hours (I didn't actually wait that long).
One option is the gmpy2 module that provides access the very fast GMP library. With gmpy2, Q(26) can be calculated in ~0.2 seconds and converted into a string in ~1.2 seconds. Q(29) can be calculated in ~1.7 seconds and converted into a string in ~15 seconds. Multiplication in GMP is O(n*ln(n)). Conversion to decimal is faster that Python's O(n^2) algorithm but still slower than multiplication.
The fastest option is Python's decimal module. Instead of using a radix-2, or binary, internal representation, it uses a radix-10 (actually of power of 10) internal representation. Calculations are slightly slower but conversion to a string is very fast; it is just O(n). Calculating Q(29) requires ~9.2 seconds but calculating and conversion together only requires ~9.5 seconds. The time for conversion to string is only ~0.3 seconds.
Here is an example program using decimal. It also sums the individual digits of the final value.
import decimal
decimal.getcontext().prec = 200000000
decimal.getcontext().Emax = 200000000
decimal.getcontext().Emin = -200000000
def sum_of_digits(x):
return sum(map(int, (t for t in str(x))))
a = decimal.Decimal(0)
b = decimal.Decimal(1)
for i in range(28):
c = (a + b) * (a + b)
a = b
b = c
temp = str(b)
print(i, len(temp), sum_of_digits(temp))
I didn't include the time for converting the millions of digits into strings and adding them in the discussion above. That time should be the same for each version.

This WILL take too long, since is a kind of geometric progression which tends to infinity.
Example:
a=0
b=1
c=1*1 = 1
a=1
b=1
c=2*2 = 4
a=1
b=4
c=5*5 = 25
a=4
b=25
c= 29*29 = 841
a=25
b=841
.
.
.

You can check if c%10==0 and then divide it, and in the end multiplyit number of times you divided it but in the end it'll be the same large number. If you really need to do this calculation try using C++ it should run it faster than Python.
Here's your code written in C++
#include <cstdlib>
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
long long int a=0;
long long int b=1;
long long int c=0;
for(int i=0;i<28;i++){
c=(a+b)*(a+b);
a=b;
b=c;
}
cout << c;
return 0;
}

Modular arithmetic in Python

Problem 48 description from Project Euler:
The series, 1^1 + 2^2 + 3^3 + ... + 10^10 = 10405071317. Find the last
ten digits of the series, 1^1 + 2^2 + 3^3 + ... + 1000^1000.
I've just solved this problem using a one-liner in Python:
print sum([i**i for i in range(1,1001)])%(10**10)
I did it that way almost instantly, as I remembered that division mod n is very fast in Python. But I still don't understand how does this work under the hood (what optimizations does Python do?) and why is this so fast.
Could you please explain this to me? Is the mod 10**10 operation optimized to be applied for every iteration of the list comprehension instead of the whole sum?
$ time python pe48.py
9110846700
real 0m0.070s
user 0m0.047s
sys 0m0.015s

Given that
print sum([i**i for i in range(1,1001)])%(10**10)
and
print sum([i**i for i in range(1,1001)])
function equally fast in Python, the answer to your last question is 'no'.
So, Python must be able to do integer exponentiation really fast. And it so happens that integer exponentiation is O(log(n)) multiplications: http://en.wikipedia.org/wiki/Exponentiation#Efficient_computation_of_integer_powers
Essentially what is done is, instead of doing 2^100 = 2*2*2*2*2... 100 times, you realize that 2^100 is also 2^64 * 2^32 * 2^4 , that you can square 2 over and over and over to get 2^2 then 2^4 then 2^8... etc, and once you've found the values of all three of those components you multiply them for a final answer. This requires much fewer multiplication operations. The specifics of how to go about it are a bit more complex, but Python is mature enough to be well optimized on such a core feature.

No, it's applied to the whole sum. The sum itself is very fast to compute. Doing exponents isn't that hard to do quickly if the arguments are integers.

Reversible hash function?

I need a reversible hash function (obviously the input will be much smaller in size than the output) that maps the input to the output in a random-looking way. Basically, I want a way to transform a number like "123" to a larger number like "9874362483910978", but not in a way that will preserve comparisons, so it must not be always true that, if x1 > x2, f(x1) > f(x2) (but neither must it be always false).
The use case for this is that I need to find a way to transform small numbers into larger, random-looking ones. They don't actually need to be random (in fact, they need to be deterministic, so the same input always maps to the same output), but they do need to look random (at least when base64encoded into strings, so shifting by Z bits won't work as similar numbers will have similar MSBs).
Also, easy (fast) calculation and reversal is a plus, but not required.
I don't know if I'm being clear, or if such an algorithm exists, but I'd appreciate any and all help!

None of the answers provided seemed particularly useful, given the question. I had the same problem, needing a simple, reversible hash for not-security purposes, and decided to go with bit relocation. It's simple, it's fast, and it doesn't require knowing anything about boolean maths or crypo algorithms or anything else that requires actual thinking.
The simplest would probably be to just move half the bits left, and the other half right:
def hash(n):
return ((0x0000FFFF & n)<<16) + ((0xFFFF0000 & n)>>16)
This is reversible, in that hash(hash(n)) = n, and has non-sequential pairs {n,m}, n < m, where hash(m) < hash(n).
And to get a much less sequential looking implementation, you might also want to consider an interlace reordering from [msb,z,...,a,lsb] to [msb,lsb,z,a,...] or [lsb,msb,a,z,...] or any other relocation you feel gives an appropriately non-sequential sequence for the numbers you deal with, or even add a XOR on top for peak desequential'ing.
(The above function is safe for numbers that fit in 32 bits, larger numbers are guaranteed to cause collisions and would need some more bit mask coverage to prevent problems. That said, 32 bits is usually enough for any non-security uid).
Also have a look at the multiplicative inverse answer given by Andy Hayden, below.

Another simple solution is to use multiplicative inverses (see Eri Clippert's blog):
we showed how you can take any two coprime positive integers x and m and compute a third positive integer y with the property that (x * y) % m == 1, and therefore that (x * z * y) % m == z % m for any positive integer z. That is, there always exists a “multiplicative inverse”, that “undoes” the results of multiplying by x modulo m.
We take a large number e.g. 4000000000 and a large co-prime number e.g. 387420489:
def rhash(n):
return n * 387420489 % 4000000000
>>> rhash(12)
649045868
We first calculate the multiplicative inverse with modinv which turns out to be 3513180409:
>>> 3513180409 * 387420489 % 4000000000
1
Now, we can define the inverse:
def un_rhash(h):
return h * 3513180409 % 4000000000
>>> un_rhash(649045868) # un_rhash(rhash(12))
12
Note: This answer is fast to compute and works for numbers up to 4000000000, if you need to handle larger numbers choose a sufficiently large number (and another co-prime).
You may want to do this with hexidecimal (to pack the int):
def rhash(n):
return "%08x" % (n * 387420489 % 4000000000)
>>> rhash(12)
'26afa76c'
def un_rhash(h):
return int(h, 16) * 3513180409 % 4000000000
>>> un_rhash('26afa76c') # un_rhash(rhash(12))
12
If you choose a relatively large co-prime then this will seem random, be non-sequential and also be quick to calculate.

What you are asking for is encryption. A block cipher in its basic mode of operation, ECB, reversibly maps a input block onto an output block of the same size. The input and output blocks can be interpreted as numbers.
For example, AES is a 128 bit block cipher, so it maps an input 128 bit number onto an output 128 bit number. If 128 bits is good enough for your purposes, then you can simply pad your input number out to 128 bits, transform that single block with AES, then format the output as a 128 bit number.
If 128 bits is too large, you could use a 64 bit block cipher, like 3DES, IDEA or Blowfish.
ECB mode is considered weak, but its weakness is the constraint that you have postulated as a requirement (namely, that the mapping be "deterministic"). This is a weakness, because once an attacker has observed that 123 maps to 9874362483910978, from then on whenever she sees the latter number, she knows the plaintext was 123. An attacker can perform frequency analysis and/or build up a dictionary of known plaintext/ciphertext pairs.

Basically, you are looking for 2 way encryption, and one that probably uses a salt.
You have a number of choices:
TripleDES
AES
Here is an example:" Simple insecure two-way "obfuscation" for C#
What language are you looking at? If .NET then look at the encryption namespace for some ideas.

Why not just XOR with a nice long number?
Easy. Fast. Reversible.
Or, if this doesn't need to be terribly secure, you could convert from base 10 to some smaller base (like base 8 or base 4, depending on how long you want the numbers to be).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use RSA? - python

Related

How to calculate numbers with large exponents

What optimisations are done that this code completes quickly?

For loop computing recurrence relation takes very long

Modular arithmetic in Python

Reversible hash function?

Categories

Resources