Python: How do I convert file to custom base number and back?

Python: How do I convert file to custom base number and back? - python

I have a file that I want to convert into custom base (base 86 for example, with custom alphabet)
I have try to convert the file with hexlify and then into my custom base but it's too slow... 8 second for 60 Ko..
def HexToBase(Hexa, AlphabetList, OccurList, threshold=10):
number = int(Hexa,16) #base 16 vers base 10
alphabet = GetAlphabet(AlphabetList, OccurList, threshold)
#GetAlphabet return a list of all chars that occurs more than threshold times
b_nbr = len(alphabet) #get the base
out = ''
while number > 0:
out = alphabet[(number % b_nbr)] + out
number = number // b_nbr
return out
file = open("File.jpg","rb")
binary_data = file.read()
HexToBase(binascii.hexlify(binary_data),['a','b'],[23,54])
So, could anyone help me to find the right solution ?
Sorry for my poor English I'm French, and Thank's for your help !

First you can replace:
int(binascii.hexlify(binary_data), 16) # timeit: 14.349809918712538
By:
int.from_bytes(binary_data, byteorder='little') # timeit: 3.3330371951720164
Second you can use the divmod function to speed up the loop:
out = ""
while number > 0:
number, m = divmod(number, b_nbr)
out = alphabet[m] + out
# timeit: 3.8345545611298126 vs 7.472579440019706
For divmod vs %, // comparison and large numbers, see Is divmod() faster than using the % and // operators?.
(Remark: I expected that buildind an array and then making a string with "".join would be faster than out = ... + out but that was not the case with CPython 3.6.)
Everything put together gave me a speed up factor of 6.

Related

int 111 to binary 111(decimal 7)

Problem:Take a number example 37 is (binary 100101).
Count the binary 1s and create a binary like (111) and print the decimal of that binary(7)
num = bin(int(input()))
st = str(num)
count=0
for i in st:
if i == "1":
count +=1
del st
vt = ""
for i in range(count):
vt = vt + "1"
vt = int(vt)
print(vt)
I am a newbie and stuck here.

I wouldn't recommend your approach, but to show where you went wrong:
num = bin(int(input()))
st = str(num)
count = 0
for i in st:
if i == "1":
count += 1
del st
# start the string representation of the binary value correctly
vt = "0b"
for i in range(count):
vt = vt + "1"
# tell the `int()` function that it should consider the string as a binary number (base 2)
vt = int(vt, 2)
print(vt)
Note that the code below does the exact same thing as yours, but a bit more concisely so:
ones = bin(int(input())).count('1')
vt = int('0b' + '1' * ones, 2)
print(vt)
It uses the standard method count() on the string to get the number of ones in ones and it uses Python's ability to repeat a string a number of times using the multiplication operator *.

Try this once you got the required binary.
def binaryToDecimal(binary):
binary1 = binary
decimal, i, n = 0, 0, 0
while(binary != 0):
dec = binary % 10
decimal = decimal + dec * pow(2, i)
binary = binary//10
i += 1
print(decimal)

In one line:
print(int(format(int(input()), 'b').count('1') * '1', 2))
Let's break it down, inside out:
format(int(input()), 'b')
This built-in function takes an integer number from the input, and returns a formatted string according to the Format Specification Mini-Language. In this case, the argument 'b' gives us a binary format.
Then, we have
.count('1')
This str method returns the total number of occurrences of '1' in the string returned by the format function.
In Python, you can multiply a string times a number to get the same string repeatedly concatenated n times:
x = 'a' * 3
print(x) # prints 'aaa'
Thus, if we take the number returned by the count method and multiply it by the string '1' we get a string that only contains ones and only the same amount of ones as our original input number in binary. Now, we can express this number in binary by casting it in base 2, like this:
int(number_string, 2)
So, we have
int(format(int(input()), 'b').count('1') * '1', 2)
Finally, let's print the whole thing:
print(int(format(int(input()), 'b').count('1') * '1', 2))

Is it possible to convert a really large int to a string quickly in python

I am building an encryption program which produces a massive integer.It looks something like this:
a = plaintextOrd**bigNumber
when i do
a = str(a)
it takes over 28 minutes.
Is there any possible way to convert an integer like this quicker that using the built in str() function?
the reason i need it to be a string is because of this function here:
def divideStringIntoParts(parts,string):
parts = int(parts)
a = len(string)//parts
new = []
firstTime = True
secondTime = True
for i in range(parts):
if firstTime:
new.append(string[:a])
firstTime = False
elif secondTime:
new.append(string[a:a+a])
secondTime = False
else:
new.append(string[a*i:a*(i+1)])
string2 = ""
for i in new:
for i in i:
string2 += i
if len(string2) - len(string) != 0:
lettersNeeded = len(string) - len(string2)
for i in range(lettersNeeded):
new[-1] += string[len(string2) + i]
return new

You wrote in the comments that you want to get the length of the integer in decimal format. You don't need to convert this integer to a string, you can use "common logarithm" instead:
import math
math.ceil(math.log(a, 10))
Moreover, if you know that:
a = plaintextOrd**bigNumber
then math.log(a, 10) is equal to math.log(plaintextOrd, 10) * bigNumber, which shouldn't take more than a few milliseconds to calculate:
>>> plaintextOrd = 12345
>>> bigNumber = 67890
>>> a = plaintextOrd**bigNumber
>>> len(str(a))
277772
>>> import math
>>> math.ceil(math.log(a, 10))
277772
>>> math.ceil(math.log(plaintextOrd, 10) * bigNumber)
277772
It should work even if a wouldn't fit on your hard drive:
>>> math.ceil(math.log(123456789, 10) * 123456789012345678901234567890)
998952457326621672529828249600
As mentioned by #kaya3, Python standard floats aren't precise enough to describe the exact length of such a large number.
You could use mpmath (arbitrary-precision floating-point arithmetic) to get results with the desired precision:
>>> from mpmath import mp
>>> mp.dps = 1000
>>> mp.ceil(mp.log(123456789, 10) * mp.mpf('123456789012345678901234567890'))
mpf('998952457326621684655868656199.0')

Some quick notes on the "I need it for this function".
You don't need the first/second logic:
[:a] == [a*0:a*(0+1)]
[a:a+a] == [a*1:a*(1+1)]
So we have
new = []
for i in range(parts):
new.append(string[a*i:a*(i+1)])
or just new = [string[a*i:a*(i+1)] for i in range(parts)].
Note that you have silently discarded the last len(string) % parts characters.
In your second loop, you shadow i with for i in i, which happens to work but is awkward and dangerous. It can also be replaced with string2 = ''.join(new), which means you can just do string2 = string[:-(len(string) % parts)].
You then see if the strings are the same length, and then add the extra letters to the end of the last list. This is a little surprising, e.g. you would have
>>> divideStringIntoParts(3, '0123456789a')
['012', '345', '6789a']
When most algorithms would produce something that favors even distributions, and earlier elements, e.g.:
>>> divideStringIntoParts(3, '0123456789a')
['0124', '4567', '89a']
Regardless of this, we see that you don't really care about the value of the string at all here, just how many digits it has. Thus you could rewrite your function as follows.
def divide_number_into_parts(number, parts):
'''
>>> divide_number_into_parts(12345678901, 3)
[123, 456, 78901]
'''
total_digits = math.ceil(math.log(number + 1, 10))
part_digits = total_digits // parts
extra_digits = total_digits % parts
remaining = number
results = []
for i in range(parts):
to_take = part_digits
if i == 0:
to_take += extra_digits
digits, remaining = take_digits(remaining, to_take)
results.append(digits)
# Reverse results, since we go from the end to the beginning
return results[::-1]
def take_digits(number, digits):
'''
Removes the last <digits> digits from number.
Returns those digits along with the remainder, e.g.:
>>> take_digits(12345, 2)
(45, 123)
'''
mod = 10 ** digits
return number % mod, number // mod
This should be very fast, since it avoids strings altogether. You can change it to strings at the end if you'd like, which may or may not benefit from the other answers here, depending on your chunk sizes.

Faster than function str conversion of int to str is provided by GMPY2
Source of Example Below
import time
from gmpy2 import mpz
# Test number (Large)
x = 123456789**12345
# int to str using Python str()
start = time.time()
python_str = str(x)
end = time.time()
print('str conversion time {0:.4f} seconds'.format(end - start))
# int to str using GMPY2 module
start = time.time()
r = mpz(x)
gmpy2_str = r.digits()
end = time.time()
print('GMPY2 conversion time {0:.4f} seconds'.format(end - start))
print('Length of 123456789**12345 is: {:,}'.format(len(python_str)))
print('str result == GMPY2 result {}'.format(python_str==gmpy2_str))
Results (GMPY2 was 12 times faster in test)
str conversion time 0.3820 seconds
GMPY2 conversion time 0.0310 seconds
Length of 123456789**12345 is: 99,890
str result == GMPY2 result True

Find length of a string that includes its own length?

I want to get the length of a string including a part of the string that represents its own length without padding or using structs or anything like that that forces fixed lengths.
So for example I want to be able to take this string as input:
"A string|"
And return this:
"A string|11"

On the basis of the OP tolerating such an approach (and to provide an implementation technique for the eventual python answer), here's a solution in Java.
final String s = "A String|";
int n = s.length(); // `length()` returns the length of the string.
String t; // the result
do {
t = s + n; // append the stringified n to the original string
if (n == t.length()){
return t; // string length no longer changing; we're good.
}
n = t.length(); // n must hold the total length
} while (true); // round again
The problem of, course, is that in appending n, the string length changes. But luckily, the length only ever increases or stays the same. So it will converge very quickly: due to the logarithmic nature of the length of n. In this particular case, the attempted values of n are 9, 10, and 11. And that's a pernicious case.

A simple solution is :
def addlength(string):
n1=len(string)
n2=len(str(n1))+n1
n2 += len(str(n2))-len(str(n1)) # a carry can arise
return string+str(n2)
Since a possible carry will increase the length by at most one unit.
Examples :
In [2]: addlength('a'*8)
Out[2]: 'aaaaaaaa9'
In [3]: addlength('a'*9)
Out[3]: 'aaaaaaaaa11'
In [4]: addlength('a'*99)
Out[4]: 'aaaaa...aaa102'
In [5]: addlength('a'*999)
Out[5]: 'aaaa...aaa1003'

Here is a simple python port of Bathsheba's answer :
def str_len(s):
n = len(s)
t = ''
while True:
t = s + str(n)
if n == len(t):
return t
n = len(t)
This is a much more clever and simple way than anything I was thinking of trying!
Suppose you had s = 'abcdefgh|, On the first pass through, t = 'abcdefgh|9
Since n != len(t) ( which is now 10 ) it goes through again : t = 'abcdefgh|' + str(n) and str(n)='10' so you have abcdefgh|10 which is still not quite right! Now n=len(t) which is finally n=11 you get it right then. Pretty clever solution!

It is a tricky one, but I think I've figured it out.
Done in a hurry in Python 2.7, please fully test - this should handle strings up to 998 characters:
import sys
orig = sys.argv[1]
origLen = len(orig)
if (origLen >= 98):
extra = str(origLen + 3)
elif (origLen >= 8):
extra = str(origLen + 2)
else:
extra = str(origLen + 1)
final = orig + extra
print final
Results of very brief testing
C:\Users\PH\Desktop>python test.py "tiny|"
tiny|6
C:\Users\PH\Desktop>python test.py "myString|"
myString|11
C:\Users\PH\Desktop>python test.py "myStringWith98Characters.........................................................................|"
myStringWith98Characters.........................................................................|101

Just find the length of the string. Then iterate through each value of the number of digits the length of the resulting string can possibly have. While iterating, check if the sum of the number of digits to be appended and the initial string length is equal to the length of the resulting string.
def get_length(s):
s = s + "|"
result = ""
len_s = len(s)
i = 1
while True:
candidate = len_s + i
if len(str(candidate)) == i:
result = s + str(len_s + i)
break
i += 1

This code gives the result.
I used a few var, but at the end it shows the output you want:
def len_s(s):
s = s + '|'
b = len(s)
z = s + str(b)
length = len(z)
new_s = s + str(length)
new_len = len(new_s)
return s + str(new_len)
s = "A string"
print len_s(s)

Here's a direct equation for this (so it's not necessary to construct the string). If s is the string, then the length of the string including the length of the appended length will be:
L1 = len(s) + 1 + int(log10(len(s) + 1 + int(log10(len(s)))))
The idea here is that a direct calculation is only problematic when the appended length will push the length past a power of ten; that is, at 9, 98, 99, 997, 998, 999, 9996, etc. To work this through, 1 + int(log10(len(s))) is the number of digits in the length of s. If we add that to len(s), then 9->10, 98->100, 99->101, etc, but still 8->9, 97->99, etc, so we can push past the power of ten exactly as needed. That is, adding this produces a number with the correct number of digits after the addition. Then do the log again to find the length of that number and that's the answer.
To test this:
from math import log10
def find_length(s):
L1 = len(s) + 1 + int(log10(len(s) + 1 + int(log10(len(s)))))
return L1
# test, just looking at lengths around 10**n
for i in range(9):
for j in range(30):
L = abs(10**i - j + 10) + 1
s = "a"*L
x0 = find_length(s)
new0 = s+`x0`
if len(new0)!=x0:
print "error", len(s), x0, log10(len(s)), log10(x0)

Python's equivalent of perl vec() function

I am new to Python. In Perl, to set specific bits to a scalar variable(integer), I can use vec() as below.
#!/usr/bin/perl -w
$vec = '';
vec($vec, 3, 4) = 1; # bits 0 to 3
vec($vec, 7, 4) = 10; # bits 4 to 7
vec($vec, 11, 4) = 3; # bits 8 to 11
vec($vec, 15, 4) = 15; # bits 12 to 15
print("vec() Has a created a string of nybbles,
in hex: ", unpack("h*", $vec), "\n");
Output:
vec() Has a created a string of nybbles,
in hex: 0001000a0003000f
I was wondering how to achieve the same in Python, without having to write bit manipulation code and using struct.pack manually?

Not sure how the vec function works in pearl (haven't worked with the vec function). However, according to the output you have mentioned, the following code in python works fine. I do not see the significance of the second argument. To call the vec function this way: vec(value, size). Every time you do so, the output string will be concatenated to the global final_str variable.
final_vec = ''
def vec(value, size):
global final_vec
prefix = ''
str_hex = str(hex(value)).replace('0x','')
str_hex_size = len(str_hex)
for i in range (0, size - str_hex_size):
prefix = prefix + '0'
str_hex = prefix + str_hex
final_vec = final_vec + str_hex
return 0
vec(1, 4)
vec(10, 4)
vec(3, 4)
vec(15, 4)
print(final_vec)

If you really want to create a hex string from nibbles, you could solve it this way
nibbles = [1,10,3,15]
hex = '0x' + "".join([ "%04x" % x for x in nibbles])

Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes?

I have this Python code to do this:
from struct import pack as _pack
def packl(lnum, pad = 1):
if lnum < 0:
raise RangeError("Cannot use packl to convert a negative integer "
"to a string.")
count = 0
l = []
while lnum > 0:
l.append(lnum & 0xffffffffffffffffL)
count += 1
lnum >>= 64
if count <= 0:
return '\0' * pad
elif pad >= 8:
lens = 8 * count % pad
pad = ((lens != 0) and (pad - lens)) or 0
l.append('>' + 'x' * pad + 'Q' * count)
l.reverse()
return _pack(*l)
else:
l.append('>' + 'Q' * count)
l.reverse()
s = _pack(*l).lstrip('\0')
lens = len(s)
if (lens % pad) != 0:
return '\0' * (pad - lens % pad) + s
else:
return s
This takes approximately 174 usec to convert 2**9700 - 1 to a string of bytes on my machine. If I'm willing to use the Python 2.7 and Python 3.x specific bit_length method, I can shorten that to 159 usecs by pre-allocating the l array to be the exact right size at the very beginning and using l[something] = syntax instead of l.append.
Is there anything I can do that will make this faster? This will be used to convert large prime numbers used in cryptography as well as some (but not many) smaller numbers.
Edit
This is currently the fastest option in Python < 3.2, it takes about half the time either direction as the accepted answer:
def packl(lnum, padmultiple=1):
"""Packs the lnum (which must be convertable to a long) into a
byte string 0 padded to a multiple of padmultiple bytes in size. 0
means no padding whatsoever, so that packing 0 result in an empty
string. The resulting byte string is the big-endian two's
complement representation of the passed in long."""
if lnum == 0:
return b'\0' * padmultiple
elif lnum < 0:
raise ValueError("Can only convert non-negative numbers.")
s = hex(lnum)[2:]
s = s.rstrip('L')
if len(s) & 1:
s = '0' + s
s = binascii.unhexlify(s)
if (padmultiple != 1) and (padmultiple != 0):
filled_so_far = len(s) % padmultiple
if filled_so_far != 0:
s = b'\0' * (padmultiple - filled_so_far) + s
return s
def unpackl(bytestr):
"""Treats a byte string as a sequence of base 256 digits
representing an unsigned integer in big-endian format and converts
that representation into a Python integer."""
return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0
In Python 3.2 the int class has to_bytes and from_bytes functions that can accomplish this much more quickly that the method given above.

Here is a solution calling the Python/C API via ctypes. Currently, it uses NumPy, but if NumPy is not an option, it could be done purely with ctypes.
import numpy
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
numpy.ctypeslib.ndpointer(numpy.uint8),
ctypes.c_size_t,
ctypes.c_int,
ctypes.c_int]
def packl_ctypes_numpy(lnum):
a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8)
PyLong_AsByteArray(lnum, a, a.size, 0, 1)
return a
On my machine, this is 15 times faster than your approach.
Edit: Here is the same code using ctypes only and returning a string instead of a NumPy array:
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
ctypes.c_char_p,
ctypes.c_size_t,
ctypes.c_int,
ctypes.c_int]
def packl_ctypes(lnum):
a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1)
PyLong_AsByteArray(lnum, a, len(a), 0, 1)
return a.raw
This is another two times faster, totalling to a speed-up factor of 30 on my machine.

For completeness and for future readers of this question:
Starting in Python 3.2, there are functions int.from_bytes() and int.to_bytes() that perform the conversion between bytes and int objects in a choice of byte orders.

I suppose you really should just be using numpy, which I'm sure has something or other built in for this. It might also be faster to hack around with the array module. But I'll take a stab at it anyway.
IMX, creating a generator and using a list comprehension and/or built-in summation is faster than a loop that appends to a list, because the appending can be done internally. Oh, and 'lstrip' on a large string has got to be costly.
Also, some style points: special cases aren't special enough; and you appear not to have gotten the memo about the new x if y else z construct. :) Although we don't need it anyway. ;)
from struct import pack as _pack
Q_size = 64
Q_bitmask = (1L << Q_size) - 1L
def quads_gen(a_long):
while a_long:
yield a_long & Q_bitmask
a_long >>= Q_size
def pack_long_big_endian(a_long, pad = 1):
if lnum < 0:
raise RangeError("Cannot use packl to convert a negative integer "
"to a string.")
qs = list(reversed(quads_gen(a_long)))
# Pack the first one separately so we can lstrip nicely.
first = _pack('>Q', qs[0]).lstrip('\x00')
rest = _pack('>%sQ' % len(qs) - 1, *qs[1:])
count = len(first) + len(rest)
# A little math trick that depends on Python's behaviour of modulus
# for negative numbers - but it's well-defined and documented
return '\x00' * (-count % pad) + first + rest

Just wanted to post a follow-up to Sven's answer (which works great). The opposite operation - going from arbitrarily long bytes object to Python Integer object requires the following (because there is no PyLong_FromByteArray() C API function that I can find):
import binascii
def unpack_bytes(stringbytes):
#binascii.hexlify will be obsolete in python3 soon
#They will add a .tohex() method to bytes class
#Issue 3532 bugs.python.org
return int(binascii.hexlify(stringbytes), 16)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: How do I convert file to custom base number and back? - python

Related

int 111 to binary 111(decimal 7)

Is it possible to convert a really large int to a string quickly in python

Find length of a string that includes its own length?

Python's equivalent of perl vec() function

Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes?

Categories

Resources