I'm using this code to generate short url. http://code.activestate.com/recipes/576918/
The idea is to encode an integer id using base62 and function enbase just works fine.
class UrlEncoder(object):
...
def enbase(self, x, min_length=0):
result = self._enbase(x)
padding = self.alphabet[0] * (min_length - len(result))
return '%s%s' % (padding, result)
But I don't quite understand what this code is for:
class UrlEncoder(object):
...
def encode_url(self, n, min_length=0):
return self.enbase(self.encode(n), min_length)
def decode_url(self, n):
return self.decode(self.debase(n))
def encode(self, n):
return (n & ~self.mask) | self._encode(n & self.mask)
Why encode then enbase? What does that bitwise operation do?
Could someone shed some light on me? Thanks.
Looking at the whole code: The net effect of encode() is to apply _encode() to the least-significant self.block_size-many bits of the value. _encode() appears to reverse those bits. It seems to be just a bit of additional scrambling. The documentation below the code explains why they are doing all these extra shuffles.
Related
Here is my code for the implementation of a CRC in python:
import math
divisors = [0b1100000001111, 0b11000000000000101, 0b10001000000100001, 0b1011, 0b10011,0b00000111, 0b11001]
def get_Length(arg):
return math.floor(math.log2(arg)) +1
def CRC(message, type):
print("Message ",bin(message)[2:], hex(message))
# int message_length = get_Length(message);
divisor_length = get_Length(divisors[type])
divisor = divisors[type]
print("Divisor: ",bin(divisor)[2:], hex(divisor))
message = message << (divisor_length-1)
old_message = message
while( (message >> (divisor_length-1)) !=0 ):
ml = get_Length(message)
divisor_copy = divisor << (ml-divisor_length)
message = message ^ divisor_copy
print(bin(message)[2:], hex(message))
print(bin(old_message| message)[2:], hex(old_message|message), end="\n\n")
def main():
CRC(0b1101011011, 4)
CRC(0x34ec, 1)
main()
The first message is from this Wikipedia example and gives the correct result. However the second one (x34ec), which demonstrates a CRC-16 is not giving the correct result (correct result). I'm attaching the output snapshot as well:
.
It would be appreciative if somebody can shed some light on it.
Thanks in advance.
There are many CRC-16's. I count 30 here alone, and I'm sure there are more in use that are not in that catalog.
The one you have implemented is the CRC-16/UMTS in that catalog, also known as CRC-16/BUYPASS and CRC-16/VERIFONE. It is the most straightforward of definitions, with the bits in and bits out not reflected, with an initial value of zero, and with no final exclusive-or.
The result CRC you implemented on the message 34 ec can in fact be found directly in your linked "correct result", on the fourth row of the table which is labeled "CRC-16/BUYPASS".
If you want to implement a different CRC-16, the first thing you need to do is specify which one. That specification is the polynomial, reflection of input and output, initial value, and final exclusive-or value.
I am trying to create a python code which can help to increment the version values below by 1,expected output shows the result?I am thinking of splitting the version saving each digit ,increment the last digit by 1 and reconstruct the version,is there a way to this simply in python?
version1 = 1151.1
version2 = 4.1.1
version3 = 275.1.2.99
version4 = 379
next_version1 = version1 + 1
print next_version1
next_version2 = version2 + 1
print next_version2
next_version3 = version3 + 1
print next_version3
next_version4 = version4 + 1
print next_version4
EXPECTED OUTPUT:-
1151.2
4.1.2
275.1.2.100
380
Actually not all the numbers are floats in this case.
You should treat it as strings and update the last element.
version1 = '275.1.2.3'
version2 = '279'
version3 = '275.2.3.10'
def updateVersion(version):
if '.' in version:
version = version.split('.')
version[-1] = str(int(version[-1]) + 1)
version = '.'.join(version)
else:
version = str(int(version)+1)
return version
updateVersion(version1)
updateVersion(version2)
Output:
275.1.2.4
280
275.2.3.11
First and foremost please read about Floating Point Arithmetic: Issues and Limitations
Maybe that was the reason why you ask, not clear.
However, I suggest to save each part as an integer, e.g.
main_version_number = 1151
minor_version_number = 1
sub_version_number = 0
You could maybe have a data structure with those fields (a Version class maybe?) with appropriate methods.
Do not rely on floating point arithmetic.
First off, the code you outline would most certainly give a syntax error.
A number of the form 2 is an integer; 2.2, a floating point; but a 2.2.2, meaningless.
You are looking for tuples here. For instance,
>>> version3 = (275,1,2,3)
Then you would get
>>> version3
(275, 1, 2, 3)
To dirty-update only the last bit of such a tuple, you could do
>>> version3 = version3[:-1] + (version3[-1] + 1,)
>>> version3
(275, 1, 2, 4)
The reason I call this dirty updating is that it will not take care of carrying over into the next significant bit.
Here's a relatively simple script to do just that that I could put together in a couple of minutes. Assuming you have stored your version number as a tuple object called version, attempt the following:
new_version = version
for bit in range(len(version)):
new_version = new_version[:-1-bit] + ((new_version[-1-bit] + 1) % 10,) + new_version[-bit:]
if -2-bit >=0:
new_version = new_version[-2-bit:] + (new_version[-2-bit] + (version[-2-bit] + 1) // 10,) + new_version[-1-bit:]
elif (version[-2-bit] + 1) // 10:
new_version = (1,) + new_version
Alternatively, take a look at bumpversion, a tool that lets you take care of version-numbering within your project, with git integration.
The variables 'version2' and 'version3' will result in a syntax error. This syntax error is caused by the fact that Python does not know of any (number) type that has several points in its value. In essence you are trying to use certain types in a way that they are not meant to be used. More specifically the floating point number is not suitable for your goals. As the name suggests a floating point number, only contains one point and that point can be placed anywhere between its digits (floating).
My advice would be to create your own type/class. This would enable you to store the version number in a way that allows for easy modification of its values and better separation of concerns in your code (i.e. that each part of your code is only concerned with one thing).
Example
class VersionNumber:
"""Represents a version number that can contain parts (major, minor, etc)."""
def __init__(self, *argv):
"""This is the constructor, i.e. a function that is called when you create a new VersionNumber.
The '*argv' allows the user of this class to give a variable amount of arguments. This is why
you can have a version number with only 1 number, and one with 4. The '*argv' is iterable."""
#Create a class specific field, that stores all the version number parts in a list.
self.Parts = []
#Fill it with the supplied arguments.
for part in argv:
self.Parts.append(part)
def __repr__(self):
"""This function is called when the VersionNumber needs to be displayed in the console"""
return str(self)
def __str__(self):
"""This function is called when the VersionNumber is parsed to a string"""
return '.'.join(map(str,self.Parts))
def incrementVersion(self, position, incrementAmount):
"""This function allows you to increment the version number. It does this by adjusting the list
we have set in the constructor."""
self.Parts[position] += incrementAmount
version1 = VersionNumber(1, 23, 45, 0)
print(version1)
#Position -1, takes the last (most right) version number part from the version number.
version1.incrementVersion(-1, 1)
print(version1)
version2 = VersionNumber(346)
print(version2)
version2.incrementVersion(-1, 2)
print(version2)
I'm learning how to write the code for DES encryption in Python. I came across this code on Github (link: https://github.com/RobinDavid/pydes/blob/master/pydes.py) but I'm not able to understand a part of the code. (See line 123 in the Github code, also given below:)
def binvalue(val, bitsize): #Return the binary value as a string of the given size
binval = bin(val)[2:] if isinstance(val, int) else bin(ord(val))[2:] # this is line 124 I'm not getting
if len(binval) > bitsize:
raise "binary value larger than the expected size"
while len(binval) < bitsize:
binval = "0"+binval #Add as many 0 as needed to get the wanted size
return binval
I understand what the function does (as written: #Return the binary value as a string of the given size) but I don't understand how it does it, I don't understand line 124. Thanks for answering.
binval = bin(val)[2:] if isinstance(val, int) else bin(ord(val))[2:]
this line is a ternary expression returning the binary value of val if val is integer, else it does the same but on the ASCII code of val.
This is a way (among others) to be compatible with Python 2 and Python 3.
in Python 3, val is an integer, as a part of a bytes data, when val is a 1-sized string as part of a str data in Python 2, which doesn't make a difference between binary & string.
In a nutshell, this is a portable way of converting a byte/character to its binary representation as string.
Note that the author could learn more about python since
while len(binval) < bitsize:
binval = "0"+binval
could be replaced by binval = binval.zfill(bitsize).
I have a script that take a file and considering its size, (in the following code less than 416 bytes) using series of hash operations, a mask is generated (using keygen function below) to be XORed with the input file (below in cipher function). I want the keys to be generated on demand for better memory efficiency. but in the keygen function when I use yield instead of return my cipher function returns the error:
CD = bytearray((x ^ y for (x, y) in zip(file, key)))
TypeError: unsupported operand type(s) for ^: 'int' and 'bytearray'
here is the code:
from hashlib import md5
def keygen(f, pk): #takes file f as input (here data size is less than 4126bytes)
ck=bytearray(b'')
l=len(f)
if l <= 28*16:
for i in pk:
a=md5(i.encode())
ck += a.digest()
yield ck
the following function does the encryption:
def cipher(file, key):
out=bytearray(b'')
out = bytearray((x ^ y for (x, y) in zip(file, key)))
return out
at the end of my script I have:
if __name__=='__main__':
file = bytearray(open('C:\\code\\Test.txt', 'rb').read())
pk = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
key = keygen(file, pk)
output = cipher(file, key)
final=open('out.data', 'wb')
final.write(output)
final.close()
the whole process works just fine if I use return in keygen function instead of yield, but fails otherwise.
I want the keys be generated on demand considering the data size inside the cipher function. but the keygen function seem not to be compatible if I use yield instead of return.
I read the file bytearray format and defined the ck in keygen as bytearray too. in I tried yield bytearray(ck) in keygen but that does not work either.
what point am I missing? How can I generate the keys on-demand considering the data size?
Long story short:
def keygen(f, pk): #takes file f as input (here data size is less than 4126bytes)
l=len(f)
if l <= 28*16:
for i in pk:
a=md5(i.encode())
yield a.digest()
from hashlib import md5
def keygen(f, pk):
if len(f) > len(pk)*16:
pk = ''
return (ord(ch) for letter in pk for ch in md5(letter).digest())
I'm using an old version of python on an embedded platform ( Python 1.5.2+ on Telit platform ). The problem that I have is my function for converting a string to hex. It is very slow. Here is the function:
def StringToHexString(s):
strHex=''
for c in s:
strHex = strHex + hexLoookup[ord(c)]
return strHex
hexLookup is a lookup table (a python list) containing all the hex representation of each character.
I am willing to try everything (a more compact function, some language tricks I don't know about). To be more clear here are the benchmarks (resolution is 1 second on that platform):
N is the number of input characters to be converted to hex and the time is in seconds.
N | Time (seconds)
50 | 1
150 | 3
300 | 4
500 | 8
1000 | 15
1500 | 23
2000 | 31
Yes, I know, it is very slow... but if I could gain something like 1 or 2 seconds it would be a progress.
So any solution is welcomed, especially from people who know about python performance.
Thanks,
Iulian
PS1: (after testing the suggestions offered - keeping the ord call):
def StringToHexString(s):
hexList=[]
hexListAppend=hexList.append
for c in s:
hexListAppend(hexLoookup[ord(c)])
return ''.join(hexList)
With this function I obtained the following times: 1/2/3/5/12/19/27 (which is clearly better)
PS2 (can't explain but it's blazingly fast) A BIG thank you Sven Marnach for the idea !!!:
def StringToHexString(s):
return ''.join( map(lambda param:hexLoookup[param], map(ord,s) ) )
Times:1/1/2/3/6/10/12
Any other ideas/explanations are welcome!
Make your hexLoookup a dictionary indexed by the characters themselves, so you don't have to call ord each time.
Also, don't concatenate to build strings – that used to be slow. Use join on a list instead.
from string import join
def StringToHexString(s):
strHex = []
for c in s:
strHex.append(hexLoookup[c])
return join(strHex, '')
Building on Petr Viktorin's answer, you could further improve the performance by avoiding global vairable and attribute look-ups in favour of local variable look-ups. Local variables are optimized to avoid a dictionary look-up on each access. (They haven't always been, by I just double-checked this optimization was already in place in 1.5.2, released in 1999.)
from string import join
def StringToHexString(s):
strHex = []
strHexappend = strHex.append
_hexLookup = hexLoookup
for c in s:
strHexappend(_hexLoookup[c])
return join(strHex, '')
Constantly reassigning and adding strings together using the + operator is very slow. I guess that Python 1.5.2 isn't yet optimizing for this. So using string.join() would be preferable.
Try
import string
def StringToHexString(s):
listhex = []
for c in s:
listhex.append(hexLookup[ord(c)])
return string.join(listhex, '')
and see if that is any faster.
Try:
from string import join
def StringToHexString(s):
charlist = []
for c in s:
charlist.append(hexLoookup[ord(c)])
return join(charlist, '')
Each string addition takes time proportional to the length of the string so, while join will also take time proportional to the length of the entire string, but you only have to do it once.
You could also make hexLookup a dict mapping characters to hex values, so you don't have to call ord for every character. It's a micro-optimization, so probably won't be significant.
def StringToHexString(s):
return ''.join( map(lambda param:hexLoookup[param], map(ord,s) ) )
Seems like this is the fastest! Thank you Sven Marnach!