How to skip combinations to guess string (python)? - python

I have a 20 byte string, from which I take 5 arrays of four bytes (first four bytes = array 1, etc.).
I have to convert each array into decimals through a specific function.
This way, I'll end up with 5 ints.
I have to add these 5 ints and reach a specific number (4863101420).
Do you have any idea of how to guess one possible combination of the 20 original chars that - going through the process of dividing into arrays and decoding into ints - will add up to 4863101420, without going through something like itertools.combinations_with_replacement?
Since I have 20 possible chars and each is one out of 94 possible chars (printable asciis), it could take a while to compute a string that adds up to 4863101420.
Any insights ?
The function I'm using to convert the char into int is:
def convertCharToDec(charInput):
firstByte = format(ord(charInput[0]), "x")
secondByte = format(ord(charInput[1]), "x")
thirdByte = format(ord(charInput[2]), "x")
fourthByte = format(ord(charInput[3]), "x")
convertedHex = firstByte + secondByte + thirdByte + fourthByte
return int(convertedHex, 16)

Do some maths and you will find out that the final integer is equal to A0×166 + A1×164 + A2×162 + A3, where Ai = sum([ord(the_whole_string[i + j]) for j in range(0, 20, 4)]). It is a sum of 5 bytes, hence 32×5 ≤ Ai ≤ 126×5.
Therefore, you could first try the values of Ai's, then for each of them try the combinations of the 5 bytes. This should greatly reduce the time of computation.
This works only if format(ord(charInput[i]), "x") returns a string of length 2 for any i.
Edit: Here is the math.
Example: 7]f`8^ga:_hb;`id<aje (This is one of the solution)
7]f` -> int(375d6660, 16) = 37*16^6 + 5d*16^4 + 66*16^2 + 60
8^ga -> int(385e6761, 16) = 38*16^6 + 5e*16^4 + 67*16^2 + 61
:_hb -> int(3a5f6862, 16) = 3a*16^6 + 5f*16^4 + 68*16^2 + 62
;`id -> int(3b606964, 16) = 3b*16^6 + 60*16^4 + 69*16^2 + 63
<aje -> int(3c616a65, 16) = 3c*16^6 + 61*16^4 + 6a*16^2 + 64
___________________________________________________________________
A0*16^6 + A1*16^4 + A2*16^2 + A3
Where A0 = 37 + 38 + 3a + 3b + 3c, A1 = 5d + 5e + 5f + 60 + 61 etc.

Related

Get the Excel column label (A, B, ..., Z, AA, ..., AZ, BA, ..., ZZ, AAA, AAB, ...)

Given the letter(s) of an Excel column header I need to output the column number.
It goes A-Z, then AA-AZ then BA-BZ and so on.
I want to go through it like it's base 26, I just don't know how to implement that.
It works fine for simple ones like AA because 26^0 = 1 + 26^1 = 26 = 27.
But with something like ZA, if I do 26 ^ 26(z is the 26th letter) the output is obviously too large. What am I missing?
If we decode "A" as 0, "B" as 1, ... then "Z" is 25 and "AA" is 26.
So it is not a pure 26-base encoding, as then a prefixed "A" would have no influence on the value, and "AAAB" would have to be the same as "B", just like in the decimal system 0001 is equal to 1. But this is not the case here.
The value of "AA" is 1*261 + 0, and "ZA" is 26*261 + 0.
We can generalise and say that "A" should be valued 1, "B" 2, ...etc (with the exception of a single letter encoding). So in "AAA", the right most "A" represents a coefficient of 0, while the other "A"s represent ones: 1*262 + 1*261 + 0
This leads to the following code:
def decode(code):
val = 0
for ch in code: # base-26 decoding "plus 1"
val = val * 26 + ord(ch) - ord("A") + 1
return val - 1
Of course, if we want the column numbers to start with 1 instead of 0, then just replace that final statement with:
return val
sum of powers
You can sum the multiples of the powers of 26:
def xl2int(s):
s = s.strip().upper()
return sum((ord(c)-ord('A')+1)*26**i
for i,c in enumerate(reversed(s)))
xl2int('A')
# 1
xl2int('Z')
# 26
xl2int('AA')
# 27
xl2int('ZZ')
# 702
xl2int('AAA')
# 703
int builtin
You can use a string translation table and the int builtin with the base parameter.
As you have a broken base you need to add 26**n+26**(n-1)+...+26**0 for an input of length n, which you can obtain with int('11...1', base=26) where there are as many 1s as the length of the input string.
from string import ascii_uppercase, digits
t = str.maketrans(dict(zip(ascii_uppercase, digits+ascii_uppercase)))
def xl2int(s):
s = s.strip().upper().translate(t)
return int(s, base=26)+int('1'*len(s), base=26)
xl2int('A')
# 1
xl2int('Z')
# 26
xl2int('AA')
# 27
xl2int('ZZ')
# 702
xl2int('AAA')
# 703
How the translation works
It shifts each character so that A -> 0, B -> 1... J -> 9, K -> A... Z -> P. Then it converts it to integer using int. However the obtained number is incorrect as we are missing 26**x for each digit position in the number, so we add as many power of 26 as there are digits in the input.
Another way to do it, written in VBA:
Function nColumn(sColumn As String) As Integer
' Return column number for a given column letter.
' 676 = 26^2
' 64 = Asc("A") - 1
nColumn = _
(IIf(Len(sColumn) < 3, 0, Asc(Left( sColumn , 1)) - 64) * 676) + _
(IIf(Len(sColumn) = 1, 0, Asc(Left(Right(sColumn, 2), 1)) - 64) * 26) + _
(Asc( Right(sColumn , 1)) - 64)
End Function
Or you can do it directly in the worksheet:
=(if(len(<clm>) < 3, 0, code(left( <clm> , 1)) - 64) * 676) +
(if(len(<clm>) = 1, 0, code(left(right(<clm>, 2), 1)) - 64) * 26) +
(code( right(<clm> , 1)) - 64)
I've also posted the inverse operation done similarly.

Integer/Round problem in getting the sum of 3 different numbers

a = 1541
b = 1575
c = 1512
# I want to ratio the sum of these numbers to 128
total = a + b + c
rounded_a = round(a*128/total) # equals 43
rounded_b = round(b*128/total) # equals 44
rounded_c = round(c*128/total) # equals 42
total_of_rounded = rounded_a + rounded_b + rounded_c # equals 129 NOT 128
# I tried the floor
floor_a = math.floor(a*128/total) # equals 42
floor_b = math.floor(b*128/total) # equals 43
floor_c = math.floor(c*128/total) # equals 41
total_of_floor = floor_a + floor_b + floor_c # equals 126 NOT 128
# The exact values
# a: 42.62057044
# b: 43.56093345
# c: 41,81849611
The question is, how can I reach the total 128?
Note: I should stay at integer, not floating numbers.
Note 2: I can write a correction function which like adding +1 to total but it doesn't seem right to me.
A possibility: round a and b down, then add the missing parts to c.
a = 1541
b = 1575
c = 1512
total = a + b + c # 4628
ra = a * 128 // total
rb = b * 128 // total
rc = (c * 128 + (a * 128)%total + (b*128)%total) // total
print(ra,rb,rc)
# (42, 43, 43)
print(ra+rb+rc)
# 128
This is the way for it:
a = 1541
b = 1575
c = 1512
# I want to ratio the sum of these numbers to 128
total = a + b + c
total_of_round = round((a*128/total)+(b*128/total)+(c*128/total))
print (total_of_round)
Ouput:
128

Python string function parameter replacing

I would like to ask your help. I have started learning python, and there are a task that I can not figure out how to complete. So here it is.
We have a input.txt file containing the next 4 rows:
f(x, 3*y) * 54 = 64 / (7 * x) + f(2*x, y-6)
x + f(21*y, x - 32/y) + 4 = f(21 ,y)
86 - f(7 + x*10, y+ 232) = f(12*x-4, 2*y-61)*32 + f(2, x)
65 - 3* y = f(2*y/33 , x + 5)
The task is to change the "f" function and its 2 parameters into dividing. There can be any number of spaces between the two parameters. For example f(2, 5) is the same as f(2 , 5) and should be (2 / 5) with exactly one space before and after the divide mark after the running of the code. Also, if one of the parameters are a multiplification or a divide, the parameter must go into bracket. For example: f(3, 5*7) should become (3 / (5*7)). And there could be any number of function in one row. So the output should look like this:
(x / (3*y)) * 54 = 64 / (7 * x) + ((2*x) / (y-6))
x + ((21*y) / (x - 32/y)) + 4 = (21 / y)
86 - ((7 + x*10) / (y+ 232)) = ((12*x-4) / (2*y-61))*32 + (2 / x)
65 - 3* y = ((2*y/33) / (x + 5))
I would be very happy if anyone could help me.
Thank you in advance,
David
Using re:
In [84]: ss=r'f(x, 3*y) * 54 = 64 / (7 * x) + f(2*x, y-6)'
In [85]: re.sub(r'(f\()(.*?),(.*?)(\))', lambda m: '((%s) / (%s))'%(m.group(2), m.group(3)), ss)
Out[85]: '((x) / ( 3*y)) * 54 = 64 / (7 * x) + ((2*x) / ( y-6))'
Explanation:
re.sub(pattern, repl, string, count=0, flags=0) returns the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl.
The () are used to catch the groups;
*? is a non-greedy qualifier, which matches as little text as possible.
Here's some places to start:
You can check if one string is in another string using string1 in
string2 (e.g., 'bcd' in 'abcdefg' -> True)
You can identify the insides of f() calls by finding the locations of 'f(' in a
string and then adding 1 to an index (that starts at 1) when you find
a '(' and removing 1 when you find ')'. When you hit 0 you're done.
You can break a string into a list by a matching string by
string1.split(string2) (e.g., 'a,b'.split(',') -> ['a', 'b']
You can format a string easily using the
.format method
(e.g., '({0} % {1})'.format(string1, string2))

Am I missing something or is this Microsoft algorithm for calculating the excel column characters incorrect?

I'm trying to write a function in Python that takes in a column number and outputs the corresponding Excel column code (for example: 5 -> "E", 27 -> "AA"). I tried implementing the algorithm given here: http://support.microsoft.com/kb/833402, which is the following visual basic:
Function ConvertToLetter(iCol As Integer) As String
Dim iAlpha As Integer
Dim iRemainder As Integer
iAlpha = Int(iCol / 27)
iRemainder = iCol - (iAlpha * 26)
If iAlpha > 0 Then
ConvertToLetter = Chr(iAlpha + 64)
End If
If iRemainder > 0 Then
ConvertToLetter = ConvertToLetter & Chr(iRemainder + 64)
End If
End Function
My python version:
def excelcolumn(colnum):
alpha = colnum // 27
remainder = colnum - (alpha*26)
out = ""
if alpha > 0:
out = chr(alpha+64)
if remainder > 0:
out = out + chr(remainder+64)
return out
This works fine until column number 53 which results in "A[", as alpha = 53 // 27 == 1 and thus remainder = 53 - 1*26 == 27 meaning the second character chr(64+27) will be "[". Am I missing something? My VBA skills are quite lackluster so that might be the issue.
edit: I am using Python 3.3.1
The Microsoft formula is incorrect. I'll bet they never tested it beyond 53. When I tested it myself in Excel it gave the same incorrect answer that yours did.
Here's how I'd do it:
def excelcolumn(colnum):
alpha, remainder = colnum // 26, colnum % 26
out = "" if alpha == 0 else chr(alpha - 1 + ord('A'))
out += chr(remainder + ord('A'))
return out
Not that this assumes a 0-based column number while the VBA code assumes 1-based.
If you need to extend beyond 701 columns you need something slightly different as noted in the comments:
def excelcolumn(colnum):
if colnum < 26:
return chr(colnum + ord('A'))
return excelcolumn(colnum // 26 - 1) + chr(colnum % 26 + ord('A'))
Here is one way to do it:
def xl_col_to_name(col_num):
col_str = ''
while col_num:
remainder = col_num % 26
if remainder == 0:
remainder = 26
# Convert the remainder to a character.
col_letter = chr(ord('A') + remainder - 1)
# Accumulate the column letters, right to left.
col_str = col_letter + col_str
# Get the next order of magnitude.
col_num = int((col_num - 1) / 26)
return col_str
Which gives:
>>> xl_col_to_name(5)
'E'
>>> xl_col_to_name(27)
'AA'
>>> xl_col_to_name(256)
'IV'
>>> xl_col_to_name(1000)
'ALL'
This is taken from the utility functions in the XlsxWriter module.
I am going to answer your specific question:
... is this Microsoft algorithm for calculating the excel column characters incorrect?
YES.
Generally speaking, when you want to have the integer division (typically called DIV) of two numbers, and the remainder (typically called MOD), you should use the same value as the denominator. Thus, you should use either 26 or 27 in both places.
So, the algorithm is incorrect (and it is easy to see that with iCol=27, where iAlpha=1 and iRemainder=1, while it should be iRemainder=0).
In this particular case, the number should be 26. Since this gives you numbers starting at zero, you should probably add ascii("A") (=65), generically speaking, instead of 64. The double error made it work for some cases.
The (hardly acceptable) confusion may stem from the fact that, from A to Z there are 26 columns, from A to ZZ there are 26*27 columns, from A to ZZZ there are 26*27*27 columns, and so on.
Code that works for any column, and non-recursive:
def excelcolumn(colnum):
if colnum < 1:
raise ValueError("Index is too small")
result = ""
while True:
if colnum > 26:
colnum, r = divmod(colnum - 1, 26)
result = chr(r + ord('A')) + result
else:
return chr(colnum + ord('A') - 1) + result
(taken from here).

Problems with Smoothing graphs in Python

I have been trying to smooth a plot which is noisy due to the sampling rate I'm using, and what it's counting. I've been using the help on here - mainly Plot smooth line with PyPlot (although I couldn't find the "spline" function and so am using UnivarinteSpline instead)
However, whatever I do I keep getting errors with either the pyplot error that "x and y are not of the same length" or, that the scipi.UnivariateSpline has a value for w that is incorrect. I am not sure quite how to fix this (not really a Python person!) I've attached the code although it's just the plotting bit at the end that is causing problems. Thanks
import os.path
import matplotlib.pyplot as plt
import scipy.interpolate as sci
import numpy as np
def main():
jcc = "0050"
dj = "005"
l = "060"
D = 20
hT = 4 * D
wT1 = 2 * D
wT2 = 5 * D
for jcm in ["025","030","035","040","045","050","055","060"]:
characteristic = "LeadersOnly/Jcm" + jcm + "/Jcc" + jcc + "/dJ" + dj + "/lambda" + l + "/Seed000"
fingertime1 = []
fingertime2 = []
stamp =[]
finger=[]
for x in range(0,2500,50):
if x<10000:
z=("00"+str(x))
if x<1000:
z=("000"+str(x))
if x<100:
z=("0000"+str(x))
if x<10:
z=("00000"+str(x))
stamp.append(x)
path = "LeadersOnly/Jcm" + jcm + "/Jcc" + jcc + "/dJ" + dj + "/lambda" + l + "/Seed000/profile_" + str(z) + ".txt"
if os.path.exists(path):
f = open(path, 'r')
pr1,pr2=np.genfromtxt(path, delimiter='\t', unpack=True)
p1=[]
p2=[]
h1=[]
h2=[]
a1=[]
a2=[]
finger1 = 0
finger2 = 0
for b in range(len(pr1)):
p1.append(pr1[b])
p2.append(pr2[b])
for elem in range(len(pr1)-80):
h1.append((p1[elem + (2*D)]-0.5*(p1[elem]+p1[elem + (4*D)])))
h2.append((p2[elem + (2*D)]-0.5*(p2[elem]+p2[elem + (4*D)])))
if h1[elem] >= hT:
a1.append(1)
else:
a1.append(0)
if h2[elem]>=hT:
a2.append(1)
else:
a2.append(0)
for elem in range(len(a1)-1):
if (a1[elem] - a1[elem + 1]) != 0:
finger1 = finger1 + 1
finger1 = finger1 / 2
for elem in range(len(a2)-1):
if (a2[elem] - a2[elem + 1]) != 0:
finger2 = finger2 + 1
finger2 = finger2 / 2
fingertime1.append(finger1)
fingertime2.append(finger2)
finger.append((finger1+finger2)/2)
namegraph = jcm
stampnew = np.linspace(stamp[0],stamp[-1],300)
fingernew = sci.UnivariateSpline(stamp, finger, stampnew)
plt.plot(stampnew,fingernew,label=namegraph)
plt.show()
main()
For information, the data input files are simply a list of integers (two lists seperated by tabs, as the code suggests).
Here is one of the error codes that I get:
0-th dimension must be fixed to 50 but got 300
error Traceback (most recent call last)
/group/data/Cara/JCMMOTFingers/fingercount_jcm_smooth.py in <module>()
116
117 if __name__ == '__main__':
--> 118 main()
119
120
/group/data/Cara/JCMMOTFingers/fingercount_jcm_smooth.py in main()
93 #print(len(stamp))
94 stampnew = np.linspace(stamp[0],stamp[-1],300)
---> 95 fingernew = sci.UnivariateSpline(stamp, finger, stampnew)
96 #print(len(stampnew))
97 #print(len(fingernew))
/usr/lib/python2.6/dist-packages/scipy/interpolate/fitpack2.pyc in __init__(self, x, y, w, bbox, k, s)
86 #_data == x,y,w,xb,xe,k,s,n,t,c,fp,fpint,nrdata,ier
87 data = dfitpack.fpcurf0(x,y,k,w=w,
---> 88 xb=bbox[0],xe=bbox[1],s=s)
89 if data[-1]==1:
90 # nest too small, setting to maximum bound
error: failed in converting 1st keyword `w' of dfitpack.fpcurf0 to C/Fortran array
Let's analyze your code a bit, starting from the for x in range(0, 2500, 50):
You define z as a string of 6 digits padded with 0s. You should really use somestring formatting like z = "{0:06d}".format(x) or z = "%06d" % x instead of these multiple tests of yours.
At the end of your loop, stamp will have (2500//50)=50 elements.
You check for the existence of your file path, then open it and read it, but you never close it. A more Pythonic way is to do:
try:
with open(path,"r") as f:
do...
except IOError:
do something else
With the with syntax, your file is automatically closed.
pr1 and pr2 are likely to be 1D arrays, right? You can really simplify the construction of your p1 and p2 lists as:
p1 = pr1.tolist()
p2 = pr2.tolist()
Your lists a1, a2 have the same size: you could combine your for elem in range(len(a..)-1) loops in a single one. You could also use the np.diff function.
at the end of the for x in range(...) loops, finger will have 50 elements minus the number of missing files. As you're not telling what to do in case of a missing file, your stamp and finger lists may not have the same number of elements, which will crash your scipy.UnivariateSpline. An easy fix would be to update your stamp list only if the path file is defined (that way, it always has the same number of elements as finger).
Your stampnew array has 300 elements, when your stamp and finger can only have at most 50. That's a second problem, the size of the weight array (stampnew) must be the same as the size of the inputs.
You're eventually trying to plot fingernew vs stamp. The problem is that fingernew is not an array, it's an instance of UnivariateSpline. You still need to calculate some actual points, for example with fingernew(stamp), then use that in your plot function.

Categories