Encode array of integers into unique int - python

I have a fixed amount of int arrays of the form:
[a,b,c,d,e]
for example:
[2,2,1,1,2]
where a and b can be ints from 0 to 2, c and d can be 0 or 1, and e can be ints from 0 to 2.
Therefore there are: 3 * 3 * 2 * 2 * 3: 108 possible arrays of this form.
I would like to assign to each of those arrays a unique integer code from 0 to 107.
I am stuck, i thought of adding each numbers in the array, but two arrays such as:
[0,0,0,0,1] and [1,0,0,0,0]
would both add to 1.
Any suggestion?
Thank you.

You could use np.ravel_multi_index:
>>> np.ravel_multi_index([1, 2, 0, 1, 2], (3, 3, 2, 2, 3))
65
Validation:
>>> {np.ravel_multi_index(j, (3, 3, 2, 2, 3)) for j in itertools.product(*map(range, (3,3,2,2,3)))} == set(range(np.prod((3, 3, 2, 2, 3))))
True
Going back the other way:
>>> np.unravel_index(65, dims=(3, 3, 2, 2, 3))
(1, 2, 0, 1, 2)

Just another way, similar to Horner's method for polynomials:
>>> array = [1, 2, 0, 1, 2]
>>> ranges = (3, 3, 2, 2, 3)
>>> reduce(lambda i, (a, r): i * r + a, zip(array, ranges), 0)
65
Unrolled that's ((((0 * 3 + 1) * 3 + 2) * 2 + 0) * 2 + 1) * 3 + 2 = 65.

This is a little like converting digits from a varying-size number base to a standard integer. In base-10, you could have five digits, each from 0 to 9, and then you would convert them to a single integer via i = a*10000 + b*1000 + c*100 + d*10 + e*1.
Equivalently, for the decimal conversion, you could write i = np.dot([a, b, c, d, e], bases), where bases = [10*10*10*10, 10*10*10, 10*10, 10, 1].
You can do the same thing with your bases, except that your positions introduce multipliers of [3, 3, 2, 2, 3] instead of [10, 10, 10, 10, 10]. So you could set bases = [3*2*2*3, 2*2*3, 2*3, 3, 1] (=[36, 12, 6, 3, 1]) and then use i = np.dot([a, b, c, d, e], bases). Note that this will always give answers in the range of 0 to 107 if a, b, c, d, and e fall in the ranges you specified.
To convert i back into a list of digits, you could use something like this:
digits = []
remainder = i
for base in bases:
digit, remainder = divmod(remainder, base)
digits.append(digit)
On the other hand, to keep your life simple, you are probably better off using Paul Panzer's answer, which pretty much does the same thing. (I never thought of an n-digit number as the coordinates of a cell in an n-dimensional grid before, but it turns out they're mathematically equivalent. And np.ravel is an easy way to assign a serial number to each cell.)

This data is small enough that you may simply enumerate them:
>>> L = [[a,b,c,d,e] for a in range(3) for b in range(3) for c in range(2) for d in range(2) for e in range(3)]
>>> L[0]
[0, 0, 0, 0, 0]
>>> L[107]
[2, 2, 1, 1, 2]
If you need to go the other way (from the array to the integer) make a lookup dict for it so that you will get O(1) instead of O(n):
>>> lookup = {tuple(x): i for i, x in enumerate(L)}
>>> lookup[1,1,1,1,1]
58

getting dot-product of your vectors as following:
In [210]: a1
Out[210]: array([2, 2, 1, 1, 2])
In [211]: a2
Out[211]: array([1, 0, 1, 1, 0])
In [212]: a1.dot(np.power(10, np.arange(5,0,-1)))
Out[212]: 221120
In [213]: a2.dot(np.power(10, np.arange(5,0,-1)))
Out[213]: 101100
should produce 108 unique numbers - use their indices...

If the array lenght is not very huge, you can calculate out the weight first, then use simple math formula to get the ID.
The code will be like:
#Test Case
test1 = [2, 2, 1, 1, 2]
test2 = [0, 2, 1, 1, 2]
test3 = [0, 0, 0, 0, 2]
def getUniqueID(target):
#calculate out the weights first;
#When Index=0; Weight[0]=1;
#When Index>0; Weight[Index] = Weight[Index-1]*(The count of Possible Values for Previous Index);
weight = [1, 3, 9, 18, 36]
return target[0]*weight[0] + target[1]*weight[1] + target[2]*weight[2] + target[3]*weight[3] + target[4]*weight[4]
print 'Test Case 1:', getUniqueID(test1)
print 'Test Case 2:', getUniqueID(test2)
print 'Test Case 3:', getUniqueID(test3)
#Output
#Test Case 1: 107
#Test Case 2: 105
#Test Case 3: 72
#[Finished in 0.335s]

Related

Numpy: Function to take arrays a and b and return array c with elements 0:b[0] with value a[0], values b[0]:b[1] with value a[1], and so on

Say I have two arrays:
a = np.asarray([0,1,2])
b = np.asarray([3,7,10])
Is there a fast way to create:
c = np.asarray([0,0,0,1,1,1,1,2,2,2])
# index 3 7 10
This can be done using a for loop but I wonder if there is a fast internal numpy function that achieves the same thing.
You can use diff to get the successive differences, r_ to add the first b value and repeat to duplicate the values:
a = np.asarray([0, 1, 2])
b = np.asarray([3, 7, 10])
c = np.repeat(a, np.r_[b[0], np.diff(b)])
Output: array([0, 0, 0, 1, 1, 1, 1, 2, 2, 2])

Vectorised index of arrays

Originally I had something like this:
a = 1 # Some randomly generated positive integer
b = -1 # Some randomly generated negative integer
c = 0 # Constant 0
i = 0 # Randomly picked from (0, 1, 2)
d = [a, b, c][i]
I would like to vectorise this so that many samples can be generated
So I have three arrays of length N, an index array of length N, and would like to use that index array to pick one of the three arrays
a = np.array([1, 2, 3, 4])
b = np.array([-1, -2, -3, -4])
c = np.array([0, 0, 0, 0])
i = np.array([2, 1, 2, 0])
d = np.array([a, b, c])[i] # Doesn't work
# Would like the result:
d = np.array([0, -2, 0, 4])
d = a * (i == 0) + b * (i == 1) + c * (i == 2) works, but surely there is a way that looks more like the unvectorised code
Make a 2-d array from the three arrays then use Integer indexing
>>> e = np.vstack([a,b,c])
>>> i = np.array([2, 1, 2, 0])
>>> e[(i,np.arange(i.shape[0]))]
array([ 0, -2, 0, 4])
>>>
Notice that your answer is on the diagonal of
np.array([a, b, c])[i]
so you can go:
np.array([a, b, c])[i].diagonal()

Various list concatenation method and their performance

I was working on an algorithm and in that, we are trying to write every line in the code such that it adds up a good performance to the final code.
In one situation we have to add lists (more than two specifically). I know some of the ways to join more than two lists also I have looked upon StackOverflow but none of the answers are giving account on the performance of the method.
Can anyone show, what are the ways we can join more than two lists and their respective performance?
Edit : The size of the list is varying from 2 to 13 (to be specific).
Edit Duplicate : I have been specifically asking for the ways we can add and their respected questions and in duplicate question its limited to only 4 methods
There are multiples ways using which you can join more than two list.
Assuming that we have three list,
a = ['1']
b = ['2']
c = ['3']
Then, for joining two or more lists in python,
1)
You can simply concatenate them,
output = a + b + c
2)
You can do it using list comprehension as well,
res_list = [y for x in [a,b,c] for y in x]
3)
You can do it using extend() as well,
a.extend(b)
a.extend(c)
print(a)
4)
You can do it by using * operator as well,
res = [*a,*b,*c]
For calculating performance, I have used timeit module present in python.
The performance of the following methods are;
4th method < 1st method < 3rd method < 2nd [method on the basis of
time]
That means If you are going to use " * operator " for concatenation of more than two lists then you will get the best performance.
Hope you got what you were looking for.
Edit:: Image showing performance of all the methods (Calculated using timeit)
I did some simple measurements, here are my results:
import timeit
from itertools import chain
a = [*range(1, 10)]
b = [*range(1, 10)]
c = [*range(1, 10)]
tests = ("""output = list(chain(a, b, c))""",
"""output = a + b + c""",
"""output = [*chain(a, b, c)]""",
"""output = a.copy();output.extend(b);output.extend(c);""",
"""output = [*a, *b, *c]""",
"""output = a.copy();output+=b;output+=c;""",
"""output = a.copy();output+=[*b, *c]""",
"""output = a.copy();output += b + c""")
results = sorted((timeit.timeit(stmt=test, number=1, globals=globals()), test) for test in tests)
for i, (t, stmt) in enumerate(results, 1):
print(f'{i}.\t{t}\t{stmt}')
Prints on my machine (AMD 2400G, Python 3.6.7):
1. 6.010000106471125e-07 output = [*a, *b, *c]
2. 7.109999842214165e-07 output = a.copy();output += b + c
3. 7.720000212430023e-07 output = a.copy();output+=b;output+=c;
4. 7.820001428626711e-07 output = a + b + c
5. 1.0520000159885967e-06 output = a.copy();output+=[*b, *c]
6. 1.4030001693754457e-06 output = a.copy();output.extend(b);output.extend(c);
7. 1.4820000160398195e-06 output = [*chain(a, b, c)]
8. 2.525000127207022e-06 output = list(chain(a, b, c))
If you are going to concatenate a variable number of lists together, your input is going to be a list of lists (or some equivalent collection). The performance tests need to take this into account because you are not going to be able to do things like list1+list2+list3.
Here are some test results (1000 repetitions):
option1 += loop 0.00097 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option2 itertools.chain 0.00138 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option3 functools.reduce 0.00174 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option4 comprehension 0.00188 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option5 extend loop 0.00127 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
option6 deque 0.00180 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]
This would indicate that a += loop through the list of lists is the fastest approach
And the source to produce them:
allLists = [ list(range(10)) for _ in range(5) ]
def option1():
result = allLists[0].copy()
for lst in allLists[1:]:
result += lst
return result
from itertools import chain
def option2(): return list(chain(*allLists))
from functools import reduce
def option3():
return list(reduce(lambda a,b:a+b,allLists))
def option4(): return [ e for l in allLists for e in l ]
def option5():
result = allLists[0].copy()
for lst in allLists[1:]:
result.extend(lst)
return result
from collections import deque
def option6():
result = deque()
for lst in allLists:
result.extend(lst)
return list(result)
from timeit import timeit
count = 1000
t = timeit(lambda:option1(), number = count)
print(f"option1 += loop {t:.5f}",option1()[:15])
t = timeit(lambda:option2(), number = count)
print(f"option2 itertools.chain {t:.5f}",option2()[:15])
t = timeit(lambda:option3(), number = count)
print(f"option3 functools.reduce {t:.5f}",option3()[:15])
t = timeit(lambda:option4(), number = count)
print(f"option4 comprehension {t:.5f}",option4()[:15])
t = timeit(lambda:option5(), number = count)
print(f"option5 extend loop {t:.5f}",option5()[:15])
t = timeit(lambda:option6(), number = count)
print(f"option6 deque {t:.5f}",option6()[:15])

constructing arithmetic progressions from loop

I am trying to work out a program that would calculate the diagonal coefficients of pascal's triangle.
For those who are not familiar with it, the general terms of sequences are written below.
1st row = 1 1 1 1 1....
2nd row = N0(natural number) // 1 = 1 2 3 4 5 ....
3rd row = N0(N0+1) // 2 = 1 3 6 10 15 ...
4th row = N0(N0+1)(N0+2) // 6 = 1 4 10 20 35 ...
the subsequent sequences for each row follows a specific pattern and it is my goal to output those sequences in a for loop with number of units as input.
def figurate_numbers(units):
row_1 = str(1) * units
row_1_list = list(row_1)
for i in range(1, units):
sequences are
row_2 = n // i
row_3 = (n(n+1)) // (i(i+1))
row_4 = (n(n+1)(n+2)) // (i(i+1)(i+2))
>>> def figurate_numbers(4): # coefficients for 4 rows and 4 columns
[1, 1, 1, 1]
[1, 2, 3, 4]
[1, 3, 6, 10]
[1, 4, 10, 20] # desired output
How can I iterate for both n and i in one loop such that each sequence of corresponding row would output coefficients?
You can use map or a list comprehension to hide a loop.
def f(x, i):
return lambda x: ...
row = [ [1] * k ]
for i in range(k):
row[i + 1] = map( f(i), row[i])
where f is function that descpribe the dependency on previous element of row.
Other possibility adapt a recursive Fibbonachi to rows. Numpy library allows for array arifmetics so even do not need map. Also python has predefined libraries for number of combinations etc, perhaps can be used.
To compute efficiently, without nested loops, use Rational Number based solution from
https://medium.com/#duhroach/fast-fun-with-pascals-triangle-6030e15dced0 .
from fractions import Fraction
def pascalIndexInRowFast(row,index):
lastVal=1
halfRow = (row>>1)
#early out, is index < half? if so, compute to that instead
if index > halfRow:
index = halfRow - (halfRow - index)
for i in range(0, index):
lastVal = lastVal * (row - i) / (i + 1)
return lastVal
def pascDiagFast(row,length):
#compute the fractions of this diag
fracs=[1]*(length)
for i in range(length-1):
num = i+1
denom = row+1+i
fracs[i] = Fraction(num,denom)
#now let's compute the values
vals=[0]*length
#first figure out the leftmost tail of this diag
lowRow = row + (length-1)
lowRowCol = row
tail = pascalIndexInRowFast(lowRow,lowRowCol)
vals[-1] = tail
#walk backwards!
for i in reversed(range(length-1)):
vals[i] = int(fracs[i]*vals[i+1])
return vals
Don't reinvent the triangle:
>>> from scipy.linalg import pascal
>>> pascal(4)
array([[ 1, 1, 1, 1],
[ 1, 2, 3, 4],
[ 1, 3, 6, 10],
[ 1, 4, 10, 20]], dtype=uint64)
>>> pascal(4).tolist()
[[1, 1, 1, 1], [1, 2, 3, 4], [1, 3, 6, 10], [1, 4, 10, 20]]

Dictionary works for len(string) multiple of 3. Function deletes remainders but now doesn't translate with dictionary. Python 2.7.1

I made a function with a dictionary. The purpose of the function is to separate the input string into sets of 3 . If the input string value is not a multiple of 3, I want to delete the remainder [1 or 2]
my function was working perfectly until I added the part for deleting the remainders
def func(fx):
d={'AAA':1,'BBB':2,'CCC':3}
length=len(fx)
if length % 3 == 0:
return fx
if length % 3 == 1:
return fx[:-1]
if length % 3 == 2:
return fx[:-2]
Fx=fx.upper()
Fx3=[Fx[i:i+3] for i in range(0,len(Fx),3)]
translate=[d[x] for x in Fx3]
return translate
x='aaabbbcc'
output = func(x)
print output
>>>
aaabbb
the function is recognizing that the input sequence is not a multiple of 3 so its deleting the 2 values which is what i want. However, its splitting the new string into 3 letter words to be translated with my dictionary anymore. If you delete the if statements, the function works but only for strings that are a multiple of 3.
What am I doing wrong ???
You are returning fx when you probably should be reassigning it
def func(fx):
d={'AAA':1,'BBB':2,'CCC':3}
length=len(fx)
if length % 3 == 0:
pass
elif length % 3 == 1:
fx = fx[:-1]
elif length % 3 == 2:
fx = fx[:-2]
Fx=fx.upper()
Fx3=[Fx[i:i+3] for i in range(0,len(Fx),3)]
translate=[d[x] for x in Fx3]
return translate
Here is an alternate function for you to figure out when you know some more Python
def func(fx):
d = {'AAA':1,'BBB':2,'CCC':3}
return [d["".join(x).upper()] for x in zip(*[iter(fx)]*3)]
Does this do what you want?
def func(fx):
d = {'AAA': 1, 'BBB': 2, 'CCC': 3}
fx = fx[:-(len(fx)%3)].upper()
groups = [fx[i:i+3] for i in range(0, len(fx), 3)]
translate = [d[group] for group in groups]
return translate
x='aaabbbcc'
print func(x)
When trimming the end of the string, you were returning the result when you wanted to just store it in a variable or assign it back to fx.
Rather than the if .. elifs you can just use the result of the length modulo 3 directly.
There is no need of a function, it can be done in a one liner less complex than the gnibbler's one.
Acom's solution is nearly mine.
d={'AAA':1,'BBB':2,'CCC':3}
for fx in ('bbbcccaaabbbcccbbbcccaaabbbcc',
'bbbcccaaabbbaaa','bbbcccaaabbbaa','bbbcccaaabbba',
'bbbcccaaabbb','bbbcccaaabb','bbbcccaaab',
'bbbcccaaa','bbbcccaa','bbbccca',
'bbbccc','bbbcc','bbbc',
'bbb','bb','b',''):
print fx
print tuple( d[fx[i:i+3].upper()] for i in xrange(0, len(fx)-len(fx)%3, 3) )
produces
bbbcccaaabbbcccbbbcccaaabbbcc
(2, 3, 1, 2, 3, 2, 3, 1, 2)
bbbcccaaabbbaaa
(2, 3, 1, 2, 1)
bbbcccaaabbbaa
(2, 3, 1, 2)
bbbcccaaabbba
(2, 3, 1, 2)
bbbcccaaabbb
(2, 3, 1, 2)
bbbcccaaabb
(2, 3, 1)
bbbcccaaab
(2, 3, 1)
bbbcccaaa
(2, 3, 1)
bbbcccaa
(2, 3)
bbbccca
(2, 3)
bbbccc
(2, 3)
bbbcc
(2,)
bbbc
(2,)
bbb
(2,)
bb
()
b
()
()
.
I think you have to treat strings that can contain only 3 characters strings 'aaa','bbb','ccc' at the positions 0,3,6,9,etc
Then the preceding programs won't crash if there's an heterogenous 3-characters string at one of these positions instead of one of these set 'aaa','bbb','ccc'
In this case, note that you could use the dictionary's method get that returns a default value when a pased argument isn't a key of the dictionary.
In the following code, I put the default returned value as 0:
d={'AAA':1,'BBB':2,'CCC':3}
for fx in ('bbbcccaaa###bbbccc"""bbbcc',
'bbb aaabbbaaa','bbbccc^^^bbbaa','bbbc;;;aabbba',
'bbbc^caaabbb',']]bccca..bb','bbb%%%aaab',
'bbbcccaaa','bbb!ccaa','b#bccca',
'bbbccc','bbbcc','bbbc',
'b&b','bb','b',''):
print fx
print [d.get(fx[i:i+3].upper(), 0) for i in xrange(0, len(fx)-len(fx)%3, 3)]
produces
bbbcccaaa###bbbccc"""bbbcc
[2, 3, 1, 0, 2, 3, 0, 2]
bbb aaabbbaaa
[2, 0, 1, 2, 1]
bbbccc^^^bbbaa
[2, 3, 0, 2]
bbbc;;;aabbba
[2, 0, 0, 2]
bbbc^caaabbb
[2, 0, 1, 2]
]]bccca..bb
[0, 3, 0]
bbb%%%aaab
[2, 0, 1]
bbbcccaaa
[2, 3, 1]
bbb!ccaa
[2, 0]
b#bccca
[0, 3]
bbbccc
[2, 3]
bbbcc
[2]
bbbc
[2]
b&b
[0]
bb
[]
b
[]
[]
By the way, I preferred to create a tuple instead of a list because for the kind of invariable objects that are in the result, I think it is better not to create a list

Categories