I have many sets of 2 strings. I'm trying to determine the number of matching elements in these 2 strings. The rules are if the strings share a common letter, that's a point, order does matter, but each letter in the first string can only match one of the letters in the second string. So in the strings 'aaaab', 'acccc', only 1 point is awarded because there is only one 'a' to match in the second string. Here are a few examples:
aaabb bbaaa 5
aabbb bbbaa 5
aaabb aabbb 4
aaabb ccaaa 3
aaaaa bbbbb 0
ababa babab 4
aabcc babaf 3
abcde abfgh 2
bacde abdgh 3
Hopefully that gets across how it works.
Here is the most efficient code I've been able to come up with, but its horribly convoluted. I hoping someone could think of something better.
def Score(guess, solution):
guess = list(guess)
solution = list(solution)
c = 0
for g in guess:
if g in solution and g != "_":
c += 1
solution[solution.index(g)] = "_"
return c
Surely this isn't the best way to do this, but I haven't been able to figure anything else out. I tried creating an algorithm with Counter and doing guess&solution, which worked, but ended up being way slower. Anyone have any ideas?
You could gain a ~10%* speed up by simply using the remove() method of list instead of the lookup with index().
Also, you don't need to copy guess into a list.
def Score(guess, solution):
solution = list(solution)
c = 0
for g in guess:
if g in solution:
c += 1
solution.remove(g)
return c
*at least that's what I measured on my machine
You can do it in vectorized form using NumPy!
import numpy as np
counts1 = np.bincount(np.array('aaadez', 'c').view(np.uint8), minlength=128)
counts2 = np.bincount(np.array('eeeedddddaa', 'c').view(np.uint8), minlength=128)
np.min((counts1, counts2), axis=0).sum()
counts1 looks like this:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 3, 0, 0, 1, 1, 0...])
This is an array indexed by ASCII codes. The nonzero elements are at positions 97, 100, and 101, which are ASCII 'a', 'd', and 'e'. Then we do a pairwise min(), followed by sum to get the score (in this example, 4).
Something neat about this solution is that you can apply it to as many strings as you want with no decrease in efficiency, and even very long strings will be quite fast because there are no loops in Python itself--only in compiled NumPy code.
Before editing I had a similar but slower and more complex solution using Pandas and SciPy. Here it is:
import scipy.stats
import numpy as np
import pandas
x1 = scipy.stats.itemfreq(np.array('aaade', 'c').view(np.uint8))
x2 = scipy.stats.itemfreq(np.array('bbacadde', 'c').view(np.uint8))
merged = pandas.merge(pandas.DataFrame(x1), pandas.DataFrame(x2), on=0)
np.sum(np.min(merged.values[:,1:], axis=1))
That gives 4.0. The first two lines convert the strings to arrays of integers and runs itemfreq() to count how many times each character occurs. In this example, x1 is:
arrray([[ 97., 3.],
[ 100., 1.],
[ 101., 1.]])
Then we join the two tables by the 0th column, dropping any characters that do not exist in the other one:
0 1_x 1_y
0 97 3 2
1 100 1 2
2 101 1 1
Then we just do a min and sum to get the final score (2+1+1 in this case).
Here is ;
list_a = list("aabbb")
list_b = list("bbbaa")
list_c = set(list_b)
counter = 0
for i in list_c:
if i in list_b:
counter = list_a.count(i)
print "counter : %s element : %s" %(counter,i )
I just wanna show how to count the common elements, you can change the code as summing the counter result.
here's a pretty simple solution using Counter:
def proc(vals):
for s1, s2 in vals:
c1, c2 = Counter(s1), Counter(s2)
same = set(s1) & set(s2)
print s1, s2, sum(min(c1[c], c2[c]) for c in same)
where vals looks like
vals = [('aaaaa', 'bbbbb'), ...]
Try this:
a, b = 'aaabb', 'ccaaa'
dict_a, dict_b = {}, {}
for key in list(a):
dict_a[key] = dict_a.setdefault(key, 0) + 1
for key in list(b):
dict_b[key] = dict_b.setdefault(key, 0) + 1
count = 0
for key, a_val in dict_a.items():
try:
b_val = dict_b[key]
count += min(b_val, a_val)
except KeyError:
None
print count
Same concept as #sloth, but using try instead of if
def Score(guess, solution):
solution = list(solution)
c = 0
for g in guess:
try:
solution.remove(g)
c += 1
except ValueError:
pass
return c
Related
For a one dimensional numpy array of 1's and 0's, how can I effectively "mask" the array such that after the occurrence of a 1, the next n elements of the array are converted to zero. After the n elements have passed, the pattern repeats such that the first next occurrence of a 1 is preserved followed again by n zeros.
It is important that the first eligible occurrences of 1 are preserved, so a simple mask such as:
[true, false, false, true ...] won't work.
furthermore, the data set is massive so efficiency is important.
I've written crude python code to give me the desired results, but it is way too slow for what I need.
Here is an example:
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1]
n = 3
newData = []
tail = 0
for x in data:
if x == 1 and tail <= 0:
newData.append(1)
tail = n
else:
newData.append(0)
tail -= 1
print(newData)
newData: [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1]
Is there possibly a vectorized numpy solution to this problem?
I'm processing tens of thousands of arrays, with more than a million elements in each array. So far using numpy functions has been the only way to manage this.
As far as I know, there is no option completely in numpy to do this. You could still use numpy to reduce the time for grabbing the indices, though.
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1]
n=3
def get_new_data(data,n):
new_data = np.zeros(len(data))
non_zero = np.argwhere(data).ravel()
idx = non_zero[0]
new_data[idx] =1
idx += n
for i in non_zero[1:]:
if i > idx:
new_data[i] = 1
idx+=n
return new_data
get_new_data(data, n)
A function like this should give you a better run time since you are not looping over the whole array.
If this is still not optimal to you, you can look at using numba, which works very well with numpy and is relatively easy to use.
You could do it like this:-
N = 3
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1]
newData = data.copy()
i = 0
M = [0 for _ in range(N)]
while i < len(newData) - N:
if newData[i] == 1:
newData[i + 1:i + 1 + N] = M
i += N
i += 1
print(newData)
I have to slice a Python list/numpy array from an index to -dx and +dx.
where dx is constant. for example:
(the position/index that contains 1 only for illustration, as the center index).
A = [0, 0, 0, 0, 1, 0, 0, 0, 0]
dx=3
print(A[4-dx: 4+dx+1]) # 4 is the position of '1'
>>>[0, 0, 0, 1, 0, 0, 0]
But for this case,
B = [0, 1, 0, 0, 0 ,0, 0 ,0, 0]
print(B[1-dx: 1+dx+1])
>>>[] # because 1-dx <0.
but what i need from case B is [0, 1, 0, 0, 0]
so i did something like this, to prevent empty list/array, say n is the center index:
if n-dx <0:
result= B[:n+dx+1]
Although the above method works fine.
The original code is quite complicated, and I have to put this if...#complicated version# everywhere.
Is there any other way around? Maybe I miss something.
Thank you!
You can use the max() function to bound the index at 0.
print(A[max(0,4-dx): 4+dx+1])
Let's assume I have a list:
list1 = [16620, 22032, 0, 0, 0, 136813, 137899, 0, 199546, 204804]
I am looking for a new list that subtracts every 'non-zero' value from the following 'non-zero' value, e.g. 22032-16620, 137899-136813. 'Zero' values will stay untouched.
In addition to that, the subtracted 'non-zero' value should change to zero.
The output would look something like:
list2 = [0, 5412, 0, 0, 0, 0, 1086, 0, 0, 5258]
Please note that the numbers, the length of the list and its distribution of elements may vary, e.g. a list could also look like
list1 = [0, 0, 0, 0, 95472, 0, 0, 104538, 0, 0, 0, 0, 187649, 0, 0, 204841, 0, 0, 0, 0, 0, 0, 0, 0]
which should turn into:
list2 = [0, 0, 0, 0, 0, 0, 0, 9066, 0, 0, 0, 0, 0, 0, 0, 17192, 0, 0, 0, 0, 0, 0, 0, 0]
As you can see, the number of elements stays the same for list1 and list2. Also, there is always an even number of 'zero' values and an even number of 'non-zero' values.
Help is greatly appreciated!
What I have so far:
from itertools import cycle, chain
list1 = [11545, 15334, 71341, 73861, 0, 0, 170374, 171671]
newlist = [list1[i + 1] - list1[i] for i in range(len(list1)-1)]
list2 = list(chain.from_iterable(zip(newlist[0::2], cycle([int()]))))
print list2
The output print list2 looks like I imagine it to be, yet it won't work for a list that looks like:
list1 = [16620, 22032, 0, 0, 0, 136813, 137899, 0, 199546, 204804]
Copy over, but adjust non-zero values, keeping track of the previous one.
list2 = []
prev = None
for curr in list1:
if curr:
if prev:
curr -= prev
prev = None
else:
prev = curr
curr = 0
list2.append(curr)
This question already has answers here:
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
Closed 7 years ago.
I have already solved problem 5 of Project Euler (What is the smallest positive number that is evenly divisible (with no remainder) by all of the numbers from 1 to 20?), but I want to find a faster way (currently 0.000109195709229 seconds).
I tried a dynamic approach, but when I run the code below (it is just the first part) I don't understand why d[var][counter] gets +1 if I explicitly wrote d[i][counter] += 1.
n = 20
d = {1:[0,1] + [0]*19} #a dictionary that assigns to each number a list of its prime factorization
for i in xrange(2,3): #I changed n+1 with 3 for simplicity
var = i
counter = 2
notDone = True
while notDone:
if var % counter == 0:
var /= counter
print var, d[var]
d[i] = d[var] #i has the same prime factorization of var ...
print var, d[var]
d[i][counter] += 1 #... except for 1 number (counter)
print var, d[var] #wtf?
notDone = False
else:
counter += 2 if counter != 2 else 1
This is the outcome:
1 [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1 [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1 [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
why does this happen?
At the line
d[i] = d[var]
the variable d[i] will hold the same list object as d[var], as lists are mutable.
Instead you need a copy of d[var], that you can get e.g. by
d[i] = d[var][:]
Numpy has a library function, np.unpackbits, which will unpack a uint8 into a bit vector of length 8. Is there a correspondingly fast way to unpack larger numeric types? E.g. uint16 or uint32. I am working on a question that involves frequent translation between numbers, for array indexing, and their bit vector representations, and the bottleneck is our pack and unpack functions.
You can do this with view and unpackbits
Input:
unpackbits(arange(2, dtype=uint16).view(uint8))
Output:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
For a = arange(int(1e6), dtype=uint16) this is pretty fast at around 7 ms on my machine
%%timeit
unpackbits(a.view(uint8))
100 loops, best of 3: 7.03 ms per loop
As for endianness, you'll have to look at http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html and apply the suggestions there depending on your needs.
This is the solution I use:
def unpackbits(x, num_bits):
if np.issubdtype(x.dtype, np.floating):
raise ValueError("numpy data type needs to be int-like")
xshape = list(x.shape)
x = x.reshape([-1, 1])
mask = 2**np.arange(num_bits, dtype=x.dtype).reshape([1, num_bits])
return (x & mask).astype(bool).astype(int).reshape(xshape + [num_bits])
This is a completely vectorized solution that works with any dimension ndarray and can unpack however many bits you want.
I have not found any function for this too, but maybe using Python's builtin struct.unpack can help make the custom function faster than shifting and anding longer uint (note that I am using uint64).
>>> import struct
>>> N = np.uint64(2 + 2**10 + 2**18 + 2**26)
>>> struct.unpack('>BBBBBBBB', N)
(2, 4, 4, 4, 0, 0, 0, 0)
The idea is to convert those to uint8, use unpackbits, concatenate the result. Or, depending on your application, it may be more convenient to use structured arrays.
There is also built-in bin() function, which produces string of 0s and 1s, but I am not sure how fast it is and it requires postprocessing too.
This works for arbitrary arrays of arbitrary uint (i.e. also for multidimensional arrays and also for numbers larger than the uint8 max value).
It cycles over the number of bits, rather than over the number of array elements, so it is reasonably fast.
def my_ManyParallel_uint2bits(in_intAr,Nbits):
''' convert (numpyarray of uint => array of Nbits bits) for many bits in parallel'''
inSize_T= in_intAr.shape
in_intAr_flat=in_intAr.flatten()
out_NbitAr= numpy.zeros((len(in_intAr_flat),Nbits))
for iBits in xrange(Nbits):
out_NbitAr[:,iBits]= (in_intAr_flat>>iBits)&1
out_NbitAr= out_NbitAr.reshape(inSize_T+(Nbits,))
return out_NbitAr
A=numpy.arange(256,261).astype('uint16')
# array([256, 257, 258, 259, 260], dtype=uint16)
B=my_ManyParallel_uint2bits(A,16).astype('uint16')
# array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
# [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
# [0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
# [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint16)