Related
I am trying to create several arrays from a big array that I have. What I mean is:
data = [[0, 1, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 1, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 1, 0, 0, 0, 0, 0,1], [0, 0, 0, 0, 1, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 1, 0]]
I want to create 10 different arrays - using the 10 data's columns - with different names.
data1 = [0, 0, 0, 1, 0, 0, 1, 0, 0],
data2 = [1, 0, 1, 0, 0, 0, 0, 1, 0], and so on
I found a close solution here - Also I take the example data from there - However, when I tried the solution suggested:
for d in xrange(0,9):
exec 'x%s = data[:,%s]' %(d,d-1)
A error message appears:
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 2, in
exec ('x%s = data[:,%s]') %(d,d-1)
File "", line 1
x%s = data[:,%s]
^
SyntaxError: invalid syntax
Please, any comments will be highly appreciated. Regards
Use numpy array index:
data = [[0, 1, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 1, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 1, 1, 0, 0, 0, 0, 0,1], [0, 0, 0, 0, 1, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 1, 0]]
d = np.array(data)
d[:, 0]
#array([0, 0, 0, 1, 0, 0, 1, 0, 0])
d[:, 1]
#array([1, 0, 1, 0, 0, 0, 0, 1, 0])
etc...
d[:, 9]
#array([0, 0, 1, 1, 1, 0, 0, 0, 0])
If you must, then dictionaries are the way to go:
val = {i:d[:,i] for i in range(d.shape[1])}
To access the arrays:
val[0]
#array([0, 0, 0, 1, 0, 0, 1, 0, 0])
...
val[9]
#array([0, 0, 1, 1, 1, 0, 0, 0, 0])
Use the following code (it is also more readable -- for python 3.x) if you really want to create dynamic variables:
for d in range(0,9):
# exec 'x%s = data[:,%s]' %(d,d-1)
exec(f"data{d} = {data[d]}" )
Either use numpy array as shown by scott boston above or use dictionary like this:
a = {}
for i in range(0,9):
a[i] = data[i][:]
Output:
{0: [0, 1, 0, 0, 0, 0, 0, 1, 0, 0],
1: [0, 0, 1, 0, 0, 1, 0, 0, 0, 0],
2: [0, 1, 1, 0, 0, 0, 0, 0, 0, 1],...
I don't see the proper indentation in your for loop.
I suggest you don't use %s for the second argument (string) but rather %d (number) since you need a number to do the indexing of your array.
I need to have something like this:
arr = array([[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0]])
Where each row contains 36 elements, every 6 element in a row represents a hidden row, and that hidden row needs exactly one 1, and 0 everywhere else. In other words, every entry mod 6 needs exactly one 1. This is my requirement for arr.
I have a table that's going to be used to compute a "fitness" value for each row. That is, I have a
table = np.array([10, 5, 4, 6, 5, 1, 6, 4, 9, 7, 3, 2, 1, 8, 3,
6, 4, 6, 5, 3, 7, 2, 1, 4, 3, 2, 5, 6, 8, 7, 7, 6, 4, 1, 3, 2])
table = table.T
and I'm going to multiply each row of arr with table. The result of that multiplication, a 1x1 matrix, will be stored as the "fitness" value of that corresponding row. UNLESS the row does not fit the requirement described above, which should return 0.
an example of what should be returned is
result = array([5,12,13,14,20,34])
I need a way to do this but I'm too new to numpy to know how to.
(I'm Assuming you want what you've asked for in the first half).
I believe better or more elegant solutions exist, but this is what I think can do the job.
np.all(arr[:,6] == 1) and np.all(arr[:, :6] == 0) and np.all(arr[:, 7:])
Alternatively, you can construct the array (with 0's and 1's) and then just compare with it, say using not_equal.
I'm also not 100% sure of your question, but I'll try to answer with the best of my knowledge.
Since you're saying your matrix has "hidden rows", to check whether it is well formed, the easiest way seems to be to just reshape it:
# First check, returns true if all elements are either 0 or 1
np.in1d(arr, [0,1]).all()
# Second check, provided the above was True, returns True if
# each "hidden row" has exactly one 1 and other 0.
(arr.reshape(6,6,6).sum(axis=2) == 1).all()
Both checks return "True" for your arr.
Now, my understanding is that for each "large" row of 36 elements, you want a scalar product with your "table" vector, unless that "large" row has an ill-formed "hidden small" row. In this case, I'd do something like:
# The following computes the result, not checking for integrity
results = arr.dot(table)
# Now remove the results that are not well formed.
# First, compute "large" rows where at least one "small" subrow
# fails the condition.
mask = (arr.reshape(6,6,6).sum(axis=2) != 1).any(axis=1)
# And set the corresponding answer to 0
results[mask] = 0
However, running this code against your data returns as answer
array([38, 31, 24, 24, 32, 20])
which is not what you mention; did I misunderstand your requirement, or was the example based on different data?
I am working on pagerank for a school project, and i have a matrix where the row "i" represent the links from the site j (line) to the site i. (If it is still unclear i'll explain more).
The current part is:
Z=[[0,1,1,1,1,0,1,0,0,0,0,0,0,0],[1,0,0,0,1,0,0,0,0,0,0,0,0,0], [1,1,0,0,0,0,0,0,0,0,0,0,0,0],[1,0,1,0,0,0,0,0,0,0,0,0,0,0],[1,0,0,1,0,0,0,0,0,0,0,0,0,0],[1,0,0,0,0,0,0,1,0,1,0,0,0,0],[0,0,0,0,0,1,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,1,0,1,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,1,0,1,1,1,1],[0,0,0,0,0,0,0,0,0,1,0,0,0,1],[0,0,0,0,0,0,0,0,0,1,1,0,0,0],[0,0,0,0,0,0,0,0,0,1,0,1,0,0],[0,0,0,0,0,0,0,0,0,1,0,0,1,0]]
A=np.matrix(Z)
G=nx.from_numpy_matrix(A,create_using=nx.MultiDiGraph())
pos=nx.circular_layout(G)
labels={}
for i in range (N):
labels[i]=i+1
nx.draw_circular(G)
nx.draw_networkx_labels(G,pos,labels,font_size=15)
The problem i have is that the labels are not where they are supposed to be, it seems that networkx is just placing them clockwise...
Also, how could i easily direct the graph, so that a link from j to i won't be from i to j?
Thanks!
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
Z = [[0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0]]
G = nx.from_numpy_matrix(np.array(Z), create_using=nx.MultiDiGraph())
pos = nx.circular_layout(G)
nx.draw_circular(G)
labels = {i : i + 1 for i in G.nodes()}
nx.draw_networkx_labels(G, pos, labels, font_size=15)
plt.show()
yields
This result appears correct to me. Notice, for example, that the node labeled 1 has directed edges pointing to 2, 3, 4, 5 and 7. This corresponds to the ones on the first row in the array, Z[0]:
[0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0]
since the first row corresponds to node 1, and the ones in this row occur in the columns corresponding to nodes 2, 3, 4, 5 and 7.
I am still coding a fingerprint image preprocessor on Python. I see in MATLAB there is a special function to remove H breaks and spurs:
bwmorph(a , 'hbreak')
bwmorph(a , 'spur')
I have searched scikit, OpenCV and others but couldn't find an equivalent for these two use of bwmorph. Can anybody point me to right direction or do i have to implement my own?
Edit October 2017
the skimage module now has at least 2 options:
skeletonize and thin
Example with comparison
from skimage.morphology import thin, skeletonize
import numpy as np
import matplotlib.pyplot as plt
square = np.zeros((7, 7), dtype=np.uint8)
square[1:-1, 2:-2] = 1
square[0, 1] = 1
thinned = thin(square)
skel = skeletonize(square)
f, ax = plt.subplots(2, 2)
ax[0,0].imshow(square)
ax[0,0].set_title('original')
ax[0,0].get_xaxis().set_visible(False)
ax[0,1].axis('off')
ax[1,0].imshow(thinned)
ax[1,0].set_title('morphology.thin')
ax[1,1].imshow(skel)
ax[1,1].set_title('morphology.skeletonize')
plt.show()
Original post
I have found this solution by joefutrelle on github.
It seems (visually) to give similar results as the Matlab version.
Hope that helps!
Edit:
As it was pointed out in the comments, I'll extend my initial post as the mentioned link might change:
Looking for a substitute in Python for bwmorph from Matlab I stumbled upon the following code from joefutrelle on Github (at the end of this post as it's very long).
I have figured out two ways to implement this into my script (I'm a beginner and I'm sure there are better ways!):
1) copy the whole code into your script and then call the function (but this makes the script harder to read)
2) copy the code it in a new python file 'foo' and save it. Now copy it in the Python\Lib (eg. C:\Program Files\Python35\Lib) folder. In your original script you can call the function by writing:
from foo import bwmorph_thin
Then you'll feed the function with your binary image:
skeleton = bwmorph_thin(foo_image, n_iter = math.inf)
import numpy as np
from scipy import ndimage as ndi
# lookup tables for bwmorph_thin
G123_LUT = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1,
0, 0, 0], dtype=np.bool)
G123P_LUT = np.array([0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0], dtype=np.bool)
def bwmorph_thin(image, n_iter=None):
"""
Perform morphological thinning of a binary image
Parameters
----------
image : binary (M, N) ndarray
The image to be thinned.
n_iter : int, number of iterations, optional
Regardless of the value of this parameter, the thinned image
is returned immediately if an iteration produces no change.
If this parameter is specified it thus sets an upper bound on
the number of iterations performed.
Returns
-------
out : ndarray of bools
Thinned image.
See also
--------
skeletonize
Notes
-----
This algorithm [1]_ works by making multiple passes over the image,
removing pixels matching a set of criteria designed to thin
connected regions while preserving eight-connected components and
2 x 2 squares [2]_. In each of the two sub-iterations the algorithm
correlates the intermediate skeleton image with a neighborhood mask,
then looks up each neighborhood in a lookup table indicating whether
the central pixel should be deleted in that sub-iteration.
References
----------
.. [1] Z. Guo and R. W. Hall, "Parallel thinning with
two-subiteration algorithms," Comm. ACM, vol. 32, no. 3,
pp. 359-373, 1989.
.. [2] Lam, L., Seong-Whan Lee, and Ching Y. Suen, "Thinning
Methodologies-A Comprehensive Survey," IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol 14, No. 9,
September 1992, p. 879
Examples
--------
>>> square = np.zeros((7, 7), dtype=np.uint8)
>>> square[1:-1, 2:-2] = 1
>>> square[0,1] = 1
>>> square
array([[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
>>> skel = bwmorph_thin(square)
>>> skel.astype(np.uint8)
array([[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
"""
# check parameters
if n_iter is None:
n = -1
elif n_iter <= 0:
raise ValueError('n_iter must be > 0')
else:
n = n_iter
# check that we have a 2d binary image, and convert it
# to uint8
skel = np.array(image).astype(np.uint8)
if skel.ndim != 2:
raise ValueError('2D array required')
if not np.all(np.in1d(image.flat,(0,1))):
raise ValueError('Image contains values other than 0 and 1')
# neighborhood mask
mask = np.array([[ 8, 4, 2],
[16, 0, 1],
[32, 64,128]],dtype=np.uint8)
# iterate either 1) indefinitely or 2) up to iteration limit
while n != 0:
before = np.sum(skel) # count points before thinning
# for each subiteration
for lut in [G123_LUT, G123P_LUT]:
# correlate image with neighborhood mask
N = ndi.correlate(skel, mask, mode='constant')
# take deletion decision from this subiteration's LUT
D = np.take(lut, N)
# perform deletion
skel[D] = 0
after = np.sum(skel) # coint points after thinning
if before == after:
# iteration had no effect: finish
break
# count down to iteration limit (or endlessly negative)
n -= 1
return skel.astype(np.bool)
"""
# here's how to make the LUTs
def nabe(n):
return np.array([n>>i&1 for i in range(0,9)]).astype(np.bool)
def hood(n):
return np.take(nabe(n), np.array([[3, 2, 1],
[4, 8, 0],
[5, 6, 7]]))
def G1(n):
s = 0
bits = nabe(n)
for i in (0,2,4,6):
if not(bits[i]) and (bits[i+1] or bits[(i+2) % 8]):
s += 1
return s==1
g1_lut = np.array([G1(n) for n in range(256)])
def G2(n):
n1, n2 = 0, 0
bits = nabe(n)
for k in (1,3,5,7):
if bits[k] or bits[k-1]:
n1 += 1
if bits[k] or bits[(k+1) % 8]:
n2 += 1
return min(n1,n2) in [2,3]
g2_lut = np.array([G2(n) for n in range(256)])
g12_lut = g1_lut & g2_lut
def G3(n):
bits = nabe(n)
return not((bits[1] or bits[2] or not(bits[7])) and bits[0])
def G3p(n):
bits = nabe(n)
return not((bits[5] or bits[6] or not(bits[3])) and bits[4])
g3_lut = np.array([G3(n) for n in range(256)])
g3p_lut = np.array([G3p(n) for n in range(256)])
g123_lut = g12_lut & g3_lut
g123p_lut = g12_lut & g3p_lut
"""`
You will have to implement those on your own since they aren't present in OpenCV or skimage as far as I know.
However, it should be straightforward to check MATLAB's code on how it works and write your own version in Python/NumPy.
Here is a guide describing in detail NumPy functions exclusively for MATLAB users, with hints on equivalent functions in MATLAB and NumPy:
Link
I'm writing a script that I'm going to use quite often, with datasets of different sizes, and I have to do some comparisons that I just can't get straight in Python.
There will be multiple lists (around 20 or more, but I've reduced them to three for example and testing purposes), all with the same number of integer items in a certain order. I want to compare items on the same position in every list to find differences.
For a defined number of lists, this is easy:
a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
for x,y,z in zip(a,b,c):
if x != y != z:
print x, y, z
I've tried wrapping that loop in a function, so the number of arguments can vary, but there I got stuck.
def compare(*args):
for x in zip(args):
???
In the final script I will have not multiple single lists, but all together in one list of list. Would that help? If I loop through a list of lists, I won't get every list at once...
Forget the function, it's not really useful anyway as it will be part of a bigger script and it's too difficult defining the different arguments.
I'm now comparing two lists at a time, saving those that are identical. That way, I can later easily remove all those from my whole list and keep only the unique ones.
l_o_l = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
for i in range(0, (len(l_o_l)-1)):
for j in range((i+1), len(l_o_l)):
if l_o_l[i] == l_o_l[j]:
duplicates.append(key_list[i])
duplicates.append(key_list[j])
dup = list(set(duplicates))
uniques = [x for x in key_list if x not in dup]
where the key_list contains, from a dictionary, identifiers for my lists.
Any suggestions for improvement?
Maybe something like this
def compare(*args):
for things in zip(*args):
yield all(x == things[0] for x in things)
You can then use it like this
a = range(10)
b = range(10)
c = range(10)
d = range(11, 20)
for match in compare(a,b,c):
print match
for match in compare(a,b,c,d):
print match
Here is a demo using your example (its a generator, so you have to iter over it or exhaust it using list)
a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
print list(compare(a,b,c))
def compare(*args):
for x in zip(args):
values_list = list(x[0]) # x[0] because x is a tuple
different_values = set(values_list) # a set does not contain identical values
if len(different_values) != 1: # if you have more than 1 value you have different values in your list
print 'different values', values_list
gives you
a = [0, 0, 1]
b = [0, 1, 1]
c = [1, 1, 1]
compare(a, b, c)
>>> different values [0, 0, 1]
>>> different values [0, 1, 1]
Assuming the lists are similar to the ones in the example, I would use:
def compare(*args):
for x in zip(args):
if min(x) != max(x):
print x
def compare(elements):
return len(set(elements)) == bool(elements)
If you want to know whether all the lists are the same you can simply do:
all(compare(elements) for elements in zip(the_lists))
An alternative could be to transform the lists into tuples and use set there:
len(set(tuple(the_list) for the_list in the_lists) == bool(the_lists)
If you simply want to remove duplicates this should be faster:
the_lists = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
Example usage:
>>> a = range(100)
>>> b = range(100, 200)
>>> c = range(200, 300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists2 = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
>>> [a,b,c] == sorted(the_lists2) #order is not maintained by set
True
It seems to be pretty fast:
>>> timeit.timeit('[list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]', 'from __main__ import the_lists', number=1000000)
7.949447154998779
Less than 8 seconds for executing 1 million times. (Where the_lists is the same used before.)
Edit:
If you want to remove only the duplicated list then the simplest algorithm I can think of is sorting the list-of-lists and using itertools.groupby:
>>> a = range(100)
>>> b = range(100,200)
>>> c = range(200,300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists.sort()
>>> import itertools as it
>>> for key, group in it.groupby(the_lists):
... if len(list(group)) == 1:
... print key
...
[200, 201, 202, ..., 297, 298, 299]
I think trying to get clever with *args and zip is just confusing the issue. I would write it something like this:
def compare(list_of_lists):
# assuming not an empty data set
inner_len = len(list_of_lists[0])
for index in range(inner_len):
expected = list_of_lists[0][index]
for inner_list in list_of_lists:
if inner_list[index] != expected:
# report difference at this index