Creating a multidimensional array (D>5) from a dictionary..?

Creating a multidimensional array (D>5) from a dictionary..? - python

I am trying to build a multidimensional array using vectors of different lengths to map out the 'process space' of a problem. I've started by storing values in keys of a dictionary:
d = {'width' : [1,2,3,5,3,5,3],
'height' : [1,2,3,5,5,3],
'length' : [1,3,3,7,8,0,0,7,2,3,6,3,2,3],
'composition' : [1,2,3,5,5,3],
'year' : [7,5,3,2,1,6,4,9,11],
'efficiency' : [1,1,2,3,5,8,13,21,34]}
Is it possible to use these keys to construct a multidimensional (6D) matrix of size
(7,6,14,6,9,9)? (That is, each dictionary key would be represented as a separate dimension of the final array)
EDIT:
I would like to use this matrix as a means of looking at a cross section of the data. For example, I would like to be able to say, "Here are all the efficiency values as a function of 'Length', given:
width = 4
height = 2
composition = 3
year = 7

I think you are naming the columns as dimensions.
Since you have indexes and data, use pandas DataFrames
from pandas import Series, DataFrame
d = {'width' : [1,2,3,5,3,5,3],
'height' : [1,2,3,5,5,3],
'length' : [1,3,3,7,8,0,0,7,2,3,6,3,2,3],
'composition' : [1,2,3,5,5,3],
'year' : [7,5,3,2,1,6,4,9,11],
'efficiency' : [1,1,2,3,5,8,13,21,34]}
Since there is missing data you need a intermediate step until you can turn it into a DataFrame.
intermediate=dict()
for x in d:
intermediate[x]=Series(d[x])
data=DataFrame(intermediate)
then you can query data using normal pandas syntax.
data[data.length>5]
composition efficiency height length width year
3 5 3 5 7 5 2
4 5 5 5 8 3 1
7 NaN 21 NaN 7 NaN 9
10 NaN NaN NaN 6 NaN NaN

Basic principle
The simples and most efficient way would be using NumPy.
d = {'width' : [1,2,3,5,3,5,3],
'height' : [1,2,3,5,5,3],
'length' : [1,3,3,7,8,0,0,7,2,3,6,3,2,3],
'composition' : [1,2,3,5,5,3],
'year' : [7,5,3,2,1,6,4,9,11],
'efficiency' : [1,1,2,3,5,8,13,21,34]}
You need an order of your names:
names = ['width' ,'height', 'length' ,'composition', 'year','efficiency']
Import NumPy:
import numpy as np
Find the shape:
shape = tuple(len(d[name]) for name in names)
shape is:
(7, 6, 14, 6, 9, 9)
Create an array of zeros:
lookup = np.zeros(shape, dtype=np.uint16)
I use very small unsigned integers to save space. You can use larger numbers if needed:
Now lookup can be used like this:
>>> lookup[0, 0, 0, 0, 0, 0]
0
>>> lookup[0, 0, 0, 0, 0, 0] = 12
>>> lookup[0, 0, 0, 0, 0, 0]
12
Lookup all values for efficiency:
>>> lookup[0, 0, 0, 0, 0, :]
array([12, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint16)
All values for year and efficiency:
>>> lookup[0, 0, 0, 0, :, :]
array([[12, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint16)
Useful class
For convenience, wrap into a class:
class Lookup(object):
def __init__(self, dims, dtype=np.uint16):
self.names = [item[0] for item in dims]
self.shape = [item[1] for item in dims]
self.repr = np.zeros(self.shape, dtype=dtype)
def _make_loc(self, coords):
return [coords.get(name, slice(None)) for name in self.names]
def get_value(self, coords):
return self.repr.__getitem__(self._make_loc(coords))
def set_value(self, coords, value):
return self.repr.__setitem__(self._make_loc(coords), value)
Specify the dimensions:
dims = [('width', 7),
('year', 9),
('composition', 6),
('height', 6),
('efficiency', 9),
('length', 14)]
Make an instance:
lookup = Lookup(dims)
Set a value:
coords1 = {'width': 3,
'height': 1,
'composition': 2,
'year': 6,
'length': 3}
lookup.set_value(coords1, 11)
Get a value back:
coords2 = {'width': 3,
'height': 1,
'composition': 2,
'year': 6}
lookup.get_value(coords2)
Gives you :
array([[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint16)

Related

Converting an array to a list in Python

I have an array A. I want to identify all locations with element 1 and convert it to a list as shown in the expected output. But I am getting an error.
import numpy as np
A=np.array([0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
B=np.where(A==1)
B=B.tolist()
print(B)
The error is
in <module>
B=B.tolist()
AttributeError: 'tuple' object has no attribute 'tolist'
The expected output is
[1, 2, 5, 7, 10, 11]

np.where used with only the condition returns a tuple of arrays containing indices; one array for each dimension of the array. According to the docs, this is much like np.nonzero, which is the recommended approach over np.where. So, since your array is one dimensional, np.where will return a tuple with one element, inside of which is the array containing the indices in your expected output. You can resolve your problem by accessing into the tuple like np.where(A == 1)[0].tolist().
However, I recommend using np.flatnonzero instead, which avoids the hassle entirely:
import numpy as np
A = np.array([0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
B = np.flatnonzero(A).tolist()
B:
[1, 2, 5, 7, 10, 11]
PS: when all other elements are 0, you don't have to explicitly compare to 1 ;).

import numpy as np
A = np.array([0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
indices = np.where(A == 1)[0]
B = indices.tolist()
print(B)

You should access the first element of this tuple with B[0] :
import numpy as np
A=np.array([0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
B=np.where(A==1)
B = B[0].tolist()
print(B) # [1, 2, 5, 7, 10, 11]

Error while converting a pandas dataframe to spark Dataframe

My Pandas DataFrame
df4.head()
features
0 [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ...
1 [0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, ...
Each cell is a python list.
mySchema=StructType([StructField("features",ArrayType(IntegerType()),True)])
sdf2=sqlCtx.createDataFrame(df4,schema=mySchema)
While creating spark Dataframe sdf2, I am getting following error. I tried with different datatypes but in vain.
Error: element in array field features: IntegerType can not accept object 0 in type <class 'numpy.int64'>
I want to run BucketedRandomProjectionLSH in Pysark which accepts a single column with data vector.

That is because you have numpy.int64 objects inside your arrays.
Spark does not accept that.
df = pd.DataFrame([
(np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]),),
(np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]),),
], columns = ['features'])
type(df.iloc[0]['features'][0])
> numpy.int64
df = pd.DataFrame([
([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],),
([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],),
], columns = ['features'])
type(df.iloc[0]['features'][0])
> int
Try using a Python list instead.

Replacing part of Numpy 2D Array using list in both rows and columns

Let's import numpy first,
import numpy as np
For example, I have a matrix A as,
A = np.identity(10)
I have two other matrices as,
B = np.random.sample((4, 4))
C = np.random.sample((6, 6))
In addition, I have two lists of indices as,
idx_1 = [1, 2, 4, 7]
idx_2 = [0, 3, 5, 6, 8, 9]
Now I want to replace idx_1 rows and columns of A by B and idx_2 rows and columns of A by C. The final A matrix will be a block diagonal matrix.
What is the efficient way to achieve this?
I tried as follows but I did not change A, I don't know why, but I did not get any error as well.
A[idx_1][:,idx_1] = B

In [99]: A = np.identity(10).astype(int)
In [100]: A
Out[100]:
array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
In [101]: idx_1 = [1, 2, 4, 7]
I can select a set of diagonal values with:
In [102]: A[idx_1, idx_1]
Out[102]: array([1, 1, 1, 1])
In [103]: A[idx_1, idx_1] = [10,20,30,40]
In [104]: A
Out[104]:
array([[ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 10, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 20, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 30, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 40, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
But it looks like you want to replace a (4,4) block of values. We have to construct a pair on indices that together broadcast to the shape, that is (4,1) array with a (1,4) array.
In [105]: np.ix_(idx_1, idx_1)
Out[105]:
(array([[1],
[2],
[4],
[7]]), array([[1, 2, 4, 7]]))
In [106]: A[np.ix_(idx_1, idx_1)]
Out[106]:
array([[10, 0, 0, 0],
[ 0, 20, 0, 0],
[ 0, 0, 30, 0],
[ 0, 0, 0, 40]])
In [107]: A[np.ix_(idx_1, idx_1)] += 1
In [108]: A
Out[108]:
array([[ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 11, 1, 0, 1, 0, 0, 1, 0, 0],
[ 0, 1, 21, 0, 1, 0, 0, 1, 0, 0],
[ 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[ 0, 1, 1, 0, 31, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[ 0, 1, 1, 0, 1, 0, 0, 41, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
The equivalent indexing with nested lists is:
In [109]: A[[[1],[2],[4],[7]],[[1,2,4,7]]]
Out[109]:
array([[11, 1, 1, 1],
[ 1, 21, 1, 1],
[ 1, 1, 31, 1],
[ 1, 1, 1, 41]])
Regarding your indexing attempt:
In [110]: A[idx_1] # A[idx_1,:]
Out[110]:
array([[ 0, 11, 1, 0, 1, 0, 0, 1, 0, 0],
[ 0, 1, 21, 0, 1, 0, 0, 1, 0, 0],
[ 0, 1, 1, 0, 31, 0, 0, 1, 0, 0],
[ 0, 1, 1, 0, 1, 0, 0, 41, 0, 0]])
In [111]: A[idx_1][:,idx_1]
Out[111]:
array([[11, 1, 1, 1],
[ 1, 21, 1, 1],
[ 1, 1, 31, 1],
[ 1, 1, 1, 41]])
In[111] is evaluated in 2 steps; first rows are selected, and then columns.
In
A[idx_1][:,idx_1] = B
the values of B will replace columns in Out[110]. But that's a copy of values from A, not a view. So a good grasp of the difference between view and copy, and between basic and advanced indexing is important when working with numpy.

Python Equivalent for bwmorph

I am still coding a fingerprint image preprocessor on Python. I see in MATLAB there is a special function to remove H breaks and spurs:
bwmorph(a , 'hbreak')
bwmorph(a , 'spur')
I have searched scikit, OpenCV and others but couldn't find an equivalent for these two use of bwmorph. Can anybody point me to right direction or do i have to implement my own?

Edit October 2017
the skimage module now has at least 2 options:
skeletonize and thin
Example with comparison
from skimage.morphology import thin, skeletonize
import numpy as np
import matplotlib.pyplot as plt
square = np.zeros((7, 7), dtype=np.uint8)
square[1:-1, 2:-2] = 1
square[0, 1] = 1
thinned = thin(square)
skel = skeletonize(square)
f, ax = plt.subplots(2, 2)
ax[0,0].imshow(square)
ax[0,0].set_title('original')
ax[0,0].get_xaxis().set_visible(False)
ax[0,1].axis('off')
ax[1,0].imshow(thinned)
ax[1,0].set_title('morphology.thin')
ax[1,1].imshow(skel)
ax[1,1].set_title('morphology.skeletonize')
plt.show()
Original post
I have found this solution by joefutrelle on github.
It seems (visually) to give similar results as the Matlab version.
Hope that helps!
Edit:
As it was pointed out in the comments, I'll extend my initial post as the mentioned link might change:
Looking for a substitute in Python for bwmorph from Matlab I stumbled upon the following code from joefutrelle on Github (at the end of this post as it's very long).
I have figured out two ways to implement this into my script (I'm a beginner and I'm sure there are better ways!):
1) copy the whole code into your script and then call the function (but this makes the script harder to read)
2) copy the code it in a new python file 'foo' and save it. Now copy it in the Python\Lib (eg. C:\Program Files\Python35\Lib) folder. In your original script you can call the function by writing:
from foo import bwmorph_thin
Then you'll feed the function with your binary image:
skeleton = bwmorph_thin(foo_image, n_iter = math.inf)
import numpy as np
from scipy import ndimage as ndi
# lookup tables for bwmorph_thin
G123_LUT = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1,
0, 0, 0], dtype=np.bool)
G123P_LUT = np.array([0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0], dtype=np.bool)
def bwmorph_thin(image, n_iter=None):
"""
Perform morphological thinning of a binary image
Parameters
----------
image : binary (M, N) ndarray
The image to be thinned.
n_iter : int, number of iterations, optional
Regardless of the value of this parameter, the thinned image
is returned immediately if an iteration produces no change.
If this parameter is specified it thus sets an upper bound on
the number of iterations performed.
Returns
-------
out : ndarray of bools
Thinned image.
See also
--------
skeletonize
Notes
-----
This algorithm [1]_ works by making multiple passes over the image,
removing pixels matching a set of criteria designed to thin
connected regions while preserving eight-connected components and
2 x 2 squares [2]_. In each of the two sub-iterations the algorithm
correlates the intermediate skeleton image with a neighborhood mask,
then looks up each neighborhood in a lookup table indicating whether
the central pixel should be deleted in that sub-iteration.
References
----------
.. [1] Z. Guo and R. W. Hall, "Parallel thinning with
two-subiteration algorithms," Comm. ACM, vol. 32, no. 3,
pp. 359-373, 1989.
.. [2] Lam, L., Seong-Whan Lee, and Ching Y. Suen, "Thinning
Methodologies-A Comprehensive Survey," IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol 14, No. 9,
September 1992, p. 879
Examples
--------
>>> square = np.zeros((7, 7), dtype=np.uint8)
>>> square[1:-1, 2:-2] = 1
>>> square[0,1] = 1
>>> square
array([[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
>>> skel = bwmorph_thin(square)
>>> skel.astype(np.uint8)
array([[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
"""
# check parameters
if n_iter is None:
n = -1
elif n_iter <= 0:
raise ValueError('n_iter must be > 0')
else:
n = n_iter
# check that we have a 2d binary image, and convert it
# to uint8
skel = np.array(image).astype(np.uint8)
if skel.ndim != 2:
raise ValueError('2D array required')
if not np.all(np.in1d(image.flat,(0,1))):
raise ValueError('Image contains values other than 0 and 1')
# neighborhood mask
mask = np.array([[ 8, 4, 2],
[16, 0, 1],
[32, 64,128]],dtype=np.uint8)
# iterate either 1) indefinitely or 2) up to iteration limit
while n != 0:
before = np.sum(skel) # count points before thinning
# for each subiteration
for lut in [G123_LUT, G123P_LUT]:
# correlate image with neighborhood mask
N = ndi.correlate(skel, mask, mode='constant')
# take deletion decision from this subiteration's LUT
D = np.take(lut, N)
# perform deletion
skel[D] = 0
after = np.sum(skel) # coint points after thinning
if before == after:
# iteration had no effect: finish
break
# count down to iteration limit (or endlessly negative)
n -= 1
return skel.astype(np.bool)
"""
# here's how to make the LUTs
def nabe(n):
return np.array([n>>i&1 for i in range(0,9)]).astype(np.bool)
def hood(n):
return np.take(nabe(n), np.array([[3, 2, 1],
[4, 8, 0],
[5, 6, 7]]))
def G1(n):
s = 0
bits = nabe(n)
for i in (0,2,4,6):
if not(bits[i]) and (bits[i+1] or bits[(i+2) % 8]):
s += 1
return s==1
g1_lut = np.array([G1(n) for n in range(256)])
def G2(n):
n1, n2 = 0, 0
bits = nabe(n)
for k in (1,3,5,7):
if bits[k] or bits[k-1]:
n1 += 1
if bits[k] or bits[(k+1) % 8]:
n2 += 1
return min(n1,n2) in [2,3]
g2_lut = np.array([G2(n) for n in range(256)])
g12_lut = g1_lut & g2_lut
def G3(n):
bits = nabe(n)
return not((bits[1] or bits[2] or not(bits[7])) and bits[0])
def G3p(n):
bits = nabe(n)
return not((bits[5] or bits[6] or not(bits[3])) and bits[4])
g3_lut = np.array([G3(n) for n in range(256)])
g3p_lut = np.array([G3p(n) for n in range(256)])
g123_lut = g12_lut & g3_lut
g123p_lut = g12_lut & g3p_lut
"""`

You will have to implement those on your own since they aren't present in OpenCV or skimage as far as I know.
However, it should be straightforward to check MATLAB's code on how it works and write your own version in Python/NumPy.
Here is a guide describing in detail NumPy functions exclusively for MATLAB users, with hints on equivalent functions in MATLAB and NumPy:
Link

Compare a varying number of lists in Python

I'm writing a script that I'm going to use quite often, with datasets of different sizes, and I have to do some comparisons that I just can't get straight in Python.
There will be multiple lists (around 20 or more, but I've reduced them to three for example and testing purposes), all with the same number of integer items in a certain order. I want to compare items on the same position in every list to find differences.
For a defined number of lists, this is easy:
a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
for x,y,z in zip(a,b,c):
if x != y != z:
print x, y, z
I've tried wrapping that loop in a function, so the number of arguments can vary, but there I got stuck.
def compare(*args):
for x in zip(args):
???
In the final script I will have not multiple single lists, but all together in one list of list. Would that help? If I loop through a list of lists, I won't get every list at once...
Forget the function, it's not really useful anyway as it will be part of a bigger script and it's too difficult defining the different arguments.
I'm now comparing two lists at a time, saving those that are identical. That way, I can later easily remove all those from my whole list and keep only the unique ones.
l_o_l = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
for i in range(0, (len(l_o_l)-1)):
for j in range((i+1), len(l_o_l)):
if l_o_l[i] == l_o_l[j]:
duplicates.append(key_list[i])
duplicates.append(key_list[j])
dup = list(set(duplicates))
uniques = [x for x in key_list if x not in dup]
where the key_list contains, from a dictionary, identifiers for my lists.
Any suggestions for improvement?

Maybe something like this
def compare(*args):
for things in zip(*args):
yield all(x == things[0] for x in things)
You can then use it like this
a = range(10)
b = range(10)
c = range(10)
d = range(11, 20)
for match in compare(a,b,c):
print match
for match in compare(a,b,c,d):
print match
Here is a demo using your example (its a generator, so you have to iter over it or exhaust it using list)
a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
b = [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 4, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
c = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
print list(compare(a,b,c))

def compare(*args):
for x in zip(args):
values_list = list(x[0]) # x[0] because x is a tuple
different_values = set(values_list) # a set does not contain identical values
if len(different_values) != 1: # if you have more than 1 value you have different values in your list
print 'different values', values_list
gives you
a = [0, 0, 1]
b = [0, 1, 1]
c = [1, 1, 1]
compare(a, b, c)
>>> different values [0, 0, 1]
>>> different values [0, 1, 1]

Assuming the lists are similar to the ones in the example, I would use:
def compare(*args):
for x in zip(args):
if min(x) != max(x):
print x

def compare(elements):
return len(set(elements)) == bool(elements)
If you want to know whether all the lists are the same you can simply do:
all(compare(elements) for elements in zip(the_lists))
An alternative could be to transform the lists into tuples and use set there:
len(set(tuple(the_list) for the_list in the_lists) == bool(the_lists)
If you simply want to remove duplicates this should be faster:
the_lists = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
Example usage:
>>> a = range(100)
>>> b = range(100, 200)
>>> c = range(200, 300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists2 = [list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]
>>> [a,b,c] == sorted(the_lists2) #order is not maintained by set
True
It seems to be pretty fast:
>>> timeit.timeit('[list(elem) for elem in set(tuple(the_list) for the_list in the_lists)]', 'from __main__ import the_lists', number=1000000)
7.949447154998779
Less than 8 seconds for executing 1 million times. (Where the_lists is the same used before.)
Edit:
If you want to remove only the duplicated list then the simplest algorithm I can think of is sorting the list-of-lists and using itertools.groupby:
>>> a = range(100)
>>> b = range(100,200)
>>> c = range(200,300)
>>> d = a[:]
>>> e = b[:]
>>> the_lists = [a,b,c,d,e]
>>> the_lists.sort()
>>> import itertools as it
>>> for key, group in it.groupby(the_lists):
... if len(list(group)) == 1:
... print key
...
[200, 201, 202, ..., 297, 298, 299]

I think trying to get clever with *args and zip is just confusing the issue. I would write it something like this:
def compare(list_of_lists):
# assuming not an empty data set
inner_len = len(list_of_lists[0])
for index in range(inner_len):
expected = list_of_lists[0][index]
for inner_list in list_of_lists:
if inner_list[index] != expected:
# report difference at this index

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a multidimensional array (D>5) from a dictionary..? - python

Related

Converting an array to a list in Python

Error while converting a pandas dataframe to spark Dataframe

Replacing part of Numpy 2D Array using list in both rows and columns

Python Equivalent for bwmorph

Compare a varying number of lists in Python

Categories

Resources