Find an easier way to compare two 2-d array's independence - python

My question
1. Intro
ka & kb are two 2-d array all in the shape of 31*37
They contain 2 value: 0 & 1
Independence:the grid amount when only the value of ka[i, j] = 1
Using np.mask, they shows like this:
http://i4.tietuku.com/29adccd90484fe34.png
code here:
ka_select = np.ma.masked_less(ka,0.001)
pa =plt.pcolor(kb_select,cmap="Set1",alpha =0.7,facecolor = "k",edgecolor = 'k',zorder =1)
kb_select = np.ma.masked_less(kb,0.001)
pb =plt.pcolor(kb_select,cmap="Set1",alpha =0.7,facecolor = "k",edgecolor = 'k',zorder =1)
2. My early work
Comparing with two array ka & kb.
If the value in index[i,j] all equal to 1, it means that this two array has overlapped in this grid.
Count the overlapping frequency.
I have written some code about comparing two 2-d array
### repeat I defined is the estimate matrix to represent overlap or not in [i,j] position
repeat = np.zeros(ka.shape[0]*ka.shape[0]).reshape(ka.shape[0],ka.shape[1])
for i in range(0,ka.shape[0],1):
for j in range(0,ka.shape[1],1):
if (ka[i,j] == 1) & (kb[i,j] == 1) :
repeat [i,j]=1
else:
repeat[u,v] = 0
rep.append(repeat.sum())
rep: the overlapping frequency for these two 2-d array.
http://i4.tietuku.com/7121ee003ce9d034.png
3. My question
When there are more than two 2-d numpy array all in the same shape with value (0,1), How to sum the overlapping frequency?
I can compare multi array in sequence but the repeat grid would be re-counted
More explain
I want to sum the amount of array ka when ka = 1 but (kb & kc & ...) != 1 at grid[i,j] (Which I call it independence as shown in title).
If ka only comparing with kb, I can use rep to achieve that, and I haven't thought out the method dealing with more than 2 array

Why not using the sum of the arrays kb, ... and test the resulting elements?
An example with three grids:
import numpy
# some random arrays
ka = numpy.random.random_integers(0,1,37*31).reshape(31,37)
kb = numpy.random.random_integers(0,1,37*31).reshape(31,37)
kc = numpy.random.random_integers(0,1,37*31).reshape(31,37)
combined_rest = kb + kc
print "independance:", numpy.sum( (ka == 1) & (combined_rest < 2) )

Related

Transforming different arrays into a loop

I was wondering if it was possible to transform these next process into a loop so that I can use one word for this (not as a vector):
Data0 = np.zeros(dem0.shape, dtype=np.int32)
Data0[zipp[0] >= 0 ] = 1
Data1 = np.zeros(dem1.shape, dtype=np.int32)
Data1[zipp[1] >= 0 ] = 1
Data2 = np.zeros(dem2.shape, dtype=np.int32)
Data2[zipp[2] >= 0 ] = 1
Data3 = np.zeros(dem3.shape, dtype=np.int32)
Data3[zipp[3] >= 0 ] = 1
As you can see, there are 4 shapes for each layer (four layers total). I am trying to put a specific/corresponding "zipp" vector position to each dem.shape for each layer I have (in vector zipp[i] each i is an array of each dem).
What I want it to do is to replace with the number 1 those values greater than or equal to zero in the array contained in zipp[i] for each layer/shape/dem.
However, as a result, I must deliver this as a word not a vector or array, so I've been thinking of a loop but haven't been illuminated enough just yet.
Thank you :)
I'm not quite sure what you mean by delivering the result "as a word not a vector or array", but assuming all of these arrays have the same shape you can reduce this to a couple of lines (maybe someone else knows how to do it in 1):
data = np.zeros_like(zipp, dtype=np.int32)
data[zipp >= 0] = 1
If just you want to return a boolean array of where zipp is greater than or equal to 0 you can do that in 1 line like this:
bool = np.greater_equal(zipp, 0)

How do you add create multiple of the same item and add them to an array?

I'm trying to create and an array of shape (1, inter) [i.e. 1 row, inter Columns], where inter is user input;
If you look at the code below,
l_o_s, Inter, n_o_s, L, d_o_s are all from user inputs
The n_o_s represents the number of sections across the total length of the shaft that have lengths corresponding to the values in l_o_s and diameters corresponding to the values in d_o_s.
So
Section 1 has a length of 1.5 and diameter 3.75
Section 2 = length of 4.5-1.5 = 3 and diameter 3.5
Section 3 = length of 7.5-4.5 = 3 and diameter 3.75
and so forth...
Here's an image of the shaft arrangement:
This is a shaft of length = 36, with 13 sections that have different size diameters
Inter is the number of intervals I require in the analysis, in this case inter is 3600, so I require a (1,3600) array.
si is an array that is a function (mathematical) of the length of the individual section in l_o_s, the total length (L) of the system and the interval (Inter).
Here's the question
So if you take every value in
si = [ 150. 450. 750. 1050. 1350. 1650. 1950. 2250. 2550. 2850. 3150. 3450. 3600.]
I require an array of shape (1,3600) whose first 150 elements are all equal to the diameter of section 1 - (3.75), and the elements between 150 and 450 i need them to equal the diameter of the second section (3.5) and so forth...
So i need the first 150 element corresponding to index 0 in d_o_s and the next 300 elements corresponding to index 1 in d_o_s, etc...
Here's a code I began with, but I don't think it's worth talking about. I was creating an array of zeros with inner inner shapes corresponding to each of the 150,300,300,300 elements.
import numpy as np
import math
L = 36
Inter = 3600
n_o_s = 13
l_o_s = np.asarray([1.5,4.5,7.5,10.5,13.5,16.5,19.5,22.5,25.5,28.5,31.5,34.5,36])
d_o_s = np.asarray([3.75,3.5,3.75,3.5,3.75,3.5,3.75,3.5,3.75,3.5,3.75,3.5,3.75])
si = np.asarray((l_o_s/L)*Inter)
print(si)
z = (si.size)
def f(x):
for i in si:
zz = np.zeros((x,1,int(i)))
for j in range(int(z)):
for p in range(int(d_o_s[j])):
zz[j][0][p] = np.full((1,int(i)),(math.pi*d_o_s**4)/64)
return zz
print(f(z))
Any ideas,
Dallan
This is what I ended up with but I'm only receiving 3599 values instead of the required 3600 any ideas? I used the diameter to output another variable (basically swapped the diameters in d_o_s for values in i_o_s)
L = 36
Inter = 3600
n_o_s = 13
l_o_s = np.asarray([0,1.5,4.5,7.5,10.5,13.5,16.5,19.5,22.5,25.5,28.5,31.5,34.5,36])
d_o_s = np.asarray([3.75,3.5,3.75,3.5,3.75,3.5,3.75,3.5,3.75,3.5,3.75,3.5,3.75])
i_o_s = (math.pi*d_o_s**4)/64
si = np.asarray((l_o_s/L)*Inter)
lengths = si[1:] - si[:-1]
Iu = np.asarray(sum([[value]*(int(length)) for value, length in zip(i_o_s, lengths)], []))
print(Iu,Iu.shape)
In python, an operation like 4 *[1] produces [1,1,1,1]. So, you need to calculate the lengths of the subarrays, create them, and concatenate them using sum().
lengths = si[1:] - si[:-1]
result = sum([
[value]*length for value, length in zip(d_o_s, lengths)
], [])
Also, your si array is of type float, so you get a rounding error when used as index. convert it to integer, by changing
si = np.asarray((l_o_s/L)*Inter)
to
si = np.asarray((l_o_s/L)*Inter).astype(int)

python when add lists using array setting elements error

write python program to add list,the first list is a 10*3*11 list(3 dimension), and the second list is also a 10*3*11 list with all elements are 0, add them,use numpy:
data_split_count = 10
cluster_number = 3
total_center_list = [[[[0] for i in range(11)] for j in range(cluster_number)] for kj in range(data_split_count)]
print("1 len total center list")
print(len(total_center_list))
total_center_data_list = minibatchkmeansClustering_no_gender(data_list)
print("total center list")
print(len(total_center_data_list))
print("total center list 0")
print(len(total_center_data_list[0]))
print("total center list 0 0")
print(len(total_center_data_list[0][0]))
print(total_center_data_list[0][1])
print("sum total center list")
temp_test = numpy.array([total_center_data_list,total_center_list])
total_center_list = temp_test.sum(axis = 0)
print(len(total_center_list))
when runnung, it shows:
1 len total center list
10
total center list
10
total center list 0
3
total center list 0 0
11
[ 0.07459313 0.05333272 0.01219305 0.32307944 0.16194127 0.00409273
0.34603601 0.33625275 0.06253664 0.1693817 0.08579227]
sum total center list
File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_stability.py", line 36, in dist_cal
temp_test = numpy.array([total_center_data_list,total_center_list])
ValueError: setting an array element with a sequence
could you please tell me the reason and how to solve it
If you would like to use numpy, it operates on arrays of data. You have to convert your lists to arrays using asarray. Then you can just add two arrays together element-wise, using "+".
import numpy as np
list1=range(3*5*11) # list1 = your total_center_list
a1=np.asarray(list1).reshape((3,5,11)) # converted to numpy array, reshaped to match your dimensions
print a1
list2=range(3*5*11) # list2 = your total_center_data_list
a2=np.asarray(list2).reshape(3,5,11)
a3=a1+a2 # your sum array
print a3.shape # checks dimensions

f2py: how to pass 2 dimension list to fortran 77

I have trouble passing 2D arrays to fortran. I want to combine a bunch of not overlapping spectra. First I select the points on the x-axis, then I interpolate all data to this new, common grid. I store the spectra in a 2D list in python.
This works in Python 2.7, but very slow:
for i in range(len(wlp)):
print wlp[i],
for a in range(len(datax)):
inrange = 0
if datax[a][0] >= wlp[i] or datax[a][-1] <= wlp[i]:
for b in range(len(datax[a])-1):
if float(datax[a][b]) <= wlp[i] and float(datax[a][b+1]) >= wlp[i]:
sp = float(datax[a][b]); ep = float(datax[a][b+1])
delx = ep-sp; dely = float(data[a][b+1])-float(data[a][b])
ji = (dely/delx)*(wlp[i]-sp)+float(data[a][b])
inrange = 1
if inrange == 0: ji = '?0'
else: ji = ji * weights[a]
print ji,
print
The common x-grid is printed in column one and all the interpolated spectra are printed in subsequent columns. If there are some shorter ones out of range, it prints "?0". This helps to set up proper weights for each datapoints later.
I ended up having this fortran subroutine to speed it up with f2py:
c wlp = x axis points (wavelength)
c lenwlp = length of list wlp, len(wlp)
c datay = 2D python list with flux
c datax = 2D python list with wavelength
c lendatax = number of spectra, len(datax)
c datax_pl = list of the lengths of all spectra
c weights = list of optional weights
c maxi = length of the longest spectrum
C============================================================================80
SUBROUTINE DOIT(wlp,lenwlp,datay,datax,lendatax,datax_pl,
. weights,maxi)
C============================================================================80
INTEGER I,a,b,lenwlp,inrange,datax_pl(*),maxi,lendatax
DOUBLE PRECISION WLP(*),SP,EP,DELY,DELX,ji
DOUBLE PRECISION WEIGHTS(*)
DOUBLE PRECISION DATAY(lendatax,maxi)
DOUBLE PRECISION DATAX(lendatax,maxi)
2 FORMAT (E20.12, 1X, $)
3 FORMAT (A, $)
4 FORMAT (1X)
I = 1
DO WHILE (I.LE.lenwlp)
WRITE(*,2) WLP(I)
DO a=1,lendatax
inrange = 0
ji = 0.0
IF (datax(a,1).ge.WLP(I) .or.
. datax(a,datax_pl(a)).le.WLP(I)) THEN
DO b=1,datax_pl(a)-1
IF (DATAX(a,b).LE.WLP(I) .and.
. DATAX(a,b+1).GE.WLP(I)) THEN
SP = DATAX(a,b); EP = DATAX(a,b+1)
DELX = EP - SP; DELY = datay(a,b+1)-datay(a,b)
if (delx.eq.0.0) then
ji = datay(a,b)
else
ji = (DELY/DELX)*(WLP(I)-SP)+datay(a,b)
end if
inrange = 1
END IF
END DO
END IF
IF (inrange.eq.0) THEN
WRITE(*,3) ' ?0'
ELSE
WRITE(*,2) ji*WEIGHTS(a)
END IF
END DO
I = I + 1
write(*,4)
END DO
END
which compiles with gfortran 4.8 fine. Then I import it in the Python code, set up the lists and run the subroutine:
import subroutines
wlp = [...]
data = [[...],[...],[...]]
datax = [[...],[...],[...]]
datax_pl = [...]
weights = [...]
maxi = max(datax_pl)
subroutines.doit(wlp,len(wlp),data,datax,len(datax),datax_pl,weights,maxi)
and it returns:
ValueError: setting an array element with a sequence.
I pass the lists and the length of the longest spectrum (maxi), this should define the maximum dimension in fortran (?).
I don't need return values, everything is printed on stdout.
The problem must be right at the beginning at the array declarations. I don't have experience with this... any advice is appreciated.
As I said in the comment, you cannot pass Python lists to f2py procedures. You MUST use numpy arrays, which are compatible with Fortran or C arrays.
The error message you show comes from this problem.
You can create the array from a list http://docs.scipy.org/doc/numpy/user/basics.creation.html

Speeding up computations with numpy matrices

I have two matrices. Both are filled with zeros and ones. One is a big one (3000 x 2000 elements), and the other is smaller ( 20 x 20 ) elements. I am doing something like:
newMatrix = (size of bigMatrix), filled with zeros
l = (a constant)
for y in xrange(0, len(bigMatrix[0])):
for x in xrange(0, len(bigMatrix)):
for b in xrange(0, len(smallMatrix[0])):
for a in xrange(0, len(smallMatrix)):
if (bigMatrix[x, y] == smallMatrix[x + a - l, y + b - l]):
newMatrix[x, y] = 1
Which is being painfully slow. Am I doing anything wrong? Is there a smart way to make this work faster?
edit: Basically I am, for each (x,y) in the big matrix, checking all the pixels of both big matrix and the small matrix around (x,y) to see if they are 1. If they are 1, then I set that value on newMatrix. I am doing a sort of collision detection.
I can think of a couple of optimisations there -
As you are using 4 nested python "for" statements, you are about as slow as you can be.
I can't figure out exactly what you are looking for -
but for one thing, if your big matrix "1"s density is low, you can certainly use python's "any" function on bigMtarix's slices to quickly check if there are any set elements there -- you could get a several-fold speed increase there:
step = len(smallMatrix[0])
for y in xrange(0, len(bigMatrix[0], step)):
for x in xrange(0, len(bigMatrix), step):
if not any(bigMatrix[x: x+step, y: y + step]):
continue
(...)
At this point, if still need to interact on each element, you do another pair of indexes to walk each position inside the step - but I think you got the idea.
Apart from using inner Numeric operations like this "any" usage, you could certainly add some control flow code to break-off the (b,a) loop when the first matching pixel is found.
(Like, inserting a "break" statement inside your last "if" and another if..break pair for the "b" loop.
I really can't figure out exactly what your intent is - so I can't give you more specifc code.
Your example code makes no sense, but the description of your problem sounds like you are trying to do a 2d convolution of a small bitarray over the big bitarray. There's a convolve2d function in scipy.signal package that does exactly this. Just do convolve2d(bigMatrix, smallMatrix) to get the result. Unfortunately the scipy implementation doesn't have a special case for boolean arrays so the full convolution is rather slow. Here's a function that takes advantage of the fact that the arrays contain only ones and zeroes:
import numpy as np
def sparse_convolve_of_bools(a, b):
if a.size < b.size:
a, b = b, a
offsets = zip(*np.nonzero(b))
n = len(offsets)
dtype = np.byte if n < 128 else np.short if n < 32768 else np.int
result = np.zeros(np.array(a.shape) + b.shape - (1,1), dtype=dtype)
for o in offsets:
result[o[0]:o[0] + a.shape[0], o[1]:o[1] + a.shape[1]] += a
return result
On my machine it runs in less than 9 seconds for a 3000x2000 by 20x20 convolution. The running time depends on the number of ones in the smaller array, being 20ms per each nonzero element.
If your bits are really packed 8 per byte / 32 per int,
and you can reduce your smallMatrix to 20x16,
then try the following, here for a single row.
(newMatrix[x, y] = 1 when any bit of the 20x16 around x,y is 1 ??
What are you really looking for ?)
python -m timeit -s '
""" slide 16-bit mask across 32-bit pairs bits[j], bits[j+1] """
import numpy as np
bits = np.zeros( 2000 // 16, np.uint16 ) # 2000 bits
bits[::8] = 1
mask = 32+16
nhit = 16 * [0]
def hit16( bits, mask, nhit ):
"""
slide 16-bit mask across 32-bit pairs bits[j], bits[j+1]
bits: long np.array( uint16 )
mask: 16 bits, int
out: nhit[j] += 1 where pair & mask != 0
"""
left = bits[0]
for b in bits[1:]:
pair = (left << 16) | b
if pair: # np idiom for non-0 words ?
m = mask
for j in range(16):
if pair & m:
nhit[j] += 1
# hitposition = jb*16 + j
m <<= 1
left = b
# if any(nhit): print "hit16:", nhit
' \
'
hit16( bits, mask, nhit )
'
# 15 msec per loop, bits[::4] = 1
# 11 msec per loop, bits[::8] = 1
# mac g4 ppc

Categories