I'm using python 2.7 and numpy (import numpy as np).
I have a list of x-y coordinates in the following shape:
coords = np.zeros((100, 2), dtype=np.int)
I have a list of values corresponding to these coordinates:
values = np.zeros(100, dtype=np.int)
My program is populating these arrays.
Now, for each coordinate, I want to find neighbours within radius r that have a non-zero value. What's the most efficient way to do that?
Demo:
import pandas as pd
from scipy.spatial.distance import pdist, squareform
In [101]: np.random.seed(123)
In [102]: coords = np.random.rand(20, 2)
In [103]: r = 0.3
In [104]: d = pd.DataFrame(squareform(pdist(coords)))
In [105]: d
Out[105]:
0 1 2 3 4 5 6 7 8 9 10 11 12 \
0 0.000000 0.539313 0.138885 0.489671 0.240183 0.566555 0.343214 0.541508 0.525761 0.295906 0.566702 0.326087 0.045059
1 0.539313 0.000000 0.509028 0.765644 0.299834 0.212418 0.535287 0.253292 0.378472 0.305322 0.504946 0.501173 0.545672
2 0.138885 0.509028 0.000000 0.369830 0.240542 0.484970 0.459329 0.449965 0.591335 0.217102 0.434730 0.187983 0.100192
3 0.489671 0.765644 0.369830 0.000000 0.579235 0.639118 0.827519 0.585140 0.946945 0.474554 0.383486 0.266724 0.444612
4 0.240183 0.299834 0.240542 0.579235 0.000000 0.364005 0.335128 0.355671 0.368796 0.148598 0.482379 0.327450 0.251218
5 0.566555 0.212418 0.484970 0.639118 0.364005 0.000000 0.676135 0.055591 0.576447 0.272729 0.315123 0.399127 0.555655
6 0.343214 0.535287 0.459329 0.827519 0.335128 0.676135 0.000000 0.679527 0.281035 0.481218 0.813671 0.621056 0.387169
7 0.541508 0.253292 0.449965 0.585140 0.355671 0.055591 0.679527 0.000000 0.602427 0.245620 0.261309 0.350237 0.526773
8 0.525761 0.378472 0.591335 0.946945 0.368796 0.576447 0.281035 0.602427 0.000000 0.498845 0.811462 0.695304 0.559738
9 0.295906 0.305322 0.217102 0.474554 0.148598 0.272729 0.481218 0.245620 0.498845 0.000000 0.333842 0.208528 0.282959
10 0.566702 0.504946 0.434730 0.383486 0.482379 0.315123 0.813671 0.261309 0.811462 0.333842 0.000000 0.254850 0.533784
11 0.326087 0.501173 0.187983 0.266724 0.327450 0.399127 0.621056 0.350237 0.695304 0.208528 0.254850 0.000000 0.288072
12 0.045059 0.545672 0.100192 0.444612 0.251218 0.555655 0.387169 0.526773 0.559738 0.282959 0.533784 0.288072 0.000000
13 0.339648 0.350100 0.407307 0.769145 0.202592 0.501132 0.185248 0.511020 0.186913 0.347808 0.678357 0.527288 0.372879
14 0.530211 0.104003 0.473790 0.689158 0.303486 0.109841 0.589377 0.149459 0.468906 0.257676 0.404710 0.431203 0.527905
15 0.622118 0.178856 0.627453 0.923461 0.391044 0.387645 0.509836 0.431502 0.273610 0.450269 0.683313 0.656742 0.639993
16 0.337079 0.211995 0.297111 0.582175 0.113238 0.251168 0.434076 0.246505 0.403684 0.107671 0.409858 0.316172 0.337886
17 0.271897 0.311029 0.313864 0.668400 0.097022 0.424905 0.252905 0.426640 0.279160 0.243693 0.576241 0.422417 0.296806
18 0.664617 0.395999 0.554151 0.592343 0.504234 0.184188 0.833801 0.157951 0.758223 0.376555 0.212643 0.410605 0.642698
19 0.328445 0.719013 0.238085 0.186618 0.476045 0.642499 0.671657 0.594990 0.828653 0.413697 0.465589 0.245340 0.284878
13 14 15 16 17 18 19
0 0.339648 0.530211 0.622118 0.337079 0.271897 0.664617 0.328445
1 0.350100 0.104003 0.178856 0.211995 0.311029 0.395999 0.719013
2 0.407307 0.473790 0.627453 0.297111 0.313864 0.554151 0.238085
3 0.769145 0.689158 0.923461 0.582175 0.668400 0.592343 0.186618
4 0.202592 0.303486 0.391044 0.113238 0.097022 0.504234 0.476045
5 0.501132 0.109841 0.387645 0.251168 0.424905 0.184188 0.642499
6 0.185248 0.589377 0.509836 0.434076 0.252905 0.833801 0.671657
7 0.511020 0.149459 0.431502 0.246505 0.426640 0.157951 0.594990
8 0.186913 0.468906 0.273610 0.403684 0.279160 0.758223 0.828653
9 0.347808 0.257676 0.450269 0.107671 0.243693 0.376555 0.413697
10 0.678357 0.404710 0.683313 0.409858 0.576241 0.212643 0.465589
11 0.527288 0.431203 0.656742 0.316172 0.422417 0.410605 0.245340
12 0.372879 0.527905 0.639993 0.337886 0.296806 0.642698 0.284878
13 0.000000 0.408426 0.339019 0.274263 0.105627 0.668252 0.643427
14 0.408426 0.000000 0.282070 0.194058 0.345013 0.294029 0.663142
15 0.339019 0.282070 0.000000 0.344028 0.355134 0.568361 0.854775
16 0.274263 0.194058 0.344028 0.000000 0.181494 0.399730 0.513362
17 0.105627 0.345013 0.355134 0.181494 0.000000 0.581128 0.551910
18 0.668252 0.294029 0.568361 0.399730 0.581128 0.000000 0.649183
19 0.643427 0.663142 0.854775 0.513362 0.551910 0.649183 0.000000
result:
In [107]: d[(0 < d) & (d < r)].apply(lambda x: x.dropna().index.tolist())
Out[107]:
0 [2, 4, 9, 12, 17]
1 [4, 5, 7, 14, 15, 16]
2 [0, 4, 9, 11, 12, 16, 19]
3 [11, 19]
4 [0, 1, 2, 9, 12, 13, 16, 17]
5 [1, 7, 9, 14, 16, 18]
6 [8, 13, 17]
7 [1, 5, 9, 10, 14, 16, 18]
8 [6, 13, 15, 17]
9 [0, 2, 4, 5, 7, 11, 12, 14, 16, 17]
10 [7, 11, 18]
11 [2, 3, 9, 10, 12, 19]
12 [0, 2, 4, 9, 11, 17, 19]
13 [4, 6, 8, 16, 17]
14 [1, 5, 7, 9, 15, 16, 18]
15 [1, 8, 14]
16 [1, 2, 4, 5, 7, 9, 13, 14, 17]
17 [0, 4, 6, 8, 9, 12, 13, 16]
18 [5, 7, 10, 14]
19 [2, 3, 11, 12]
dtype: object
You can also do this only in numpy and scipy, I find it faster.
from scipy.spatial.distance import pdist, squareform
import numpy
SIZE=512
N_PARTICLE=100
RADIUS = 15
VALUE_THRESHOLD = 0
coords = numpy.random.randint(0, SIZE, size=(N_PARTICLE, 2))
values = numpy.random.randint(0, 2, (N_PARTICLE))
square_dist = squareform(pdist(coords, metric='euclidean'))
condlist = []
for i, row in enumerate(square_dist[:]):
condlist.append(numpy.where((values>VALUE_THRESHOLD) & (row < RADIUS) & (row > 0))[0].tolist())
It must be a better way to do it thoughtfully.
Related
I have a pandas dataframe like:
I need to style it using a list of lists like:
[[3, 7, 4, 5],
[6, 17, 5, 10, 13, 16],
[7, 22, 6, 17, 19, 12],
[12, 26, 24, 25, 23, 18, 20],
[21, 20, 18, 27, 25]]
If R1 values are in first list color blue, if R2 values are in second list color blue and so on.
In other words color numbers of each column if value is in the correspondent list.
I have tried:
def posclass(val):
color = 'black'
for i in range(5):
if (val in list[i]):
color = 'blue'
return 'color: %s' % color
df.style.applymap(posclass, subset=['R1','R2','R3','R4','R5'])
But this is not working properly applying each list to each column.
The desired result is a dataframe with colored numbers (those that matches in each column with each list).
Try something like this:
df = pd.DataFrame(np.arange(40).reshape(-1,4), columns=[f'R{i}' for i in range(1,5)])
Input df:
R1 R2 R3 R4
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
5 20 21 22 23
6 24 25 26 27
7 28 29 30 31
8 32 33 34 35
9 36 37 38 39
and
list_l = [[3, 7, 4, 5],
[6, 17, 5, 10, 13, 16],
[7, 22, 6, 17, 19, 12],
[12, 26, 24, 25, 23, 18, 20],
[21, 20, 18, 27, 25]]
Then:
def f(x):
colpos = df.columns.get_loc(x.name)
return ['color: blue' if n in list_l[colpos] else '' for n in x]
df.style.apply(f)
Output:
I have a data frame with the temperatures recorded per day/month/year.
Then I find the lowest temperature from each month using groupby and min functions, which gives a data series with multiple index.
How can I drop a value from a specific year and month? eg. year 2005 month 12?
# Find the lowest value per each month
[In] low = df.groupby([df['Date'].dt.year,df['Date'].dt.month])['Data_Value'].min()
[In] low
[Out]
Date Date
2005 1 -60
2 -114
3 -153
4 -13
5 -14
6 26
7 83
8 65
9 21
10 36
11 -36
12 -86
2006 1 -75
2 -53
3 -83
4 -30
5 36
6 17
7 85
8 82
9 66
10 40
11 -2
12 -32
2007 1 -63
2 -42
3 -21
4 -11
5 28
6 74
7 73
8 61
9 46
10 -33
11 -37
12 -97
[In] low.index
[Out] MultiIndex(levels=[[2005, 2006, 2007], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
names=['Date', 'Date'])
This works.
#dummy data
mux = pd.MultiIndex.from_arrays([
(2017,)*12 + (2018,)*12,
list(range(1, 13))*2
], names=['year', 'month'])
df = pd.DataFrame({'value': np.random.randint(1, 20, (len(mux)))}, mux)
Then just use drop.
df.drop((2017, 12), inplace=True)
>>> print(df)
value
year month
2017 1 18
2 13
3 14
4 1
5 8
6 19
7 19
8 8
9 11
10 5
11 7 <<<
2018 1 9
2 18
3 9
4 14
5 7
6 4
7 6
8 12
9 12
10 1
11 19
12 10
How to change the sign in the series, if I have:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
and need to get:
1, 2, 3, -4, -5, -6, 8, 9, 10, -11, -12, -13
I need to be able to set the period (now it is equal to 3) and the index from which the function starts (now it is equal to 3).
For example, if I specify 2 as the index, I get
1, 2, -3, -4, -5, 6, 8, 9, -10, -11, -12, 13
I need to apply this function sequentially to each column, since applying to the entire DataFrame leads to a memory error.
Use numpy.where with integer division by (//) and modulo (%) for boolean mask:
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
N = 3
#if default RangeIndex
m = (s.index // N) % 2 == 1
#general index
#m = (np.arange(len(s.index)) // N) % 2 == 1
s = pd.Series(np.where(m, -s, s))
print (s)
0 1
1 2
2 3
3 -4
4 -5
5 -6
6 7
7 8
8 9
9 -10
10 -11
11 -12
12 13
dtype: int64
EDIT:
N = 3
M = 1
m = np.concatenate([np.repeat(False, M),
(np.arange(len(s.index) - M) // N) % 2 == 0])
s = pd.Series(np.where(m, -s, s))
print (s)
0 1
1 -2
2 -3
3 -4
4 5
5 6
6 7
7 -8
8 -9
9 -10
10 11
11 12
12 13
dtype: int64
I have a list A of the form:
A = ['P', 'Q', 'R', 'S', 'T', 'U']
and an array B of the form:
B = [[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]
[19 20 21 22 23 24]]
now I would like to create a structured array C of the form:
C = [[ P Q R S T U]
[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]
[19 20 21 22 23 24]]
so that I can extract columns with column names P, Q, R, etc. I tried the following code but it does not create a structured array and gives the following error.
Code
import numpy as np
A = (['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]])
C = np.vstack((A, B))
print (C)
D = C['P']
Error
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
How to create structured array in Python in this case?
Update
Both are variables, their shape changes during runtime but both list and array will have the same number of columns.
If you want to do it in pure numpy you can do
A = np.array(['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24]])
# define the structured array with the names from A
C = np.zeros(B.shape[0],dtype={'names':A,'formats':['f8','f8','f8','f8','f8','f8']})
# copy the data from B into C
for i,n in enumerate(A):
C[n] = B[:,i]
C['Q']
array([ 2., 8., 14., 20.])
Edit: you can automatize the format list by using instead
C = np.zeros(B.shape[0],dtype={'names':A,'formats':['f8' for x in range(A.shape[0])]})
Furthermore, the names do not appear in C as data but in dtype. In order to get the names from C you can use
C.dtype.names
This is what the pandas library is for:
>>> A = ['P', 'Q', 'R', 'S', 'T', 'U']
>>> B = np.arange(1, 25).reshape(4, 6)
>>> B
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24]])
>>> import pandas as pd
>>> pd.DataFrame(B, columns=A)
P Q R S T U
0 1 2 3 4 5 6
1 7 8 9 10 11 12
2 13 14 15 16 17 18
3 19 20 21 22 23 24
>>> df = pd.DataFrame(B, columns=A)
>>> df['P']
0 1
1 7
2 13
3 19
Name: P, dtype: int64
>>> df['T']
0 5
1 11
2 17
3 23
Name: T, dtype: int64
>>>
http://pandas.pydata.org/pandas-docs/dev/tutorials.html
Your error occurs on:
D = C['P']
Here is a simple approach, using regular Python lists on the title row.
import numpy as np
A = (['P', 'Q', 'R', 'S', 'T', 'U'])
B = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]])
C = np.vstack((A, B))
print (C)
D = C[0:len(C), list(C[0]).index('P')]
print (D)
So I found this:
When converting MATLAB code it might be necessary to first reshape a
matrix to a linear sequence, perform some indexing operations and then
reshape back. As reshape (usually) produces views onto the same
storage, it should be possible to do this fairly efficiently.
Note that the scan order used by reshape in Numpy defaults to the 'C'
order, whereas MATLAB uses the Fortran order. If you are simply
converting to a linear sequence and back this doesn't matter. But if
you are converting reshapes from MATLAB code which relies on the scan
order, then this MATLAB code:
z = reshape(x,3,4);
should become
z = x.reshape(3,4,order='F').copy()
in Numpy.
I have a multidimensional 16*2 array called mafs, when I do in MATLAB:
mafs2 = reshape(mafs,[4,4,2])
I get something different than when in python I do:
mafs2 = reshape(mafs,(4,4,2))
or even
mafs2 = mafs.reshape((4,4,2),order='F').copy()
Any help on this? Thank you all.
Example:
MATLAB:
>> mafs = [(1:16)' (17:32)']
mafs =
1 17
2 18
3 19
4 20
5 21
6 22
7 23
8 24
9 25
10 26
11 27
12 28
13 29
14 30
15 31
16 32
>> reshape(mafs,[4 4 2])
ans(:,:,1) =
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
ans(:,:,2) =
17 21 25 29
18 22 26 30
19 23 27 31
20 24 28 32
Python:
>>> import numpy as np
>>> mafs = np.c_[np.arange(1,17), np.arange(17,33)]
>>> mafs.shape
(16, 2)
>>> mafs[:,0]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
>>> mafs[:,1]
array([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32])
>>> r = np.reshape(mafs, (4,4,2), order="F")
>>> r.shape
(4, 4, 2)
>>> r[:,:,0]
array([[ 1, 5, 9, 13],
[ 2, 6, 10, 14],
[ 3, 7, 11, 15],
[ 4, 8, 12, 16]])
>>> r[:,:,1]
array([[17, 21, 25, 29],
[18, 22, 26, 30],
[19, 23, 27, 31],
[20, 24, 28, 32]])
I was having a similar issue myself, as I am also trying to make the transition from MATLAB to Python. I was finally able to convert a numpy matrix, given in depth, row, col, format to a single sheet of column vectors (per image).
In MATLAB I would have done something like:
output = reshape(imStack,[row*col,depth])
In Python this seems to translate to:
import numpy as np
output=np.transpose(imStack)
output=output.reshape((row*col, depth), order='F')