How to drop a value from data series with multiple index? - python

I have a data frame with the temperatures recorded per day/month/year.
Then I find the lowest temperature from each month using groupby and min functions, which gives a data series with multiple index.
How can I drop a value from a specific year and month? eg. year 2005 month 12?
# Find the lowest value per each month
[In] low = df.groupby([df['Date'].dt.year,df['Date'].dt.month])['Data_Value'].min()
[In] low
[Out]
Date Date
2005 1 -60
2 -114
3 -153
4 -13
5 -14
6 26
7 83
8 65
9 21
10 36
11 -36
12 -86
2006 1 -75
2 -53
3 -83
4 -30
5 36
6 17
7 85
8 82
9 66
10 40
11 -2
12 -32
2007 1 -63
2 -42
3 -21
4 -11
5 28
6 74
7 73
8 61
9 46
10 -33
11 -37
12 -97
[In] low.index
[Out] MultiIndex(levels=[[2005, 2006, 2007], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
names=['Date', 'Date'])

This works.
#dummy data
mux = pd.MultiIndex.from_arrays([
(2017,)*12 + (2018,)*12,
list(range(1, 13))*2
], names=['year', 'month'])
df = pd.DataFrame({'value': np.random.randint(1, 20, (len(mux)))}, mux)
Then just use drop.
df.drop((2017, 12), inplace=True)
>>> print(df)
value
year month
2017 1 18
2 13
3 14
4 1
5 8
6 19
7 19
8 8
9 11
10 5
11 7 <<<
2018 1 9
2 18
3 9
4 14
5 7
6 4
7 6
8 12
9 12
10 1
11 19
12 10

Related

Filling 0 with previous value at index

I have a df:
1 2 3 4 5 6 7 8 9 10
A 10 0 0 15 0 21 45 0 0 7
I am trying fill index A values with the current value if the next value is 0 so that the df would look like this:
1 2 3 4 5 6 7 8 9 10
A 10 10 10 15 15 21 45 45 45 7
I tried:
df.loc[['A']].replace(to_replace=0, method='ffill').values
But this does not work, where is my mistake?
If you want to use your method, you need to work with Series on both sides:
df.loc['A'] = df.loc['A'].replace(to_replace=0, method='ffill')
Alternatively, you can mask the 0 with NaNs, and ffill the data on axis=1:
df.mask(df.eq(0)).ffill(axis=1)
output:
1 2 3 4 5 6 7 8 9 10
A 10.0 10.0 10.0 15.0 15.0 21.0 45.0 45.0 45.0 7.0
Well you should change your code a little bit and work with series:
import pandas as pd
df = pd.DataFrame({'1': [10], '2': [0], '3': [0], '4': [15], '5': [0],
'6': [21], '7': [45], '8': [0], '9': [0], '10': [7]},
index=['A'])
print(df.apply(lambda x: pd.Series(x.values).replace(to_replace=0, method='ffill').values, axis=1))
Output:
A [10, 10, 10, 15, 15, 21, 45, 45, 45, 7]
dtype: object
This way, if you have multiple indices, the code still works:
import pandas as pd
df = pd.DataFrame({'1': [10, 11], '2': [0, 12], '3': [0, 0], '4': [15, 0], '5': [0, 3],
'6': [21, 3], '7': [45, 0], '8': [0, 4], '9': [0, 5], '10': [7, 0]},
index=['A', 'B'])
print(df.apply(lambda x: pd.Series(x.values).replace(to_replace=0, method='ffill').values, axis=1))
Output:
A [10, 10, 10, 15, 15, 21, 45, 45, 45, 7]
B [11, 12, 12, 12, 3, 3, 3, 4, 5, 5]
dtype: object
df.applymap(lambda x:pd.NA if x==0 else x).fillna(method='ffill',axis=1)
1 2 3 4 5 6 7 8 9 10
A 10 10 10 15 15 21 45 45 45 7

Need to find the greatest sum of five adjacent numbers in a 25x25 grid

import numpy as np
Bees = open("BeesData.txt", "r")
Bees = Bees.read()
Bees = Bees.split( )
for i in range(0, len(Bees)):
Bees[i] = int(Bees[i])
Bees = np.reshape(Bees,(25,25))
def suma5x5(x,y):
suma=0
This is the BeesData file:
https://drive.google.com/file/d/1aWcLZq2MuGENavoTnCfokXr1Nnyygz23/view?usp=sharing![enter image description here](https://i.stack.imgur.com/G9u5P.png)
I took this as kind of a learning exercise for myself. Not a numpy user really.
TLDR:
search_area = np.arange(25).reshape(5,5)
search_kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
results = convolve2d(search_area, search_kernel, mode='same')
max_index = np.argmax(results, axis=None)
max_location = np.unravel_index(max_index, results.shape)
print(max_location)
Assuming that 5 adjacent means: up, down, left, right, center, then you can find the value using a convolution.
Assume that we want to find the sum of every 3x3 block, only for the values marked as 1's:
[[0, 1, 0],
[1, 1, 1],
[0, 1, 0]]
This shape can be used as your kernel for a convolution. It will be used to sum up every value in a 3x3 square by multiplying their corresponding values by 1. e.g for
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
You would get 1x0 + 2x1 + 3x0 + 4x1 + 5x1 + 6x1 + 7x0 + 8x1 + 9x0 = 25
scipy has a method for this called convolve2d.
import numpy as np
from scipy.signal import convolve2d
search_area = np.arange(36).reshape(6,6)
search_kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
results = convolve2d(search_area, search_kernel)
print(search_area)
print(results)
This outputs:
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
[[ 0 0 1 2 3 4 5 0]
[ 0 7 10 14 18 22 20 5]
[ 6 25 35 40 45 50 43 11]
[ 12 49 65 70 75 80 67 17]
[ 18 73 95 100 105 110 91 23]
[ 24 97 125 130 135 140 115 29]
[ 30 85 118 122 126 130 98 35]
[ 0 30 31 32 33 34 35 0]]
Because we included the edges as part of the convolution, you'll see that the result size is now 8x8 instead of the original 6x6. For values it couldn't find because they go off the edge of the array, the method assumed a value of zero.
To discard the edges you can use the same mode, which make it drop these edges from the results:
results = convolve2d(search_area, search_kernel, mode='same')
print(search_area)
print(results)
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
[[ 7 10 14 18 22 20]
[ 25 35 40 45 50 43]
[ 49 65 70 75 80 67]
[ 73 95 100 105 110 91]
[ 97 125 130 135 140 115]
[ 85 118 122 126 130 98]]
Now to find the location with the most bees, you can use argmax to get the index of the largest value, and unravel_index to get this as a location in the original shape.
max_index = np.argmax(results, axis=None)
max_location = np.unravel_index(max_index, results.shape)
print(max_location)
(4, 4)
Assuming window size of 5x5 from your last 2 lines of code:
def suma5x5(x,y):
suma=0
tried a simple basic python + numpy operation code.
arr = np.array([[2, 3, 7, 4, 6, 2, 9],
[6, 6, 9, 8, 7, 4, 3],
[3, 4, 8, 3, 8, 9, 7],
[7, 8, 3, 6, 6, 3, 4],
[4, 2, 1, 8, 3, 4, 6],
[3, 2, 4, 1, 9, 8, 3],
[0, 1, 3, 9, 2, 1, 4]])
w_size = 5
res = max(sum(sum(a)) for a in (arr[row:row+w_size, col:col+w_size] for row in range(arr.shape[0]-w_size+1) for col in range(arr.shape[1]-w_size+1)))
print(res) # 138

Sort rows and removing NaN values

I have a dataset looks like below:
state Item_Number
0 AP 1.0, 4.0, 20.0, 2.0, 11.0, 7.0
1 GOA 1.0, 4.0, nan, 2.0, 8.0, nan
2 GU 1.0, 4.0, 13.0, 2.0, 11.0, 7.0
3 KA 1.0, 23.0, nan, nan, 11.0, 7.0
4 MA 1.0, 14.0, 13.0, 2.0, 19.0, 21.0
I want to remove NaN values and sort the rows, as well as convert float to int. After completion the dataset should looks like below:
state Item_Number
0 AP 1, 2, 4, 7, 11, 20
1 GOA 1, 2, 4, 8
2 GU 1, 2, 4, 7, 11, 13
3 KA 1, 7, 11, 23
4 MA 1, 2, 13, 14, 19, 21
Another solution using Series.str.split and Series.apply:
df['Item_Number'] = (df.Item_Number.str.split(',')
.apply(lambda x: ', '.join([str(z) for z in sorted([int(float(y)) for y in x if 'nan' not in y])])))
[out]
state Item_Number
0 AP 1, 2, 4, 7, 11, 20
1 GOA 1, 2, 4, 8
2 GU 1, 2, 4, 7, 11, 13
3 KA 1, 7, 11, 23
4 MA 1, 2, 13, 14, 19, 21
Use list comprehension with remove missing values by principe NaN != NaN:
df['Item_Number'] = [sorted([int(float(y)) for y in x.split(',') if float(y) == float(y)]) for x in df['Item_Number']]
print (df)
state Item_Number
0 AP [1, 2, 4, 7, 11, 20]
1 GOA [1, 2, 4, 8]
2 GU [1, 2, 4, 7, 11, 13]
3 KA [1, 7, 11, 23]
4 MA [1, 2, 13, 14, 19, 21]
If need strings:
df['Item_Number'] = [' '.join(map(str, sorted([int(float(y)) for y in x.split(',') if float(y) == float(y)]))) for x in df['Item_Number']]
print (df)
state Item_Number
0 AP 1 2 4 7 11 20
1 GOA 1 2 4 8
2 GU 1 2 4 7 11 13
3 KA 1 7 11 23
4 MA 1 2 13 14 19 21

Change the sign of the number in the pandas series

How to change the sign in the series, if I have:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
and need to get:
1, 2, 3, -4, -5, -6, 8, 9, 10, -11, -12, -13
I need to be able to set the period (now it is equal to 3) and the index from which the function starts (now it is equal to 3).
For example, if I specify 2 as the index, I get
1, 2, -3, -4, -5, 6, 8, 9, -10, -11, -12, 13
I need to apply this function sequentially to each column, since applying to the entire DataFrame leads to a memory error.
Use numpy.where with integer division by (//) and modulo (%) for boolean mask:
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
N = 3
#if default RangeIndex
m = (s.index // N) % 2 == 1
#general index
#m = (np.arange(len(s.index)) // N) % 2 == 1
s = pd.Series(np.where(m, -s, s))
print (s)
0 1
1 2
2 3
3 -4
4 -5
5 -6
6 7
7 8
8 9
9 -10
10 -11
11 -12
12 13
dtype: int64
EDIT:
N = 3
M = 1
m = np.concatenate([np.repeat(False, M),
(np.arange(len(s.index) - M) // N) % 2 == 0])
s = pd.Series(np.where(m, -s, s))
print (s)
0 1
1 -2
2 -3
3 -4
4 5
5 6
6 7
7 -8
8 -9
9 -10
10 11
11 12
12 13
dtype: int64

Radius search in list of coordinates

I'm using python 2.7 and numpy (import numpy as np).
I have a list of x-y coordinates in the following shape:
coords = np.zeros((100, 2), dtype=np.int)
I have a list of values corresponding to these coordinates:
values = np.zeros(100, dtype=np.int)
My program is populating these arrays.
Now, for each coordinate, I want to find neighbours within radius r that have a non-zero value. What's the most efficient way to do that?
Demo:
import pandas as pd
from scipy.spatial.distance import pdist, squareform
In [101]: np.random.seed(123)
In [102]: coords = np.random.rand(20, 2)
In [103]: r = 0.3
In [104]: d = pd.DataFrame(squareform(pdist(coords)))
In [105]: d
Out[105]:
0 1 2 3 4 5 6 7 8 9 10 11 12 \
0 0.000000 0.539313 0.138885 0.489671 0.240183 0.566555 0.343214 0.541508 0.525761 0.295906 0.566702 0.326087 0.045059
1 0.539313 0.000000 0.509028 0.765644 0.299834 0.212418 0.535287 0.253292 0.378472 0.305322 0.504946 0.501173 0.545672
2 0.138885 0.509028 0.000000 0.369830 0.240542 0.484970 0.459329 0.449965 0.591335 0.217102 0.434730 0.187983 0.100192
3 0.489671 0.765644 0.369830 0.000000 0.579235 0.639118 0.827519 0.585140 0.946945 0.474554 0.383486 0.266724 0.444612
4 0.240183 0.299834 0.240542 0.579235 0.000000 0.364005 0.335128 0.355671 0.368796 0.148598 0.482379 0.327450 0.251218
5 0.566555 0.212418 0.484970 0.639118 0.364005 0.000000 0.676135 0.055591 0.576447 0.272729 0.315123 0.399127 0.555655
6 0.343214 0.535287 0.459329 0.827519 0.335128 0.676135 0.000000 0.679527 0.281035 0.481218 0.813671 0.621056 0.387169
7 0.541508 0.253292 0.449965 0.585140 0.355671 0.055591 0.679527 0.000000 0.602427 0.245620 0.261309 0.350237 0.526773
8 0.525761 0.378472 0.591335 0.946945 0.368796 0.576447 0.281035 0.602427 0.000000 0.498845 0.811462 0.695304 0.559738
9 0.295906 0.305322 0.217102 0.474554 0.148598 0.272729 0.481218 0.245620 0.498845 0.000000 0.333842 0.208528 0.282959
10 0.566702 0.504946 0.434730 0.383486 0.482379 0.315123 0.813671 0.261309 0.811462 0.333842 0.000000 0.254850 0.533784
11 0.326087 0.501173 0.187983 0.266724 0.327450 0.399127 0.621056 0.350237 0.695304 0.208528 0.254850 0.000000 0.288072
12 0.045059 0.545672 0.100192 0.444612 0.251218 0.555655 0.387169 0.526773 0.559738 0.282959 0.533784 0.288072 0.000000
13 0.339648 0.350100 0.407307 0.769145 0.202592 0.501132 0.185248 0.511020 0.186913 0.347808 0.678357 0.527288 0.372879
14 0.530211 0.104003 0.473790 0.689158 0.303486 0.109841 0.589377 0.149459 0.468906 0.257676 0.404710 0.431203 0.527905
15 0.622118 0.178856 0.627453 0.923461 0.391044 0.387645 0.509836 0.431502 0.273610 0.450269 0.683313 0.656742 0.639993
16 0.337079 0.211995 0.297111 0.582175 0.113238 0.251168 0.434076 0.246505 0.403684 0.107671 0.409858 0.316172 0.337886
17 0.271897 0.311029 0.313864 0.668400 0.097022 0.424905 0.252905 0.426640 0.279160 0.243693 0.576241 0.422417 0.296806
18 0.664617 0.395999 0.554151 0.592343 0.504234 0.184188 0.833801 0.157951 0.758223 0.376555 0.212643 0.410605 0.642698
19 0.328445 0.719013 0.238085 0.186618 0.476045 0.642499 0.671657 0.594990 0.828653 0.413697 0.465589 0.245340 0.284878
13 14 15 16 17 18 19
0 0.339648 0.530211 0.622118 0.337079 0.271897 0.664617 0.328445
1 0.350100 0.104003 0.178856 0.211995 0.311029 0.395999 0.719013
2 0.407307 0.473790 0.627453 0.297111 0.313864 0.554151 0.238085
3 0.769145 0.689158 0.923461 0.582175 0.668400 0.592343 0.186618
4 0.202592 0.303486 0.391044 0.113238 0.097022 0.504234 0.476045
5 0.501132 0.109841 0.387645 0.251168 0.424905 0.184188 0.642499
6 0.185248 0.589377 0.509836 0.434076 0.252905 0.833801 0.671657
7 0.511020 0.149459 0.431502 0.246505 0.426640 0.157951 0.594990
8 0.186913 0.468906 0.273610 0.403684 0.279160 0.758223 0.828653
9 0.347808 0.257676 0.450269 0.107671 0.243693 0.376555 0.413697
10 0.678357 0.404710 0.683313 0.409858 0.576241 0.212643 0.465589
11 0.527288 0.431203 0.656742 0.316172 0.422417 0.410605 0.245340
12 0.372879 0.527905 0.639993 0.337886 0.296806 0.642698 0.284878
13 0.000000 0.408426 0.339019 0.274263 0.105627 0.668252 0.643427
14 0.408426 0.000000 0.282070 0.194058 0.345013 0.294029 0.663142
15 0.339019 0.282070 0.000000 0.344028 0.355134 0.568361 0.854775
16 0.274263 0.194058 0.344028 0.000000 0.181494 0.399730 0.513362
17 0.105627 0.345013 0.355134 0.181494 0.000000 0.581128 0.551910
18 0.668252 0.294029 0.568361 0.399730 0.581128 0.000000 0.649183
19 0.643427 0.663142 0.854775 0.513362 0.551910 0.649183 0.000000
result:
In [107]: d[(0 < d) & (d < r)].apply(lambda x: x.dropna().index.tolist())
Out[107]:
0 [2, 4, 9, 12, 17]
1 [4, 5, 7, 14, 15, 16]
2 [0, 4, 9, 11, 12, 16, 19]
3 [11, 19]
4 [0, 1, 2, 9, 12, 13, 16, 17]
5 [1, 7, 9, 14, 16, 18]
6 [8, 13, 17]
7 [1, 5, 9, 10, 14, 16, 18]
8 [6, 13, 15, 17]
9 [0, 2, 4, 5, 7, 11, 12, 14, 16, 17]
10 [7, 11, 18]
11 [2, 3, 9, 10, 12, 19]
12 [0, 2, 4, 9, 11, 17, 19]
13 [4, 6, 8, 16, 17]
14 [1, 5, 7, 9, 15, 16, 18]
15 [1, 8, 14]
16 [1, 2, 4, 5, 7, 9, 13, 14, 17]
17 [0, 4, 6, 8, 9, 12, 13, 16]
18 [5, 7, 10, 14]
19 [2, 3, 11, 12]
dtype: object
You can also do this only in numpy and scipy, I find it faster.
from scipy.spatial.distance import pdist, squareform
import numpy
SIZE=512
N_PARTICLE=100
RADIUS = 15
VALUE_THRESHOLD = 0
coords = numpy.random.randint(0, SIZE, size=(N_PARTICLE, 2))
values = numpy.random.randint(0, 2, (N_PARTICLE))
square_dist = squareform(pdist(coords, metric='euclidean'))
condlist = []
for i, row in enumerate(square_dist[:]):
condlist.append(numpy.where((values>VALUE_THRESHOLD) & (row < RADIUS) & (row > 0))[0].tolist())
It must be a better way to do it thoughtfully.

Categories