I have a question regarding the resampling method of pandas Dataframes.
I have a DataFrame with one observation per day:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(366, 1)), columns=list('A'))
df.index = pd.date_range(datetime.date(2016,1,1),datetime.date(2016,12,31))
if I want to compute the sum (or other) for every month, I can directly do:
EOM_sum = df.resample(rule="M").sum()
however I have a specific calendar (irregular frequency):
import datetime
custom_dates = pd.DatetimeIndex([datetime.date(2016,1,13),
datetime.date(2016,2,8),
datetime.date(2016,3,16),
datetime.date(2016,4,10),
datetime.date(2016,5,13),
datetime.date(2016,6,17),
datetime.date(2016,7,12),
datetime.date(2016,8,11),
datetime.date(2016,9,10),
datetime.date(2016,10,9),
datetime.date(2016,11,14),
datetime.date(2016,12,19),
datetime.date(2016,12,31)])
If I want to compute the sum for each period, I currently add a temporary column to df with the end of the period each row belongs to, and then perform the operation with a groupby:
df["period"] = custom_dates[custom_dates.searchsorted(df.index)]
custom_sum = df.groupby(by=['period']).sum()
However this is quite dirty and doesn't look pythonic. Is there a built-in method to do this in Pandas?
Thanks in advance.
Creating nw column is not necessary, you can groupby by DatatimeIndex, because length is same as lenght of df:
import pandas as pd
import numpy as np
np.random.seed(100)
df = pd.DataFrame(np.random.randint(0,100,size=(366, 1)), columns=list('A'))
df.index = pd.date_range(datetime.date(2016,1,1),datetime.date(2016,12,31))
print (df.head())
A
2016-01-01 8
2016-01-02 24
2016-01-03 67
2016-01-04 87
2016-01-05 79
import datetime
custom_dates = pd.DatetimeIndex([datetime.date(2016,1,13),
datetime.date(2016,2,8),
datetime.date(2016,3,16),
datetime.date(2016,4,10),
datetime.date(2016,5,13),
datetime.date(2016,6,17),
datetime.date(2016,7,12),
datetime.date(2016,8,11),
datetime.date(2016,9,10),
datetime.date(2016,10,9),
datetime.date(2016,11,14),
datetime.date(2016,12,19),
datetime.date(2016,12,31)])
custom_sum = df.groupby(custom_dates[custom_dates.searchsorted(df.index)]).sum()
print (custom_sum)
A
2016-01-13 784
2016-02-08 1020
2016-03-16 1893
2016-04-10 1242
2016-05-13 1491
2016-06-17 1851
2016-07-12 1319
2016-08-11 1348
2016-09-10 1616
2016-10-09 1523
2016-11-14 1793
2016-12-19 1547
2016-12-31 664
Another solution is append new index by custom_dates, groupby use numpy array as output from searchsorted function:
print (custom_dates.searchsorted(df.index))
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12]
custom_sum = df.groupby(custom_dates.searchsorted(df.index)).sum()
custom_sum.index = custom_dates
print (custom_sum)
A
2016-01-13 784
2016-02-08 1020
2016-03-16 1893
2016-04-10 1242
2016-05-13 1491
2016-06-17 1851
2016-07-12 1319
2016-08-11 1348
2016-09-10 1616
2016-10-09 1523
2016-11-14 1793
2016-12-19 1547
2016-12-31 664
Related
I want to make a data frame with columns from 2012 to 2100. I would like to make a data frame that gives +1 in 2012 in reference column Stand_Age(example below table), and +1 in 2013 plus +1 in 2012 and +1 in 2100 in 2099 as well. Code and the frame are below.
for i in list(range(0, 90, 1)):
Stand_Age[i+1] = Stand_Age[i] + 1
You shouldn't use Stand_Age[i+1] but rather
df["2012"] = df["Stand_Age"] + 1
And for many rows it would need
for i in range(1, 90):
df[str(2011+i)] = df["Stand_Age"] + i
Minimal working code:
import pandas as pd
df = pd.DataFrame({
"Stand_Age": [1,1,2,2,3,3,4,4,5,5]
})
print(df)
for i in range(1, 10):
df[str(2011+i)] = df["Stand_Age"] + i
print(df)
Result:
Stand_Age
0 1
1 1
2 2
3 2
4 3
5 3
6 4
7 4
8 5
9 5
Stand_Age 2012 2013 2014 2015 2016 2017 2018 2019 2020
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 4 5 6 7 8 9 10
2 2 3 4 5 6 7 8 9 10 11
3 2 3 4 5 6 7 8 9 10 11
4 3 4 5 6 7 8 9 10 11 12
5 3 4 5 6 7 8 9 10 11 12
6 4 5 6 7 8 9 10 11 12 13
7 4 5 6 7 8 9 10 11 12 13
8 5 6 7 8 9 10 11 12 13 14
9 5 6 7 8 9 10 11 12 13 14
I have a dataframe as below
A B C D E F G H I
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
I want to multiply every 3rd column after the 2 column in the last 2 rows by 5 to get the ouput as below.
How to acomplish this?
A B C D E F G H I
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 10 3 4 25 6 7 40 9
1 10 3 4 25 6 7 40 9
I am able to select the cells i need with df.iloc[-2:,1::3]
which results in the df as below but I am not able to proceed further.
B E H
2 5 8
2 5 8
I know that I can select the same cells with loc instead of iloc, then the calcualtion is straign forward, but i am not able to figure it out.
The column names & cell values CANNOT Be used since these change (the df here is just a dummy data)
You can assign back to same selection of rows/ columns like:
df.iloc[-2:,1::3] = df.iloc[-2:,1::3].mul(5)
#alternative
#df.iloc[-2:,1::3] = df.iloc[-2:,1::3] * 5
print (df)
A B C D E F G H I
0 1 2 3 4 5 6 7 8 9
1 1 2 3 4 5 6 7 8 9
2 1 2 3 4 5 6 7 8 9
3 1 2 3 4 5 6 7 8 9
4 1 2 3 4 5 6 7 8 9
5 1 10 3 4 25 6 7 40 9
6 1 10 3 4 25 6 7 40 9
I'm trying to create a number pyramid in python, and none of the solutions I've found on Stack Overflow are quite what I'm looking for. Here is the code I have so far:
for i in range(1, height+1):
for j in range(1, height-i+1):
if j > 9:
print(len(str(j)) * " ", end=" ")
else:
print(" ", end=" ")
for j in range(i, 0, -1):
print(j, end=" ")
for j in range(2, i + 1):
print(j, end=" ")
print()
And here is the output:
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
From what I can see, the code works fine with heights <= 9, but once double digits come in, the alignment fails. I also need to ensure that the spacing between each number is consistent (ONE space in between each number), but the workarounds that I've looked at involve adding more than one space.
Please let me know if there is anything I should clarify, and thank you in advance for your time!
You can use string formatting to define a fixed width for a field, padded by either whitespace or zeroes.
field_len = len(str(height))
for i in range(1, height+1):
for j in range(1, height-i+1):
print(" " * field_len, end=" ")
for j in range(i, 0, -1):
print(f"{j:{field_len}}", end=" ")
for j in range(2, i + 1):
print(f"{j:{field_len}}", end=" ")
print()
which produces
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
and which will auto-adjust the spacing depending on if the number of digits change.
This keeps the slope of the pyramid the same, though the alignment appears to get more sparse with interior numbers, as they're padded into two spaces.
A solution to that is just to use the width of the current number as the number of spaces - which we can do by changing the arguments to range() where it prints the spaces, to actually count down from the height.
for i in range(1, height+1):
for j in range(i, height):
print(" " * len(str(j + 1)), end=" ")
for j in range(i, 0, -1):
print(j, end=" ")
for j in range(2, i + 1):
print(j, end=" ")
print()
This produces a pyramid with uneven slopes but even spacing.
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
just for completeness I will provide another approach to this problem.
the main idea is to keep track of the length of the current line and use rjust to pad with whatever delimeter you wish (I chose the default whitespace)
height = 16
max_line_len = len(' '.join([str(i) for i in range(height,0,-1)] + [str(i) for i in range(2,height+1)]))
half_max_line_len = int((max_line_len+1)/2)
list_of_nums = [str(1)]
print('creating pyramid...')
for num in range(1, height+1):
print(' '.join(list_of_nums).rjust(half_max_line_len))
list_of_nums = [str(num+1)] + list_of_nums + [str(num+1)]
half_max_line_len += len(str(num+1))+1
output:
creating pyramid...
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I have a panda data frame that looks like this and can be copy pasted in with pd.read_clipboard() :
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
When i reindex it creates an extra 2 which causes me issues as my code to read the index gives an error:
In [6025]: lookuptable.reindex(lookuptable[2])
Out[6025]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2
As you can see it created an extra 2 on the top of the index with nothing in the row. I don't need that row at all i want it to look like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2
I tried lookuptable.droplevel(1) and lookuptable.droplevel(0), neither which worked. Any help would be appreciated if you can help me create the reindex to look like the sample i posted above. Thanks in advance.
It's just lookups[2] has a name, namely 2. So it puts the number 2 there for you to know that the new index has a name. It's not an extra row, as you can see with lookups.reindex(lookups[2]).shape.
If you really really don't like that number 2, just pass the numpy array to reindex:
lookups.reindex(lookups[2].values)
Output
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2
Another options to set name of that axis to None.
lookups.reindex(lookups[2]).rename_axis(None)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2
n=20
a=""
for i in range(1,n+1):
a+=str(i)+" "
print (a)
I don't know about lambda expression.Please Help me?
If you are looking for a lambda, you'll need one which returns a string. This means you'll need a generator comprehension to generate your string.
Consequently, you'll need 2 levels of str.join:
In [856]: f = lambda x: '\n'.join(' '.join(map(str, range(1, i))) for i in range(1, x + 1))
In [857]: print(f(20))
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
It looks complicated but it is the same as a loop, condensed into a generator comprehension. We generate each line using ' '.join(map(str, range(1, i))) for each i and then all such lines are joined by the newline \n.
I suggest
[print(*range(1, i+1)) for i in range(1, 20)] and None