Removing unnecessary row when re-indexing in pandas

Removing unnecessary row when re-indexing in pandas - python

I have a panda data frame that looks like this and can be copy pasted in with pd.read_clipboard() :
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
When i reindex it creates an extra 2 which causes me issues as my code to read the index gives an error:
In [6025]: lookuptable.reindex(lookuptable[2])
Out[6025]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2
As you can see it created an extra 2 on the top of the index with nothing in the row. I don't need that row at all i want it to look like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2
I tried lookuptable.droplevel(1) and lookuptable.droplevel(0), neither which worked. Any help would be appreciated if you can help me create the reindex to look like the sample i posted above. Thanks in advance.

It's just lookups[2] has a name, namely 2. So it puts the number 2 there for you to know that the new index has a name. It's not an extra row, as you can see with lookups.reindex(lookups[2]).shape.
If you really really don't like that number 2, just pass the numpy array to reindex:
lookups.reindex(lookups[2].values)
Output
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2

Another options to set name of that axis to None.
lookups.reindex(lookups[2]).rename_axis(None)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13
3 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14
6 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9
7 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
4 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11
5 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10
10 10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5
11 11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4
8 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7
9 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6
14 14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1
15 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
12 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3
13 13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2

Related

pandas insert values in order plus one

I want to make a data frame with columns from 2012 to 2100. I would like to make a data frame that gives +1 in 2012 in reference column Stand_Age(example below table), and +1 in 2013 plus +1 in 2012 and +1 in 2100 in 2099 as well. Code and the frame are below.
for i in list(range(0, 90, 1)):
Stand_Age[i+1] = Stand_Age[i] + 1

You shouldn't use Stand_Age[i+1] but rather
df["2012"] = df["Stand_Age"] + 1
And for many rows it would need
for i in range(1, 90):
df[str(2011+i)] = df["Stand_Age"] + i
Minimal working code:
import pandas as pd
df = pd.DataFrame({
"Stand_Age": [1,1,2,2,3,3,4,4,5,5]
})
print(df)
for i in range(1, 10):
df[str(2011+i)] = df["Stand_Age"] + i
print(df)
Result:
Stand_Age
0 1
1 1
2 2
3 2
4 3
5 3
6 4
7 4
8 5
9 5
Stand_Age 2012 2013 2014 2015 2016 2017 2018 2019 2020
0 1 2 3 4 5 6 7 8 9 10
1 1 2 3 4 5 6 7 8 9 10
2 2 3 4 5 6 7 8 9 10 11
3 2 3 4 5 6 7 8 9 10 11
4 3 4 5 6 7 8 9 10 11 12
5 3 4 5 6 7 8 9 10 11 12
6 4 5 6 7 8 9 10 11 12 13
7 4 5 6 7 8 9 10 11 12 13
8 5 6 7 8 9 10 11 12 13 14
9 5 6 7 8 9 10 11 12 13 14

How to reference Column data in a rolling window calculation? Error ValueError: window must be an integer 0 or greater

I am currently working on a large DF and need to reference the data in a column for a rolling window calculation. All rows have a separate rolling window value so i need to reference the column but i am getting the out put
ValueError: window must be an integer 0 or greater
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,20,size=(20, 4)), columns=list('abcd'))
df['op'] = (np.random.randint(0,20, size=20))
a b c d op
0 6 17 3 5 9
1 8 3 13 7 2
2 19 12 18 3 8
3 8 8 5 4 17
4 0 5 9 3 19
5 0 5 19 9 11
6 7 7 13 8 10
7 7 5 12 0 4
8 13 17 4 4 17
9 7 0 16 9 7
10 7 8 13 10 13
11 18 3 1 11 16
12 4 4 5 13 4
13 9 8 14 19 9
14 13 10 10 7 10
15 9 16 11 16 3
16 5 7 3 0 11
17 13 14 10 1 16
18 6 14 13 4 18
19 1 9 8 0 19
trying to reference the value in df['op'] for a rolling average.
df['SMA'] = df.a.rolling(window=df.op).mean()
produces Error ValueError: window must be an integer 0 or greater
As mentioned i am working on a large data frame so the above is example code.

Solution
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,20,size=(20, 4)),
columns=list('abcd'))
df['op'] = (np.random.randint(0,20, size=20))
def lookback_window(row, values, lookback, method='mean', *args, **kwargs):
loc = values.index.get_loc(row.name)
lb = lookback.loc[row.name]
return getattr(values.iloc[loc - lb: loc + 1], method)(*args, **kwargs)
df['SMA'] = df.apply(lookback_window, values=df['a'], lookback=df['op'], axis=1)
df
a b c d op SMA
0 17 19 11 9 0 17.000000
1 0 10 9 11 19 NaN
2 13 8 11 2 16 NaN
3 9 2 4 4 8 NaN
4 11 10 0 17 18 NaN
5 14 19 17 10 17 NaN
6 6 12 17 1 4 10.600000
7 10 1 3 18 2 10.000000
8 7 6 12 3 19 NaN
9 1 9 7 5 9 8.800000
10 17 1 3 13 1 9.000000
11 19 17 0 2 7 10.625000
12 18 5 2 4 12 10.923077
13 18 5 4 2 1 18.000000
14 5 11 17 11 11 11.250000
15 16 9 2 11 16 NaN
16 15 17 1 8 14 11.933333
17 15 2 0 3 6 15.142857
18 18 3 18 3 10 13.545455
19 7 0 12 15 3 13.750000

how to update each series field in a dataframe

I have a DataFrame which holds two columns like below:
player_id days
0 None 1
1 None 1
2 None 1
3 None 1
4 None 1
5 None 1
6 None 2
7 None 2
8 None 2
9 None 2
10 None 2
.
.
82 None 13
83 None 14
83 None 14
83 None 14
83 None 14
83 None 14
83 None 14
in output, I need to replace None with the id of players which is 1 to 11, have something like:
player_id days
0 1 1
1 2 1
2 3 1
3 4 1
4 5 1
5 6 1
6 7 2
7 8 2
8 9 2
9 10 2
10 11 2
11 1 2
12 2 2
13 3 2
14 4 2
.
.
82 5 13
83 6 14
83 7 14
83 8 14
83 9 14
83 10 14
83 11 14
this is my code:
for index in range(len(df)):
for i in range(1, 11):
df.iloc[index, 0] = i
print(df)
however I get the following dataframe:
player_id days
0 11 1
1 11 1
2 11 1
3 11 1
4 11 1
5 11 1
6 11 2
7 11 2
8 11 2
9 11 2
10 11 2
11 11 2
12 11 2
13 11 2
14 11 2
.
.
82 11 13
83 11 14
83 11 14
83 11 14
83 11 14
83 11 14
83 11 14
I also tried to add a new series as follows, but does not work:
for index in range(len(df)):
for i in range(1, 11):
df.iloc[index, 0] = pd.Series([i, df['day']], index=['player_id', 'day'])
print(df)
I have some doubt if editing a filed in dataframe is possible or not, I just skipped itertuples and iterrows to be able to edit this rows in an efficient way.

try % operator:
import numpy as np
df['player_id'] = 1 + np.arange(len(df))%11
df
output
player_id days
0 1 1
1 2 1
2 3 1
3 4 1
4 5 1
5 6 1
6 7 2
7 8 2
8 9 2
9 10 2
10 11 2
82 1 13
83 2 14
83 3 14
83 4 14
83 5 14
83 6 14
83 7 14
Edit: using index
if the df's index (the first column in the output above) is not sequential and you want the same pattern but based on the index, then you can do
df['player_id'] = 1 + df.index%11

This can be done as.
i=0
for index in range(len(df)):
df.iloc[index, 0] = 1+i%11
i+=1
print(df)
player_id days
0 1 1
1 2 1
2 3 1
3 4 1
4 5 1
5 6 1
6 7 1
7 8 1
8 9 1
9 10 1
10 11 1
11 1 2
12 2 2
13 3 2
14 4 2
15 5 2
16 6 2
17 7 2
18 8 2
19 9 2
20 10 2
21 11 2
22 1 3
23 2 3
24 3 3
25 4 3
26 5 3
27 6 3
28 7 3
29 8 3
30 9 3
31 10 3
32 11 3

How to ensure consistent spacing in creating number pyramid (Python)

I'm trying to create a number pyramid in python, and none of the solutions I've found on Stack Overflow are quite what I'm looking for. Here is the code I have so far:
for i in range(1, height+1):
for j in range(1, height-i+1):
if j > 9:
print(len(str(j)) * " ", end=" ")
else:
print(" ", end=" ")
for j in range(i, 0, -1):
print(j, end=" ")
for j in range(2, i + 1):
print(j, end=" ")
print()
And here is the output:
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
From what I can see, the code works fine with heights <= 9, but once double digits come in, the alignment fails. I also need to ensure that the spacing between each number is consistent (ONE space in between each number), but the workarounds that I've looked at involve adding more than one space.
Please let me know if there is anything I should clarify, and thank you in advance for your time!

You can use string formatting to define a fixed width for a field, padded by either whitespace or zeroes.
field_len = len(str(height))
for i in range(1, height+1):
for j in range(1, height-i+1):
print(" " * field_len, end=" ")
for j in range(i, 0, -1):
print(f"{j:{field_len}}", end=" ")
for j in range(2, i + 1):
print(f"{j:{field_len}}", end=" ")
print()
which produces
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
and which will auto-adjust the spacing depending on if the number of digits change.
This keeps the slope of the pyramid the same, though the alignment appears to get more sparse with interior numbers, as they're padded into two spaces.
A solution to that is just to use the width of the current number as the number of spaces - which we can do by changing the arguments to range() where it prints the spaces, to actually count down from the height.
for i in range(1, height+1):
for j in range(i, height):
print(" " * len(str(j + 1)), end=" ")
for j in range(i, 0, -1):
print(j, end=" ")
for j in range(2, i + 1):
print(j, end=" ")
print()
This produces a pyramid with uneven slopes but even spacing.
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

just for completeness I will provide another approach to this problem.
the main idea is to keep track of the length of the current line and use rjust to pad with whatever delimeter you wish (I chose the default whitespace)
height = 16
max_line_len = len(' '.join([str(i) for i in range(height,0,-1)] + [str(i) for i in range(2,height+1)]))
half_max_line_len = int((max_line_len+1)/2)
list_of_nums = [str(1)]
print('creating pyramid...')
for num in range(1, height+1):
print(' '.join(list_of_nums).rjust(half_max_line_len))
list_of_nums = [str(num+1)] + list_of_nums + [str(num+1)]
half_max_line_len += len(str(num+1))+1
output:
creating pyramid...
1
2 1 2
3 2 1 2 3
4 3 2 1 2 3 4
5 4 3 2 1 2 3 4 5
6 5 4 3 2 1 2 3 4 5 6
7 6 5 4 3 2 1 2 3 4 5 6 7
8 7 6 5 4 3 2 1 2 3 4 5 6 7 8
9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9
10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10
11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11
12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12
13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13
14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Reducing code with a for loop to one line in Python

n=20
a=""
for i in range(1,n+1):
a+=str(i)+" "
print (a)
I don't know about lambda expression.Please Help me?

If you are looking for a lambda, you'll need one which returns a string. This means you'll need a generator comprehension to generate your string.
Consequently, you'll need 2 levels of str.join:
In [856]: f = lambda x: '\n'.join(' '.join(map(str, range(1, i))) for i in range(1, x + 1))
In [857]: print(f(20))
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
It looks complicated but it is the same as a loop, condensed into a generator comprehension. We generate each line using ' '.join(map(str, range(1, i))) for each i and then all such lines are joined by the newline \n.

I suggest
[print(*range(1, i+1)) for i in range(1, 20)] and None

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing unnecessary row when re-indexing in pandas - python

Related

pandas insert values in order plus one

How to reference Column data in a rolling window calculation? Error ValueError: window must be an integer 0 or greater

how to update each series field in a dataframe

How to ensure consistent spacing in creating number pyramid (Python)

Reducing code with a for loop to one line in Python

Categories

Resources