By default, pandas shows you top and bottom 5 rows of a dataframe in jupyter, given that there are too many rows to display:
>>> df.shape
(100, 4)
col0
col1
col2
col3
0
7
17
15
2
1
6
5
5
12
2
10
15
5
15
3
6
19
19
14
4
12
7
4
12
...
...
...
...
...
95
2
14
8
16
96
8
8
5
16
97
6
8
9
1
98
1
5
10
15
99
15
9
1
18
I know that this setting exists:
pd.set_option("display.max_rows", 20)
however, that yields the same result. Using df.head(10) and df.tail(10) in to consecutive cells is an option, but less clean. Same goes for concatenation. Is there another pandas setting like display.max_row for this default view? How can I expand this to let's say the top and bottom 10?
IIUC, use display.min_rows:
pd.set_option("display.min_rows", 20)
print(df)
# Output:
0 1 2 3
0 18 8 12 2
1 2 13 13 14
2 8 7 9 2
3 17 19 9 3
4 14 18 12 3
5 11 5 9 18
6 4 5 12 3
7 12 8 2 7
8 11 2 14 13
9 6 6 3 6
.. .. .. .. ..
90 8 2 1 9
91 7 19 4 6
92 4 3 17 12
93 19 6 5 18
94 3 5 15 5
95 16 3 13 13
96 11 3 18 8
97 1 9 18 4
98 13 10 18 15
99 16 3 5 9
[100 rows x 4 columns]
Related
I need to duplicate each row 3 times and add two new columns. The new column values are different for each row.
import pandas as pd
df = {'A': [ 8,9,12],
'B': [ 1,11,3],
'C': [ 7,9,13],
'D': [81,92,121]}
df = pd.DataFrame(df)
#####################################################
#input
A B C D
8 1 7 81
9 11 9 92
12 3 13 121
####################################################
#expected output
A B C D E F
8 1 7 81 9 8 E=A+1, F= C+1
8 1 7 81 8 7 E=A, F= C
8 1 7 81 7 6 E=A-1, F= C-1
9 11 9 92 10 10
9 11 9 92 9 9
9 11 9 92 8 8
12 3 13 121 13 14
12 3 13 121 12 13
12 3 13 121 11 12
To repeat the DataFrame you can use np.repeat().
Afterwards you can create a list to add to "A" and "C".
df = pd.DataFrame(np.repeat(df.to_numpy(), 3, axis=0), columns=df.columns)
extra = [1,0, -1]*3
df['E'] = df['A']+extra
df['F'] = df['C']+extra
This gives:
A B C D E F
0 8 1 7 81 9 8
1 8 1 7 81 8 7
2 8 1 7 81 7 6
3 9 11 9 92 10 10
4 9 11 9 92 9 9
5 9 11 9 92 8 8
6 12 3 13 121 13 14
7 12 3 13 121 12 13
8 12 3 13 121 11 12
Use Index.repeat with DataFrame.loc for repeat rows, then repeat integers [1,0,-1] by numpy.tile and create new columns E, F:
df1 = df.loc[df.index.repeat(3)]
g = np.tile([1,0,-1], len(df))
df1[['E','F']] = df1[['A','C']].add(g, axis=0).to_numpy()
df1 = df1.reset_index(drop=True)
print (df1)
A B C D E F
0 8 1 7 81 9 8
1 8 1 7 81 8 7
2 8 1 7 81 7 6
3 9 11 9 92 10 10
4 9 11 9 92 9 9
5 9 11 9 92 8 8
6 12 3 13 121 13 14
7 12 3 13 121 12 13
8 12 3 13 121 11 12
I want to create the following dataframe: n is the number of rows, and m is the columns.
In R, this would be generated by:
ia=array((1:m),c(m,n))
But I do not know how i can achieve the same in python.
Kind regards,
Use numpy.broadcast_to with DataFrame constructor:
m = 24
n = 13
df = pd.DataFrame(np.broadcast_to(np.arange(1, m + 1)[:, None], (m, n)))
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 5 5 5 5
5 6 6 6 6 6 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7
7 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 9 9 9
9 10 10 10 10 10 10 10 10 10 10 10 10 10
10 11 11 11 11 11 11 11 11 11 11 11 11 11
11 12 12 12 12 12 12 12 12 12 12 12 12 12
12 13 13 13 13 13 13 13 13 13 13 13 13 13
13 14 14 14 14 14 14 14 14 14 14 14 14 14
14 15 15 15 15 15 15 15 15 15 15 15 15 15
15 16 16 16 16 16 16 16 16 16 16 16 16 16
16 17 17 17 17 17 17 17 17 17 17 17 17 17
17 18 18 18 18 18 18 18 18 18 18 18 18 18
18 19 19 19 19 19 19 19 19 19 19 19 19 19
19 20 20 20 20 20 20 20 20 20 20 20 20 20
20 21 21 21 21 21 21 21 21 21 21 21 21 21
21 22 22 22 22 22 22 22 22 22 22 22 22 22
22 23 23 23 23 23 23 23 23 23 23 23 23 23
23 24 24 24 24 24 24 24 24 24 24 24 24 24
df = df.rename(index = lambda x: x+1, columns=lambda x: x+1)
print (df)
1 2 3 4 5 6 7 8 9 10 11 12 13
1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10 10 10 10 10
11 11 11 11 11 11 11 11 11 11 11 11 11 11
12 12 12 12 12 12 12 12 12 12 12 12 12 12
13 13 13 13 13 13 13 13 13 13 13 13 13 13
14 14 14 14 14 14 14 14 14 14 14 14 14 14
15 15 15 15 15 15 15 15 15 15 15 15 15 15
16 16 16 16 16 16 16 16 16 16 16 16 16 16
17 17 17 17 17 17 17 17 17 17 17 17 17 17
18 18 18 18 18 18 18 18 18 18 18 18 18 18
19 19 19 19 19 19 19 19 19 19 19 19 19 19
20 20 20 20 20 20 20 20 20 20 20 20 20 20
21 21 21 21 21 21 21 21 21 21 21 21 21 21
22 22 22 22 22 22 22 22 22 22 22 22 22 22
23 23 23 23 23 23 23 23 23 23 23 23 23 23
24 24 24 24 24 24 24 24 24 24 24 24 24 24
You can use np.repeat or np.tile
n = 5 # 13
m = 8 # 24
# Enhanced by #mozway
df = pd.DataFrame(np.tile(np.arange(1, m+1),(n, 1)).T)
# OR
df = pd.DataFrame(np.repeat(np.arange(1, m+1), m).reshape(-1, m))
print(df)
# Output
0 1 2 3 4
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
4 5 5 5 5 5
5 6 6 6 6 6
6 7 7 7 7 7
7 8 8 8 8 8
I am trying to add a new column in which every 6 rows in the dataframe is filled with 1 to 6 numbers.
Repeating it for all the rows in the dataframe. The illustration below shows how the output should look like
input
ID
0 20
1 20
2 20
3 20
4 20
5 20
6 34
7 34
8 34
9 34
10 34
11 34
12 67
13 67
14 67
15 67
16 67
17 67
output
ID 6_months
0 20 1
1 20 2
2 20 3
3 20 4
4 20 5
5 20 6
6 34 1
7 34 2
8 34 3
9 34 4
10 34 5
11 34 6
12 67 1
13 67 2
14 67 3
15 67 4
16 67 5
17 67 6
My question is related to pivot table and merging.
I have a main dataframe that I use to create a pivot table. Later, I perform some calculations to that pivot and add a new column. Finally I want to merge this new column back to the main dataframe but not getting result as desired.
I try to explain the steps that i performed as follows:
Step 1.
df:
items cat section weight factor1
0 1 7 abc 3 80
1 1 7 abc 3 80
2 2 7 xyz 5 60
3 2 7 xyz 5 60
4 2 7 xyz 5 60
5 2 7 xyz 5 60
6 3 7 abc 3 80
7 3 7 abc 3 80
8 3 7 abc 3 80
9 1 8 abc 2 80
10 1 8 abc 2 60
11 2 8 xyz 6 60
12 2 8 xyz 6 60
12 2 8 xyz 6 60
13 2 8 xyz 6 60
14 3 8 abc 2 80
15 1 9 abc 4 80
16 2 9 xyz 9 60
17 2 9 xyz 9 60
18 3 9 abc 4 80
Main dataframe (df) having number of items. Each item has given a number.
whereas each item belongs to a dedicated section. Each item has given a weight that varies based on a category (cat) and section. In addition, there is another column named 'factor' whose value is constant for a given section.
Step 2.
I need to create a pivot as follows from the above df.
pivot = df.pivot_table(db, index=['section'],values=['weight','factor', 'items'],columns=['cat'],aggfunc={'weight':np.max,'factor':np.max, 'items':np.sum})
pivot:
weight factor items
cat 7 8 9 7 8 9 7 8 9
section
abc 3 2 4 80 80 80 5 3 2
xyz 5 6 9 60 60 60 4 4 2
Step 3:
Now I want to perform some calculations on that pivot then add the
result in a new column as follows:
pivot['w_n',7] = pivot['factor', 7]/pivot['items', 7]
pivot['w_n',8] = pivot['factor', 8]/pivot['items', 8]
pivot['w_n',9] = pivot['factor', 9]/pivot['items', 9]
pivot:
weight factor items w_n
cat 7 8 9 7 8 9 7 8 9 7 8 9
section
abc 3 2 4 80 80 80 5 3 2 16 27 40
xyz 5 6 9 60 60 60 4 4 2 15 15 30
Step 4:
Finally I want to merge that new column back to the main df .
with a desired result of single column 'w_n' but instead I am getting 3 columns one for each cat.
Current result:
df:
items cat section weight factor1 w_n_7 w_n,8 w_n,9
0 1 7 abc 3 80 16 27 40
1 1 7 abc 3 80 16 27 40
2 2 7 xyz 5 60 15 15 30
3 2 7 xyz 5 60 15 15 30
4 2 7 xyz 5 60 15 15 30
5 2 7 xyz 5 60 15 15 30
6 3 7 abc 3 80 16 27 40
7 3 7 abc 3 80 16 27 40
8 3 7 abc 3 80 16 27 40
9 1 8 abc 2 80 16 27 40
10 1 8 abc 2 60 16 27 40
11 2 8 xyz 6 60 15 15 30
12 2 8 xyz 6 60 15 15 30
12 2 8 xyz 6 60 15 15 30
13 2 8 xyz 6 60 15 15 30
14 3 8 abc 2 80 16 27 40
15 1 9 abc 4 80 16 27 40
16 2 9 xyz 9 60 15 15 30
17 2 9 xyz 9 60 15 15 30
18 3 9 abc 4 80 16 27 40
Desired result:
------------------
df:
items cat section weight factor1 w_n
0 1 7 abc 3 80 16
1 1 7 abc 3 80 16
2 2 7 xyz 5 60 15
3 2 7 xyz 5 60 15
4 2 7 xyz 5 60 15
5 2 7 xyz 5 60 15
6 3 7 abc 3 80 16
7 3 7 abc 3 80 16
8 3 7 abc 3 80 16
9 1 8 abc 2 80 27
10 1 8 abc 2 60 27
11 2 8 xyz 6 60 15
12 2 8 xyz 6 60 15
12 2 8 xyz 6 60 15
13 2 8 xyz 6 60 15
14 3 8 abc 2 80 27
15 1 9 abc 4 80 40
16 2 9 xyz 9 60 30
17 2 9 xyz 9 60 30
18 3 9 abc 4 80 40
Use DataFrame.join with MultiIndex Series with Series.unstack:
df = df.join(pivot['w_n'].unstack().rename('W_n'), on=['cat','section'])
print (df)
items cat section weight factor W_n
0 1 7 abc 3 80 7.272727
1 1 7 abc 3 80 7.272727
2 2 7 xyz 5 60 7.500000
3 2 7 xyz 5 60 7.500000
4 2 7 xyz 5 60 7.500000
5 2 7 xyz 5 60 7.500000
6 3 7 abc 3 80 7.272727
7 3 7 abc 3 80 7.272727
8 3 7 abc 3 80 7.272727
9 1 8 abc 2 80 16.000000
10 1 8 abc 2 60 16.000000
11 2 8 xyz 6 60 7.500000
12 2 8 xyz 6 60 7.500000
12 2 8 xyz 6 60 7.500000
13 2 8 xyz 6 60 7.500000
14 3 8 abc 2 80 16.000000
15 1 9 abc 4 80 20.000000
16 2 9 xyz 9 60 15.000000
17 2 9 xyz 9 60 15.000000
18 3 9 abc 4 80 20.000000
I have created a days difference column in a pandas dataframe, and I'm looking to add a column that has the sum of a specific value over a given days window backwards
Notice that I can supply a date column for each row if it is needed, but the diff was created as days difference from the first day of the data.
Example
df = pd.DataFrame.from_dict({'diff': [0,0,1,2,2,2,2,10,11,15,18],
'value': [10,11,15,2,5,7,8,9,23,14,15]})
df
Out[12]:
diff value
0 0 10
1 0 11
2 1 15
3 2 2
4 2 5
5 2 7
6 2 8
7 10 9
8 11 23
9 15 14
10 18 15
I want to add 5_days_back_sum column that will sum the past 5 days, including same day so the result would be like this
Out[15]:
5_days_back_sum diff value
0 21 0 10
1 21 0 11
2 36 1 15
3 58 2 2
4 58 2 5
5 58 2 7
6 58 2 8
7 9 10 9
8 32 11 23
9 46 15 14
10 29 18 15
How can I achieve that? Originally I have a date column to create the diff column, if that helps its available
Use custom function with boolean indexing for filtering range with sum:
def f(x):
return df.loc[(df['diff'] >= x - 5) & (df['diff'] <= x), 'value'].sum()
df['5_days_back_sum'] = df['diff'].apply(f)
print (df)
diff value 5_days_back_sum
0 0 10 21
1 0 11 21
2 1 15 36
3 2 2 58
4 2 5 58
5 2 7 58
6 2 8 58
7 10 9 9
8 11 23 32
9 15 14 46
10 18 15 29
Similar solution with between:
def f(x):
return df.loc[df['diff'].between(x - 5, x), 'value'].sum()
df['5_days_back_sum'] = df['diff'].apply(f)
print (df)
diff value 5_days_back_sum
0 0 10 21
1 0 11 21
2 1 15 36
3 2 2 58
4 2 5 58
5 2 7 58
6 2 8 58
7 10 9 9
8 11 23 32
9 15 14 46
10 18 15 29