I have a dataframe like the following example:
A B C D E F
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18
I want to repeat the all dataframe like it was one block,
like I want to repeat the above dataframe 3 times and every element increases by 3 than the original one.
The desired dataframe:
A B C D E F
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18
3 4 7 10 13 16 19
4 5 8 11 14 17 20
5 6 9 12 15 18 21
6 7 10 14 16 19 22
7 8 11 15 17 20 23
8 9 12 16 18 21 24
My real df is like:
0 1 2 3 4 5 6 7 8 9 10 11 12
11 CONECT 12 9 13
12 CONECT 13 12 14 15 16
13 CONECT 14 13
14 CONECT 15 13
15 CONECT 16 13 17 18 19
16 CONECT 17 16
code:
import pandas as pd
df = pd.read_csv('connect_part.txt', 'sample_file.csv', names =['A'])
df = df.A.str.split(expand=True)
df.fillna('', inplace=True)
repeats = 3
step = 3
df1 = df.set_index([0]) # add all non-numeric columns here
df2 = pd.concat([df1+i for i in range(0, len(df1)*repeats, step)]).reset_index()
print(df2)
error:
TypeError: can only concatenate str (not "int") to str
res = pd.concat([df + 3*i for i in range(3)], ignore_index=True)
Output:
>>> res
A B C D E F
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18
3 4 7 10 13 16 19
4 5 8 11 14 17 20
5 6 9 12 15 18 21
6 7 10 13 16 19 22
7 8 11 14 17 20 23
8 9 12 15 18 21 24
Setup:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9],
'D': [10, 11, 12],
'E': [13, 14, 15],
'F': [16, 17, 18]
})
Assuming df as input, use pandas.concat:
repeats = 3
step = 3
df2 = pd.concat([df+i for i in range(0, len(df)*repeats, step)],
ignore_index=True)
output:
A B C D E F
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18
3 4 7 10 13 16 19
4 5 8 11 14 17 20
5 6 9 12 15 18 21
6 7 10 13 16 19 22
7 8 11 14 17 20 23
8 9 12 15 18 21 24
update: non-numeric columns:
repeats = 3
step = 3
df1 = df.set_index([0]) # add all non-numeric columns here
df2 = pd.concat([df1+i for i in range(0, len(df1)*repeats, step)]).reset_index()
Related
suppose I have following data frame :
data = {'age' :[10,11,12,11,11,10,11,13,13,13,14,14,15,15,15],
'num1':[10,11,12,13,14,15,16,17,18,19,20,21,22,23,24],
'num2':[20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]}
df = pd.DataFrame(data)
I want to sum rows for age 14 and 15 and keep those new values as age 14. my expected output would be like this:
age time1 time2
1 10 10 20
2 11 11 21
3 12 12 22
4 11 13 23
5 11 14 24
6 10 15 25
7 11 16 26
8 13 17 27
9 13 18 28
10 13 19 29
11 14 110 160
in the code below, I have tried to group.by age but it does not work for me:
df1 =df.groupby(age[age >=14])['num1', 'num2'].apply(', '.join).reset_index(drop=True).to_frame()
limit_age = 14
new = df.query("age < #limit_age").copy()
new.loc[len(new)] = [limit_age,
*df.query("age >= #limit_age").drop(columns="age").sum()]
first get the "before 14" dataframe
then assign it to a new row where
age is 14
other values are the row-wise sums of "after 14" dataframe
to get
>>> new
age num1 num2
0 10 10 20
1 11 11 21
2 12 12 22
3 11 13 23
4 11 14 24
5 10 15 25
6 11 16 26
7 13 17 27
8 13 18 28
9 13 19 29
10 14 110 160
(new.index += 1 can be used for a 1-based index at the end.)
I would use a mask and concat:
m = df['age'].isin([14, 15])
out = pd.concat([df[~m],
df[m].agg({'age': 'min', 'num1': 'sum', 'num2': 'sum'})
.to_frame().T
], ignore_index=True)
Output:
age num1 num2
0 10 10 20
1 11 11 21
2 12 12 22
3 11 13 23
4 11 14 24
5 10 15 25
6 11 16 26
7 13 17 27
8 13 18 28
9 13 19 29
10 14 110 160
I need to duplicate each row 3 times and add two new columns. The new column values are different for each row.
import pandas as pd
df = {'A': [ 8,9,12],
'B': [ 1,11,3],
'C': [ 7,9,13],
'D': [81,92,121]}
df = pd.DataFrame(df)
#####################################################
#input
A B C D
8 1 7 81
9 11 9 92
12 3 13 121
####################################################
#expected output
A B C D E F
8 1 7 81 9 8 E=A+1, F= C+1
8 1 7 81 8 7 E=A, F= C
8 1 7 81 7 6 E=A-1, F= C-1
9 11 9 92 10 10
9 11 9 92 9 9
9 11 9 92 8 8
12 3 13 121 13 14
12 3 13 121 12 13
12 3 13 121 11 12
To repeat the DataFrame you can use np.repeat().
Afterwards you can create a list to add to "A" and "C".
df = pd.DataFrame(np.repeat(df.to_numpy(), 3, axis=0), columns=df.columns)
extra = [1,0, -1]*3
df['E'] = df['A']+extra
df['F'] = df['C']+extra
This gives:
A B C D E F
0 8 1 7 81 9 8
1 8 1 7 81 8 7
2 8 1 7 81 7 6
3 9 11 9 92 10 10
4 9 11 9 92 9 9
5 9 11 9 92 8 8
6 12 3 13 121 13 14
7 12 3 13 121 12 13
8 12 3 13 121 11 12
Use Index.repeat with DataFrame.loc for repeat rows, then repeat integers [1,0,-1] by numpy.tile and create new columns E, F:
df1 = df.loc[df.index.repeat(3)]
g = np.tile([1,0,-1], len(df))
df1[['E','F']] = df1[['A','C']].add(g, axis=0).to_numpy()
df1 = df1.reset_index(drop=True)
print (df1)
A B C D E F
0 8 1 7 81 9 8
1 8 1 7 81 8 7
2 8 1 7 81 7 6
3 9 11 9 92 10 10
4 9 11 9 92 9 9
5 9 11 9 92 8 8
6 12 3 13 121 13 14
7 12 3 13 121 12 13
8 12 3 13 121 11 12
I want to create the following dataframe: n is the number of rows, and m is the columns.
In R, this would be generated by:
ia=array((1:m),c(m,n))
But I do not know how i can achieve the same in python.
Kind regards,
Use numpy.broadcast_to with DataFrame constructor:
m = 24
n = 13
df = pd.DataFrame(np.broadcast_to(np.arange(1, m + 1)[:, None], (m, n)))
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 5 5 5 5
5 6 6 6 6 6 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7
7 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 9 9 9
9 10 10 10 10 10 10 10 10 10 10 10 10 10
10 11 11 11 11 11 11 11 11 11 11 11 11 11
11 12 12 12 12 12 12 12 12 12 12 12 12 12
12 13 13 13 13 13 13 13 13 13 13 13 13 13
13 14 14 14 14 14 14 14 14 14 14 14 14 14
14 15 15 15 15 15 15 15 15 15 15 15 15 15
15 16 16 16 16 16 16 16 16 16 16 16 16 16
16 17 17 17 17 17 17 17 17 17 17 17 17 17
17 18 18 18 18 18 18 18 18 18 18 18 18 18
18 19 19 19 19 19 19 19 19 19 19 19 19 19
19 20 20 20 20 20 20 20 20 20 20 20 20 20
20 21 21 21 21 21 21 21 21 21 21 21 21 21
21 22 22 22 22 22 22 22 22 22 22 22 22 22
22 23 23 23 23 23 23 23 23 23 23 23 23 23
23 24 24 24 24 24 24 24 24 24 24 24 24 24
df = df.rename(index = lambda x: x+1, columns=lambda x: x+1)
print (df)
1 2 3 4 5 6 7 8 9 10 11 12 13
1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10 10 10 10 10
11 11 11 11 11 11 11 11 11 11 11 11 11 11
12 12 12 12 12 12 12 12 12 12 12 12 12 12
13 13 13 13 13 13 13 13 13 13 13 13 13 13
14 14 14 14 14 14 14 14 14 14 14 14 14 14
15 15 15 15 15 15 15 15 15 15 15 15 15 15
16 16 16 16 16 16 16 16 16 16 16 16 16 16
17 17 17 17 17 17 17 17 17 17 17 17 17 17
18 18 18 18 18 18 18 18 18 18 18 18 18 18
19 19 19 19 19 19 19 19 19 19 19 19 19 19
20 20 20 20 20 20 20 20 20 20 20 20 20 20
21 21 21 21 21 21 21 21 21 21 21 21 21 21
22 22 22 22 22 22 22 22 22 22 22 22 22 22
23 23 23 23 23 23 23 23 23 23 23 23 23 23
24 24 24 24 24 24 24 24 24 24 24 24 24 24
You can use np.repeat or np.tile
n = 5 # 13
m = 8 # 24
# Enhanced by #mozway
df = pd.DataFrame(np.tile(np.arange(1, m+1),(n, 1)).T)
# OR
df = pd.DataFrame(np.repeat(np.arange(1, m+1), m).reshape(-1, m))
print(df)
# Output
0 1 2 3 4
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
4 5 5 5 5 5
5 6 6 6 6 6
6 7 7 7 7 7
7 8 8 8 8 8
By default, pandas shows you top and bottom 5 rows of a dataframe in jupyter, given that there are too many rows to display:
>>> df.shape
(100, 4)
col0
col1
col2
col3
0
7
17
15
2
1
6
5
5
12
2
10
15
5
15
3
6
19
19
14
4
12
7
4
12
...
...
...
...
...
95
2
14
8
16
96
8
8
5
16
97
6
8
9
1
98
1
5
10
15
99
15
9
1
18
I know that this setting exists:
pd.set_option("display.max_rows", 20)
however, that yields the same result. Using df.head(10) and df.tail(10) in to consecutive cells is an option, but less clean. Same goes for concatenation. Is there another pandas setting like display.max_row for this default view? How can I expand this to let's say the top and bottom 10?
IIUC, use display.min_rows:
pd.set_option("display.min_rows", 20)
print(df)
# Output:
0 1 2 3
0 18 8 12 2
1 2 13 13 14
2 8 7 9 2
3 17 19 9 3
4 14 18 12 3
5 11 5 9 18
6 4 5 12 3
7 12 8 2 7
8 11 2 14 13
9 6 6 3 6
.. .. .. .. ..
90 8 2 1 9
91 7 19 4 6
92 4 3 17 12
93 19 6 5 18
94 3 5 15 5
95 16 3 13 13
96 11 3 18 8
97 1 9 18 4
98 13 10 18 15
99 16 3 5 9
[100 rows x 4 columns]
I've got the following list of 25 mini black-and-white images representing patterns:
imgs.shape
(25, 3, 3, 1)
I.e. there are 25 different 3x3 black and white image patterns. What I want to do is create a single large image that's 5x5 of these 3x3 blocks, does that make sense? Kind of like this below:
My intention is then to have something of shape (15, 15, 1) that I can display and view like this. I'm using numpy and opencv with Python. I am looking to do something quite efficient for real-time processing, so I thought numpy's reshape might make sense.
Solution:
imgs.reshape(5, 5, 3, 3, 1).swapaxes(1, 2).reshape(15, 15, 1)
Examples:
# test data
# each 3x3 image consists of the 9 identical digits
A = np.stack([
np.full((3, 3, 1), i)
for i in range(1, 26)
])
with_swap = A.reshape(5, 5, 3, 3, 1).swapaxes(1, 2).reshape(15, 15, 1)
print(with_swap[...,-1])
without_swap = A.reshape(15, 15, 1)
print(without_swap[...,-1])
With swap:
[[ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]
[ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]
[ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]
[ 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10]
[ 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10]
[ 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10]
[11 11 11 12 12 12 13 13 13 14 14 14 15 15 15]
[11 11 11 12 12 12 13 13 13 14 14 14 15 15 15]
[11 11 11 12 12 12 13 13 13 14 14 14 15 15 15]
[16 16 16 17 17 17 18 18 18 19 19 19 20 20 20]
[16 16 16 17 17 17 18 18 18 19 19 19 20 20 20]
[16 16 16 17 17 17 18 18 18 19 19 19 20 20 20]
[21 21 21 22 22 22 23 23 23 24 24 24 25 25 25]
[21 21 21 22 22 22 23 23 23 24 24 24 25 25 25]
[21 21 21 22 22 22 23 23 23 24 24 24 25 25 25]]
Without swap:
[[ 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2]
[ 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4]
[ 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5]
[ 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7]
[ 7 7 7 8 8 8 8 8 8 8 8 8 9 9 9]
[ 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10]
[11 11 11 11 11 11 11 11 11 12 12 12 12 12 12]
[12 12 12 13 13 13 13 13 13 13 13 13 14 14 14]
[14 14 14 14 14 14 15 15 15 15 15 15 15 15 15]
[16 16 16 16 16 16 16 16 16 17 17 17 17 17 17]
[17 17 17 18 18 18 18 18 18 18 18 18 19 19 19]
[19 19 19 19 19 19 20 20 20 20 20 20 20 20 20]
[21 21 21 21 21 21 21 21 21 22 22 22 22 22 22]
[22 22 22 23 23 23 23 23 23 23 23 23 24 24 24]
[24 24 24 24 24 24 25 25 25 25 25 25 25 25 25]]