create dataframe with increasing numbers in python

create dataframe with increasing numbers in python - python

I want to create the following dataframe: n is the number of rows, and m is the columns.
In R, this would be generated by:
ia=array((1:m),c(m,n))
But I do not know how i can achieve the same in python.
Kind regards,

Use numpy.broadcast_to with DataFrame constructor:
m = 24
n = 13
df = pd.DataFrame(np.broadcast_to(np.arange(1, m + 1)[:, None], (m, n)))
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12
0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 5 5 5 5
5 6 6 6 6 6 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7
7 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 9 9 9
9 10 10 10 10 10 10 10 10 10 10 10 10 10
10 11 11 11 11 11 11 11 11 11 11 11 11 11
11 12 12 12 12 12 12 12 12 12 12 12 12 12
12 13 13 13 13 13 13 13 13 13 13 13 13 13
13 14 14 14 14 14 14 14 14 14 14 14 14 14
14 15 15 15 15 15 15 15 15 15 15 15 15 15
15 16 16 16 16 16 16 16 16 16 16 16 16 16
16 17 17 17 17 17 17 17 17 17 17 17 17 17
17 18 18 18 18 18 18 18 18 18 18 18 18 18
18 19 19 19 19 19 19 19 19 19 19 19 19 19
19 20 20 20 20 20 20 20 20 20 20 20 20 20
20 21 21 21 21 21 21 21 21 21 21 21 21 21
21 22 22 22 22 22 22 22 22 22 22 22 22 22
22 23 23 23 23 23 23 23 23 23 23 23 23 23
23 24 24 24 24 24 24 24 24 24 24 24 24 24
df = df.rename(index = lambda x: x+1, columns=lambda x: x+1)
print (df)
1 2 3 4 5 6 7 8 9 10 11 12 13
1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10 10 10 10 10
11 11 11 11 11 11 11 11 11 11 11 11 11 11
12 12 12 12 12 12 12 12 12 12 12 12 12 12
13 13 13 13 13 13 13 13 13 13 13 13 13 13
14 14 14 14 14 14 14 14 14 14 14 14 14 14
15 15 15 15 15 15 15 15 15 15 15 15 15 15
16 16 16 16 16 16 16 16 16 16 16 16 16 16
17 17 17 17 17 17 17 17 17 17 17 17 17 17
18 18 18 18 18 18 18 18 18 18 18 18 18 18
19 19 19 19 19 19 19 19 19 19 19 19 19 19
20 20 20 20 20 20 20 20 20 20 20 20 20 20
21 21 21 21 21 21 21 21 21 21 21 21 21 21
22 22 22 22 22 22 22 22 22 22 22 22 22 22
23 23 23 23 23 23 23 23 23 23 23 23 23 23
24 24 24 24 24 24 24 24 24 24 24 24 24 24

You can use np.repeat or np.tile
n = 5 # 13
m = 8 # 24
# Enhanced by #mozway
df = pd.DataFrame(np.tile(np.arange(1, m+1),(n, 1)).T)
# OR
df = pd.DataFrame(np.repeat(np.arange(1, m+1), m).reshape(-1, m))
print(df)
# Output
0 1 2 3 4
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
4 5 5 5 5 5
5 6 6 6 6 6
6 7 7 7 7 7
7 8 8 8 8 8

Related

Repeat and concatenate a DataFrame with constant step value increase

I have a dataframe like the following example:
A B C D E F
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18
I want to repeat the all dataframe like it was one block,
like I want to repeat the above dataframe 3 times and every element increases by 3 than the original one.
The desired dataframe:
A B C D E F
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18
3 4 7 10 13 16 19
4 5 8 11 14 17 20
5 6 9 12 15 18 21
6 7 10 14 16 19 22
7 8 11 15 17 20 23
8 9 12 16 18 21 24
My real df is like:
0 1 2 3 4 5 6 7 8 9 10 11 12
11 CONECT 12 9 13
12 CONECT 13 12 14 15 16
13 CONECT 14 13
14 CONECT 15 13
15 CONECT 16 13 17 18 19
16 CONECT 17 16
code:
import pandas as pd
df = pd.read_csv('connect_part.txt', 'sample_file.csv', names =['A'])
df = df.A.str.split(expand=True)
df.fillna('', inplace=True)
repeats = 3
step = 3
df1 = df.set_index([0]) # add all non-numeric columns here
df2 = pd.concat([df1+i for i in range(0, len(df1)*repeats, step)]).reset_index()
print(df2)
error:
TypeError: can only concatenate str (not "int") to str

res = pd.concat([df + 3*i for i in range(3)], ignore_index=True)
Output:
>>> res
A B C D E F
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18
3 4 7 10 13 16 19
4 5 8 11 14 17 20
5 6 9 12 15 18 21
6 7 10 13 16 19 22
7 8 11 14 17 20 23
8 9 12 15 18 21 24
Setup:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9],
'D': [10, 11, 12],
'E': [13, 14, 15],
'F': [16, 17, 18]
})

Assuming df as input, use pandas.concat:
repeats = 3
step = 3
df2 = pd.concat([df+i for i in range(0, len(df)*repeats, step)],
ignore_index=True)
output:
A B C D E F
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18
3 4 7 10 13 16 19
4 5 8 11 14 17 20
5 6 9 12 15 18 21
6 7 10 13 16 19 22
7 8 11 14 17 20 23
8 9 12 15 18 21 24
update: non-numeric columns:
repeats = 3
step = 3
df1 = df.set_index([0]) # add all non-numeric columns here
df2 = pd.concat([df1+i for i in range(0, len(df1)*repeats, step)]).reset_index()

How to change the default number of top and bottom row

By default, pandas shows you top and bottom 5 rows of a dataframe in jupyter, given that there are too many rows to display:
>>> df.shape
(100, 4)
col0
col1
col2
col3
0
7
17
15
2
1
6
5
5
12
2
10
15
5
15
3
6
19
19
14
4
12
7
4
12
...
...
...
...
...
95
2
14
8
16
96
8
8
5
16
97
6
8
9
1
98
1
5
10
15
99
15
9
1
18
I know that this setting exists:
pd.set_option("display.max_rows", 20)
however, that yields the same result. Using df.head(10) and df.tail(10) in to consecutive cells is an option, but less clean. Same goes for concatenation. Is there another pandas setting like display.max_row for this default view? How can I expand this to let's say the top and bottom 10?

IIUC, use display.min_rows:
pd.set_option("display.min_rows", 20)
print(df)
# Output:
0 1 2 3
0 18 8 12 2
1 2 13 13 14
2 8 7 9 2
3 17 19 9 3
4 14 18 12 3
5 11 5 9 18
6 4 5 12 3
7 12 8 2 7
8 11 2 14 13
9 6 6 3 6
.. .. .. .. ..
90 8 2 1 9
91 7 19 4 6
92 4 3 17 12
93 19 6 5 18
94 3 5 15 5
95 16 3 13 13
96 11 3 18 8
97 1 9 18 4
98 13 10 18 15
99 16 3 5 9
[100 rows x 4 columns]

Is there an pandas function to compare with diff column value in each row?

how to drop diff colume have some values rows:
like this:
cod sto POS BDP TMS
30C0 A89R 29 30 30
30C0 A89R 27 27 27
30C0 A89S 10 12 12
30C0 A89S 8 8 8
30C0 A89T 6 9 9
30C0 A89U 15 15 15
30C0 A89V 7 8 8
30C0 A89V 6 13 13
30C0 A89W 6 6 6
30C0 A89W 4 4 4
30C0 A89X 18 15 15
30C0 A89Y 25 27 27
30C0 A89Y 13 13 13
30C0 A89Z 15 17 17
30C0 A89Z 9 10 10
30C0 A900 6 6 6
desired get this:
30C0 A89R 29 30 30
30C0 A89S 10 12 12
30C0 A89T 6 9 9
30C0 A89V 7 8 8
30C0 A89V 6 13 13
30C0 A89X 18 15 15
30C0 A89Y 25 27 27
30C0 A89Z 15 17 17
30C0 A89Z 9 10 10

You may check with nunique
yourdf=df[df.iloc[:,2:].nunique(1).gt(1)].copy()
yourdf
Out[565]:
cod sto POS BDP TMS
0 30C0 A89R 29 30 30
2 30C0 A89S 10 12 12
4 30C0 A89T 6 9 9
6 30C0 A89V 7 8 8
7 30C0 A89V 6 13 13
10 30C0 A89X 18 15 15
11 30C0 A89Y 25 27 27
13 30C0 A89Z 15 17 17
14 30C0 A89Z 9 10 10

Python numpy how to reshape this list of arrays/images into a collage?

I've got the following list of 25 mini black-and-white images representing patterns:
imgs.shape
(25, 3, 3, 1)
I.e. there are 25 different 3x3 black and white image patterns. What I want to do is create a single large image that's 5x5 of these 3x3 blocks, does that make sense? Kind of like this below:
My intention is then to have something of shape (15, 15, 1) that I can display and view like this. I'm using numpy and opencv with Python. I am looking to do something quite efficient for real-time processing, so I thought numpy's reshape might make sense.

Solution:
imgs.reshape(5, 5, 3, 3, 1).swapaxes(1, 2).reshape(15, 15, 1)
Examples:
# test data
# each 3x3 image consists of the 9 identical digits
A = np.stack([
np.full((3, 3, 1), i)
for i in range(1, 26)
])
with_swap = A.reshape(5, 5, 3, 3, 1).swapaxes(1, 2).reshape(15, 15, 1)
print(with_swap[...,-1])
without_swap = A.reshape(15, 15, 1)
print(without_swap[...,-1])
With swap:
[[ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]
[ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]
[ 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5]
[ 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10]
[ 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10]
[ 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10]
[11 11 11 12 12 12 13 13 13 14 14 14 15 15 15]
[11 11 11 12 12 12 13 13 13 14 14 14 15 15 15]
[11 11 11 12 12 12 13 13 13 14 14 14 15 15 15]
[16 16 16 17 17 17 18 18 18 19 19 19 20 20 20]
[16 16 16 17 17 17 18 18 18 19 19 19 20 20 20]
[16 16 16 17 17 17 18 18 18 19 19 19 20 20 20]
[21 21 21 22 22 22 23 23 23 24 24 24 25 25 25]
[21 21 21 22 22 22 23 23 23 24 24 24 25 25 25]
[21 21 21 22 22 22 23 23 23 24 24 24 25 25 25]]
Without swap:
[[ 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2]
[ 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4]
[ 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5]
[ 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7]
[ 7 7 7 8 8 8 8 8 8 8 8 8 9 9 9]
[ 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10]
[11 11 11 11 11 11 11 11 11 12 12 12 12 12 12]
[12 12 12 13 13 13 13 13 13 13 13 13 14 14 14]
[14 14 14 14 14 14 15 15 15 15 15 15 15 15 15]
[16 16 16 16 16 16 16 16 16 17 17 17 17 17 17]
[17 17 17 18 18 18 18 18 18 18 18 18 19 19 19]
[19 19 19 19 19 19 20 20 20 20 20 20 20 20 20]
[21 21 21 21 21 21 21 21 21 22 22 22 22 22 22]
[22 22 22 23 23 23 23 23 23 23 23 23 24 24 24]
[24 24 24 24 24 24 25 25 25 25 25 25 25 25 25]]

Columns located within a column

I am trying to extract a dataframe from a web api and can't seem to work out how to break columns out. For Home and Away, they have breakdowns inside them, so should read Home Wins, Home Draws etc.
url = "http://api.football-data.org/v1/soccerseasons/398/leagueTable/?matchday=38"
response = requests.get(url)
response_json = response.content
result = json.loads(response_json)
football = pd.DataFrame(result['standing'], columns=['position','teamName','playedGames','wins','draws','losses','goals',
'goalsAgainst','home','away','goalDifference','points'])
football
football.home
this shows the problem:
0 {u'wins': 12, u'losses': 1, u'draws': 6, u'goa...

I think you can use json_normalize:
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize
url = "http://api.football-data.org/v1/soccerseasons/398/leagueTable/?matchday=38"
result = json.loads(requests.get(url).text)
#print (result)
df = json_normalize(result["standing"])
print (df)
_links.team.href away.draws away.goals \
0 http://api.football-data.org/v1/teams/338 6 33
1 http://api.football-data.org/v1/teams/57 7 34
2 http://api.football-data.org/v1/teams/73 7 34
3 http://api.football-data.org/v1/teams/65 7 24
4 http://api.football-data.org/v1/teams/66 4 22
5 http://api.football-data.org/v1/teams/340 6 20
6 http://api.football-data.org/v1/teams/563 7 31
7 http://api.football-data.org/v1/teams/64 4 30
8 http://api.football-data.org/v1/teams/70 5 19
9 http://api.football-data.org/v1/teams/61 5 27
10 http://api.football-data.org/v1/teams/62 9 24
11 http://api.football-data.org/v1/teams/72 5 22
12 http://api.football-data.org/v1/teams/346 3 20
13 http://api.football-data.org/v1/teams/74 8 14
14 http://api.football-data.org/v1/teams/354 6 20
15 http://api.football-data.org/v1/teams/1044 4 22
16 http://api.football-data.org/v1/teams/71 6 25
17 http://api.football-data.org/v1/teams/67 3 12
18 http://api.football-data.org/v1/teams/68 2 13
19 http://api.football-data.org/v1/teams/58 3 13
away.goalsAgainst away.losses away.wins \
0 18 2 11
1 25 4 8
2 20 3 9
3 20 5 7
4 26 8 7
5 19 6 7
6 25 5 7
7 28 7 8
8 31 8 6
9 23 7 7
10 25 5 5
11 32 10 4
12 31 10 6
13 22 7 4
14 28 8 5
15 33 9 6
16 42 10 3
17 41 14 2
18 37 14 3
19 41 15 1
crestURI draws goalDifference \
0 http://upload.wikimedia.org/wikipedia/en/6/63/... 12 32
1 http://upload.wikimedia.org/wikipedia/en/5/53/... 11 29
2 http://upload.wikimedia.org/wikipedia/de/b/b4/... 13 34
3 http://upload.wikimedia.org/wikipedia/de/f/fd/... 9 30
4 http://upload.wikimedia.org/wikipedia/de/d/da/... 9 14
5 http://upload.wikimedia.org/wikipedia/de/c/c9/... 9 18
6 http://upload.wikimedia.org/wikipedia/de/e/e0/... 14 14
7 http://upload.wikimedia.org/wikipedia/de/0/0a/... 12 13
8 http://upload.wikimedia.org/wikipedia/de/a/a3/... 9 -14
9 http://upload.wikimedia.org/wikipedia/de/5/5c/... 14 6
10 http://upload.wikimedia.org/wikipedia/de/f/f9/... 14 4
11 http://upload.wikimedia.org/wikipedia/de/a/ab/... 11 -10
12 https://upload.wikimedia.org/wikipedia/en/e/e2... 9 -10
13 http://upload.wikimedia.org/wikipedia/de/8/8b/... 13 -14
14 http://upload.wikimedia.org/wikipedia/de/b/bf/... 9 -12
15 https://upload.wikimedia.org/wikipedia/de/4/41... 9 -22
16 http://upload.wikimedia.org/wikipedia/de/6/60/... 12 -14
17 http://upload.wikimedia.org/wikipedia/de/5/56/... 10 -21
18 http://upload.wikimedia.org/wikipedia/de/8/8c/... 7 -28
19 http://upload.wikimedia.org/wikipedia/de/9/9f/... 8 -49
goals ... home.goals home.goalsAgainst home.losses home.wins \
0 68 ... 35 18 1 12
1 65 ... 31 11 3 12
2 69 ... 35 15 3 10
3 71 ... 47 21 5 12
4 49 ... 27 9 2 12
5 59 ... 39 22 5 11
6 65 ... 34 26 3 9
7 63 ... 33 22 3 8
8 41 ... 22 24 7 8
9 59 ... 32 30 5 5
10 59 ... 35 30 8 6
11 42 ... 20 20 5 8
12 40 ... 20 19 7 6
13 34 ... 20 26 8 6
14 39 ... 19 23 10 6
15 45 ... 23 34 9 5
16 48 ... 23 20 7 6
17 44 ... 32 24 5 7
18 39 ... 26 30 8 6
19 27 ... 14 35 12 2
losses playedGames points position teamName wins
0 3 38 81 1 Leicester City FC 23
1 7 38 71 2 Arsenal FC 20
2 6 38 70 3 Tottenham Hotspur FC 19
3 10 38 66 4 Manchester City FC 19
4 10 38 66 5 Manchester United FC 19
5 11 38 63 6 Southampton FC 18
6 8 38 62 7 West Ham United FC 16
7 10 38 60 8 Liverpool FC 16
8 15 38 51 9 Stoke City FC 14
9 12 38 50 10 Chelsea FC 12
10 13 38 47 11 Everton FC 11
11 15 38 47 12 Swansea City FC 12
12 17 38 45 13 Watford FC 12
13 15 38 43 14 West Bromwich Albion FC 10
14 18 38 42 15 Crystal Palace FC 11
15 18 38 42 16 AFC Bournemouth 11
16 17 38 39 17 Sunderland AFC 9
17 19 38 37 18 Newcastle United FC 9
18 22 38 34 19 Norwich City FC 9
19 27 38 17 20 Aston Villa FC 3
[20 rows x 22 columns]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

create dataframe with increasing numbers in python - python

I want to create the following dataframe: n is the number of rows, and m is the columns. In R, this would be generated by: ia=array((1:m),c(m,n)) But I do not know how i can achieve the same in python. Kind regards,

Related

Repeat and concatenate a DataFrame with constant step value increase

How to change the default number of top and bottom row

Is there an pandas function to compare with diff column value in each row?

Python numpy how to reshape this list of arrays/images into a collage?

Columns located within a column

Categories

Resources