Is it possible to append columns from a dataframe into an empty list?
Example of a random df is produced:
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))
The output is:
A B C D
0 25 27 34 77
1 85 62 39 49
2 90 51 2 97
3 39 19 86 59
4 33 79 64 73
5 36 66 29 78
6 22 27 84 41
7 0 26 22 22
8 44 57 29 37
9 0 31 96 90
If I had an empty list or lists, could you append the columns by each row? So A,C to a list and B,Dto a list. An example output would be:
empty_list = [[],[]]
empty_list[0] = [[25,34],
[85,39]
[90,2]
[39,86]
[33,64]
[36,29]
[22,84]
[0,22]
[44,29]
[0,96]]
Or would you have to go through and convert each column to a list with df['A'].tolist() and then go through an append by row?
Try this
d=df[['A','C']]
d.values.tolist()
Output
[[0, 93], [58, 14], [79, 18], [40, 26], [91, 14], [25, 18], [22, 25], [35, 99], [12, 82], [48, 72]]
So the solution would be :
empty_list = [[],[]]
empty_list[0]=df[['A','C']].values.tolist()
empty_list[1]=df[['B','D']].values.tolist()
My df was :
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))
df
A B C D
0 0 60 93 94
1 58 52 14 33
2 79 84 18 1
3 40 21 26 32
4 91 19 14 8
5 25 34 18 68
6 22 37 25 10
7 35 58 99 80
8 12 38 82 8
9 48 56 72 66
Related
If I create a dataframe like so:
import pandas as pd, numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('AB'))
replace_1=[i+random.randint(0, 50) for i in range(16)]
How would I change the entry in column A to be the values of replace_1 list from row 0 -15, for example? In other words, how do I replace specific cells value from a list of value based purely on index?
here is one way to do it
# update the column in DF with series, based on the index value
df['A'].update(replace_1)
result
A B
0 7 17
1 26 70
2 13 81
3 48 64
4 45 74
... ... ...
95 74 3
96 18 94
97 81 4
98 37 11
99 65 29
replace_1
[7, 26, 13, 48, 45, 51, 35, 53, 20, 11, 38, 16, 36, 14, 63, 24]
Starting DF
A B
0 75 17
1 84 70
2 57 81
3 88 64
4 78 74
... ... ...
95 74 3
96 18 94
97 81 4
98 37 11
99 65 29
So basically, I need a numpy function which will do this or something similar to this:
correct_answers = np.array([scores[i][y[i]] for i in range(num_train)])
but using numpy, because Python list comprehension is too slow for me
scores is a num_train X columns matrix and y is an array of length num_train and takes values from 0 to columns - 1 inclusive
Is there a workaround using arange or something similar? Thanks.
import numpy as np
y = np.arange(81).reshape(9, 9)
correct_answers = y[np.arange(9), np.arange(9)]
output:
y =
[[ 0 1 2 3 4 5 6 7 8]
[ 9 10 11 12 13 14 15 16 17]
[18 19 20 21 22 23 24 25 26]
[27 28 29 30 31 32 33 34 35]
[36 37 38 39 40 41 42 43 44]
[45 46 47 48 49 50 51 52 53]
[54 55 56 57 58 59 60 61 62]
[63 64 65 66 67 68 69 70 71]
[72 73 74 75 76 77 78 79 80]]
correct_answers =
[ 0 10 20 30 40 50 60 70 80]
correct_answers = scores[np.arange(num_train), y[np.arange(num_train)]]
This does the thing I wanted to do, props to the other dude which gave me the idea
I have a pandas dataframe df1 that looks like this:
import pandas as pd
d = {'node1': [47, 24, 19, 77, 24, 19, 77, 24, 56, 92, 32, 77], 'node2': [24, 19, 77, 24, 19, 77, 24, 19, 92, 32, 77, 24], 'user': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C']}
df1 = pd.DataFrame(data=d)
df1
node1 node2 user
47 24 A
24 19 A
19 77 A
77 24 A
24 19 A
19 77 B
77 24 B
24 19 B
56 92 C
92 32 C
32 77 C
77 24 C
And a second pandas dataframe df2 that looks like this:
d2 = {'way_id': [4, 3, 1, 8, 5, 2, 7, 9, 6, 10], 'source': [24, 19, 84, 47, 19, 16, 77, 56, 32, 92], 'target': [19, 43, 67, 24, 77, 29, 24, 92, 77, 32]}
df2 = pd.DataFrame(data=d2)
df2
way_id source target
4 24 19
3 19 43
1 84 67
8 47 24
5 19 77
2 16 29
7 77 24
9 56 92
6 32 77
10 92 32
In a new dataframe I would like to count how often the value pairs per row in the columns node1 and node2 in df1 occur in the rows of the source and target columns in df2. The order is relevant, but also the corresponding user should be added to a new column. That's why the desired output should be like this:
way_id source target count user
4 24 19 2 A
3 19 43 0 A
1 84 67 0 A
8 47 24 1 A
5 19 77 1 A
2 16 29 0 A
7 77 24 1 A
9 56 92 0 A
6 32 77 0 A
10 92 32 0 A
4 24 19 1 B
3 19 43 0 B
1 84 67 0 B
8 47 24 0 B
5 19 77 1 B
2 16 29 0 B
7 77 24 1 B
9 56 92 0 B
6 32 77 0 B
10 92 32 0 B
4 24 19 0 C
3 19 43 0 C
1 84 67 0 C
8 47 24 0 C
5 19 77 0 C
2 16 29 0 C
7 77 24 1 C
9 56 92 1 C
6 32 77 1 C
10 92 32 1 C
Since you don't care about the source/target match, you need to duplicate the data then merge :
(pd.concat([df1.rename(columns={'node1':'source','node2':'target'}),
df1.rename(columns={'node2':'source','node1':'target'})]
)
.merge(df2, on=['source','target'], how='outer')
.groupby(['source','target','user'], as_index=False)['way_id'].count()
)
I have DataFrame from 1 to 80 numbers how can i get randomly 20 elements and save result to another DataFrame? I cant save every list like a row. Its saving elements like a columns. In the future i want to try predict every radom elements with sklearn
a = np.arange(1,81).reshape(8,10)
pd.DataFrame(a)
I must to get 20 unique numbers and write it one row. For example in python:
from random import sample
for x in range(1,20):
i=sample(range(1,81), k=20)
i.sort()
print(x,'-',i)`
It return as list [1,3,5,8,34,45,12,76,45...] 20 elements and i want its look like :
0 1 2 3 4 5 6 7 8 9 10 11 12 ... 20
0 1 5 10 14 20 55 67 34 ...... 20 elements
1
.
.
Use df.sample() to get samples of data frm a dataframe:
a = np.arange(1,81).reshape(8,10)
df = pd.DataFrame(a)
df1= df.sample(frac=.25)
>>df1
0 1 2 3 4 5 6 7 8 9
5 51 52 53 54 55 56 57 58 59 60
3 31 32 33 34 35 36 37 38 39 40
For a random permutation np.random.permutation():
df.iloc[np.random.permutation(len(df))].head(2)
0 1 2 3 4 5 6 7 8 9
6 61 62 63 64 65 66 67 68 69 70
1 11 12 13 14 15 16 17 18 19 20
EDIT : To get 20 elements in a list use:
import itertools
list(itertools.chain.from_iterable(df.sample(frac=.25).values))
#[71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
frac=.25 means 25% of the data, since you have used 80 elements 25% gives you 20 elements, you can adjust the fraction depending on you many elements you have and how many you want.
EDIT1: Further to your edit in the question: print(df.values) gives you an array:
[[ 1 2 3 4 5 6 7 8 9 10]
[11 12 13 14 15 16 17 18 19 20]
[21 22 23 24 25 26 27 28 29 30]
[31 32 33 34 35 36 37 38 39 40]
[41 42 43 44 45 46 47 48 49 50]
[51 52 53 54 55 56 57 58 59 60]
[61 62 63 64 65 66 67 68 69 70]
[71 72 73 74 75 76 77 78 79 80]]
You would require to shuffle this array using np.random.shuffle , in this case , do it on df.T.values since you also want to shuffle columns:
np.random.shuffle(df.T.values)
Then do a reshape:
df1 = pd.DataFrame(np.reshape(df.values,(4,20)))
>>df1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 4 3 10 2 8 7 1 5 6 9 14 13 20 12 18 17 11 15 16 19
1 24 23 30 22 28 27 21 25 26 29 34 33 40 32 38 37 31 35 36 39
2 44 43 50 42 48 47 41 45 46 49 54 53 60 52 58 57 51 55 56 59
3 64 63 70 62 68 67 61 65 66 69 74 73 80 72 78 77 71 75 76 79
This is a simple way using existing stackoverflow answers:
1- flatten the array so it looks more like a list, will allow you to deal with only one index instead of dealing with two array indexes
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ndarray.flatten.html
aflat = a.flatten()
2- Choose random items from the flattened array any of the answers here
How to randomly select an item from a list?
3- With the selected data, build your dataframe
You can also use numpy.random.choice and you can specify exact rows you want from the sample:
In [263]: a = np.arange(1,81).reshape(8,10)
In [265]: b = pd.DataFrame(a)
In [268]: b.iloc[np.random.choice(np.arange(len(b)), 5, False)]
Out[268]:
0 1 2 3 4 5 6 7 8 9
5 51 52 53 54 55 56 57 58 59 60
7 71 72 73 74 75 76 77 78 79 80
3 31 32 33 34 35 36 37 38 39 40
1 11 12 13 14 15 16 17 18 19 20
4 41 42 43 44 45 46 47 48 49 50
You can change 5 to 20 for your purpose. You need not worry about the percentile.
Let's assume I have a dataframe df:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(12,4))
print(df)
0 1 2 3
0 71 64 84 20
1 48 60 83 61
2 48 78 71 46
3 65 88 66 77
4 71 22 42 58
5 66 76 64 80
6 67 28 74 87
7 32 90 55 78
8 80 42 52 14
9 54 76 73 17
10 32 89 42 36
11 85 78 61 12
How do I shuffle the rows of df three-by-three, i.e., how do I randomly shuffle the first three rows (0, 1, 2) with either the second (3, 4, 5), third (6, 7, 8) or fourth (9, 10, 11) group? This could be a possible outcome:
print(df)
0 1 2 3
3 65 88 66 77
4 71 22 42 58
5 66 76 64 80
9 54 76 73 17
10 32 89 42 36
11 85 78 61 12
6 67 28 74 87
7 32 90 55 78
8 80 42 52 14
0 71 64 84 20
1 48 60 83 61
2 48 78 71 46
Thus, the new order has the second group of 3 rows from original dataframe, then the last one, then the third one and finally the first group.
You can reshape into a 3D array splitting the first axis into two with the latter one of length 3 corresponding to the group length and then use np.random.shuffle for such a groupwise in-place shuffle along the first axis, which being of length as the number of groups holds those groups and thus achieves our desired result, like so -
np.random.shuffle(df.values.reshape(-1,3,df.shape[1]))
Explanation
To give it a bit of explanation, let's use np.random.permutation to generate those random indices along the first axis and then index into the 3D array version.
1] Input df :
In [199]: df
Out[199]:
0 1 2 3
0 71 64 84 20
1 48 60 83 61
2 48 78 71 46
3 65 88 66 77
4 71 22 42 58
5 66 76 64 80
6 67 28 74 87
7 32 90 55 78
8 80 42 52 14
9 54 76 73 17
10 32 89 42 36
11 85 78 61 12
2] Get 3D array version :
In [200]: arr_3D = df.values.reshape(-1,3,df.shape[1])
In [201]: arr_3D
Out[201]:
array([[[71, 64, 84, 20],
[48, 60, 83, 61],
[48, 78, 71, 46]],
[[65, 88, 66, 77],
[71, 22, 42, 58],
[66, 76, 64, 80]],
[[67, 28, 74, 87],
[32, 90, 55, 78],
[80, 42, 52, 14]],
[[54, 76, 73, 17],
[32, 89, 42, 36],
[85, 78, 61, 12]]])
3] Get shuffling indices and index into the first axis of 3D version :
In [202]: shuffle_idx = np.random.permutation(arr_3D.shape[0])
In [203]: shuffle_idx
Out[203]: array([0, 3, 1, 2])
In [204]: arr_3D[shuffle_idx]
Out[204]:
array([[[71, 64, 84, 20],
[48, 60, 83, 61],
[48, 78, 71, 46]],
[[54, 76, 73, 17],
[32, 89, 42, 36],
[85, 78, 61, 12]],
[[65, 88, 66, 77],
[71, 22, 42, 58],
[66, 76, 64, 80]],
[[67, 28, 74, 87],
[32, 90, 55, 78],
[80, 42, 52, 14]]])
Then, we are assigning these values back to input dataframe.
With np.random.shuffle, we are just doing everything in-place and hiding away the work needed to explicitly generate shuffling indices and assigning back.
Sample run -
In [181]: df = pd.DataFrame(np.random.randint(11,99,(12,4)))
In [182]: df
Out[182]:
0 1 2 3
0 82 49 80 20
1 19 97 74 81
2 62 20 97 19
3 36 31 14 41
4 27 86 28 58
5 38 68 24 83
6 85 11 25 88
7 21 31 53 19
8 38 45 14 72
9 74 63 40 94
10 69 85 53 81
11 97 96 28 29
In [183]: np.random.shuffle(df.values.reshape(-1,3,df.shape[1]))
In [184]: df
Out[184]:
0 1 2 3
0 85 11 25 88
1 21 31 53 19
2 38 45 14 72
3 82 49 80 20
4 19 97 74 81
5 62 20 97 19
6 36 31 14 41
7 27 86 28 58
8 38 68 24 83
9 74 63 40 94
10 69 85 53 81
11 97 96 28 29
Similar solution to #Divakar, probably simpler as I directly shuffle the index of the dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame([np.arange(0, 12)]*4).T
len_group = 3
index_list = np.array(df.index)
np.random.shuffle(np.reshape(index_list, (-1, len_group)))
shuffled_df = df.loc[index_list, :]
Sample output:
shuffled_df
Out[82]:
0 1 2 3
9 9 9 9 9
10 10 10 10 10
11 11 11 11 11
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
This is doing the same as the other two answers, but using integer division to create a group column.
nrows_df = len(df)
nrows_group = 3
shuffled = (
df
.assign(group_var=df.index // nrows_group)
.set_index("group_var")
.loc[np.random.permutation(nrows_df / nrows_group)]
)