How do I convert a list of lists to a panda dataframe?
it is not in the form of coloumns but instead in the form of rows.
#!/usr/bin/env python
from random import randrange
import pandas
data = [[[randrange(0,100) for j in range(0, 12)] for y in range(0, 12)] for x in range(0, 5)]
print data
df = pandas.DataFrame(data[0], columns=['B','P','F','I','FP','BP','2','M','3','1','I','L'])
print df
for example:
data[0][0] == [64, 73, 76, 64, 61, 32, 36, 94, 81, 49, 94, 48]
I want it to be shown as rows and not coloumns.
currently it shows somethign like this
B P F I FP BP 2 M 3 1 I L
0 64 73 76 64 61 32 36 94 81 49 94 48
1 57 58 69 46 34 66 15 24 20 49 25 98
2 99 61 73 69 21 33 78 31 16 11 77 71
3 41 1 55 34 97 64 98 9 42 77 95 41
4 36 50 54 27 74 0 8 59 27 54 6 90
5 74 72 75 30 62 42 90 26 13 49 74 9
6 41 92 11 38 24 48 34 74 50 10 42 9
7 77 9 77 63 23 5 50 66 49 5 66 98
8 90 66 97 16 39 55 38 4 33 52 64 5
9 18 14 62 87 54 38 29 10 66 18 15 86
10 60 89 57 28 18 68 11 29 94 34 37 59
11 78 67 93 18 14 28 64 11 77 79 94 66
I want the rows and coloumns to be switched. Moreover, How do I make it for all 5 main lists?
This is how I want the output to look like with other coloumns also filled in.
B P F I FP BP 2 M 3 1 I L
0 64
1 73
1 76
2 64
3 61
4 32
5 36
6 94
7 81
8 49
9 94
10 48
However. df.transpose() won't help.
This is what I came up with
data = [[[randrange(0,100) for j in range(0, 12)] for y in range(0, 12)] for x in range(0, 5)]
print data
df = pandas.DataFrame(data[0], columns=['B','P','F','I','FP','BP','2','M','3','1','I','L'])
print df
df1 = df.transpose()
df1.columns = ['B','P','F','I','FP','BP','2','M','3','1','I','L']
print df1
import numpy
df = pandas.DataFrame(numpy.asarray(data[x]).T.tolist(),
columns=['B','P','F','I','FP','BP','2','M','3','1','I','L'])
Related
I have a DataFrame and I need to create a new column which contains the second largest value of each row in the original Dataframe.
Sample:
df = pd.DataFrame(np.random.randint(1,100, 80).reshape(8, -1))
Desired output:
0 1 2 3 4 5 6 7 8 9 penultimate
0 52 69 62 7 20 69 38 10 57 17 62
1 52 94 49 63 1 90 14 76 20 84 90
2 78 37 58 7 27 41 27 26 48 51 58
3 6 39 99 36 62 90 47 25 60 84 90
4 37 36 91 93 76 69 86 95 69 6 93
5 5 54 73 61 22 29 99 27 46 24 73
6 71 65 45 9 63 46 4 93 36 18 71
7 85 7 76 46 65 97 64 52 28 80 85
How can this be done in as little code as possible?
You could use NumPy for this:
import numpy as np
df = pd.DataFrame(np.random.randint(1,100, 80).reshape(8, -1))
df['penultimate'] = np.sort(df.values, 1)[:, -2]
print(df)
Using NumPy is faster.
Here is a simple lambda function!
# Input
df = pd.DataFrame(np.random.randint(1,100, 80).reshape(8, -1))
# Output
out = df.apply(lambda x: x.sort_values().unique()[-2], axis=1)
df['penultimate'] = out
print(df)
Cheers!
I am trying to understand the tf.data.experimental.group_by_window() method in Tensorflow 2 but I have some difficulties.
For a reproducible example I use the one presented in the documentation:
components = np.arange(100).astype(np.int64)
dataset20 = tf.data.Dataset.from_tensor_slices(components)
dataset20 = dataset.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _,\
els: els.batch(10), window_size=100))
i = 0
for elem in dataset20:
print('i is {0}\n'.format(i))
print('elem is {0}'.format(elem.numpy()))
i += 1
print('\n--------------------------------\n')
i is 0
elem is [0 2 4 6 8]
--------------------------------
i is 1
elem is [1 3 5 7 9]
--------------------------------
Part of the confusion may be that the output doesn't correspond to the example code. The actual output from this:
components = np.arange(100).astype(np.int64)
dataset20 = tf.data.Dataset.from_tensor_slices(components)
dataset20 = dataset20.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _,els: els.batch(10), window_size=100))
for i, d in enumerate(dataset20):
print(i, d.numpy())
is
0 [ 0 2 4 6 8 10 12 14 16 18]
1 [20 22 24 26 28 30 32 34 36 38]
2 [40 42 44 46 48 50 52 54 56 58]
3 [60 62 64 66 68 70 72 74 76 78]
4 [80 82 84 86 88 90 92 94 96 98]
5 [ 1 3 5 7 9 11 13 15 17 19]
6 [21 23 25 27 29 31 33 35 37 39]
7 [41 43 45 47 49 51 53 55 57 59]
8 [61 63 65 67 69 71 73 75 77 79]
9 [81 83 85 87 89 91 93 95 97 99]
As described in the documentation here, the key func separates the data into groups with associated key values. In the example the key func separates the data [0, 99] into even and odd groups. The reduce_func then operates on the key, group pairs to produce another dataset. Note though that reduce_func only operates on groups of data no greater than window_size. In the example, the window size is greater than the two group sizes (100 vs 50 elements), so has no effect and all evens are given in batches of 10 followed by all odds. If window size is changed to a value less than 50 then it does have an effect. For example, if the window size is changed to 5 and also the batching is moved to outside the group_by_window function:
dataset20 = dataset20.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _, els: els, window_size=5)).batch(10)
then the following output is produced:
0 [0 2 4 6 8 1 3 5 7 9]
1 [10 12 14 16 18 11 13 15 17 19]
2 [20 22 24 26 28 21 23 25 27 29]
3 [30 32 34 36 38 31 33 35 37 39]
4 [40 42 44 46 48 41 43 45 47 49]
5 [50 52 54 56 58 51 53 55 57 59]
6 [60 62 64 66 68 61 63 65 67 69]
7 [70 72 74 76 78 71 73 75 77 79]
8 [80 82 84 86 88 81 83 85 87 89]
9 [90 92 94 96 98 91 93 95 97 99]
I have DataFrame from 1 to 80 numbers how can i get randomly 20 elements and save result to another DataFrame? I cant save every list like a row. Its saving elements like a columns. In the future i want to try predict every radom elements with sklearn
a = np.arange(1,81).reshape(8,10)
pd.DataFrame(a)
I must to get 20 unique numbers and write it one row. For example in python:
from random import sample
for x in range(1,20):
i=sample(range(1,81), k=20)
i.sort()
print(x,'-',i)`
It return as list [1,3,5,8,34,45,12,76,45...] 20 elements and i want its look like :
0 1 2 3 4 5 6 7 8 9 10 11 12 ... 20
0 1 5 10 14 20 55 67 34 ...... 20 elements
1
.
.
Use df.sample() to get samples of data frm a dataframe:
a = np.arange(1,81).reshape(8,10)
df = pd.DataFrame(a)
df1= df.sample(frac=.25)
>>df1
0 1 2 3 4 5 6 7 8 9
5 51 52 53 54 55 56 57 58 59 60
3 31 32 33 34 35 36 37 38 39 40
For a random permutation np.random.permutation():
df.iloc[np.random.permutation(len(df))].head(2)
0 1 2 3 4 5 6 7 8 9
6 61 62 63 64 65 66 67 68 69 70
1 11 12 13 14 15 16 17 18 19 20
EDIT : To get 20 elements in a list use:
import itertools
list(itertools.chain.from_iterable(df.sample(frac=.25).values))
#[71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
frac=.25 means 25% of the data, since you have used 80 elements 25% gives you 20 elements, you can adjust the fraction depending on you many elements you have and how many you want.
EDIT1: Further to your edit in the question: print(df.values) gives you an array:
[[ 1 2 3 4 5 6 7 8 9 10]
[11 12 13 14 15 16 17 18 19 20]
[21 22 23 24 25 26 27 28 29 30]
[31 32 33 34 35 36 37 38 39 40]
[41 42 43 44 45 46 47 48 49 50]
[51 52 53 54 55 56 57 58 59 60]
[61 62 63 64 65 66 67 68 69 70]
[71 72 73 74 75 76 77 78 79 80]]
You would require to shuffle this array using np.random.shuffle , in this case , do it on df.T.values since you also want to shuffle columns:
np.random.shuffle(df.T.values)
Then do a reshape:
df1 = pd.DataFrame(np.reshape(df.values,(4,20)))
>>df1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 4 3 10 2 8 7 1 5 6 9 14 13 20 12 18 17 11 15 16 19
1 24 23 30 22 28 27 21 25 26 29 34 33 40 32 38 37 31 35 36 39
2 44 43 50 42 48 47 41 45 46 49 54 53 60 52 58 57 51 55 56 59
3 64 63 70 62 68 67 61 65 66 69 74 73 80 72 78 77 71 75 76 79
This is a simple way using existing stackoverflow answers:
1- flatten the array so it looks more like a list, will allow you to deal with only one index instead of dealing with two array indexes
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ndarray.flatten.html
aflat = a.flatten()
2- Choose random items from the flattened array any of the answers here
How to randomly select an item from a list?
3- With the selected data, build your dataframe
You can also use numpy.random.choice and you can specify exact rows you want from the sample:
In [263]: a = np.arange(1,81).reshape(8,10)
In [265]: b = pd.DataFrame(a)
In [268]: b.iloc[np.random.choice(np.arange(len(b)), 5, False)]
Out[268]:
0 1 2 3 4 5 6 7 8 9
5 51 52 53 54 55 56 57 58 59 60
7 71 72 73 74 75 76 77 78 79 80
3 31 32 33 34 35 36 37 38 39 40
1 11 12 13 14 15 16 17 18 19 20
4 41 42 43 44 45 46 47 48 49 50
You can change 5 to 20 for your purpose. You need not worry about the percentile.
The following code prints out number sequences up to around 100 from a list. A fair amount of the sequences print out above 100. I want to know how to only print out the numbers that add up to 100 on the button. I have tried printing the results to a list without luck. I tried putting in if and else statements to filter the results but with no luck. I looked at list comprehensions but I know those don't use while loops and so I don't know how to get the same results with a for loop. The only information I can seem to find online is basic lessons on how to use a while loop and just printing a list of numbers out. I could not find anything about how to sort a list of numbers printed.
Here is the code:
import itertools
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in list1:
a = 0
num1 = 2
num2 = i
seq = ([a])
it = itertools.cycle((num1,num2))
while a < 100:
a += next(it)
print(a, end = " ")
seq.append(a)
print()
print("Here are the numbers", num1, "&", num2, "added together in a sequence")
print()
Output:
2 3 5 6 8 9 11 12 14 15 17 18 20 21 23 24 26 27 29 30 32 33 35 36 38 39 41 42 44 45 47 48 50 51 53 54 56 57 59 60 62 63 65 66 68 69 71 72 74 75 77 78 80 81 83 84 86 87 89 90 92 93 95 96 98 99 101
Here are the numbers 2 & 1 added together in a sequence
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Here are the numbers 2 & 2 added together in a sequence
2 5 7 10 12 15 17 20 22 25 27 30 32 35 37 40 42 45 47 50 52 55 57 60 62 65 67 70 72 75 77 80 82 85 87 90 92 95 97 100
Here are the numbers 2 & 3 added together in a sequence
2 6 8 12 14 18 20 24 26 30 32 36 38 42 44 48 50 54 56 60 62 66 68 72 74 78 80 84 86 90 92 96 98 102
Here are the numbers 2 & 4 added together in a sequence
2 7 9 14 16 21 23 28 30 35 37 42 44 49 51 56 58 63 65 70 72 77 79 84 86 91 93 98 100
Here are the numbers 2 & 5 added together in a sequence
2 8 10 16 18 24 26 32 34 40 42 48 50 56 58 64 66 72 74 80 82 88 90 96 98 104
Here are the numbers 2 & 6 added together in a sequence
2 9 11 18 20 27 29 36 38 45 47 54 56 63 65 72 74 81 83 90 92 99 101
Here are the numbers 2 & 7 added together in a sequence
2 10 12 20 22 30 32 40 42 50 52 60 62 70 72 80 82 90 92 100
Here are the numbers 2 & 8 added together in a sequence
2 11 13 22 24 33 35 44 46 55 57 66 68 77 79 88 90 99 101
Here are the numbers 2 & 9 added together in a sequence
2 12 14 24 26 36 38 48 50 60 62 72 74 84 86 96 98 108
Here are the numbers 2 & 10 added together in a sequence
What I want is:
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Here are the numbers 2 & 2 added together in a sequence
2 5 7 10 12 15 17 20 22 25 27 30 32 35 37 40 42 45 47 50 52 55 57 60 62 65 67 70 72 75 77 80 82 85 87 90 92 95 97 100
Here are the numbers 2 & 3 added together in a sequence
2 7 9 14 16 21 23 28 30 35 37 42 44 49 51 56 58 63 65 70 72 77 79 84 86 91 93 98 100
Here are the numbers 2 & 5 added together in a sequence
2 10 12 20 22 30 32 40 42 50 52 60 62 70 72 80 82 90 92 100
Here are the numbers 2 & 8 added together in a sequence
Any and all help on this will be greatly appreciated.
Well, you only know if your sequence addition adds up to 100 once you are done, so you can't start printing before that point. This should do the job:
import itertools
list1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in list1:
a = 0
num1 = 2
num2 = i
seq = ([a])
it = itertools.cycle((num1,num2))
while a < 100:
a += next(it)
seq.append(a)
if seq[-1] == 100: # -1 as an index gets the last entry in a list
print(" ".join([str(val) for val in seq]))
print("Here are the numbers", num1, "&", num2, "added together in a sequence")
print()
I have a Dataframe of 100 Columns and I want to multiply one column ('Count') value with the columns position ranging from 6 to 74. Please tell me how to do that. I have been trying
df = df.ix[0, 6:74].multiply(df["Count"], axis="index")
df = df[df.columns[6:74]]*df["Count"]
None of them is working
The result Dataframe should be of 100 columns with all original columns where columns number 6 to 74 have the multiplied values in all the rows.
Assuming the same dataframe provided by #MaxU
Not easier, but a perspective on how to use other api elements.
pd.DataFrame.update and pd.DataFrame.mul
df.update(df.iloc[:, 3:7].mul(df.Count, 0))
df
0 1 2 3 4 5 6 7 8 9 Count
0 89 38 89 15.366436 1.355862 7.231264 4.971494 12 70 69 0.225977
1 49 1 38 1.004190 1.095480 2.829990 0.273870 57 93 64 0.030430
2 2 53 49 49.749460 50.379200 54.157640 16.373240 22 31 41 0.629740
3 38 44 23 28.437516 73.545300 41.185368 73.545300 19 99 57 0.980604
4 45 2 60 10.093230 4.773825 10.502415 6.274170 43 63 55 0.136395
5 65 97 15 10.375760 57.066680 38.260615 14.915155 68 5 21 0.648485
6 95 90 45 52.776000 16.888320 22.517760 50.664960 76 32 75 0.703680
7 60 31 65 63.242210 2.976104 26.784936 38.689352 72 73 94 0.744026
8 64 96 96 7.505370 37.526850 11.007876 10.007160 68 56 39 0.500358
9 78 54 74 8.409275 25.227825 16.528575 9.569175 97 63 37 0.289975
Demo:
Sample DF:
In [6]: df = pd.DataFrame(np.random.randint(100,size=(10,10))) \
.assign(Count=np.random.rand(10))
In [7]: df
Out[7]:
0 1 2 3 4 5 6 7 8 9 Count
0 89 38 89 68 6 32 22 12 70 69 0.225977
1 49 1 38 33 36 93 9 57 93 64 0.030430
2 2 53 49 79 80 86 26 22 31 41 0.629740
3 38 44 23 29 75 42 75 19 99 57 0.980604
4 45 2 60 74 35 77 46 43 63 55 0.136395
5 65 97 15 16 88 59 23 68 5 21 0.648485
6 95 90 45 75 24 32 72 76 32 75 0.703680
7 60 31 65 85 4 36 52 72 73 94 0.744026
8 64 96 96 15 75 22 20 68 56 39 0.500358
9 78 54 74 29 87 57 33 97 63 37 0.289975
Let's multiply columns 3-6 by df['Count']:
In [8]: df.iloc[:, 3:6+1]
Out[8]:
3 4 5 6
0 68 6 32 22
1 33 36 93 9
2 79 80 86 26
3 29 75 42 75
4 74 35 77 46
5 16 88 59 23
6 75 24 32 72
7 85 4 36 52
8 15 75 22 20
9 29 87 57 33
In [9]: df.iloc[:, 3:6+1] *= df['Count']
In [10]: df
Out[10]:
0 1 2 3 4 5 6 7 8 9 Count
0 89 38 89 66.681065 0.818372 20.751519 15.480964 12 70 69 0.225977
1 49 1 38 32.359929 4.910233 60.309102 6.333122 57 93 64 0.030430
2 2 53 49 77.467708 10.911630 55.769707 18.295685 22 31 41 0.629740
3 38 44 23 28.437513 10.229653 27.236368 52.776014 19 99 57 0.980604
4 45 2 60 72.564688 4.773838 49.933342 32.369289 43 63 55 0.136395
5 65 97 15 15.689662 12.002793 38.260613 16.184644 68 5 21 0.648485
6 95 90 45 73.545292 3.273489 20.751519 50.664974 76 32 75 0.703680
7 60 31 65 83.351331 0.545581 23.345459 36.591370 72 73 94 0.744026
8 64 96 96 14.709058 10.229653 14.266669 14.073604 68 56 39 0.500358
9 78 54 74 28.437513 11.866397 36.963643 23.221446 97 63 37 0.289975
The easiest thing to do here would be to extract the values, multiply, and then assign.
u = df.iloc[0, 6:74].values
v = df[['count']]
df = pd.DataFrame(u * v)
By using combine_first
df.iloc[:, 3:6+1].mul(df['Count'],axis=0).combine_first(df)
You need to concatenate the data frame resulting from multiplication with the remaining columns:
df=pd.concat( [df.iloc[0:6],df.iloc[75:],df.iloc[:,6:74+1].multiply(df['Count'],axis=0)] , axis=1)