Yep, much discussed and similar questions down voted multiple times.. I still can't figure this one out..
Say I have a dataframe like this:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
I want to end up with four separate list (a, b, c and d) with the data from each column.
Logically (to me anyway) I would do:
list_of_lst = df.values.T.astype(str).tolist()
for column in df.columns:
i = 0
while i < len(df.columns) - 1:
column = list_of_lst[1]
i = i + 1
But assigning variable names in a loop is not doable/recommended...
Any suggestions how I can get what I need?
I think the best is create dictionary of list by DataFrame.to_dict:
np.random.seed(456)
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
print (df)
A B C D
0 5 9 4 5
1 7 1 8 3
2 5 2 4 2
3 2 8 4 8
4 5 6 0 9
5 8 2 3 6
6 7 0 0 3
7 3 5 6 6
8 3 8 9 6
9 5 1 6 1
d = df.to_dict('l')
print (d['A'])
[5, 7, 5, 2, 5, 8, 7, 3, 3, 5]
If really want A, B, C and D lists:
for k, v in df.to_dict('l').items():
globals()[k] = v
print (A)
[5, 7, 5, 2, 5, 8, 7, 3, 3, 5]
retList = dict()
for i in df.columns:
iterator = df[i].tolist()
retList[i] = iterator
You'd get a dictionary with the keys as the column names and values as the list of values in that column.
Modify it to any data structure you want.
retList.values() will give you a list of size 4 with each inner list being the list of each column values
You can transpose your dataframe and use df.T.values.tolist(). But, if you are manipulating numeric arrays thereafter, it's advisable you skip the tolist() part.
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 4)), columns=list('ABCD'))
# A B C D
# 0 17 56 57 31
# 1 3 44 15 0
# 2 94 36 87 30
# 3 44 49 56 76
# 4 29 5 35 24
list_of_lists = df.T.values.tolist()
# [[17, 3, 94, 44, 29],
# [56, 44, 36, 49, 5],
# [57, 15, 87, 56, 35],
# [31, 0, 30, 76, 24]]
Related
I'm trying to move the last two rows up:
import pandas as pd
df = pd.DataFrame({
"A" : [1,2,3,4],
"C": [5, 6, 7, 8],
"D": [9, 10, 11, 12],
"E": [13, 14, 15, 16],
})
print(df)
Output:
A C D E
0 1 5 9 13
1 2 6 10 14
2 3 7 11 15
3 4 8 12 16
Desired output:
A C D E
0 3 7 11 15
1 4 8 12 16
2 1 5 9 13
3 2 6 10 14
I was able to move the last row using
df = df.reindex(np.roll(df.index, shift=1))
But can't get the second to last row to move as well. Any advice what's the most efficient way to do this without creating a copy of the dataframe?
Using your code, you can just change the roll's shift value.
import pandas as pd
import numpy as np
df = pd.DataFrame({
"A" : [1,2,3,4],
"C": [5, 6, 7, 8],
"D": [9, 10, 11, 12],
"E": [13, 14, 15, 16],
})
df = df.reindex(np.roll(df.index, shift=2), copy=False)
df.reset_index(inplace=True, drop=True)
print(df)
A C D E
0 3 7 11 15
1 4 8 12 16
2 1 5 9 13
3 2 6 10 14
The shift value will change how many rows are affected by the roll, and afterwards we just reset the index of the dataframe so that it goes back to 0,1,2,3.
Based on the comment of wanting to swap indexes 0 and 1 around, we can use an answer in #CatalinaChou's link to do that. I am choosing to do it after using the roll so as to only have to contend with indexes 0 and 1 after it's been shifted.
# continuing from where the last code fence ends
swap_indexes = {1: 0, 0: 1}
df.rename(swap_indexes, inplace=True)
df.sort_index(inplace=True)
print(df)
A C D E
0 4 8 12 16
1 3 7 11 15
2 1 5 9 13
3 2 6 10 14
A notable difference is the use of inplace=True and thus not being able to chain the methods, but this would be to fulfil not copying the dataframe at all (or as much as possible, I'm not sure if df.reindex will make an internal copy with copy=False).
I have a dataframe with three different types (name-wise) of variables. I want to use the names of the variables as filter and split the dataframe in 3 new dataframes. In my first dataframe I want PERIOD, A, B, C. In my second I want PERIOD, A_TREND, B_TREND, C_TREND and in my third I want PERIOD, A_Seasonally_Adjusted, B_Seasonally_Adjusted, C_Seasonally_Adjusted.
I'm looking here for specific solution that uses the information in the column names (_Trend & _Seasonally_Adjusted) in a generic way to acheive the above.
df=pd.DataFrame(np.array([['2021-04', 1, 12, 33, 2, 35, 6, 3, 8, 90],
['2021-05', 4, 98, 9, 5, 82, 94, 82, 9, 21],
['2021-06', 81, 9, 8, 8, 9, 9, 8, 3, 72]]),
columns=['PERIOD', 'A', 'B', 'C',
'A_Trend', 'A__Seasonally Adjusted',
'B_Trend', 'B__Seasonally Adjusted',
'C_Trend', 'C__Seasonally Adjusted'])
you can use .filter with like
trend_df = df.filter(like='Trend')
season_df = df.filter(like='Seasonally Adjusted')
print(trend_df)
A_Trend B_Trend C_Trend
0 2 6 8
1 5 94 9
2 8 9 3
print(season_df)
A__Seasonally Adjusted B__Seasonally Adjusted C__Seasonally Adjusted
0 35 3 90
1 82 82 21
2 9 8 72
then take the delta to get your starting columns.
df.loc[:,~df.columns.isin(season_df.columns.tolist() + trend_df.columns.tolist())]
PERIOD A B C
0 2021-04 1 12 33
1 2021-05 4 98 9
2 2021-06 81 9 8
if you want Period in all the dataframes, first set to the index
df.set_index('Period') #or
df.set_index('Period',append=True)
I have a dataframe and want to create a column based on a condition that populates the row with the value of a row in another column.
df = pd.DataFrame({'parent':[32, 3, 88, 9, 10, 23, 99, 23],
'id':[1, 2, 3, 4, 5, 6, 7, 8],
'flag':[True,True,False,True,False,True,True,True]})
I have tried to do this using np.where() but it doesn't update the value row by row but instead replaces all values within the column with the condition that is met.
df['res'] = np.where(df['flag'] == True, df['parent'], df['id'])
The dataframe I want to create looks as follows:
df = pd.DataFrame({'parent':[32, 3, 88, 9, 10, 23, 99, 23],
'id':[1, 2, 3, 4, 5, 6, 7, 8],
'flag':[True,True,False,True,False,True,True,True],
'res':[32, 3, 3, 9, 5, 23, 99, 23]})
Any ideas what I'm doing wrong? I'm new to python, so any help is much appreciated.
Just change this:
df['res'] = np.where(df['flag'] == True, output['parent'], output['id'])
to this:
df['res'] = np.where(df['flag'] == True, df['parent'], df['id'])
Fix your code change the output to df
df['res1'] = np.where(df['flag'] == True, df['parent'], df['id'])
df
Out[176]:
parent id flag res res1
0 32 1 True 32 32
1 3 2 True 3 3
2 88 3 False 3 3
3 9 4 True 9 9
4 10 5 False 5 5
5 23 6 True 23 23
6 99 7 True 99 99
7 23 8 True 23 23
As others have pointed out, you had a typo in your code. An alternative way the achieve the desired output is to use the apply method:
df['res'] = df.apply(lambda x : x['parent'] if x['flag'] else x['id'], 1)
Or
df['res'] = np.where(df['flag'], df['parent'], df['id'])
Output:
parent id flag res
0 32 1 True 32
1 3 2 True 3
2 88 3 False 3
3 9 4 True 9
4 10 5 False 5
5 23 6 True 23
6 99 7 True 99
7 23 8 True 23
I want to create a set of n columns in a DataFrame each assigned a separate value using a list comprehension.
#My original dataframe
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
A B
0 1 4
1 2 5
2 3 6
#Expected output -
pd.concat([df, pd.DataFrame(np.tile(np.array([5,10,15,20,25])[:,None], 3).T)], axis=1)
A B 0 1 2 3 4
0 1 4 5 10 15 20 25
1 2 5 5 10 15 20 25
2 3 6 5 10 15 20 25
I need to do it in this fashion -
#ROUGH structure of the code that I am looking for -
n = "number of columns i want to add"
df[[i for i in range(n)]] = numpyarray #whose shape is (n,3)
The error that I face is quite obvious -
KeyError: "None of [Int64Index([0, 1, 2], dtype='int64')] are in the [columns]"
#AND
SyntaxError: can't assign to list comprehension
I have read other solutions which allow adding multiple columns but this one specifically needs a loop with an iterator of n because -
The data frame may need 25 columns added and that doesn't depend on the array of values
The array of values can be (3, 15) which means that last 10 of the columns will not take their values from the array
The prefered solution would be a list comprehension since the list of columns that I would be creating (25 for example) come from a list comprehension based iterator
Here's an updated version of the solution.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
print(df)
n = 10
df = pd.concat([df,pd.DataFrame(
np.tile([5*(i+1) for i in range(n)],len(df)).reshape(len(df),n),
columns=[i+1 for i in range (n)])],axis=1)
print(df)
The output from this is as follows:
Original DataFrame:
A B
0 1 4
1 2 5
2 3 6
Merged dataframe
A B 1 2 3 4 5 6 7 8 9 10
0 1 4 5 10 15 20 25 30 35 40 45 50
1 2 5 5 10 15 20 25 30 35 40 45 50
2 3 6 5 10 15 20 25 30 35 40 45 50
We need to get a table with values [5,10,15,...,n*5]. To achieve this, I am using:
np.tile([5*(i+1) for i in range(n)],len(df))
This will give me an array like this:
array([ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 5, 10, 15, 20, 25, 30, 35,
40, 45, 50, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50])
Now we need to switch this to 3 rows by n columns where n=10 in this example. I am doing that using:
reshape(len(df),n)
Here len(df) = 3 and n = 10
The result of
np.tile([5*(i+1) for i in range(n)],len(df)).reshape(len(df),n)
will be :
array([[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50],
[ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]])
Now that I have the values listed, I just need to get the column names. I am using a list comprehension to create the column names.
columns=[i+1 for i in range (n)])]
And obviously we got to use axis=1 otherwise it will not concatenate correctly.
Putting all this together gives you the final result set.
I went back and tried to use Akshay's logic. Here' what I got. This also works.
df2 = pd.concat([df,pd.DataFrame(
np.tile(np.array([[5*i] for i in range(1,n+1)]), len(df)).T,
columns=[i+1 for i in range (n)])],axis=1)
print(df2)
If you think there are easier ways to do this, please let me know so I can learn as well.
The previous response is below:
I am fairly new to pandas and still learning to figure things out. Here's what I tried and it looks like this is what you want.
import pandas as pd
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
lst = [5,10,15,20,25]
n = 6
for i in range(1,n): df[i] =lst[i-1]
print(df)
This gave me the following output:
A B 1 2 3 4 5
0 1 4 5 10 15 20 25
1 2 5 5 10 15 20 25
2 3 6 5 10 15 20 25
Does this make sense and is this what you are looking for?
One idea for create columns by list comprehension, tested in pandas 1.1.1:
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
#list created by list comprehension
L = [i + 1 for i in range(5)]
print (L)
[1, 2, 3, 4, 5]
n = len(L)
df[list(range(n))] = L
print (df)
A B 0 1 2 3 4
0 1 4 1 2 3 4 5
1 2 5 1 2 3 4 5
2 3 6 1 2 3 4 5
I have a dictionary like this:
dictionary = {100:[1, 2], 110:[3, 4],
120:[5, 6, 7], 130:[8, 9],
140:[10, 11, 12, 13],
150:[14, 15]}
I already have an existing column 'A' and would like to create a new column 'B' and map the values back to each key but display this in a data frame format. Here is the output I wish to receive:
#Column A is the existing column. Column B is the new one to be created
A B
4 110
8 130
15 150
7 120
7 120
4 110
9 130
1 100
2 100
How can this be done? Help would be appreciated. Thanks in advance!
Reverse your dictionary via a comprehension. Then use pd.Series.map with your new dictionary:
d = {100:[1, 2], 110:[3, 4], 120:[5, 6, 7], 130:[8, 9],
140:[10, 11, 12, 13], 150:[14, 15]}
d_rev = {w: k for k, v in d.items() for w in v}
df['B'] = df['A'].map(d_rev)
print(df)
A B
0 4 110
1 8 130
2 15 150
3 7 120
4 7 120
5 4 110
6 9 130
7 1 100
8 2 100