How to add multiple column to a dataframe

How to add multiple column to a dataframe - python

I have a function which returns a list of lists and I'd like to add multiple columns to my dataframe based on the return value. Here is how the return value of my function looks like
[[1,2,3],[3,4,3],[1,6,7],[4,7,6]]
I would like to add three columns to my dataframe. I have the following code
col_names = ['A','B','C']
df[col_names] = func()
but it gives me an error. How can I add 3 new columns?

you can pass the list directly:
pd.DataFrame([[1,2,3],[3,4,3],[1,6,7],[4,7,6]],columns=['A','B','C'])
A B C
0 1 2 3
1 3 4 3
2 1 6 7
3 4 7 6
Or if you have defined the list as l = [[1,2,3],[3,4,3],[1,6,7],[4,7,6]], pass it to the Dataframe:
df = pd.Dataframe(l,columns=['A','B','C'])

Here is one way to do it:
df = pd.DataFrame({'foo': ['bar', 'buzz', 'fizz']})
def buzz():
return [[1,2,3],[3,4,3],[1,6,7],[4,7,6]]
pd.concat([df, pd.DataFrame.from_records(buzz(), columns=col_names)], axis=1)
foo A B C
0 bar 1 2 3
1 buzz 3 4 3
2 fizz 1 6 7
3 NaN 4 7 6

Here is an example with a dummy function that returns a list of list. Since the function returns a list, you need to pass it to pd.DataFrame constructor before assigning it to the existing dataframe.
def fn(l1,l2,l3):
return [l1,l2,l3]
df = pd.DataFrame({'col': ['a', 'b', 'c']})
col_names = ['A','B','C']
df[col_names] = pd.DataFrame(fn([1,2,3], [3,4,3], [4,7,6]))
You get
col A B C
0 a 1 2 3
1 b 3 4 3
2 c 4 7 6

Related

What is the most efficient way to swap the values of two columns of a 2D list in python when the number of rows is in the tens of thousands?

for example if I have an original list:
A B
1 3
2 4
to be turned into
A B
3 1
4 2

two cents worth:
3 ways to do it
you could add a 3rd column C, copy A to C, then delete A. This would take more memory.
you could create a swap function for the values in a row, then wrap it into a loop.
you could just swap the labels of the columns. This is probably the most efficient way.

You could use rename:
df2 = df.rename(columns={'A': 'B', 'B': 'A'})
output:
B A
0 1 3
1 2 4
If order matters:
df2 = df.rename(columns={'A': 'B', 'B': 'A'})[df.columns]
output:
A B
0 3 1
1 4 2

Use DataFrame.rename with dictionary for swapping columnsnames, last check orcer by selecting columns:
df = df.rename(columns=dict(zip(df.columns, df.columns[::-1])))[df.columns]
print (df)
A B
0 3 1
1 4 2

You can also just simple use masking to change the values.
import pandas as pd
df = pd.DataFrame({"A":[1,2],"B":[3,4]})
df[["A","B"]] = df[["B","A"]].values
df
A B
0 3 1
1 4 2

for more than 2 columns:
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[7,8,9], 'D':[10,11,12]})
print(df)
'''
A B C D
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
'''
df = df.set_axis(df.columns[::-1],axis=1)[df.columns]
print(df)
'''
A B C D
0 10 7 4 1
1 11 8 5 2
2 12 9 6 3

I assume that your list is like this:
my_list = [[1, 3], [2, 4]]
So you can use this code:
print([[each_element[1], each_element[0]] for each_element in my_list])
The output is:
[[3, 1], [4, 2]]

How to append rows to pandas dataframe with for loop and if statements

I am trying to copy and append the lines to the dataframe, if each line meets the condition, as many times as indicated in 'qty1'.
Here are the codes I have attempted thus far:
import pandas as pd
row_list = [['a', 1, 4], ['b', 2, 5], ['c', 3, 6]]
columns = ['item', 'qty1', 'qty2']
df = pd.DataFrame(row_list, columns = columns)
for index, row in df.iterrows():
if df.loc[index, 'qty1'] != 1: # apply only on lines where 'qty1' is different from 1
df.append([row]*df.loc[index,'qty1']) # multiply and append the rows as many times as the column 'qty1' indicates
else:
pass
df
I get the following result (but nothing happens):
item qty1 qty2
0 a 1 4
1 b 2 5
2 c 3 6
While what I am looking for is this:
item qty1 qty2
0 a 1 4
1 b 2 5
2 b 2 5
3 c 3 6
4 c 3 6
5 c 3 6
Now, I am not well aware of the faults of this code and I am just not sure how to bug fix.

You don't need a loop here, just use Index.repeat passing in the qty1 field as the repetitions. Then use loc to return the rows.
df.loc[df.index.repeat(df['qty1'])].reset_index(drop=True)
[out]
item qty1 qty2
0 a 1 4
1 b 2 5
2 b 2 5
3 c 3 6
4 c 3 6
5 c 3 6

Sorting a dataframe by a column

Hi I need to sort a data frame. My data frame looks like below.
A B
2 5
3 9
2 7
I want to sort this by column A.
A B
2 5
2 7
3 9
when having duplicates in the column A,
sorted_data=data.sort_values(by=['A'], inplace=True)
doesn't workout. Any suggestion how I can fix this

It has worked correctly. The problem is that if you use inplace=True the sorting is done in your original DataFrame, data in your case.
If you want the order dataframe and to store it in sorted_data, do the following:
sorted_data=data.sort_values(by=['A'])
For example:
>>> df = pd.DataFrame({'A': [2,3,2], 'B': [5,9,7]})
>>> df.sort_values(by=['A'],inplace=True)
>>> df
a b
0 2 5
2 2 7
1 3 9
The other way:
>>> df = pd.DataFrame({'A': [2,3,2], 'B': [5,9,7]})
>>> sorted_df = df.sort_values(by=['A'])
>>> sorted_df
a b
0 2 5
2 2 7
1 3 9
>>> df
a b
0 2 5
1 3 9
2 2 7

delimit/split row values and form individual rows

reproducible code for data:
import pandas as pd
dict = {"a": "[1,2,3,4]", "b": "[1,2,3,4]"}
dict = pd.DataFrame(list(dict.items()))
dict
0 1
0 a [1,2,3,4]
1 b [1,2,3,4]
I wanted to split/delimit "column 1" and create individual rows for each split values.
expected output:
0 1
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
Should I be removing the brackets first and then split the values? I really don't get any idea of doing this. Any reference that would help me solve this please?

Based on the logic from that answer:
s = d[1]\
.apply(lambda x: pd.Series(eval(x)))\
.stack()
s.index = s.index.droplevel(-1)
s.name = "split"
d.join(s).drop(1, axis=1)

Because you have strings containing a list (and not lists) in your cells, you can use eval:
dict_v = {"a": "[1,2,3,4]", "b": "[1,2,3,4]"}
df = pd.DataFrame(list(dict_v.items()))
df = (df.rename(columns={0:'l'}).set_index('l')[1]
.apply(lambda x: pd.Series(eval(x))).stack()
.reset_index().drop('level_1',1).rename(columns={'l':0,0:1}))
or another way could be to create a DataFrame (probably faster) such as:
df = (pd.DataFrame(df[1].apply(eval).tolist(),index=df[0])
.stack().reset_index(level=1, drop=True)
.reset_index(name='1'))
your output is
0 1
0 a 1
1 a 2
2 a 3
3 a 4
4 b 1
5 b 2
6 b 3
7 b 4
all the rename are to get exactly your input/output

Replace values in pandas datatable if in list

How can I replace values in the datatable data with information in filllist if a value is in varlist?
import pandas as pd
data = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 10]})
varlist = (5,7,9,10)
fillist = ('a', 'b', 'c', 'd')
data[data.isin(varlist)==True] = 'is in varlist!'
Returns data as:
A B
0 is in varlist! 1
1 6 2
2 3 3
3 4 is in varlist!
But I want:
A B
0 a 1
1 6 2
2 3 3
3 4 d

Use the replace method of the dataframe.
replace_map = dict(zip(varlist, fillist))
data.replace(replace_map)
this gives
A B
0 a 1
1 6 2
2 3 3
3 4 d
The documentation is here in case you want to use it in a different way:
replace method documentation

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to add multiple column to a dataframe - python

you can pass the list directly: pd.DataFrame([[1,2,3],[3,4,3],[1,6,7],[4,7,6]],columns=['A','B','C']) A B C 0 1 2 3 1 3 4 3 2 1 6 7 3 4 7 6 Or if you have defined the list as l = [[1,2,3],[3,4,3],[1,6,7],[4,7,6]], pass it to the Dataframe: df = pd.Dataframe(l,columns=['A','B','C'])

Here is one way to do it: df = pd.DataFrame({'foo': ['bar', 'buzz', 'fizz']}) def buzz(): return [[1,2,3],[3,4,3],[1,6,7],[4,7,6]] pd.concat([df, pd.DataFrame.from_records(buzz(), columns=col_names)], axis=1) foo A B C 0 bar 1 2 3 1 buzz 3 4 3 2 fizz 1 6 7 3 NaN 4 7 6

Related

What is the most efficient way to swap the values of two columns of a 2D list in python when the number of rows is in the tens of thousands?

How to append rows to pandas dataframe with for loop and if statements

Sorting a dataframe by a column

delimit/split row values and form individual rows

Replace values in pandas datatable if in list

Categories

Resources