Select columns of pandas dataframe using a dictionary list value - python

I have column names in a dictionary and would like to select those columns from a dataframe.
In the example below, how do I select dictionary values 'b', 'c' and save it in to df1?
import pandas as pd
ds = {'cols': ['b', 'c']}
d = {'a': [2, 3], 'b': [3, 4], 'c': [4, 5]}
df_in = pd.DataFrame(data=d)
print(ds)
print(df_in)
df_out = df_in[[ds['cols']]]
print(df_out)
TypeError: unhashable type: 'list'

Remove nested list - []:
df_out = df_in[ds['cols']]
print(df_out)
b c
0 3 4
1 4 5

According to ref, just need to drop one set of brackets.
df_out = df_in[ds['cols']]

Related

How to update a dataframe with values from another dataframe when indexes and columns don't not match

I want to update the dataframe df with the values coming from another dataframe df_new if some condition hold true.
The indexes and the columns names of the dataframes does not match. How could it be done?
names = ['a', 'b', 'c']
df = pd.DataFrame({
'val': [10, 10, 10],
}, index=names)
new_names = ['a', 'c', 'd']
df_new = pd.DataFrame({
'profile': [5, 15, 22],
}, index=new_names)
above_max = df_new['profile'] >= 7
# This works only if indexes of df and df_new match
#df.loc[above_max, 'val'] = df_new['profile']
# expected df:
# val
# a 10
# b 10
# c 15
One idea with Series.reindex for match index values of mask with another DataFrame:
s = df_new['profile'].reindex(df.index)
above_max = s >= 7
df.loc[above_max, 'val'] = s

python dataframe how to convert set column to list

I tried to convert a set column to list in python dataframe, but failed. Not sure what's best way to do so. Thanks.
Here is the example:
I tried to create a 'c' column which convert 'b' set column to list. but 'c' is still set.
data = [{'a': [1,2,3], 'b':{11,22,33}},{'a':[2,3,4],'b':{111,222}}]
tdf = pd.DataFrame(data)
tdf['c'] = list(tdf['b'])
tdf
a b c
0 [1, 2, 3] {33, 11, 22} {33, 11, 22}
1 [2, 3, 4] {222, 111} {222, 111}
You could do:
import pandas as pd
data = [{'a': [1,2,3], 'b':{11,22,33}},{'a':[2,3,4],'b':{111,222}}]
tdf = pd.DataFrame(data)
tdf['c'] = [list(e) for e in tdf.b]
print(tdf)
Use apply:
tdf['c'] = tdf['b'].apply(list)
Because using list is doing to whole column not one by one.
Or do:
tdf['c'] = tdf['b'].map(list)

Pandas: renaming columns that have the same name

I have a dataframe that has duplicated column names a, b and b. I would like to rename the second b into c.
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "b1": [7, 8, 9]})
df.rename(index=str, columns={'b1' : 'b'})
Trying this with no success..
df.rename(index=str, columns={2 : "c"})
try:
>>> df.columns = ['a', 'b', 'c']
>>> df
a b c
0 1 4 7
1 2 5 8
2 3 6 9
You can always just manually rename all the columns.
df.columns = ['a', 'b', 'c']
You can simply do:
df.columns = ['a','b','c']
If your columns are ordered and you want lettered columns, don't type names out manually. This is prone to error.
You can use string.ascii_lowercase, assuming you have a maximum of 26 columns:
from string import ascii_lowercase
df = pd.DataFrame(columns=['a', 'b', 'b1'])
df.columns = list(ascii_lowercase[:len(df.columns)])
print(df.columns)
Index(['a', 'b', 'c'], dtype='object')
These solutions don't take into account the problem with having many cols.
Here is a solution where, independent on the amount of columns, you can rename the columns with the same name to a unique name
df.columns = ['name'+str(col[0]) if col[1] == 'name' else col[1] for col in enumerate(df.columns)]

get pandas dataframe records according to a specific column quantiles

I would like to get the records of dataframe df whose values of column c equal to a list of specified quantiles.
for a single quantile this works:
df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'], 'C': [1, 2, 3, 4, 5]})
print(df[df['C'] == df['C'].quantile(q = 0.25)])
and outputs:
A C
1 b 2
but it looks clunky to me, and also fails when there are multiple quantiles: print(df[df['C'] == df['C'].quantile(q = [0.25, 0.75])]) throws ValueError: Can only compare identically-labeled Series objects
related to Retrieve the Kth quantile within each group in Pandas
You can do it using this way:
All you have to do is keep your desired quantiles, in a list: as shown below:
You will have your result in final_df
quantile_list = [0.1,0.5,0.4]
final_df = pd.DataFrame(columns = df.columns)
for i in quantile_list:
temp = df[df['c'] == df['c'].quantile(q = i)]
final_df = pd.concat([final_df,temp])
final_df.reset_index(drop=True,inplace=True) #optional incase you want to reset the index

What to do when pandas column renaming creates column name duplicates

Why doesn't a pandas.DataFrame object complain when I rename a column if the new column name already exists?
This makes referencing the new column in the future return a pandas.DataFrame as opposed to a pandas.Series , which can cause further errors.
Secondly, is there a suggested way to handle such a situation?
Example:
import pandas
df = pd.DataFrame( {'A' : ['foo','bar'] ,'B' : ['bar','foo'] } )
df.B.map( {'bar':'foo','foo':'bar'} )
# 0 foo
# 1 bar
# Name: B, dtype: object
df.rename(columns={'A':'B'},inplace=True)
Now, the following will fail:
df.B.map( {'bar':'foo','foo':'bar'} )
#AttributeError: 'DataFrame' object has no attribute 'map'
Let's say you had a dictionary mapping old columns to new column names. When renaming your DataFrame, you could use a dictionary comprehension to test if the new value v is already in the DataFrame:
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
d = {'a': 'B', 'b': 'B'}
df.rename(columns={k: v for k, v in d.iteritems() if v not in df}, inplace=True)
>>> df
a B
0 1 3
1 2 4
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
d = {'a': 'b'}
df.rename(columns={k: v for k, v in d.iteritems() if v not in df}, inplace=True)
>>> df
a b
0 1 3
1 2 4

Categories