I have the following dataframe in Pandas
OfferPreference_A OfferPreference_B OfferPreference_C
A B A
B C C
C S G
I have the following dictionary of unique values under all the columns
dict1={A:1, B:2, C:3, S:4, G:5, D:6}
I also have a list of the columnames
columnlist=['OfferPreference_A', 'OfferPreference_B', 'OfferPreference_C']
I Am trying to get the following table as the output
OfferPreference_A OfferPreference_B OfferPreference_C
1 2 1
2 3 3
3 4 5
How do I do this.
Use:
#if value not match get NaN
df = df[columnlist].applymap(dict1.get)
Or:
#if value not match get original value
df = df[columnlist].replace(dict1)
Or:
#if value not match get NaN
df = df[columnlist].stack().map(dict1).unstack()
print (df)
OfferPreference_A OfferPreference_B OfferPreference_C
0 1 2 1
1 2 3 3
2 3 4 5
You can use map for this like shown below, assuming the values will match always
for col in columnlist:
df[col] = df[col].map(dict1)
Related
lets say we have.
df = pd.DataFrame({"A":[1,2,3],"B":[44,66,77]})
print(df)
the dataframe I get from API request
but I am expecting col C , D and E
---- since they are not there I want to add columns C,D and E with empty string values
----but first I should check if these columns dont exist
A straight forward dict comprehension using **kwargs to assign()
df = pd.DataFrame({"A":[1,2,3]})
wanted = {"A":20,"B":"a value","C":np.nan}
df = df.assign(**{c:v for c,v in wanted.items() if c not in df.columns})
A
B
C
0
1
a value
nan
1
2
a value
nan
2
3
a value
nan
I have the following dataframe:
df
Group Dist
0 A 5
1 B 2
2 A 3
3 B 1
4 B 0
5 A 5
I am trying to drop all rows that match Group if the Dist column equals zero. This works to delete row 4:
df = df[df.Dist != 0]
however I also want to delete rows 1 and 3 so I am left with:
df
Group Dist
0 A 5
2 A 3
5 A 5
Any ideas on how to drop the group based off this condition?
Thanks!
First get all Group values for Entry == 0 and then filter out them by check column Group with inverted mask by ~:
df1 = df[~df['Group'].isin(df.loc[df.Dist == 0, 'Group'])]
print (df1)
Group Dist
0 A 5
2 A 3
5 A 5
Or you can use GroupBy.transform with GroupBy.all for test if groups has no 0 values:
df1 = df[(df.Dist != 0).groupby(df['Group']).transform('all')]
EDIT: For remove all groups with missing values:
df2 = df[df['Dist'].notna().groupby(df['Group']).transform('all')]
For test missing values:
print (df[df['Dist'].isna()])
if return nothing there are no missing values NaN or no None like Nonetype.
So is possible check scalar, e.g. if this value is in row with index 10:
print (df.loc[10, 'Dist'])
print (type(df.loc[10, 'Dist']))
You can use groupby and the method filter:
df.groupby('Group').filter(lambda x: x['Dist'].ne(0).all())
Output:
Group Dist
0 A 5
2 A 3
5 A 5
If you want to filter out groups with missing values:
df.groupby('Group').filter(lambda x: x['Dist'].notna().all())
I have a pandas dataframe df1 = {'A':['a','b','c','d','e'],'no.':[0,1,2,3,4]}, df1 = pd.DataFrame(df1,columns=['A','no.']) where I would like to reverse in place the content of the second column with the result being like that: df2 = {'A':['a','b','c','d','e'],'no.':[4,3,2,1,0]} df2 = pd.DataFrame(df2,columns=['A','no.'])
Convert values to numpy and then indexing for change order:
df1['no.'] = df1['no.'].to_numpy()[::-1]
print (df1)
A no.
0 a 4
1 b 3
2 c 2
3 d 1
4 e 0
I just need one column of my dateframe, but in the original order. When I take it off, it is sorted by the values, and I can't understand why. I tried different ways to pick out one column but all the time it was sorted by the values.
this is my code:
import pandas
data = pandas.read_csv('/data.csv', sep=';')
longti = data.iloc[:,4]
To return the first Column your function should work.
import pandas as pd
df = pd.DataFrame(dict(A=[1,2,3,4,5,6], B=['A','B','C','D','E','F']))
df = df.iloc[:,0]
Out:
0 1
1 2
2 3
3 4
4 5
5 6
If you want to return the second Column you can use the following:
df = df.iloc[:,1]
Out:
0 A
1 B
2 C
3 D
4 E
5 F
I would like to merge nine Pandas dataframes together into a single dataframe, doing a join on two columns, controlling the column names. Is this possible?
I have nine datasets. All of them have the following columns:
org, name, items,spend
I want to join them into a single dataframe with the following columns:
org, name, items_df1, spend_df1, items_df2, spend_df2, items_df3...
I've been reading the documentation on merging and joining. I can currently merge two datasets together like this:
ad = pd.DataFrame.merge(df_presents, df_trees,
on=['practice', 'name'],
suffixes=['_presents', '_trees'])
This works great, doing print list(aggregate_data.columns.values) shows me the following columns:
[org', u'name', u'spend_presents', u'items_presents', u'spend_trees', u'items_trees'...]
But how can I do this for nine columns? merge only seems to accept two at a time, and if I do it sequentially, my column names are going to end up very messy.
You could use functools.reduce to iteratively apply pd.merge to each of the DataFrames:
result = functools.reduce(merge, dfs)
This is equivalent to
result = dfs[0]
for df in dfs[1:]:
result = merge(result, df)
To pass the on=['org', 'name'] argument, you could use functools.partial define the merge function:
merge = functools.partial(pd.merge, on=['org', 'name'])
Since specifying the suffixes parameter in functools.partial would only allow
one fixed choice of suffix, and since here we need a different suffix for each
pd.merge call, I think it would be easiest to prepare the DataFrames column
names before calling pd.merge:
for i, df in enumerate(dfs, start=1):
df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')},
inplace=True)
For example,
import pandas as pd
import numpy as np
import functools
np.random.seed(2015)
N = 50
dfs = [pd.DataFrame(np.random.randint(5, size=(N,4)),
columns=['org', 'name', 'items', 'spend']) for i in range(9)]
for i, df in enumerate(dfs, start=1):
df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')},
inplace=True)
merge = functools.partial(pd.merge, on=['org', 'name'])
result = functools.reduce(merge, dfs)
print(result.head())
yields
org name items_df1 spend_df1 items_df2 spend_df2 items_df3 \
0 2 4 4 2 3 0 1
1 2 4 4 2 3 0 1
2 2 4 4 2 3 0 1
3 2 4 4 2 3 0 1
4 2 4 4 2 3 0 1
spend_df3 items_df4 spend_df4 items_df5 spend_df5 items_df6 \
0 3 1 0 1 0 4
1 3 1 0 1 0 4
2 3 1 0 1 0 4
3 3 1 0 1 0 4
4 3 1 0 1 0 4
spend_df6 items_df7 spend_df7 items_df8 spend_df8 items_df9 spend_df9
0 3 4 1 3 0 1 2
1 3 4 1 3 0 0 3
2 3 4 1 3 0 0 0
3 3 3 1 3 0 1 2
4 3 3 1 3 0 0 3
Would doing a big pd.concat() and then renaming all the columns work for you? Something like:
desired_columns = ['items', 'spend']
big_df = pd.concat([df1, df2[desired_columns], ..., dfN[desired_columns]], axis=1)
new_columns = ['org', 'name']
for i in range(num_dataframes):
new_columns.extend(['spend_df%i' % i, 'items_df%i' % i])
bid_df.columns = new_columns
This should give you columns like:
org, name, spend_df0, items_df0, spend_df1, items_df1, ..., spend_df8, items_df8
I've wanted this as well at times but been unable to find a built-in pandas way of doing it. Here is my suggestion (and my plan for the next time I need it):
Create an empty dictionary, merge_dict.
Loop through the index you want for each of your data frames and add the desired values to the dictionary with the index as the key.
Generate a new index as sorted(merge_dict).
Generate a new list of data for each column by looping through merge_dict.items().
Create a new data frame with index=sorted(merge_dict) and columns created in the previous step.
Basically, this is somewhat like a hash join in SQL. Seems like the most efficient way I can think of and shouldn't take too long to code up.
Good luck.