Dictionaries to Data frame keep key integrity - python

This i more a question of what is the best way to achieve something.
For example if I have 3 dictionaries
A ={key1:1, key2:2, key3:3}
B ={key2:2, key3:3, key1:1}
C= {key3:'you', key2:'are', key1:'how'}
Ideally I would like to turn this in to DF with 4 columns Key,A,B,C
with each of the dictionaries becoming a columns, and ensuring that entries are inserted for the correct key?
Additionally if there was a 4th dictionary D however it only had the following entries
D = {key2:'some', key3:'data'}
Is it possible to have the 5th D column and any missing entries are given a NaN value?

Let's try:
df = (pd.DataFrame({'A':A, 'B':B, 'C':C})
.rename_axis(index='Key')
.reset_index()
)
# add D
df['D'] = df['Key'].map(D)
Output:
Key A B C D
0 key1 1 1 how NaN
1 key2 2 2 are some
2 key3 3 3 you data

Related

Dataframe - column don't exist create one

lets say we have.
df = pd.DataFrame({"A":[1,2,3],"B":[44,66,77]})
print(df)
the dataframe I get from API request
but I am expecting col C , D and E
---- since they are not there I want to add columns C,D and E with empty string values
----but first I should check if these columns dont exist
A straight forward dict comprehension using **kwargs to assign()
df = pd.DataFrame({"A":[1,2,3]})
wanted = {"A":20,"B":"a value","C":np.nan}
df = df.assign(**{c:v for c,v in wanted.items() if c not in df.columns})
A
B
C
0
1
a value
nan
1
2
a value
nan
2
3
a value
nan

How can one merge or concatenate Pandas series with different lengths and empty value?

I have a number of series with blanks as some values. Something like this
import pandas as pd
serie_1 = pd.Series(['a','','b','c','',''])
serie_2 = pd.Series(['','d','','','e','f','g'])
There is no problem in filtering blanks in each series, something like serie_1 = serie_1[serie_1 != '']
However, when I combine them in one df, either building the df from them or either building two one-column df and concatting them, I'm not obtaining what I'm looking for.
I'm looking for a table like this:
col1 col2
0 a d
1 b e
2 c f
3 nan g
But I am obtaining something like this
0 a nan
1 nan d
2 b nan
3 c nan
4 nan e
5 nan f
6 nan g
How could I obtain the table I'm looking for?
Thanks in advance
Here is one approach, if I understand correctly:
pd.concat([
serie_1[lambda x: x != ''].reset_index(drop=True).rename('col1'),
serie_2[lambda x: x != ''].reset_index(drop=True).rename('col2')
], axis=1)
col1 col2
0 a d
1 b e
2 c f
3 NaN g
The logic is: select non-empty entries (with the lambda expression). Re-start index numbering from 0 (with reset index). Set the column names (with rename). Create a wide table (with axis=1 in the merge function).
One way using pandas.concat:
ss = [serie_1, serie_2]
df = pd.concat([s[s.ne("")].reset_index(drop=True) for s in ss], 1)
print(df)
Output:
0 1
0 a d
1 b e
2 c f
3 NaN g
I would just filter out the blank values before creating the dataframe like this:
import pandas as pd
def filter_blanks(string_list):
return [e for e in string_list if e]
serie_1 = pd.Series(filter_blanks(['a','','b','c','','']))
serie_2 = pd.Series(filter_blanks(['','d','','','e','f','g']))
pd.concat([serie_1, serie_2], axis=1)
Which results in:
0 1
0 a d
1 b e
2 c f
3 NaN g

Set value in separate pandas column when mapping dictionary

I have a dictionary:
d = {"A":1, "B":2, "C":3}
I also have a pandas dataframe:
col1
A
G
E
B
C
I'd like to create a new column by mapping the dictionary onto col1. Simultaneously I'd like to set the values in another column to indicate whether the value in that row has been mapped. The desired output would look like this:
col1 col2 col3
A 1 1
G NaN 0
E NaN 0
B 2 1
C 3 1
I know that col2 can be created using df.col1.map(d), but how can I simultaneously create col3?
You can create both column in one function assign - first by map and second by isin for boolean mask with casting to integers:
df = df.assign(col2=df.col1.map(d), col3=df.col1.isin(d.keys()).astype(int))
print (df)
col1 col2 col3
0 A 1.0 1
1 G NaN 0
2 E NaN 0
3 B 2.0 1
4 C 3.0 1
Another 2 step solution with different boolean mask - by checking not missing values:
df['col2'] = df.col1.map(d)
df['col3'] = df['col2'].notnull().astype(int)

code multiple columns based on lists and dictionaries in Python

I have the following dataframe in Pandas
OfferPreference_A OfferPreference_B OfferPreference_C
A B A
B C C
C S G
I have the following dictionary of unique values under all the columns
dict1={A:1, B:2, C:3, S:4, G:5, D:6}
I also have a list of the columnames
columnlist=['OfferPreference_A', 'OfferPreference_B', 'OfferPreference_C']
I Am trying to get the following table as the output
OfferPreference_A OfferPreference_B OfferPreference_C
1 2 1
2 3 3
3 4 5
How do I do this.
Use:
#if value not match get NaN
df = df[columnlist].applymap(dict1.get)
Or:
#if value not match get original value
df = df[columnlist].replace(dict1)
Or:
#if value not match get NaN
df = df[columnlist].stack().map(dict1).unstack()
print (df)
OfferPreference_A OfferPreference_B OfferPreference_C
0 1 2 1
1 2 3 3
2 3 4 5
You can use map for this like shown below, assuming the values will match always
for col in columnlist:
df[col] = df[col].map(dict1)

How do I delete columns where the average of the column already exists

In the example below, Column C should be deleted because it already exists (Column A should remain)
type(df): pandas.core.frame.DataFrame
A B C
1 2 1
0 2 0
3 2 3
I tried creating a dictionary to later delete repeated values but got stuck
dict_test = {}
for each_column in df:
dict_test[each_column] = df[[each_column]].mean()
dict_test
The result came out to be dtype: float64, 'A' : A 1.33333
The problem above is that the dictionary is storing the 'Key and Value' in the Value section so I can't compare Values to one another
You can use df.mean().drop_duplicates() and pandas indexing:
In [30]: df[df.mean().drop_duplicates().index]
Out[30]:
A B
0 1 2
1 0 2
2 3 2

Categories