lets say we have.
df = pd.DataFrame({"A":[1,2,3],"B":[44,66,77]})
print(df)
the dataframe I get from API request
but I am expecting col C , D and E
---- since they are not there I want to add columns C,D and E with empty string values
----but first I should check if these columns dont exist
A straight forward dict comprehension using **kwargs to assign()
df = pd.DataFrame({"A":[1,2,3]})
wanted = {"A":20,"B":"a value","C":np.nan}
df = df.assign(**{c:v for c,v in wanted.items() if c not in df.columns})
A
B
C
0
1
a value
nan
1
2
a value
nan
2
3
a value
nan
Related
This i more a question of what is the best way to achieve something.
For example if I have 3 dictionaries
A ={key1:1, key2:2, key3:3}
B ={key2:2, key3:3, key1:1}
C= {key3:'you', key2:'are', key1:'how'}
Ideally I would like to turn this in to DF with 4 columns Key,A,B,C
with each of the dictionaries becoming a columns, and ensuring that entries are inserted for the correct key?
Additionally if there was a 4th dictionary D however it only had the following entries
D = {key2:'some', key3:'data'}
Is it possible to have the 5th D column and any missing entries are given a NaN value?
Let's try:
df = (pd.DataFrame({'A':A, 'B':B, 'C':C})
.rename_axis(index='Key')
.reset_index()
)
# add D
df['D'] = df['Key'].map(D)
Output:
Key A B C D
0 key1 1 1 how NaN
1 key2 2 2 are some
2 key3 3 3 you data
I have a number of series with blanks as some values. Something like this
import pandas as pd
serie_1 = pd.Series(['a','','b','c','',''])
serie_2 = pd.Series(['','d','','','e','f','g'])
There is no problem in filtering blanks in each series, something like serie_1 = serie_1[serie_1 != '']
However, when I combine them in one df, either building the df from them or either building two one-column df and concatting them, I'm not obtaining what I'm looking for.
I'm looking for a table like this:
col1 col2
0 a d
1 b e
2 c f
3 nan g
But I am obtaining something like this
0 a nan
1 nan d
2 b nan
3 c nan
4 nan e
5 nan f
6 nan g
How could I obtain the table I'm looking for?
Thanks in advance
Here is one approach, if I understand correctly:
pd.concat([
serie_1[lambda x: x != ''].reset_index(drop=True).rename('col1'),
serie_2[lambda x: x != ''].reset_index(drop=True).rename('col2')
], axis=1)
col1 col2
0 a d
1 b e
2 c f
3 NaN g
The logic is: select non-empty entries (with the lambda expression). Re-start index numbering from 0 (with reset index). Set the column names (with rename). Create a wide table (with axis=1 in the merge function).
One way using pandas.concat:
ss = [serie_1, serie_2]
df = pd.concat([s[s.ne("")].reset_index(drop=True) for s in ss], 1)
print(df)
Output:
0 1
0 a d
1 b e
2 c f
3 NaN g
I would just filter out the blank values before creating the dataframe like this:
import pandas as pd
def filter_blanks(string_list):
return [e for e in string_list if e]
serie_1 = pd.Series(filter_blanks(['a','','b','c','','']))
serie_2 = pd.Series(filter_blanks(['','d','','','e','f','g']))
pd.concat([serie_1, serie_2], axis=1)
Which results in:
0 1
0 a d
1 b e
2 c f
3 NaN g
I have the following dataframe in Pandas
OfferPreference_A OfferPreference_B OfferPreference_C
A B A
B C C
C S G
I have the following dictionary of unique values under all the columns
dict1={A:1, B:2, C:3, S:4, G:5, D:6}
I also have a list of the columnames
columnlist=['OfferPreference_A', 'OfferPreference_B', 'OfferPreference_C']
I Am trying to get the following table as the output
OfferPreference_A OfferPreference_B OfferPreference_C
1 2 1
2 3 3
3 4 5
How do I do this.
Use:
#if value not match get NaN
df = df[columnlist].applymap(dict1.get)
Or:
#if value not match get original value
df = df[columnlist].replace(dict1)
Or:
#if value not match get NaN
df = df[columnlist].stack().map(dict1).unstack()
print (df)
OfferPreference_A OfferPreference_B OfferPreference_C
0 1 2 1
1 2 3 3
2 3 4 5
You can use map for this like shown below, assuming the values will match always
for col in columnlist:
df[col] = df[col].map(dict1)
I'd like to find the highest values in each row and return the column header for the value in python. For example, I'd like to find the top two in each row:
df =
A B C D
5 9 8 2
4 1 2 3
I'd like my for my output to look like this:
df =
B C
A D
You can use a dictionary comprehension to generate the largest_n values in each row of the dataframe. I transposed the dataframe and then applied nlargest to each of the columns. I used .index.tolist() to extract the desired top_n columns. Finally, I transposed this result to get the dataframe back into the desired shape.
top_n = 2
>>> pd.DataFrame({n: df.T[col].nlargest(top_n).index.tolist()
for n, col in enumerate(df.T)}).T
0 1
0 B C
1 A D
I decided to go with an alternative way: Apply the pd.Series.nlargest() function to each row.
Path to Solution
>>> df.apply(pd.Series.nlargest, axis=1, n=2)
A B C D
0 NaN 9.0 8.0 NaN
1 4.0 NaN NaN 3.0
This gives us the highest values for each row, but keeps the original columns, resulting in ugly NaN values where a column is not everywhere part of the top n values. Actually, we want to receive the index of the nlargest() result.
>>> df.apply(lambda s, n: s.nlargest(n).index, axis=1, n=2)
0 Index(['B', 'C'], dtype='object')
1 Index(['A', 'D'], dtype='object')
dtype: object
Almost there. Only thing left is to convert the Index objects into Series.
Solution
df.apply(lambda s, n: pd.Series(s.nlargest(n).index), axis=1, n=2)
0 1
0 B C
1 A D
Note that I'm not using the Index.to_series() function since I do not want to preserve the original index.
I have 2 data frames with one column each. Index of the first is [C,B,F,A,Z] not sorted in any way. Index of the second is [C,B,Z], also unsorted.
I use pd.concat([df1,df2],axis=1) and get a data frame with 2 columns and NaN in the second column where there is no appropriate value for the index.
The problem I have is that index automatically becomes sorted in alphabetical order.
I have tried = pd.concat([df1,df2],axis=1, names = my_list) where my_list = [C,B,F,A,Z], but that didn't make any changes.
How can I specify index to be not sorted?
This seems to be by design, the only thing I'd suggest is to call reindex on the concatenated df and pass the index of df:
In [56]:
df = pd.DataFrame(index=['C','B','F','A','Z'], data={'a':np.arange(5)})
df
Out[56]:
a
C 0
B 1
F 2
A 3
Z 4
In [58]:
df1 = pd.DataFrame(index=['C','B','Z'], data={'b':np.random.randn(3)})
df1
Out[58]:
b
C -0.146799
B -0.227027
Z -0.429725
In [67]:
pd.concat([df,df1],axis=1).reindex(df.index)
Out[67]:
a b
C 0 -0.146799
B 1 -0.227027
F 2 NaN
A 3 NaN
Z 4 -0.429725