The gist of this post is that I have "23" in my original data, and I want "23" in my resulting dict (not "23.0"). Here's how I've tried to handle it with Pandas.
My Excel worksheet has a coded Region column:
23
11
27
(blank)
25
Initially, I created a dataframe and Pandas set the dtype of Region to float64*
import pandas as pd
filepath = 'data_file.xlsx'
df = pd.read_excel(filepath, sheetname=0, header=0)
df
23.0
11.0
27.0
NaN
25.0
Pandas will convert the dtype to object if I use fillna() to replace NaN's with blanks which seems to eliminate the decimals.
df.fillna('', inplace=True)
df
23
11
27
(blank)
25
Except I still get decimals when I convert the dataframe to a dict:
data = df.to_dict('records')
data
[{'region': 23.0,},
{'region': 27.0,},
{'region': 11.0,},
{'region': '',},
{'region': 25.0,}]
Is there a way I can create the dict without the decimal places? By the way, I'm writing a generic utility, so I won't always know the column names and/or value types, which means I'm looking for a generic solution (vs. explicitly handling Region).
Any help is much appreciated, thanks!
The problem is that after fillna('') your underlying values are still float despite the column being of type object
s = pd.Series([23., 11., 27., np.nan, 25.])
s.fillna('').iloc[0]
23.0
Instead, apply a formatter, then replace
s.apply('{:0.0f}'.format).replace('nan', '').to_dict()
{0: '23', 1: '11', 2: '27', 3: '', 4: '25'}
Using a custom function, takes care of integers and keeps strings as strings:
import pprint
def func(x):
try:
return int(x)
except ValueError:
return x
df = pd.DataFrame({'region': [1, 2, 3, float('nan')],
'col2': ['a', 'b', 'c', float('nan')]})
df.fillna('', inplace=True)
pprint.pprint(df.applymap(func).to_dict('records'))
Output:
[{'col2': 'a', 'region': 1},
{'col2': 'b', 'region': 2},
{'col2': 'c', 'region': 3},
{'col2': '', 'region': ''}]
A variation that also keeps floats as floats:
import pprint
def func(x):
try:
if int(x) == x:
return int(x)
else:
return x
except ValueError:
return x
df = pd.DataFrame({'region1': [1, 2, 3, float('nan')],
'region2': [1.5, 2.7, 3, float('nan')],
'region3': ['a', 'b', 'c', float('nan')]})
df.fillna('', inplace=True)
pprint.pprint(df.applymap(func).to_dict('records'))
Output:
[{'region1': 1, 'region2': 1.5, 'region3': 'a'},
{'region1': 2, 'region2': 2.7, 'region3': 'b'},
{'region1': 3, 'region2': 3, 'region3': 'c'},
{'region1': '', 'region2': '', 'region3': ''}]
You could add: dtype=str
import pandas as pd
filepath = 'data_file.xlsx'
df = pd.read_excel(filepath, sheetname=0, header=0, dtype=str)
Related
I am currently importing a file as so:
df= pd.read_csv(r"Test.csv")
And the output looks like
Type Value
0 Food_Place_1 1
1 Food_Place_2 2
2 Car_Type_1 3
3 Car_Type_2 4
I would like to iterate through this df and depending on the Type column allocated to a dictionary like this
food_type_dict = {'Type': ['Food_Place_1', 'Food_Place_2'], 'Value': [1, 2]}
car_type_dict = {'Type': ['Car_Type_1', 'Car_Type_2'], 'Value': [3, 4]}
My plan was to convert the entire dataframe into a dictionary and filter from there. However when I try to convert using this, the output is not what I was expecting. I can't seem to remove the Value header from the dictionary
df_dict = df.set_index(['Type']).T.to_dict('dict')
Output
{'A1': {'Value': 1},....}
Create category for possible aggregate lists for nested dictionary:
#If category is set by remove digits
cat = df['Type'].str.replace('\d','')
#If category is set by first letter
#cat = df['Type'].str[0]
d = df.rename(columns={'Type':'Component'}).groupby(cat).agg(list).to_dict('index')
print (d)
{'A': {'Component': ['A1', 'A2'], 'Value': [1, 2]},
'B': {'Component': ['B1', 'B2'], 'Value': [3, 4]}}
Then instead a_type_dict use d['A'], b_type_dict use d['B'].
EDIT:
cat = df['Type'].str.split('_').str[0]
d = df.rename(columns={'Type':'Component'}).groupby(cat).agg(list).to_dict('index')
print (d)
{'Car': {'Component': ['Car_Type_1', 'Car_Type_2'], 'Value': [3, 4]},
'Food': {'Component': ['Food_Place_1', 'Food_Place_2'], 'Value': [1, 2]}}
I have a pandas dataframe as below. I'm just wondering if there's any way to have my column values as my key to the json.
df:
|symbol | price|
|:------|------|
|a. |120|
|b. |100|
|c |200|
I expect the json to look like {'a': 120, 'b': 100, 'c': 200}
I've tried the below and got the result as {symbol: 'a', price: 120}{symbol: 'b', price: 100}{symbol: 'c', price: 200}
df.to_json('price.json', orient='records', lines=True)
Let's start by creating the dataframe that OP mentions
import pandas as pd
df = pd.DataFrame({'symbol': ['a', 'b', 'c'], 'price': [120, 100, 200]})
Considering that OP doesn't want the JSON values as a list (as OP commented here), the following will do the work
df.groupby('symbol').price.apply(lambda x: x.iloc[0]).to_dict()
[Out]: {'a': 120, 'b': 100, 'c': 200}
If one wants the JSON values as a list, the following will do the work
df.groupby('symbol').price.apply(list).to_json()
[Out]: {"a":[120],"b":[100],"c":[200]}
Try like this :
import pandas as pd
d = {'symbol': ['a', 'b', 'c'], 'price': [120, 100, 200]}
df = pd.DataFrame(data=d)
print(df)
print (df.set_index('symbol').rename(columns={'price':'json_data'}).to_json())
# EXPORT TO FILE
df.set_index('symbol').rename(columns={'price':'json_data'}).to_json('price.json')
Output :
symbol price
0 a 120
1 b 100
2 c 200
{"json_data":{"a":120,"b":100,"c":200}}
I have a dataframe:
values
NaN
NaN
[1,2,5]
[2]
[5]
And a dictionary
{nan: nan,
'1': '10',
'2': '11',
'5': '12',}
The dataframe contains keys from the dictionary.
How can I replace these keys with the corresponding values from the same dictionary?
Output:
values
NaN
NaN
[10,11,12]
[11]
[12]
I have tried
so_df['values'].replace(my_dictionary, inplace=True)
so_df.head()
You can use apply() method of pandas df. Check the implementation below:
import pandas as pd
import numpy as np
df = pd.DataFrame([np.nan,
np.nan,
['1', '2', '5'],
['2'],
['5']], columns=['values'])
my_dict = {np.nan: np.nan,
'1': '10',
'2': '11',
'5': '12'}
def update(row):
if isinstance(row['values'], list):
row['values'] = [my_dict.get(val) for val in row['values']]
else:
row['values'] = my_dict.get(row['values'])
return row
df = df.apply(lambda row: update(row), axis=1)
Simple implementation. Just make sure if your dataframe contains string, your dictionary keys also contains string.
Try:
df['values']=pd.to_numeric(df['values'].explode().astype(str).map(my_dict), errors='coerce').groupby(level=0).agg(list)
Setup
import numpy as np
df=pd.DataFrame({'values':[np.nan,np.nan,[1,2,5],[2],5]})
my_dict={np.nan: np.nan, '1': '10', '2': '11', '5': '12'}
Use Series.explode with Series.map
df['values']=( df['values'].explode()
.astype(str)
.map(my_dict)
.dropna()
.astype(int)
.groupby(level = 0)
.agg(list) )
If there are others strings in your values column you would need pd.to_numeric with errors = coerce, to keep it you should do:
df['values']=(pd.to_numeric( df['values'].explode()
.astype(str)
.replace(my_dict),
errors = 'coerce')
.dropna()
.groupby(level = 0)
.agg(list)
.fillna(df['values'])
)
Output
values
0 NaN
1 NaN
2 [10, 11, 12]
3 [11]
4 [12]
UPDATE
solution without explode
df['values']=(pd.to_numeric( df['values'].apply(pd.Series)
.stack()
.reset_index(level=1,drop=1)
.astype(str)
.replace(my_dict),
errors = 'coerce')
.dropna()
.groupby(level = 0)
.agg(list)
.fillna(df['values'])
)
I have a simple DataFrame:
Name Format
0 cntry int
1 dweight str
2 pspwght str
3 pweight str
4 nwspol str
I want a dictionairy as such:
{
"cntry":"int",
"dweight":"str",
"pspwght":"str",
"pweight":"str",
"nwspol":"str"
}
Where dict["cntry"] would return int or dict["dweight"] would return str.
How could I do this?
How about this:
import pandas as pd
df = pd.DataFrame({'col_1': ['A', 'B', 'C', 'D'], 'col_2': [1, 1, 2, 3], 'col_3': ['Bla', 'Foo', 'Sup', 'Asdf']})
res_dict = dict(zip(df['col_1'], df['col_3']))
Contents of res_dict:
{'A': 'Bla', 'B': 'Foo', 'C': 'Sup', 'D': 'Asdf'}
You're looking for DataFrame.to_dict()
From the documentation:
>>> df = pd.DataFrame({'col1': [1, 2],
... 'col2': [0.5, 0.75]},
... index=['row1', 'row2'])
>>> df
col1 col2
row1 1 0.50
row2 2 0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
You can always invert an internal dictionary if it's not mapped how you'd like it to be:
inv_dict = {v: k for k, v in original_dict['Name'].items()}
I think you want is:
df.set_index('Name').to_dict()['Format']
Since you want to use the values in the Name column as the keys to your dict.
Note that you might want to do:
df.set_index('Name').astype(str).to_dict()['Format']
if you want the values of the dictionary to be strings.
I'm trying to change the values of only certain values in a dataframe:
test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a':2}
test['col2'] = np.where(test.col1 == 'a', test.col1.map(lambda x: dict_curr[x]), test.col2)
However, this doesn't seem to work because even though I'm looking only at the values in col1 that are 'a', the error says
KeyError: 'b'
Implying that it also looks at the values of col1 with values 'b'. Why is this? And how do I fix it?
The error is originating from the test.col1.map(lambda x: dict_curr[x]) part. You look up the values from col1 in dict_curr, which only has an entry for 'a', not for 'b'.
You can also just index the dataframe:
test.loc[test.col1 == 'a', 'col2'] = 2
The problem is that when you call np.where all of its parameters are evaluated first, and then the result is decided depending on the condition. So the dictionary is queried also for 'b' and 'c', even if those values will be discarded later. Probably the easiest fix is:
import pandas as pd
import numpy as np
test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a': 2}
test['col2'] = np.where(test.col1 == 'a', test.col1.map(lambda x: dict_curr.get(x, 0)), test.col2)
This will give the value 0 for keys not in the dictionary, but since it will be discarded later it does not matter which value you use.
Another easy way of getting the same result is:
import pandas as pd
test = pd.DataFrame({'col1': ['a', 'a', 'b', 'c'], 'col2': [1, 2, 3, 4]})
dict_curr = {'a': 2}
test['col2'] = test.apply(lambda x: dict_curr.get(x.col1, x.col2), axis=1)