import pandas as pd
dict = {
'1': 'Alb',
'2': 'Bnk',
'3': 'Cd'
}
df = pd.DataFrame(
{
'col1': {
0: 20,
1: 2,
2: 10,
3: 2,
4: 44
},
'col2': {
0:'a',
1:'b',
2:'c',
3:'b',
4:20
}
}
)
I want to replace col1 value 2 with 'Bnk' if col2 value == 'b'
How can this be done?
Thanks
There are several ways to do this but for clarity you can use apply:
import pandas as pd
dict = {
1: 'Alb',
2: 'Bnk',
3: 'Cd'
}
df = pd.DataFrame(
{
'col1': {
0: 20,
1: 2,
2: 10,
3: 2,
4: 44
},
'col2': {
0:'a',
1:'b',
2:'c',
3:'b',
4:20
}
}
)
def change(data, col2val, dict):
if data['col2'] == col2val:
data['col1'] = dict[data['col1']]
return data
new_df = df.apply(change, axis = 1, col2val = 'b', dict = dict)
print(new_df)
I also modified the dict to have integer keys for simplicity.
Output:
col1 col2
0 20 a
1 Bnk b
2 10 c
3 Bnk b
4 44 20
Related
I have a DF that looks like this.
df = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3}, 'Value': {0: 'a', 1: 'b', 2: np.nan}})
ID
Value
0
1
a
1
2
b
2
3
c
I'd like to create a dictionary out of it.
So if I run df.to_dict('records'), it gives me
[{'Visual_ID': 1, 'Customer': 'a'},
{'Visual_ID': 2, 'Customer': 'b'},
{'Visual_ID': 3, 'Customer': 'c'}]
​However, what I want is the following.
{
1: 'a',
2: 'b',
3: 'c'
}
All of the rows in the DF or unique, so it shouldn't run into same key names issue.
Try with
d = dict(zip(df.ID, df.Value))
Hello I have dataframe such as :
COL1 COL2 COL3
G1 1 [[(OK2_+__HELLO,OJ_+__BY),(LO_-__HOLLA,KUOJ_+__BY)]]
G1 2 [[(JU3_+__BO,UJ3_-__GET)]]
how can I use re.sub(r'.*__', '') within the COL3 sulist ?
and get a new column without evrything before '__':
COL1 COL2 COL3 COL4
G1 1 [[(OK2_+__HELLO,OJ_+__BY),(LO_-__HOLLA,KUOJ_+__BY)]] [[(HELLO,OBY),(HOLLA,BY)]]
G1 2 [[(JU3_+__BO,UJ3_-__GET)]] [(BO,GET)]]
here is the data :
data= {'COL1': {0: 'G1', 1: 'G1'}, 'COL2': {0: 1, 1: 2}, 'COL3 ': {0: "[[(OK2_+__HELLO,OJ_+__BY),(LO_-__HOLLA,KUOJ_+__BY)]]", 1: "[[(JU3_+__BO,UJ3_-__GET)]]"}}
df = pd.DataFrame.from_dict(data)
Updated data solution
data= {'COL1': {0: 'G1', 1: 'G1'}, 'COL2': {0: 1, 1: 2}, 'COL3 ': {0: "[[(OK2_+__HELLO,OJ_+__BY),(LO_-__HOLLA,KUOJ_+__BY)]]", 1: "[[(JU3_+__BO,UJ3_-__GET)]]"}}
df = pd.DataFrame.from_dict(data)
df['COL4'] = df['COL3 '].str.replace(r"([,(])[^(),]*__", r"\1")
df['COL4']
# => 0 [[(HELLO,BY),(HOLLA,BY)]]
# 1 [[(BO,GET)]]
# Name: COL4, dtype: object
See the regex demo.
Old data solution
You can use ast.literal_eval to turn the strings in the COL3 column into lists of lists and iterate over them while modifying the tuple items:
import ast
import pandas as pd
data= {'COL1': {0: 'G1', 1: 'G1'}, 'COL2': {0: 1, 1: 2}, 'COL3 ': {0: "[[('OK2_+__HELLO','OJ_+__BY'),('LO_-__HOLLA','KUOJ_+__BY')]]", 1: "[[('JU3_+__BO','UJ3_-__GET')]]"}}
df = pd.DataFrame.from_dict(data)
def repl(m):
result = []
for l in ast.literal_eval(m):
ll = []
for x, y in l:
ll.append(tuple([re.sub(r'.*__', '', x), re.sub(r'.*__', '', y)]))
result.append(ll)
return str(result)
df['COL4'] = df['COL3 '].apply(repl)
df['COL4']
# => 0 [[('HELLO', 'BY'), ('HOLLA', 'BY')]]
# 1 [[('BO', 'GET')]]
You do not need to use str(result) if you are OK to keep the result as a list of lists.
My dictionary looks like this :
dict1 = { '2020-10-11' : {
'group1':{
1 : 2356,
21 : 10001,
34 : 234
},
'group2':{
11 : 999,
2 : 101,
13 : 1234
}
},
'2020-10-12' : {
'group1':{
11 : 236,
21 : 100,
34 : 34
},
'group2':{
1 : 99,
3 : 121,
2 : 12
}
}
}
I wanted my output to look something like this :
The requirement is : for every date, the color should be different.
I have tried this using this method:
reform = {(level1_key, level2_key, level3_key): values
for level1_key, level2_dict in dict1.items()
for level2_key, level3_dict in level2_dict.items()
for level3_key, values in level3_dict.items()}
out = pd.DataFrame(reform,index = ['amount']).T
names=['date', 'group', 'id']
out.index.set_names(names, inplace=True)
out in xls :
After this how am I supposed to proceed for the color formatting in excel using python?
The first step is the entirely flatten the structure so that a 2-dimensional representation of the nested values emmerges:
dict1 = {'2020-10-11': {'group1': {1: 2356, 21: 10001, 34: 234}, 'group2': {11: 999, 2: 101, 13: 1234}}, '2020-10-12': {'group1': {11: 236, 21: 100, 34: 34}, 'group2': {1: 99, 3: 121, 2: 12}}}
def flatten(d, c = []):
flag = True
for a, b in d.items():
if isinstance(b, dict):
yield from flatten(b, c=c+[a] if flag or not c else [*c[:-2],'',a])
else:
yield c+[a, b] if flag or not c else [*(['']*(len(c))),a, b]
flag = False
data = list(flatten(dict1))
#[['2020-10-11', 'group1', 1, 2356], ['', '', 21, 10001], ['', '', 34, 234], ['', 'group2', 11, 999], ['', '', 2, 101], ['', '', 13, 1234], ['2020-10-12', 'group1', 11, 236], ['', '', 21, 100], ['', '', 34, 34], ['', 'group2', 1, 99], ['', '', 3, 121], ['', '', 2, 12]]
Next, create a pd.DataFrame from the results and apply the coloring:
import pandas as pd
df = pd.DataFrame(data, columns=['Date', 'Group', 'ID', 'Amount'])
writer = pd.ExcelWriter('test_rsults12.xls', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
c_pool = iter([workbook.add_format({'bg_color': '#fff0c1'}), workbook.add_format({'bg_color': '#d5e6f5'})])
fmt = None
for i in range(len(data)):
if data[i][0]:
fmt = next(c_pool)
worksheet.set_row(i+1, cell_format=fmt)
writer.save()
Result:
I have a pandas dataframe that has information about a user with multiple orders and within each order there are multiple items purchases. An example of the dataframe format:
user_id | order_num | item_id | item_desc
1 1 1 red
1 1 2 blue
1 1 3 green
I want to convert it to JSONb Object in a column so that I can query it in postgresql.
Currently I am using the following code:
j = (reg_test.groupby(['user_id', 'order_num'], as_index=False)
.apply(lambda x: x[['item_id','item_desc']].to_dict('r'))
.reset_index()
.rename(columns={0:'New-Data'})
.to_json(orient='records'))
This is the result I am getting:
'''
[
{
"New-Data": [
{
"item_id": "1",
"item_desc": "red",
},
{
"item_id": "2",
"item_desc": "blue",
},
{
"item_id": "3",
"item_desc": "green",
}
],
"order_number": "1",
"user_id": "1"
}
]
'''
While that is correct json format, I want the result to look like this:
'''
[
{
"New-Data": [{
"1":
{
"item_id": "1",
"item_desc": "red",
},
"2": {
"item_id": "2",
"item_desc": "blue",
},
"3":
{
"item_id": "3",
"item_desc": "green",
}
}
],
"order_number": "1",
"user_id": "1"
}
]
'''
as an alternative to #rpanai's solution, i moved the processing into vanilla python :
convert dataframe to dict :
M = df.to_dict("records")
create the dict for the items
items = [
{key: value
for key, value in entry.items()
if key not in ("user_id", "order_num")}
for entry in M
]
item_details = [{str(num + 1): entry}
for num, entry
in enumerate(items)]
print(item_details)
[{'1': {'item_id': 1, 'item_desc': 'red'}},
{'2': {'item_id': 2, 'item_desc': 'blue'}},
{'3': {'item_id': 3, 'item_desc': 'green'}}]
Initialize dict and add the remaining data
d = dict()
d['New-Data'] = item_details
d['order_number'] = M[0]['order_num']
d['user_id'] = M[0]['user_id']
wrapper = [d]
print(wrapper)
[{'New-Data': [{'1': {'item_id': 1, 'item_desc': 'red'}},
{'2': {'item_id': 2, 'item_desc': 'blue'}},
{'3': {'item_id': 3, 'item_desc': 'green'}}],
'order_number': 1,
'user_id': 1}]
Have you considered to use a custom function
import pandas as pd
df = pd.DataFrame({'user_id': {0: 1, 1: 1, 2: 1},
'order_num': {0: 1, 1: 1, 2: 1},
'item_id': {0: 1, 1: 2, 2: 3},
'item_desc': {0: 'red', 1: 'blue', 2: 'green'}})
out = df.groupby(['user_id', 'order_num'])[["item_id", "item_desc"]]\
.apply(lambda x: x.to_dict("records"))\
.apply(lambda x: [{str(l["item_id"]):l for l in x}])\
.reset_index(name="New-Data")\
.to_dict("records")
where out returns
[{'user_id': 1,
'order_num': 1,
'New-Data': [{'1': {'item_id': 1, 'item_desc': 'red'},
'2': {'item_id': 2, 'item_desc': 'blue'},
'3': {'item_id': 3, 'item_desc': 'green'}}]}]
I have a df such as follows:
data = [['a', 10, 1], ['b', 15,12], ['c', 14,12]]
df = pd.DataFrame(data, columns = ['Name', 'x', 'y'])
Name x y
0 a 10 1
1 b 15 12
2 c 14 12
Now I want it to pass it to a dict where x and y are inside of a key called total:
so the final dict would be like this
{
'Name': 'a',
"total": {
"x": 308,
"y": 229
},
}
I know i can use df.to_dict('records') to get this dict:
{
'Name': 'a',
"x": 308,
"y": 229
}
Any tips?
You could try
my_dict = [{'Name': row['Name'], 'total': {'x': row['x'], 'y': row['y']}} for row in df.to_dict('records')]
Result:
[{'Name': 'a', 'total': {'x': 10, 'y': 1}}, {'Name': 'b', 'total': {'x': 15, 'y': 12}}, {'Name': 'c', 'total': {'x': 14, 'y': 12}}]
Or, if you wish to convert all columns except the 'Name' to the 'total', and provided that there are no repititions in 'Name':
df.set_index('Name', inplace=True)
result = [{'Name': name, 'total': total} for name, total in df.to_dict('index').items()]
With the same result as before.