Pulling a value from DataFrame based on another value

Pulling a value from DataFrame based on another value - python

I'm playing around with the Facebook Ads API, I've pulled campaign data for one of my campaigns. If I have this dataframe:
[<Insights> {
"actions": [
{
"action_type": "custom_event_abc",
"value": 50
},
{
"action_type": "custom_event_def",
"value": 42
},]
How would I go about getting the value for custom_event_def out?
In my wider results, I first used (df.loc[0]['actions'][1]['value']) in my code which worked, but my issue with that is that custom_event_abc doesn't always appear and so the position of custom_event_defcan change; meaning my solution only works some of the time.
Can value (42) be pulled out using a reference to the action_type?

This will first create a dictionary actions with the content of "actions", iterate through all value it to find custom_event_def and then print the corresponding value
actions = df.loc[0]['actions']
for i, elem in enumerate(actions):
if elem['action_type'] == "custom_event_def":
print(actions[i]['value'])

Related

How to normalize complex JSON structure with 4 levels of nested arrays?

As I'm fairly new to python I've tried various ways based on answers found here but can't manage to normalize my json file.
As I checked in Postman it has 4 levels of nested arrays. For suppliers I want to expand all the levels of data.The problem I have is with the score_card subtree for which I want to pull out all risk_groups , then name,risk_score and all riskstogether with name,risk_score and indicators with the deepest level containing again name and risk_score.
and I'm not sure whether it is possible in one go based on ambiguous names between levels.
Below I'm sharing a reproducible example:
data = {"suppliers": [
{
"id": 1260391,
"name": "2712270 MANITOBA LTD",
"duns": "rm-071-7291",
"erp_number": "15189067;15189067",
"material_group_ids": [
176069
],
"business_unit_ids": [
13728
],
"responsible_user_ids": [
37761,
37760,
37759,
37758,
37757,
37756,
36520,
37587,
36494,
22060,
37742,
36446,
36289
],
"address": {
"address1": "BOX 550 NE 16-20-26",
"address2": None,
"zip_code": "R0J 1W0",
"city": "RUSSELL",
"state": None,
"country_code": "ca",
"latitude": 50.7731176,
"longitude": -101.2862461
},
"score_card": {
"risk_score": 26.13,
"risk_groups": [
{
"name": "Viability",
"risk_score": 43.33,
"risks": [
{
"name": "Financial stability supplier",
"risk_score": None,
"indicators": [
{
"name": "Acquisitions",
"risk_score": None
}
]
}
]
},
]
}
}
]
}
And here is how it should look:
expected = [[1260391,'2712270 MANITOBA LTD','rm-071-7291',176069,13728,[37761,37760,
37759,37758,37757,37756,36520,37587,36494,22060,37742,36446,36289],
'BOX 550 NE 16-20-26','None','R0J 1W0','RUSSELL','None','ca',50.7731176,
-101.2862461,26.13,'Viability',43.33,'Financial stability supplier','None',
'Acquisitions','None']]
df = pd.DataFrame(expected,columns=['id','name','duns','material_groups_ids',
'business_unit_ids','responsible_user_ids',
'address.address1','address.address2','address.zip_code',
'address.city','address.state','address.country_code','address.latitude',
'address.longitude','risk_score','risk_groups.name',
'risk_groups.risk_score','risk_groups.risks.name',
'risk_groups.risks.risk_score','risk_groups.risks.indicators.name',
'risk_groups.risks.indicators.risk_score'])
What I tried since last 3 days is to use json_normalize:
example = pd.json_normalize(data['suppliers'],
record_path=['score_card','risk_groups','risks',
'indicators'],meta = ['id','name','duns','erp_number','material_group_ids',
'business_unit_ids','responsible_user_ids','address',['score_card','risk_score']],record_prefix='_')
but when I specify the fields for record_path which seems to be a vital param here it always return this prompt:
TypeError: {'name': 'Acquisitions', 'risk_score': None} has non list value Acquisitions for path name. Must be list or null.
What I want to get a a table with columns as presented below:
So it seems that python doesn't know how to treat most granular level which contains null values.
I've tried approach from here: Use json_normalize to normalize json with nested arrays
but unsuccessfully.
I would like to get the same view which I was able to 'unpack' in Power BI but this way of having the data is not satisfying me.
All risk indicators are the components of risks which are part of risk groups as You can see on attached pictures.
Going to the most granular level it goes this way: Risk groups -> Risks -> Indicators so I would like to unpack it all.
Any help more than desperately needed. Kindly thank You for any support here.

Google Sheets Python batchUpdate repeatCell -> issue with range and number format

I am trying to use the google sheets api for python to format only a specific columns results to a "NUMBER" type but am struggling to get it to work properly. Am I doing something wrong with the "range" block? There are values that are getting appended to the column and when they get appended (via a different api set) they do not come back as formatted numbers that, when highlighting the entire column, result in a numbered sum.
id_sampleforstackoverflow = 'abcdefg123xidjadsfh192810'
cost_sav_body = {
"requests": [
{
"repeatCell": {
"range": {
"sheetId": 0,
"startRowIndex": 2,
"endRowIndex": 6,
"startColumnIndex": 0,
"endColumnIndex": 6
},
"cell": {
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#.0#;#.0#"
}
}
},
"fields": "userEnteredFormat.numberFormat"
}
}
]
}
cost_sav_sum = service.spreadsheets().batchUpdate(spreadsheetId=id_sampleforstackoverflow, body=cost_sav_body).execute()
So when I run the above with the rest of my code, the values get appended, however, when highlighting the column, it simply gives me a count of the objects, and not a formatted number summing the total of the values (i.e. there are three values of -24, but only see a "Count" of 3 instead of -72).
I am using the GCP recommendations api for machineType to append the cost projection -> costs -> units value to the column (they append for example like i.e. -24).
Can someone help?
Documentation I have already gone through:
https://cloud.google.com/blog/products/application-development/formatting-cells-with-the-google-sheets-api
https://developers.google.com/sheets/api/guides/formats
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#GridRange

#all
I was able to figure out the problem. When doing straight reporting of the values for the cost (as explained above as an objective) I was converting the output to string using the str() python method. I removed that str() method and kept the rest of the code you see above and now things are posting correctly:
#spend = str(element.primary_impact.cost_projection.cost.units)
spend = element.primary_impact.cost_projection.cost.units
So FYI for anyone else wondering, make sure that str() method is not used if you need to do a custom formatting code to those particular cells!

Getting index of a value inside a json file PYTHON

I have a sizable json file and i need to get the index of a certain value inside it. Here's what my json file looks like:
data.json
[{...many more elements here...
},
{
"name": "SQUARED SOS",
"unified": "1F198",
"non_qualified": null,
"docomo": null,
"au": "E4E8",
"softbank": null,
"google": "FEB4F",
"image": "1f198.png",
"sheet_x": 0,
"sheet_y": 28,
"short_name": "sos",
"short_names": [
"sos"
],
"text": null,
"texts": null,
"category": "Symbols",
"sort_order": 167,
"added_in": "0.6",
"has_img_apple": true,
"has_img_google": true,
"has_img_twitter": true,
"has_img_facebook": true
},
{...many more elements here...
}]
How can i get the index of the value "FEB4F" whose key is "google", for example?
My only idea was this but it doesn't work:
print(data.index('FEB4F'))

Your basic data structure is a list, so there's no way to avoid looping over it.
Loop through all the items, keeping track of the current position. If the current item has the desired key/value, print the current position.
position = 0
for item in data:
if item.get('google') == 'FEB4F':
print('position is:', position)
break
position += 1

Assuming your data can fit in an table, I recommend using pandas for that. Here is the summary:
Read de data using pandas.read_json
Identify witch column to filter
Filter using pandas.DataFrame.loc
IE:
import pandas as pd
data = pd.read_json("path_to_json.json")
print(data)
#lets assume you want to filter using the 'unified' column
filtered = data.loc[data['unified'] == 'something']
print(filtered)
Of course the steps would be different depending on the JSON structure

How to compare a json with a CSV file

I have one json payload which is used for one service request. After processing that payload(JSON) will be stored in S3 and through Athena we can download those data in CSV format. Now in the actual scenario, there are more than 100 fields. I want to verify their value through some automated script instead of manual.
say my sample payload is similar to the following:
{
"BOOK": {
"serialno": "123",
"author": "xyz",
"yearofpublish": "2015",
"price": "16"
}, "Author": [
{
"isbn": "xxxxx", "title": "first", "publisher": "xyz", "year": "2020"
}, {
"isbn": "yyyy", "title": "second", "publisher": "zmy", "year": "2019"
}
]
}
the sample csv will be like following:
Can anyone please help me how exactly I can do it on Python? Maybe the library or dictionary?

it looks like you just want to flatten out the JSON structure. It'll be easiest to loop over the "Author" list. Since the CSV has renamed the columns you'll need some way to represent that mapping. Based only on example, this works:
import json
fin=open(some_json_file, 'r')
j=json.load(fin)
result=[]
for author in j['Author']:
val = {'book_serialno': j['BOOK']['serialno'],
'book_author': j['BOOK']['author'],
'book_yearofpublish': j['BOOK']['yearofpublish'],
'book_price': j['BOOK']['price'],
'author_isbn': author['isbn'],
'author_title': author['title'],
'author_publisher': author['publisher'],
'author_year': author['year']}
result.append(val)
This is using a dictionary to show the mapping of data points to the new column names. You might be able to get away with using a list as well. Depends how you want to use it later on. To write to a CSV:
import csv
fout=open(some_csv_file, 'w')
writer=csv.writer(fout)
writer.writerow(result[0].keys())
writer.writerows(r.values() for r in result)
This writes the column names in the first row, then the data. If you don't want the column names, just leave out the writerow(...) line.

Setting column in Google Sheets API (with Python) to be number-formatted

I'm trying to format a column of numbers in Google Sheets using the API (Sheets API v.4 and Python 3.6.1, specifically). A portion of my non-functional code is below. I know it's executing, as the background color of the column gets set, but the numbers still show as text, not numbers.
Put another way, I'm trying to get the equivalent of clicking on a column header (A, B, C, or whatever) then choosing the Format -> Number -> Number menu item in the GUI.
def sheets_batch_update(SHEET_ID,data):
print ( ("Sheets: Batch update"))
service.spreadsheets().batchUpdate(spreadsheetId=SHEET_ID,body=data).execute() #,valueInputOption='RAW'
data={
"requests": [
{
"repeatCell": {
"range": {
"sheetId": all_sheets['Users'],
"startColumnIndex": 19,
"endColumnIndex": 20
},
"cell": {
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#,##0",
},
"backgroundColor": {
"red": 0.0,
"green": 0.4,
"blue": 0.4
},
}
},
"fields": "userEnteredFormat(numberFormat,backgroundColor)"
}
},
]
}
sheets_batch_update(SHEET_ID, data)

The problem is likely that your data is currently stored as strings and therefore not affected by the number format.
"userEnteredValue": {
"stringValue": "1000"
},
"formattedValue": "1000",
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#,##0"
}
},
When you set a number format via the UI (Format > Number > ...) it's actually doing two things at once:
Setting the number format.
Converting string values to number values, if possible.
Your API call is only doing #1, so any cells that are currently set with a string value will remain a string value and will therefore be unaffected by the number format. One solution would be to go through the affected values and move the stringValue to a numberValue if the cell contains a number.

To flesh out the answer from Eric Koleda a bit more, I ended up solving this two ways, depending on how I was getting the data for the Sheet:
First, if I was appending cells to the sheet, I used a function:
def set_cell_type(cell_contents):
current_cell_contents=str(cell_contents).replace(',', '')
float_cell=re.compile("^\d+\.\d+$")
int_cell=re.compile("^\d+$")
if int_cell.search(current_cell_contents):
data = {"userEnteredValue": {"numberValue": int(current_cell_contents)}}
elif float_cell.search(current_cell_contents):
data = {"userEnteredValue": {"numberValue": float(current_cell_contents)}}
else:
data = {"userEnteredValue": {"stringValue": str(cell_contents)}}
return data
To format the cells properly. Here's the call that actually did the appending:
rows = [{"values": [set_cell_type(cell) for cell in row]} for row in daily_data_output]
data = { "requests": [ { "appendCells": { "sheetId": all_sheets['Daily record'], "rows": rows, "fields": "*", } } ], }
sheets_batch_update(SHEET_ID,data)
Second, if I was replacing a whole sheet, I did:
#convert the ints to ints and floats to floats
float_cell=re.compile("^\d+\.\d+$")
int_cell=re.compile("^\d+$")
row_list=error_message.split("\t")
i=0
while i < len(row_list):
current_cell=row_list[i].replace(',', '') #remove the commas from any numbers
if int_cell.search(current_cell):
row_list[i]=int(current_cell)
elif float_cell.search(current_cell):
row_list[i]=float(current_cell)
i+=1
error_output.append(row_list)
then the following to actually save error_output to the sheet:
data = {'values': [row for row in error_output]}
sheets_update(SHEET_ID,data,'Errors!A1')
those two techniques, coupled with the formatting calls I had already figured out in my initial question, did the trick.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pulling a value from DataFrame based on another value - python

Related

How to normalize complex JSON structure with 4 levels of nested arrays?

Google Sheets Python batchUpdate repeatCell -> issue with range and number format

Getting index of a value inside a json file PYTHON

How to compare a json with a CSV file

Setting column in Google Sheets API (with Python) to be number-formatted

Categories

Resources