Getting index of a value inside a json file PYTHON - python

I have a sizable json file and i need to get the index of a certain value inside it. Here's what my json file looks like:
data.json
[{...many more elements here...
},
{
"name": "SQUARED SOS",
"unified": "1F198",
"non_qualified": null,
"docomo": null,
"au": "E4E8",
"softbank": null,
"google": "FEB4F",
"image": "1f198.png",
"sheet_x": 0,
"sheet_y": 28,
"short_name": "sos",
"short_names": [
"sos"
],
"text": null,
"texts": null,
"category": "Symbols",
"sort_order": 167,
"added_in": "0.6",
"has_img_apple": true,
"has_img_google": true,
"has_img_twitter": true,
"has_img_facebook": true
},
{...many more elements here...
}]
How can i get the index of the value "FEB4F" whose key is "google", for example?
My only idea was this but it doesn't work:
print(data.index('FEB4F'))

Your basic data structure is a list, so there's no way to avoid looping over it.
Loop through all the items, keeping track of the current position. If the current item has the desired key/value, print the current position.
position = 0
for item in data:
if item.get('google') == 'FEB4F':
print('position is:', position)
break
position += 1

Assuming your data can fit in an table, I recommend using pandas for that. Here is the summary:
Read de data using pandas.read_json
Identify witch column to filter
Filter using pandas.DataFrame.loc
IE:
import pandas as pd
data = pd.read_json("path_to_json.json")
print(data)
#lets assume you want to filter using the 'unified' column
filtered = data.loc[data['unified'] == 'something']
print(filtered)
Of course the steps would be different depending on the JSON structure

Related

Create JSON from specific lines of data

I have this test data that I am trying to use to create a JSON for just select items. I have the items listed and would just like to output a JSON list with select item.
What I have:
import json
# Data to be written
dictionary ={
"id": "04",
"name": "sunil",
"department": "HR"
}
# Serializing json
x= json.dumps(dictionary,indent=0)
# JSON String
y = json.loads(x)
# Goal is to print:
{
"id": "04",
"name": "sunil"
}
If you don't need to save the department key you can use this:
del y['department']
Then your y variable will print what you wanted:
{"id": "04", "name": "sunil"}
Other ways to solve the same issue
for key in list(y.keys()):
#add all potential keys you want to remain in the final dictionary
if key == "id" or key == "name":
continue
else:
del y[key]
However, iterating over a dictionary is pretty slow. You could assign values to temporary variables and then remake the dictionary like this:
temp_id = y['id']
temp_name = y['name']
y.clear()
y['id'] = temp_id
y['name'] = temp_name
This should be faster that iterating over a dictionary.

Converting csv to nested Json using python

I want to convert csv file to json file.
I have large data in csv file.
CSV Column Structure
This is my column structure in csv file . I has 200+ records.
id.oid libId personalinfo.Name personalinfo.Roll_NO personalinfo.addr personalinfo.marks.maths personalinfo.marks.physic clginfo.clgName clginfo.clgAddr clginfo.haveCert clginfo.certNo clginfo.certificates.cert_name_1 clginfo.certificates.cert_no_1 clginfo.certificates.cert_exp_1 clginfo.certificates.cert_name_2 clginfo.certificates.cert_no_2 clginfo.certificates.cert_exp_2 clginfo.isDept clginfo.NoofDept clginfo.DeptDetails.DeptName_1 clginfo.DeptDetails.location_1 clginfo.DeptDetails.establish_date_1 _v updatedAt.date
Expected Json
[{
"id":
{
"$oid": "00001"
},
"libId":11111,
"personalinfo":
{
"Name":"xyz",
"Roll_NO":101,
"addr":"aa bb cc ddd",
"marks":
[
"maths":80,
"physic":90
.....
]
},
"clginfo"
{
"clgName":"pqr",
"clgAddr":"qwerty",
"haveCert":true, //this is boolean true or false
"certNo":1, //this could be 1-10
"certificates":
[
{
"cert_name_1":"xxx",
"cert_no_1":12345,
"cert_exp.1":"20/2/20202"
},
{
"cert_name_2":"xxx",
"cert_no_2":12345,
"cert_exp_2":"20/2/20202"
},
......//could be up to 10
],
"isDept":true, //this is boolean true or false
"NoofDept":1 , //this could be 1-10
"DeptDetails":
[
{
"DeptName_1":"yyy",
"location_1":"zzz",
"establish_date_1":"1/1/1919"
},
......//up to 10 records
]
},
"__v": 1,
"updatedAt":
{
"$date": "2022-02-02T13:35:59.843Z"
}
}]
I have tried using pandas but I'm getting output as
My output
[{
"id.$oid": "00001",
"libId":11111,
"personalinfo.Name":"xyz",
"personalinfo.Roll_NO":101,
"personalinfo.addr":"aa bb cc ddd",
"personalinfo.marks.maths":80,
"personalinfo.marks.physic":90,
"clginfo.clgName":"pqr",
"clginfo.clgAddr":"qwerty",
"clginfo.haveCert":true,
"clginfo.certNo":1,
"clginfo.certificates.cert_name_1":"xxx",
"clginfo.certificates.cert_no_1":12345,
"clginfo.certificates.cert_exp.1":"20/2/20202"
"clginfo.certificates.cert_name_2":"xxx",
"clginfo.certificates.cert_no_2":12345,
"clginfo.certificates.cert_exp_2":"20/2/20202"
"clginfo.isDept":true,
"clginfo.NoofDept":1 ,
"clginfo.DeptDetails.DeptName_1":"yyy",
"clginfo.DeptDetails.location_1":"zzz",
"eclginfo.DeptDetails.stablish_date_1":"1/1/1919",
"__v": 1,
"updatedAt.$date": "2022-02-02T13:35:59.843Z",
}]
I am new to python I only know the basic Please help me getting this output.
200+ records is really tiny, so even naive solution is good.
It can't be totally generic because I don't see how it can be seen from the headers that certificates is a list, unless we rely on all names under certificates having _N at the end.
Proposed solution using only basic python:
read header row - split all column names by period. Iterate over resulting list and create nested dicts with appropriate keys and dummy values (if you want to handle lists: create array if current key ends with _N and use N as an index)
for all rows:
clone dictionary with dummy values
for each column use split keys from above to put the value into the corresponding dict. same solution from above for lists.
append the dictionary to list of rows

How to convert dynamic nested json into csv?

I have some dynamically generated nested json that I want to convert to a CSV file using python. I am trying to use pandas for this. My question is - is there a way to use this and flatten the json data to put in the csv without knowing the json keys that need flattened in advance? An example of my data is this:
{
"reports": [
{
"name": "report_1",
"details": {
"id": "123",
"more info": "zyx",
"people": [
"person1",
"person2"
]
}
},
{
"name": "report_2",
"details": {
"id": "123",
"more info": "zyx",
"actions": [
"action1",
"action2"
]
}
}
]
}
More nested json objects can be dynamically generated in the "details" section that I do not know about in advance but need to be represented in their own cell in the csv.
For the above example, I'd want the csv to look something like this:
Name, Id, More Info, People_1, People_2, Actions_1, Actions_2
report_1, 123, zxy, person1, person2, ,
report_2, 123, zxy , , , action1 , action2
Here's the code I have:
data = json.loads('{"reports": [{"name": "report_1","details": {"id": "123","more info": "zyx","people": ["person1","person2"]}},{"name": "report_2","details": {"id": "123","more info": "zyx","actions": ["action1","action2"]}}]}')
df = pd.json_normalize(data['reports'])
df.to_csv("test.csv")
And here is the outcome currently:
,name,details.id,details.more info,details.people,details.actions
0,report_1,123,zyx,"['person1', 'person2']",
1,report_2,123,zyx,,"['action1', 'action2']"
I think what your are lookig for is:
https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
If using pandas doesn't work for you, here's the more canonical Python way of doing it.
You're trying to write out a CSV file, and that implicitly means you must write out a header containing all the keys.
The constraint that you don't know the keys in advance means you can't do this in a single pass.
def convert_record_to_flat_dict(record):
# You need to figure out exactly how you want to do this; everything
# should be strings.
record.update(record.pop('details'))
return record
header = {}
rows = [[]] # Leave the header row blank for now.
csv_out = csv.writer(buffer)
for record in report:
record = convert_record_to_flat_dict(record)
for key in record.keys():
if key not in header:
header[key] = len(header)
rows[0].append(key)
row = [''] * len(header)
for key, index in header.items():
row[index] = record.get(key, '')
rows.append(row)
# And you can go back to ensure all rows have the same number of keys:
for row in rows:
row.extend([''] * (len(row) - len(header)))
Now you have a list of lists that's ready to be sent to csv.csvwriter() or the like.
If memory is an issue, another technique is to write out a temporary file and then reprocess it once you know the header.

How to extract data from JSON file for each entry?

So I am using the JSON package in python to extract data from generated JSON which would essentially fetched data from a firebase database which was then generated as a JSON file.
Within the given data set I want to extract all of the data corresponding to bills in each entry within the JSON file. For that I created a separate dictionary to add all of the elements corresponding to bills in the dataset.
When converted to CSV, the dataset looks like this:
csv for one entry
So I have the following code to do above operation. But as I create a new dictionary, there are certain entries which have null values designated as [] (see the csv file). I assigned list to store all those bills which would have the data in the bills column (essentially avoiding all the null entries). But as a I create a new list the required output is only getting stored in the first index of the new list or array. Please see the code below.
My code is as below:
filedata = open('requireddataset.json','r') data = json.load(filedata)
listoffields = [] # To produce it into a list with fields for dic
for dic in data:
try:
listoffields.append(dic['bills']) # only non-essential bill categories.
except KeyError:
pass
#print (listoffields[3]) # This would return the first payment entry within
# the JSON Array of objects.
for val in listoffields:
if val!=[]:
x = val[0] # only val[0] would contain data
#print (x)
myarray = np.array(val)
print(myarray[0]) # All of the data stored in only one index, any way to change this?
This is the output : output
This is how the original JSON file looks like : requireddataset.json
Essentially my question is the list listoffields would contain all the fields in it(from the JSON file), and bills in one of the fields. And within the column bills each entry again contains id, value, role and many other entries. Is there any way to extract only values from this and produce sum .
In the JSON file this is how it looks like for one entry :
[{"goal_savings": 0.0, "social_id": "", "score": 0, "country": "BR", "photo": "http://graph.facebook", "id": "", "plates": 3, "rcu": null, "name": "", "email": ".", "provider": "facebook", "phone": "", "savings": [], "privacyPolicyAccepted": true, "currentRole": "RoleType.PERSONAL", "empty_lives_date": null, "userId": "", "authentication_token": "-------", "onboard_status": "ONBOARDING_WIZARD", "fcmToken": ----------", "level": 1, "dni": "", "social_token": "", "lives": 10, "bills": [{"date": "2020-12-10", "role": "RoleType.PERSONAL", "name": "Supermercado", "category": "feeding", "periodicity": "PeriodicityType.NONE", "value": 100.0"}], "payments": [], "goals": [], "goalTransactions": [], "incomes": [], "achievements": [{"created_at":", "name": ""}]}]

Convert python nested JSON-like data to dataframe

My records looks like this and I need to write it to a csv file:
my_data={"data":[{"id":"xyz","type":"book","attributes":{"doc_type":"article","action":"cut"}}]}
which looks like json, but the next record starts with "data" and not "data1" which forces me to read each record separately. Then, I convert it to a dict using eval(), to iterate thru keys and values for a certain path to get to the values I need. Then, I generate a list of keys and values based on the keys I need. Then, a pd.dataframe() converts that list into a dataframe which I know how to convert to csv. My code that works is below. But I am sure there are better ways to do this. Mine scales poorly. Thx.
counter=1
k=[]
v=[]
res=[]
m=0
for line in f2:
jline=eval(line)
counter +=1
for items in jline:
k.append(jline[u'data'][0].keys())
v.append(jline[u'data'][0].values())
print 'keys are:', k
i=0
j=0
while i <3 :
while j <3:
if k[i][j]==u'id':
res.append(v[i][j])
j += 1
i += 1
#res is my result set
del k[:]
del v[:]
Changing my_data to be:
my_data = [{"id":"xyz","type":"book","attributes":{"doc_type":"article","action":"cut"}}, # Data One
{"id":"xyz2","type":"book","attributes":{"doc_type":"article","action":"cut"}}, # Data Two
{"id":"xyz3","type":"book","attributes":{"doc_type":"article","action":"cut"}}] # Data Three
You can dump this directly into a dataframe as so:
mydf = pd.DataFrame(my_data)
It's not clear what your data path would be, but if you are looking for specific combinations of id, type, etc. You could explicitly search
def find_my_way(data, pattern):
# pattern = {'id':'someid', 'type':'sometype'...}
res = []
for row in data:
if row.get('id') == pattern.get('id'):
res.append(row)
return row
mydf = pd.DataFrame(find_my_way(mydata, pattern))
EDIT:
Without going into how the api works, in pseudo-code, you'll want to do something like the following:
my_objects = []
calls = 0
while calls < maximum:
my_data = call_the_api(params)
data = my_data.get('data')
if not data:
calls+=1
continue
# Api calls to single objects usually return a dictionary, to group objects they return lists. This handles both cases
if isinstance(data, list):
my_objects = [*data, *my_objects]
elif isinstance(data, {}):
my_objects = [{**data}, *my_objects]
# This will unpack the data response into a list that you can then load into a DataFrame with the attributes from the api as the columns
df = pd.DataFrame(my_objects)
Assuming your data from the api looks like:
"""
{
"links": {},
"meta": {},
"data": {
"type": "FactivaOrganizationsProfile",
"id": "Goog",
"attributes": {
"key_executives": {
"source_provider": [
{
"code": "FACSET",
"descriptor": "FactSet Research Systems Inc.",
"primary": true
}
]
}
},
"relationships": {
"people": {
"data": {
"type": "people",
"id": "39961704"
}
}
}
},
"included": {}
}
"""
per the documentation, which is why I'm using my_data.get('data').
That should get you all of the data (unfiltered) into a DataFrame
Saving the DataFrame for the last bit is a bit more memory friendly

Categories