How to loop over data in JSON? - python

I had a MySQL database stored that way: Company_name, employee1, employee2, employee3.
When I input a company name, the code look for the company name in my database, then loop over employee1, employee2, and employee3 to check if one of them is free in my calendar.
This was my code to check for the employees :
for i in range(3):
employee = row[i+1]
How do I do translate this loop so it can read a JSON structure?
Example of my structure:
[
{
"id": 1,
"name_company": "Acier Michel",
"inspecteur1": "Hou, L",
"inspecteur2": "Caana, C",
"inspecteur3": "Luc, C",
"type": "Water",
"location": "Laval"
},
{
"id": 2,
"name_company": "Aciers ABC Inc.",
"inspecteur1": "Vali, M",
"inspecteur2": "Alemane, K",
"inspecteur3": "laszik, M",
"type": "NA",
"location": "St-Joseph de Sorel"
}
]
I want to be able to iterate through inspecteur1, inspecteur2 and inspecteur 3.

First translate the json to python object with
import json
userList = json.loads(yourJsonString)
Then iterate on the list
for user in userList:
print(user)

The data is a list of dictionaries
Use pandas
This assumes your list of dictionaries is in a file
import pandas as pd
import json
from pathlib import Path
# path to file
p = Path(r'c:\path_to_file\test.json')
# read the file
with p.open('r', encoding='utf-8') as f:
data = json.loads(f.read())
# load into pandas
df = pd.DataFrame(data)
print(df)
id name_company inspecteur1 inspecteur2 inspecteur3 type location
1 Acier Michel Hou, L Caana, C Luc, C Water Laval
2 Aciers ABC Inc. Vali, M Alemane, K laszik, M NA St-Joseph de Sorel
# search datafram
search = df[['inspecteur1', 'inspecteur2', 'inspecteur3']][df.name_company == 'Aciers ABC Inc.']
print(search)
inspecteur1 inspecteur2 inspecteur3
Vali, M Alemane, K laszik, M
Note addressing comment:
With search you have access to the desired values of inspecteur1-3
search.values returns an numpy array, which can be iterated through.
There is not enough information in the question to offer a more comprehensive solution.
for name in search.values[0]:
print(name)
Vali, M
Alemane, K
laszik, M
Additionally, the dataframe can be updated with additional columns and or rows and saved back in to a file.
df.to_json('test.json', orient='records')

Related

open JSON file with pandas DataFrame

sorry for this trivial question:
I have a json file first.json and I want to open it with pandas.read_json:
df = pandas.read_json('first.json') gives me next result:
The result I need is one row with keys('name', 'street', 'geo', 'servesCuisine' etc.) as columns. I tried to change different"orient" param but it doesn't help. How can I achieve the desired DataFrame format?
This is the data in my json file:
{
"name": "La Continental (San Telmo)",
"geo": {
"longitude": "-58.371852",
"latitude": "-34.616099"
},
"servesCuisine": "Italian",
"containedInPlace": {},
"priceRange": 450,
"currenciesAccepted": "ARS",
"address": {
"street": "Defensa 701",
"postalCode": "C1065AAM",
"locality": "Autonomous City of Buenos Aires",
"country": "Argentina"
},
"aggregateRatings": {
"thefork": {
"ratingValue": 9.3,
"reviewCount": 3
},
"tripadvisor": {
"ratingValue": 4,
"reviewCount": 350
}
},
"id": "585777"
}
you can try
with open("test.json") as fp:
s = json.load(fp)
# flattened df, where nested keys -> column as `key1.key2.key_last`
df = pd.json_normalize(s)
# rename cols to innermost key only (be sure you don't overwrite cols)
cols = {col:col.split(".")[-1] for col in df.columns}
df = df.rename(columns=cols)
output:
name servesCuisine priceRange currenciesAccepted id ... country ratingValue reviewCount ratingValue reviewCount
0 La Continental (San Telmo) Italian 450 ARS 585777 ... Argentina 9.3 3 4 350
You can read the JSON file with Python command, convert it to dict object, then, hand-pick data items to create a new dataframe from it.
import pandas as pd
# open/read the json data file
fo = open("test11.json", "r")
injs = fo.read()
#print(injs)
inp_json = eval(injs) #make it an object
# Or
# inp_json = your_json_data
# prepare 1 row of data
axis1 = [[inp_json["name"], inp_json["address"]["street"], inp_json["geo"], inp_json["servesCuisine"],
inp_json["aggregateRatings"]["tripadvisor"]["ratingValue"],
inp_json["id"],
], ] #for data
axis0 = ['row_1', ] #for index
heads = ["name", "add_.street", "geo", "servesCuisine",
"agg_.tripadv_.ratingValue", "id", ]
# create a dataframe using the prepped values above
df0 = pd.DataFrame(axis1, index=axis0, columns=heads)
# see data in selected columns only
df0[["name","add_.street","id"]]
name add_.street id
row_1 La Continental (San Telmo) Defensa 701 585777

How to assign variables using JSON data in python

I wrote a python code using MySQL data, but then I decided to use JSON as a "database" rather than MySQL.
This is MySQL code :
mydb = mysql.connector.connect(host="localhost", user="nn", passwd="passpass")
mycursor = mydb.cursor()
event_fabricant = input('Inscrivez le nom de la compagnie : ')
mycursor.execute("""SELECT name_company,inspecteur1, inspecteur2, inspecteur3, ville, email FROM listedatabase.entreprises_inspecteurs WHERE name_company = %s""", (event_fabricant,))
data = mycursor.fetchall()
if data:
row = data[0]
event_location = row[4]
event_email = row [5]
How do I assign data like I did with MySQL but with JSON?
This is a sample of my JSON data, and below what I did so far.
JSON SAMPLE :
[
{
"id": 1,
"name_company": "Acier Michel",
"inspecteur1": "Hou, L",
"inspecteur2": "Caana, C",
"inspecteur3": "Luc, C",
"type": "Water",
"location": "Laval"
},
{
"id": 2,
"name_company": "Aciers ABC Inc.",
"inspecteur1": "Vali, M",
"inspecteur2": "Alemane, K",
"inspecteur3": "laszik, M",
"type": "NA",
"location": "St-Joseph de Sorel"
}
]
This is what I did so far but it's not exactly what i want :
import json
database = "convertcsv.json"
data = json.loads(open(database).read())
name_company = input("type company name: ")
for item in data:
if item['nom_entreprise'] == name_company:
print(item['inspecteur1'])
else:
print("Not Found")
What I need instead is to be able to assign to variable1 the inspecteur1 name.
If you want to assign the data just use: variable1 = item["inspecteur1"].
One issue with your JSON code above is that it will print Not Found for every record that does NOT match which I don't think it's what you want. Try:
found = False
for item in data:
if item['nom_entreprise'] == name_company:
print(item['inspecteur1'])
found = True
if not found:
print("Not Found")
If you feel like MySQL is too complex for your needs, may I suggest SQLite? It's supported out of the box in Python, there's no server process (just a file) and you get all the database features that JSON does not provide by itself.

python trasform data from csv to array of dictionaries and group by field value

I have csv like this:
id,company_name,country,country_id
1,batstop,usa, xx
2,biorice,italy, yy
1,batstop,italy, yy
3,legstart,canada, zz
I want an array of dictionaries to import to firebase. I need to group the different country informations for the same company in a nested list of dictionaries. This is the desired output:
[ {'id':'1', 'agency_name':'batstop', countries [{'country':'usa','country_id':'xx'}, {'country':'italy','country_id':'yy'}]} ,
{'id':'2', 'agency_name':'biorice', countries [{'country':'italy','country_id':'yy'}]},
{'id':'3', 'legstart':'legstart', countries [{'country':'canada','country_id':'zz'}]} ]
Recently I had a similar task, the groupby function from itertools and the itemgetter function from operator - both standard python libraries - helped me a lot. Here's the code considering your csv, note how defining the primary keys of your csv dataset is important.
import csv
import json
from operator import itemgetter
from itertools import groupby
primary_keys = ['id', 'company_name']
# Start extraction
with open('input.csv', 'r') as file:
# Read data from csv
reader = csv.DictReader(file)
# Sort data accordingly to primary keys
reader = sorted(reader, key=itemgetter(*primary_keys))
# Create a list of tuples
# Each tuple containing a dict of the group primary keys and its values, and a list of the group ordered dicts
groups = [(dict(zip(primary_keys, _[0])), list(_[1])) for _ in groupby(reader, key=itemgetter(*primary_keys))]
# Create formatted dict to be converted into firebase objects
group_dicts = []
for group in groups:
group_dict = {
"id": group[0]['id'],
"agency_name": group[0]['company_name'],
"countries": [
dict(country=_['country'], country_id=_['country_id']) for _ in group[1]
],
}
group_dicts.append(group_dict)
print("\n".join([json.dumps(_, indent=2) for _ in group_dicts]))
Here's the output:
{
"id": "1",
"agency_name": "batstop",
"countries": [
{
"country": "usa",
"country_id": " xx"
},
{
"country": "italy",
"country_id": " yy"
}
]
}
{
"id": "2",
"agency_name": "biorice",
"countries": [
{
"country": "italy",
"country_id": " yy"
}
]
}
{
"id": "3",
"agency_name": "legstart",
"countries": [
{
"country": "canada",
"country_id": " zz"
}
]
}
There's no external library,
Hope it suits you well!
You can try this, you may have to change a few parts to get it working with your csv, but hope it's enough to get you started:
csv = [
"1,batstop,usa, xx",
"2,biorice,italy, yy",
"1,batstop,italy, yy",
"3,legstart,canada, zz"
]
output = {} # dictionary useful to avoid searching in list for existing ids
# Parse each row
for line in csv:
cols = line.split(',')
id = int(cols[0])
agency_name = cols[1]
country = cols[2]
country_id = cols[3]
if id in output:
output[id]['countries'].append([{'country': country,
'country_id': country_id}])
else:
output[id] = {'id': id,
'agency_name': agency_name,
'countries': [{'country': country,
'country_id': country_id}]
}
# Put into list
json_output = []
for key in output.keys():
json_output.append( output[key] )
# Check output
for row in json_output:
print(row)

JSON file modification with Python

I'm trying to write a Python script to read data from a JSON file, do some calculations with it and then write output to a new JSON file. But I can't seem to automate the JSON reading process. I get this error. Could you please help me with this issue?
Thank you very much
print([a[0]][b[1]][c[1]])
TypeError: list indices must be integers or slices, not str
test.json
{
"male": {
"jack": {
"id": "001",
"telephone": "+31 2225 345",
"address": "10 Street, Aukland",
"balance": "1500"
},
"john": {
"id": "002",
"telephone": "+31 6542 365",
"address": "Main street, Hanota",
"balance": "2500"
}
},
"female": {
"kay": {
"id": "00",
"telephone": "+31 6542 365",
"address": "Main street, Kiro",
"balance": "500"
}
}
}
test.py
with open("q.json") as datafile:
data = json.load(datafile)
a = ['male', 'female']
b = ['jack', 'john', 'kay']
c = ['id', 'telephone', 'address', 'balance']
print([a[1]][b[1]][c[1]])
If I understand you correctly, you really want to print data from the JSON, not your intermediary arrays.
So:
print(data['Male']) # will print the entire Male subsection
print(data['Male']['Jack']) # will print the entire Jack record
print(data['Male']['Jack']['telephone']) # will print Jack's telephone
But to relate that with your intermediary arrays too:
print(data[a[0]]) # will print the entire Male subsection
print(data[a[0]][b[0]]) # will print the entire Jack record
print(data[a[0]][b[0]][c[0]]) # will print Jack's telephone
assuming that you declare a correctly:
a = ['Male', 'Female'] # Notice the capitals
I dont know, how you access data in your code, because you directly write hard coded values into a, b and c. In addition, you could print out your test via: print(a[1], b[1], c[1]).

How to convert any nested json into a pandas dataframe

I'm currently working on a project that will be analyzing multiple data sources for information, other data sources are fine but I am having a lot of trouble with json and its sometimes deeply nested structure. I have tried to turn the json into a python dictionary, but with not much luck as it can start to struggle as it gets more complicated. For example with this sample json file:
{
"Employees": [
{
"userId": "rirani",
"jobTitleName": "Developer",
"firstName": "Romin",
"lastName": "Irani",
"preferredFullName": "Romin Irani",
"employeeCode": "E1",
"region": "CA",
"phoneNumber": "408-1234567",
"emailAddress": "romin.k.irani#gmail.com"
},
{
"userId": "nirani",
"jobTitleName": "Developer",
"firstName": "Neil",
"lastName": "Irani",
"preferredFullName": "Neil Irani",
"employeeCode": "E2",
"region": "CA",
"phoneNumber": "408-1111111",
"emailAddress": "neilrirani#gmail.com"
}
]
}
after converting to dictionary and doing dict.keys() only returns "Employees".
I then resorted to instead opt for a pandas dataframe and I could achieve what I wanted by calling json_normalize(dict['Employees'], sep="_") but my problem is that it must work for ALL jsons and looking at the data beforehand is not an option so my method of normalizing this way will not always work. Is there some way I could write some sort of function that would take in any json and convert it into a nice pandas dataframe? I have searched for about 2 weeks for answers bt with no luck regarding my specific problem. Thanks
I've had to do that in the past (Flatten out a big nested json). This blog was really helpful. Would something like this work for you?
Note, like the others have stated, for this to work for EVERY json, is a tall task, I'm merely offering a way to get started if you have a wider range of json format objects. I'm assuming they will be relatively CLOSE to what you posted as an example with hopefully similarly structures.)
jsonStr = '''{
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}'''
It flattens out the entire json into single rows, then you can put into a dataframe. In this case it creates 1 row with 18 columns. Then iterates through those columns, using the number values within those column names to reconstruct into multiple rows. If you had a different nested json, I'm thinking it theoretically should work, but you'll have to test it out.
import json
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
jsonObj = json.loads(jsonStr)
flat = flatten_json(jsonObj)
results = pd.DataFrame()
columns_list = list(flat.keys())
for item in columns_list:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
column = item.replace('_'+row_idx+'_', '_')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
print (results)
Output:
print (results)
Employees_userId ... Employees_emailAddress
0 rirani ... romin.k.irani#gmail.com
1 nirani ... neilrirani#gmail.com
[2 rows x 9 columns]
d={
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}
import pandas as pd
df=pd.DataFrame([x.values() for x in d["Employees"]],columns=d["Employees"][0].keys())
print(df)
Output
userId jobTitleName firstName ... region phoneNumber emailAddress
0 rirani Developer Romin ... CA 408-1234567 romin.k.irani#gmail.com
1 nirani Developer Neil ... CA 408-1111111 neilrirani#gmail.com
[2 rows x 9 columns]
For the particular JSON data given. My approach, which uses pandas package only, follows:
import pandas as pd
# json as python's dict object
jsn = {
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}
# get the main key, here 'Employees' with index '0'
emp = list(jsn.keys())[0]
# when you have several keys at this level, i.e. 'Employers' for example
# .. you need to handle all of them too (your task)
# get all the sub-keys of the main key[0]
all_keys = jsn[emp][0].keys()
# build dataframe
result_df = pd.DataFrame() # init a dataframe
for key in all_keys:
col_vals = []
for ea in jsn[emp]:
col_vals.append(ea[key])
# add a new column to the dataframe using sub-key as its header
# it is possible that values here is a nested object(s)
# .. such as dict, list, json
result_df[key]=col_vals
print(result_df.to_string())
Output:
userId lastName jobTitleName phoneNumber emailAddress employeeCode preferredFullName firstName region
0 rirani Irani Developer 408-1234567 romin.k.irani#gmail.com E1 Romin Irani Romin CA
1 nirani Irani Developer 408-1111111 neilrirani#gmail.com E2 Neil Irani Neil CA

Categories