I'm trying to extract the values from this JSON file, but I having some trouble to extract the data inside from lists in the dict values. For example, in the city and state, I would like to get only the name values and create a Pandas Dataframe and select only some keys like this.
I tried using some for with get methods techniques, but without success.
{'birthday': ['1987-07-13T00:00:00.000Z'],
'cpf': ['9999999999999'],
'rg': [],
'gender': ['Feminino'],
'email': ['my_user#bol.com.br'],
'phone_numbers': ['51999999999'],
'photo': [],
'id': 11111111,
'duplicate_id': -1,
'name': 'My User',
'cnpj': [],
'company_name': '[]',
'city': [{'id': 0001, 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
'type': 'Private Person',
'tags': [],
'pending_tickets_count': 0}
In [123]: data
Out[123]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
'cpf': ['9999999999999'],
'rg': [],
'gender': ['Feminino'],
'email': ['my_user#bol.com.br'],
'phone_numbers': ['51999999999'],
'photo': [],
'id': 11111111,
'duplicate_id': -1,
'name': 'My User',
'cnpj': [],
'company_name': '[]',
'city': [{'id': '0001', 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
'type': 'Private Person',
'tags': [],
'pending_tickets_count': 0}
In [124]: data2 = {k:v for k,v in data.items() if k in required}
In [125]: data2
Out[125]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
'gender': ['Feminino'],
'id': 11111111,
'name': 'My User',
'city': [{'id': '0001', 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}]}
In [126]: pd.DataFrame(data2).assign(
...: city_name=lambda x: x['city'].str.get('name'),
...: state_name=lambda x: x['state'].str.get('name'),
...: state_fs=lambda x: x['state'].str.get('fs')
...: ).drop(['state', 'city'], axis=1)
Out[126]:
birthday gender id name city_name state_name state_fs
0 1987-07-13T00:00:00.000Z Feminino 11111111 My User Porto Alegre Rio Grande do Sul RS
reason why data2 is required is that you can't have columns that differ in length. So in this case, pd.DataFrame(data) won't work as rg has 0 items but birthday has 1 item.
Also something to look at if you are directly dealing with json files is pd.json_normalize
Related
Imagine I have the following dictionary.For every record (row of data), I want to merge the dictionaries of sub fields into a single dictionary. So in the end I have a list of dictionaries. One per each record.
Data = [{'Name': 'bob', 'age': '40’}
{'Name': 'tom', 'age': '30’},
{'Country’: 'US', 'City': ‘Boston’},
{'Country’: 'US', 'City': ‘New York},
{'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Email’: 'tom#fake.com', 'Phone': ‘none'}]
Output = [
{'Name': 'bob', 'age': '40’,'Country’: 'US', 'City': ‘Boston’,'Email’: 'bob#fake.com', 'Phone': ‘bob phone'},
{'Name': 'tom', 'age': '30’,'Country’: 'US', 'City': ‘New York', 'Email’: 'tom#fake.com', 'Phone': ‘none'}
]
Related: How do I merge a list of dicts into a single dict?
I understand you know which dictionary relates to Bob and which dictionary relates to Tom by their position: dictionaries at even positions relate to Bob, while dictionaries at odd positions relate to Tom.
You can check whether a number is odd or even using % 2:
Data = [{'Name': 'bob', 'age': '40'},
{'Name': 'tom', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Email': 'bob#fake.com', 'Phone': 'bob phone'},
{'Email': 'tom#fake.com', 'Phone': 'none'}]
bob_dict = {}
tom_dict = {}
for i,d in enumerate(Data):
if i % 2 == 0:
bob_dict.update(d)
else:
tom_dict.update(d)
Output=[bob_dict, tom_dict]
Or alternatively:
Output = [{}, {}]
for i, d in enumerate(Data):
Output[i%2].update(d)
This second approach is not only shorter to write, it's also faster to execute and easier to scale if you have more than 2 people.
Splitting the list into more than 2 dictionaries
k = 4 # number of dictionaries you want
Data = [{'Name': 'Alice', 'age': '40'},
{'Name': 'Bob', 'age': '30'},
{'Name': 'Charlie', 'age': '30'},
{'Name': 'Diane', 'age': '30'},
{'Country': 'US', 'City': 'Boston'},
{'Country': 'US', 'City': 'New York'},
{'Country': 'UK', 'City': 'London'},
{'Country': 'UK', 'City': 'Oxford'},
{'Email': 'alice#fake.com', 'Phone': 'alice phone'},
{'Email': 'bob#fake.com', 'Phone': '12345'},
{'Email': 'charlie#fake.com', 'Phone': '0000000'},
{'Email': 'diane#fake.com', 'Phone': 'none'}]
Output = [{} for j in range(k)]
for i, d in enumerate(Data):
Output[i%k].update(d)
# Output = [
# {'Name': 'Alice', 'age': '40', 'Country': 'US', 'City': 'Boston', 'Email': 'alice#fake.com', 'Phone': 'alice phone'},
# {'Name': 'Bob', 'age': '30', 'Country': 'US', 'City': 'New York', 'Email': 'bob#fake.com', 'Phone': '12345'},
# {'Name': 'Charlie', 'age': '30', 'Country': 'UK', 'City': 'London', 'Email': 'charlie#fake.com', 'Phone': '0000000'},
# {'Name': 'Diane', 'age': '30', 'Country': 'UK', 'City': 'Oxford', 'Email': 'diane#fake.com', 'Phone': 'none'}
#]
Additionally, instead of hardcoding k = 4:
If you know the number of fields but not the number of people, you can compute k by dividing the initial number of dictionaries by the number of dictionary types:
fields = ['Name', 'Country', 'Email']
assert(len(Data) % len(fields) == 0) # make sure Data is consistent with number of fields
k = len(Data) // len(fields)
Or alternatively, you can compute k by counting how many occurrences of the 'Names' field you have:
k = sum(1 for d in Data if 'Name' in d)
I have a json dictionary of thousands of json objects called 'businesses' that contains the following nested JSON in an array. Each 'business' has a 'unique-request-id' that comes back as well. Sometimes, one request could come back with multiple busineses.
{'principals': [{'addresses': [{'city': 'DENVILLE', 'state': 'NJ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '07834', 'address': '87 JILLOW AVE'}], 'titles': ['SECRETARY'], 'names': [{'suffix': None, 'firstName': 'JAMES', 'middleName': None, 'lastName': 'VU', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}, {'suffix': None, 'firstName': 'LAURA', 'middleName': 'A', 'lastName': 'VU', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}], 'pids': ['1000320219031584'], 'lastSeenDate': '2008-01-01T00:00:00.000Z', 'firstSeenDate': None}, {'addresses': [{'city': 'LAKE HAVASU CITY', 'state': 'AZ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '86403', 'address': '1887 WILLOW AVE'}], 'titles': ['MANAGER'], 'names': [{'suffix': None, 'firstName': 'JASON', 'middleName': None, 'lastName': 'ROBERTS', 'salutation': None, 'pids': [], 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}}], 'pids': ['1000320147115755'], 'lastSeenDate': '1999-07-28T00:00:00.000Z', 'firstSeenDate': None}], 'tradeNames': ['DORI LAURA DVM'], 'addresses': [{'city': 'HAVASU CITY', 'state': 'NY', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '16403', 'address': '299 PASEO DEL SOL'}], 'searchType': 'Name+Address', 'contacts': [], 'phones': ['19284532022'], 'registrationDate': '1999-07-29T00:00:00.000Z', 'websites': ['WWW.DORIVET.COM'], 'eid': '39281818', 'sources': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 5}, 'dataSource': 'iData', 'names': ['DORI VETERINARY CENTER INC', 'DORI-ROBERTS INC'], 'registrationAddress': {'city': 'PHOENIX', 'state': 'AZ', 'metadata': {'sources': [], 'lastSeen': None, 'firstSeen': None, 'sourceCount': 0}, 'zip': '85007', 'address': '1200 W WASHINGTON'}, 'duns': ['042512348'], 'relatedBusinesses': [], 'industryNames': ['VETERINARIAN, ANIMAL SPECIALTIES'], 'officeAddress': None, 'ein': ['77711133333']}
I would like the ultimate CSV from DataFrame to look like this ... with all the flattened columns to the right.
RequestId BusinessName BusinessAddress ....
ABCD DORI VETERINARY CENTER INC. 87 JILLOW AVENUE
ABCD SECONDARY VETERINARY CENTER INC. 3 MAIN ST.
XYZV. xxxbusiness 20 AUTO ST.
Is this possible using Pandas and DataFrames?
You'd have to do something like below using json_normalize, the problem with this is, you'll have to do it separately for each record path - one for principals, one for tradeNames and so on.
with open('test-so.json') as f:
data = json.load(f)
df1 = json_normalize(data,
record_path=['principals', 'addresses'],
meta=[['principals', 'lastSeenDate']],
errors='ignore')
city state zip address metadata.sources metadata.lastSeen metadata.firstSeen metadata.sourceCount principals.lastSeenDate
0 DENVILLE NJ 07834 87 JILLOW AVE [] None None 0 2008-01-01T00: 00: 00.000Z
1 LAKE HAVASU CITY AZ 86403 1887 WILLOW AVE [] None None 0 1999-07-28T00: 00: 00.000Z
Or you could use flatten_json directly, it flattens the entire json into one row, check the documentation here - Flatten JSON
This question already has answers here:
How can I convert JSON to CSV?
(26 answers)
Closed 3 years ago.
I am trying to write my JSON output to CSV, but I'm not sure how to separate my values into individual columns
This is my current code
with open('dict.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
for key, value in response.json().items():
writer.writerow([value])
print(value)
This is the csv file I am getting:
current csv file
This is the desired csv file/output I want to get:
desired output
This is an example of my JSON Output
[{'id': '123', 'custom_id': '12', 'company': 28, 'company_name': 'Sunshine'}, {'id': '224', 'custom_id': '14', 'company': 38, 'company_name': 'Flowers'},
{'id': '888', 'custom_id': '10', 'company': 99, 'company_name': 'Fields'}]
how about this JSON format? (a more complicated one)
[{'id': '777', 'custom_id': '000112', 'company': 28, 'company_name':
'Weddings Inc', 'delivery_address': '25 olive park terrace, 61234', 'delivery_timeslot': {'lower': '2019-12-06T10:00:00Z', 'upper': '2019-12-06T13:00:00Z', 'bounds': '[)'}, 'sender_name': 'Joline', 'sender_email': '', 'sender_contact': '91234567', 'removed': None, 'recipient_name': 'Joline', 'recipient_contact': '91866655', 'notes': '', 'items': [{'id': 21668, 'name': 'Loose hair flowers', 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 21667, 'name': "Groom's Boutonniere", 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 21666, 'name': 'Bridal Bouquet', 'quantity': 1, 'metadata': {}, 'removed': None}], 'latitude': '1.1234550920764211111', 'longitude': '103.864352476201000000', 'created': '2019-08-15T05:40:30.385467Z', 'updated': '2019-08-15T05:41:27.930110Z', 'status': 'pending', 'verbose_status': 'Pending', 'logs': [{'id': 56363, 'order': '50c402', 'order_custom_id': '000112', 'order_delivery_address': '25 olive park terrace, 61234', 'order_delivery_timeslot': {'lower': '2019-12-06T10:00:00Z', 'upper': '2019-12-06T13:00:00Z', 'bounds': '[)'}, 'message': 'Order was created.', 'failure_reason': None, 'success_code': None, 'success_description': None, 'created': '2019-08-15T05:40:30.431790Z', 'removed': None}, {'id': 56364, 'order': '50c402d8-7c76-45b5-b883-e2fb887a507e', 'order_custom_id': 'INV-000112', 'order_delivery_address': '25 olive park terrace, 61234', 'order_delivery_timeslot': {'lower': '2019-12-06T10:00:00Z', 'upper': '2019-12-06T13:00:00Z', 'bounds': '[)'}, 'message': 'Order is pending.', 'failure_reason': None, 'success_code': None, 'success_description': None, 'created': '2019-08-15T05:40:30.433139Z', 'removed': None}], 'reschedule_requests': [], 'signature': None},
{'id': '241', 'custom_id': '000123', 'company': 22, 'company_name': 'Pearl Pte Ltd', 'delivery_address': '90 Merchant Road, Hotel Royal, 223344', 'delivery_timeslot': {'lower': '2019-11-29T10:00:00Z', 'upper': '2019-11-29T13:00:00Z', 'bounds': '[)'}, 'sender_name': 'Vera Smith', 'sender_email': '', 'sender_contact': '81234567', 'removed': None, 'recipient_name': 'Vera Smith', 'recipient_contact': '81234561', 'notes': '', 'items': [{'id': 22975, 'name': 'Custom wrapped bouquet', 'quantity': 2, 'metadata': {}, 'removed': None}, {'id': 22974, 'name': "Parents' boutonniere x 3", 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 22973, 'name': "Groom's boutonniere", 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 22972, 'name': 'Loose hair flowers', 'quantity': 1, 'metadata': {}, 'removed': None}, {'id': 22971, 'name': 'Bridal Bouquet', 'quantity': 1, 'metadata': {}, 'removed': None}], 'latitude': '1.28821802835873000000', 'longitude': '103.84569230314800000000', 'created': '2019-08-30T03:20:17.477528Z', 'updated': '2019-08-30T03:29:25.307856Z', 'status': 'pending', 'verbose_status': 'Pending', 'logs': [{'id': 59847, 'order': '24117085-9104-4442-841b-4a734f801d39', 'order_custom_id': 'INV-000123', 'order_delivery_address': '90 Merchant Road, Hotel Royal, 223344', 'order_delivery_timeslot': {'lower': '2019-11-29T10:00:00Z', 'upper': '2019-11-29T13:00:00Z', 'bounds': '[)'}, 'message': 'Order was created.', 'failure_reason': None, 'success_code': None, 'success_description': None, 'created': '2019-08-30T03:20:17.511250Z', 'removed': None}, {'id': 59848, 'order': '24117085-9104-4442-841b-4a734f801d39', 'order_custom_id': 'INV-000123', 'order_delivery_address': '90 Merchant Road, Hotel Royal, 223344', 'order_delivery_timeslot': {'lower': '2019-11-29T10:00:00Z', 'upper': '2019-11-29T13:00:00Z', 'bounds': '[)'}, 'message': 'Order is pending.', 'failure_reason': None, 'success_code': None, 'success_description': None, 'created': '2019-08-30T03:20:17.513132Z', 'removed': None}], 'reschedule_requests': [], 'signature': None}]
Use pandas library:
df.to_csv() - Write object to a comma-separated values (csv) file.
Ex.
import pandas as pd
data = [{'id': '123', 'custom_id': '12', 'company': 28, 'company_name': 'Sunshine'},
{'id': '224', 'custom_id': '14', 'company': 38, 'company_name': 'Flowers'},
{'id': '888', 'custom_id': '10', 'company': 99, 'company_name': 'Fields'}]
df = pd.DataFrame(data)
df.to_csv('sample.csv')
Try:
import csv
csv_file = 'my_file.csv'
csv_columns = ['id', 'custom_id', 'company', 'company_name']
dict_data = [{'id': '123', 'custom_id': '12', 'company': 28, 'company_name': 'Sunshine'}, {'id': '224', 'custom_id': '14', 'company': 38, 'company_name': 'Flowers'}, {'id': '888', 'custom_id': '10', 'company': 99, 'company_name': 'Fields'}]
try:
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
for data in dict_data:
writer.writerow(data)
except IOError:
print("I/O error")
Given your response data in json format
response = [{'id': '123', 'custom_id': '12', 'company': 28, 'company_name': 'Sunshine'},
{'id': '224', 'custom_id': '14', 'company': 38, 'company_name': 'Flowers'},
{'id': '888', 'custom_id': '10', 'company': 99, 'company_name': 'Fields'}]
You can convert it to a list of lists using
header = [response[0].keys()]
data = [row.values() for row in response]
csv_list = header + data
And then save it to csv using
with open('dict.csv', "w") as f:
for row in csv_list:
f.write("%s\n" % ','.join(str(col) for col in row))
This should yield your desired output
How can I convert this json into dataframe in python, by removing fields. I just need employess data in my dataframe.
{'fields': [{'id': 'displayName', 'type': 'text', 'name': 'Display name'},
{'id': 'firstName', 'type': 'text', 'name': 'First name'},
{'id': 'gender', 'type': 'gender', 'name': 'Gender'}],
'employees': [{'id': '123', 'displayName': 'abc', 'firstName': 'abc','gender': 'Female'},
{'id': '234', 'displayName': 'xyz.', 'firstName': 'xyz','gender': 'Female'},
{'id': '345', 'displayName': 'pqr', 'firstName': 'pqr', 'gender': 'Female'}]}
If you wan the employee information you can
JSON = {var:[...],'employees':[{}]}
employee_info = JSON['employees']
employee_info with be a list of dictionaries which you will be able to create a dataframe from by this answer: Convert list of dictionaries to a pandas DataFrame
Below is result I got from API query.
[{'type':'book','title': 'example1', 'id': 12456, 'price': '8.20', 'qty': '12', 'status': 'available'},
{'type':'book','title': 'example2', 'id': 12457, 'price': '10.50', 'qty': '5', 'status': 'none'}]
How do I specify in code to get value pairs of title, price, & status only?
So result will be like:
[{'title': 'example1', 'price': '8.20', 'status': 'available'},
{'title': 'example2', 'price': '10.50', 'status': 'none'}]
You can use a dictionary comprehension within a list comprehension:
L = [{'type':'book','title': 'example1', 'id': 12456, 'price': '8.20', 'qty': '12', 'status': 'available'},
{'type':'book','title': 'example2', 'id': 12457, 'price': '10.50', 'qty': '5', 'status': 'none'}]
keys = ['title', 'price', 'status']
res = [{k: d[k] for k in keys} for d in L]
print(res)
[{'price': '8.20', 'status': 'available', 'title': 'example1'},
{'price': '10.50', 'status': 'none', 'title': 'example2'}]