[
{
"id": 1,
"name": "Lea",
"username": "Bret",
"email": "hhaa#gma",
"address": {
"street": "Light",
"suite": "Apt. 5",
"city": "Gwen",
"zipcode": "3874",
"geo": {
"lat": "-37.3159",
"lng": "81.1496"
}
},
"phone": "1-770",
"website": "hilde.org",
"company": {
"name": "Roma",
"catchPhrase": "net",
"bs": "markets"
}
},
{
"id": 2,
"name": "Er",
"username": "Ant",
"email": "Sh",
"address": {
"street": "Vis",
"suite": "89",
"city": "Wibrugh",
"zipcode": "905",
"geo": {
"lat": "-43.9509",
"lng": "-34.4618"
}
},
"phone": "010-69",
"website": "ansia.net",
"company": {
"name": "Deist",
"catchPhrase": "contingency",
"bs": " supply-chains"
}
}
]
I am getting this data from webscraping and I would like to store this data into netezza database. Can you Please give me sample code? Do I need to correct Json before? If yes, How would I do it?
And when I am trying to use items iterate in list, I am only getting the last user id details.
I would suggest a different approach, due to better scalability:
1) load the raw txt data into a (temporary) table with the ‘external table’ syntax of Netezza
2) use these functions to parse the Json data into table columns: https://developer.ibm.com/articles/i-json-table-trs/
Related
I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them to csv format
csv file
{
"_id": {
"$oid": "2e3230"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "10005",
},
"name": "Evgiya Kovava",
"address2": {
"country": "US",
"country_name": "NY",
}
}
}
{
"_id": {
"$oid": "2d118c8bo"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "52805",
},
"name": "Eiya tceva",
"address2": {
"country": "US",
"country_name": "TX",
}
}
}
import pandas as pd
null = 'null'
data = {
"_id": {
"$oid": "2e3230s314i5dc07e118c8bo"
},
"add": {
"address": {
"address_type": "Door",
"address": "kvartira 14",
"city": "new york",
"region": null,
"zipcode": "10005",
},
"name": "Evgeniya Kovantceva",
"type": "Private person",
"code": null,
"additional_phone_nums": null,
"email": null,
"notifications": [],
"address": {
"address": "kvartira 14",
"city": "new york",
"region": null,
"zipcode": "10005",
"country": "US",
"country_name": "NY",
}
}
}
df = pd.json_normalize(data)
df.to_csv('yourpath.csv')
Beware the null value. The "address" nested dictionary comes inside "add" two times almost identical?
EDIT
Ok after your information it looks like json.JSONDecoder() is what you need.
Originally posted by #pschill on this link:
how to analyze json objects that are NOT separated by comma (preferably in Python)
I tried his code on your data:
import json
import pandas as pd
data = """{
"_id": {
"$oid": "2e3230"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "10005"
},
"name": "Evgiya Kovava",
"address2": {
"country": "US",
"country_name": "NY"
}
}
}
{
"_id": {
"$oid": "2d118c8bo"
},
"add": {
"address1": {
"address": "kvartira 14",
"zipcode": "52805"
},
"name": "Eiya tceva",
"address2": {
"country": "US",
"country_name": "TX"
}
}
}"""
Keep in mind that your data also has trailing commas which makes the data unreadable (the last commas right before every closing bracket).
You have to remove them with some regex or another approach I am not familiar with. For the purpose of this answer I removed them manually.
So after that I tried this:
content = data
parsed_values = []
decoder = json.JSONDecoder()
while content:
value, new_start = decoder.raw_decode(content)
content = content[new_start:].strip()
# You can handle the value directly in this loop:
# print("Parsed:", value)
# Or you can store it in a container and use it later:
parsed_values.append(value)
which gave me an error but the list seems to get populated with all the values:
parsed_values
[{'_id': {'$oid': '2e3230'},
'add': {'address1': {'address': 'kvartira 14', 'zipcode': '10005'},
'name': 'Evgiya Kovava',
'address2': {'country': 'US', 'country_name': 'NY'}}},
{'_id': {'$oid': '2d118c8bo'},
'add': {'address1': {'address': 'kvartira 14', 'zipcode': '52805'},
'name': 'Eiya tceva',
'address2': {'country': 'US', 'country_name': 'TX'}}}]
next I did:
df = pd.json_normalize(parsed_values)
which worked fine.
You can always save that to a csv with:
df.to_csv('yourpath.csv')
Tell me if that helped.
Your json is quite problematic after all. Duplicate keys (problem), null value (unreadable), trailing commas (unreadable), not comma separated dicts... It didn't catch the eye at first :P
I have dumped data into a parquet file.
When I use
SELECT * FROM s3object s LIMIT 1
it gives me the following result.
{
"name": "John",
"age": "45",
"country": "USA",
"experience": [{
"company": {
"name": "ABC",
"years": "10",
"position": "Manager"
}
},
{
"company": {
"name": "BBC",
"years": "2",
"position": "Assistant"
}
}
]
}
I want to filter the result where company.name = "ABC"
so, the output should be looks like following.
{
"name": "John",
"age": "45",
"country": "USA",
"experience": [{
"company": {
"name": "ABC",
"years": "10",
"position": "Manager"
}
}
]
}
or this
{
"name": "John",
"age": "45",
"country": "USA",
"experience.company.name": "ABC",
"experience.company.years": "10",
"experience.company.position": "Manager"
}
Any support is highly appreciated.
Thanks.
i have a sample json file from a webhook response and i will want to extract just two data set from the JSON how can i do that using python. assuming i want to get the subscription code, and plan code values. thanks in anticipation
"event": "subscription.create",
"data": {
"domain": "test",
"status": "active",
"subscription_code": "SUB_vsyqdmlzble3uii",
"amount": 50000,
"cron_expression": "0 0 28 * *",
"next_payment_date": "2016-05-19T07:00:00.000Z",
"open_invoice": null,
"createdAt": "2016-03-20T00:23:24.000Z",
"plan": {
"name": "Monthly retainer",
"plan_code": "PLN_gx2wn530m0i3w3m",
"description": null,
"amount": 50000,
"interval": "monthly",
"send_invoices": true,
"send_sms": true,
"currency": "NGN"
},
"authorization": {
"authorization_code": "AUTH_96xphygz",
"bin": "539983",
"last4": "7357",
"exp_month": "10",
"exp_year": "2017",
"card_type": "MASTERCARD DEBIT",
"bank": "GTBANK",
"country_code": "NG",
"brand": "MASTERCARD"
},
"customer": {
"first_name": "BoJack",
"last_name": "Horseman",
"email": "bojack#horsinaround.com",
"customer_code": "CUS_xnxdt6s1zg1f4nx",
"phone": "",
"metadata": {},
"risk_action": "default"
},
"created_at": "2016-10-01T10:59:59.000Z"
}
}
You can use the built-in json library. For example:
import json
#if your json is in file
dict_from_file = json.load(open("foo.json"))
#if your json is in a string
dict_from_string = json.loads(string)
I am trying to use a JSON file in Mapbox studio for network analysis but it gives me an error:
Input failed. "type" member required on line 1.
The representative sample of JSON file is:
"version": 0.6,
"generator": "Overpass API 0.7.56.2 b688b00f",
"osm3s": {
"timestamp_osm_base": "2020-03-27T11:58:01Z",
"copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
},
"elements": [
{
"type": "node",
"id": 123458059,
"lat": -38.3344495,
"lon": 143.5394486
},
{
"type": "node",
"id": 123458066,
"lat": -38.3394461,
"lon": 143.5923655,
"tags": {
"crossing": "traffic_signals",
"highway": "traffic_signals"
}
},
{
"type": "way",
"id": 769574290,
"nodes": [
7183581936,
681081177,
1561328098,
1539139562,
448021781
],
"tags": {
"highway": "trunk",
"lanes": "2",
"maxspeed": "80",
"name": "Princes Highway",
"ref": "A1",
"source:maxspeed:sign": "mapillary"
}
},
{
"type": "way",
"id": 776227225,
"nodes": [
1017428185,
317738200
],
"tags": {
"alt_name": "Princes Highway",
"highway": "trunk",
"lanes": "2",
"maxspeed": "50",
"name": "Murray Street",
"ref": "A1",
"source:maxspeed:sign": "OpenStreetCam",
"source:name": "services.land.vic.gov.au"
}
}
]
}
Does the error occur because of the specification of the format? Do we need to reformat the features or types?
To upload data to Mapbox you will need to convert your JSON file to GeoJSON, a subset of the JSON format. For example:
{"type": "FeatureCollection",
"features": [
{
"geometry": {
"type": "Point",
"coordinates": [
-76.9750541388,
38.8410857803
]
},
"type": "Feature",
"properties": {
"description": "Southern Ave",
"marker-symbol": "rail-metro",
"title": "Southern Ave",
"url": "http://www.wmata.com/rider_tools/pids/showpid.cfm?station_id=107",
"lines": [
"Green"
],
"address": "1411 Southern Avenue, Temple Hills, MD 20748"
}
},
{
"geometry": {
"type": "Point",
"coordinates": [
-76.935256783,
38.9081784965
]
},
"type": "Feature",
"properties": {
"description": "Deanwood",
"marker-symbol": "rail-metro",
"title": "Deanwood",
"url": "http://www.wmata.com/rider_tools/pids/showpid.cfm?station_id=65",
"lines": [
"Orange"
],
"address": "4720 Minnesota Avenue NE, Washington, DC 20019"
}
}
]}
Mapbox accepts GeoJSON, you'll need to ask Overpass to return data as GeoJSON.
Everything with my script runs fine until I try to run it through a for loop. Specifically, when I attempt to index a specific array within the object. Before I get to the The script is intended to grab the delivery date for each tracking number in my list.
This is my script:
import requests
import json
TrackList = ['1Z3X756E0310496105','1ZX0373R0303581450','1ZX0373R0103574417']
url = 'https://onlinetools.ups.com/rest/Track'
para1 = '...beginning of JSON request string...'
para2 = '...end of JSON request string...'
for TrackNum in TrackList:
parameters = para1+TrackNum+para2
resp = requests.post(url = url, data = parameters, verify=False)
data = json.loads(resp.text)
DelDate = data['TrackResponse']['Shipment']['Package'][0]['Activity'][0]['Date']
print(DelDate)
JSON API Response (if needed):
{
"TrackResponse": {
"Response": {
"ResponseStatus": {
"Code": "1",
"Description": "Success"
},
"TransactionReference": {
"CustomerContext": "Analytics Inquiry"
}
},
"Shipment": {
"InquiryNumber": {
"Code": "01",
"Description": "ShipmentIdentificationNumber",
"Value": "1ZX0373R0103574417"
},
"Package": {
"Activity": [
{
"ActivityLocation": {
"Address": {
"City": "OKLAHOMA CITY",
"CountryCode": "US",
"PostalCode": "73128",
"StateProvinceCode": "OK"
},
"Code": "M3",
"Description": "Front Desk",
"SignedForByName": "CUMMINGS"
},
"Date": "20190520",
"Status": {
"Code": "9E",
"Description": "Delivered",
"Type": "D"
},
"Time": "091513"
},
{
"ActivityLocation": {
"Address": {
"City": "Oklahoma City",
"CountryCode": "US",
"StateProvinceCode": "OK"
},
"Description": "Front Desk"
},
"Date": "20190520",
"Status": {
"Code": "OT",
"Description": "Out For Delivery Today",
"Type": "I"
},
"Time": "085943"
},
{
"ActivityLocation": {
"Address": {
"City": "Oklahoma City",
"CountryCode": "US",
"StateProvinceCode": "OK"
},
"Description": "Front Desk"
},
"Date": "20190520",
"Status": {
"Code": "DS",
"Description": "Destination Scan",
"Type": "I"
},
"Time": "011819"
},
{
"ActivityLocation": {
"Address": {
"City": "Oklahoma City",
"CountryCode": "US",
"StateProvinceCode": "OK"
},
"Description": "Front Desk"
},
"Date": "20190519",
"Status": {
"Code": "AR",
"Description": "Arrival Scan",
"Type": "I"
},
"Time": "235100"
},
{
"ActivityLocation": {
"Address": {
"City": "DFW Airport",
"CountryCode": "US",
"StateProvinceCode": "TX"
},
"Description": "Front Desk"
},
"Date": "20190519",
"Status": {
"Code": "DP",
"Description": "Departure Scan",
"Type": "I"
},
"Time": "195500"
},
{
"ActivityLocation": {
"Address": {
"City": "DFW Airport",
"CountryCode": "US",
"StateProvinceCode": "TX"
},
"Description": "Front Desk"
},
"Date": "20190517",
"Status": {
"Code": "OR",
"Description": "Origin Scan",
"Type": "I"
},
"Time": "192938"
},
{
"ActivityLocation": {
"Address": {
"CountryCode": "US"
},
"Description": "Front Desk"
},
"Date": "20190517",
"Status": {
"Code": "MP",
"Description": "Order Processed: Ready for UPS",
"Type": "M"
},
"Time": "184621"
}
],
"PackageWeight": {
"UnitOfMeasurement": {
"Code": "LBS"
},
"Weight": "2.00"
},
"ReferenceNumber": [
{
"Code": "01",
"Value": "8472745558"
},
{
"Code": "01",
"Value": "5637807:1007379402:BN81-17077A:1"
},
{
"Code": "01",
"Value": "5637807"
}
],
"TrackingNumber": "1ZX0373R0103574417"
},
"PickupDate": "20190517",
"Service": {
"Code": "001",
"Description": "UPS Next Day Air"
},
"ShipmentAddress": [
{
"Address": {
"AddressLine": "S 600 ROYAL LN",
"City": "COPPELL",
"CountryCode": "US",
"PostalCode": "750193827",
"StateProvinceCode": "TX"
},
"Type": {
"Code": "01",
"Description": "Shipper Address"
}
},
{
"Address": {
"City": "OKLAHOMA CITY",
"CountryCode": "US",
"PostalCode": "73128",
"StateProvinceCode": "OK"
},
"Type": {
"Code": "02",
"Description": "ShipTo Address"
}
}
],
"ShipmentWeight": {
"UnitOfMeasurement": {
"Code": "LBS"
},
"Weight": "2.00"
},
"ShipperNumber": "X0373R"
}
}
}
Below is the error I receive:
Traceback (most recent call last):
File "/Users/***/Library/Preferences/PyCharmCE2019.1/scratches/UPS_API.py", line 15, in <module>
DelDate = data['TrackResponse']['Shipment']['Package'][0]['Activity'][0]['Date']
KeyError: 0
You're trying to index "Package" at index 0, but it's an object not an array. So you should be accessing ['Package']['Activity']
just take away the [0] because there is no [1] or [2]