How to insert json from API to snowflake database using python? - python

I am getting data from Linkedin AD API using python.
I get the data as a json string.
How can I insert this json into Snowfalke table with a variant column?
Instead of variant, fields inside "elements" can also be inserted as a normal.
I am new to both json and python so would love to get some help on this.
Here is the sample json string I am getting.
{
"elements": [
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 3
},
"end": {
"month": 3,
"year": 2019,
"day": 3
}
},
"clicks": 11,
"impressions": 2453,
"pivotValues": [
"urn:li:sponsoredCampaign:1234567"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 4
},
"end": {
"month": 3,
"year": 2019,
"day": 4
}
},
"clicks": 4,
"impressions": 816,
"pivotValues": [
"urn:li:sponsoredCampaign:1234567"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 7
},
"end": {
"month": 3,
"year": 2019,
"day": 7
}
},
"clicks": 1,
"impressions": 629,
"pivotValues": [
"urn:li:sponsoredCampaign:1234565"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 21
},
"end": {
"month": 3,
"year": 2019,
"day": 21
}
},
"clicks": 3,
"impressions": 154,
"pivotValues": [
"urn:li:sponsoredCampaign:1323516"
]
}
],
"paging": {
"count": 10,
"start": 0,
"links": []
}
}

The documentation might be helpful here.
In particular:
INSERT INTO myTable (myColumn)
SELECT ('{"key3": "value3", "key4": "value4"}'::VARIANT);
Just insert your JSON string in the appropriate place.

Here is an example in python of how to insert JSON data:
https://github.com/snowflakedb/snowflake-connector-python/blob/master/test/test_cursor.py#L456
I imagine you're missing the parse_json function from your insert.

Related

Python - trying to convert time from utc to cst in api response

Below is code I am using to get data from an api. And below that is the response. I am trying to convert datetime from UTC to CST and then present the data with that time zone instead. But I am having trouble isolating datetime
import requests
import json
weather = requests.get('...')
j = json.loads(weather.text)
print (json.dumps(j, indent=2))
Response:
{
"metadata": null,
"data": [
{
"datetime": "2022-12-11T05:00:00Z",
"is_day_time": false,
"icon_code": 5,
"weather_text": "Clear with few low clouds and few cirrus",
"temperature": {
"value": 45.968,
"units": "F"
},
"feels_like_temperature": {
"value": 39.092,
"units": "F"
},
"relative_humidity": 56,
"precipitation": {
"precipitation_probability": 4,
"total_precipitation": {
"value": 0.0,
"units": "in"
}
},
"wind": {
"speed": {
"value": 5.144953471725125,
"units": "mi/h"
},
"direction": 25
},
"wind_gust": {
"value": 9.014853256979242,
"units": "mi/h"
},
"pressure": {
"value": 29.4171829577118,
"units": "inHg"
},
"visibility": {
"value": 6.835083114610673,
"units": "mi"
},
"dew_point": {
"value": 31.01,
"units": "F"
},
"cloud_cover": 31
},
{
"datetime": "2022-12-11T06:00:00Z",
"is_day_time": false,
"icon_code": 4,
"weather_text": "Clear with few low clouds",
"temperature": {
"value": 45.068,
"units": "F"
},
"feels_like_temperature": {
"value": 38.066,
"units": "F"
},
"relative_humidity": 56,
"precipitation": {
"precipitation_probability": 5,
"total_precipitation": {
"value": 0.0,
"units": "in"
}
},
"wind": {
"speed": {
"value": 5.167322834645669,
"units": "mi/h"
},
"direction": 27
},
"wind_gust": {
"value": 8.724051539012168,
"units": "mi/h"
},
"pressure": {
"value": 29.4213171559632,
"units": "inHg"
},
"visibility": {
"value": 5.592340730136005,
"units": "mi"
},
"dew_point": {
"value": 30.2,
"units": "F"
},
"cloud_cover": 13
},
{
"datetime": "2022-12-11T07:00:00Z",
"is_day_time": false,
"icon_code": 4,
"weather_text": "Clear with few low clouds",
"temperature": {
"value": 44.33,
"units": "F"
},
"feels_like_temperature": {
"value": 37.364,
"units": "F"
},
"relative_humidity": 56,
"precipitation": {
"precipitation_probability": 4,
"total_precipitation": {
"value": 0.0,
"units": "in"
}
},
"wind": {
"speed": {
"value": 4.988367931281317,
"units": "mi/h"
},
"direction": 28
},
"wind_gust": {
"value": 8.254294917680744,
"units": "mi/h"
},
"pressure": {
"value": 29.4165923579616,
"units": "inHg"
},
"visibility": {
"value": 7.456454306848007,
"units": "mi"
},
"dew_point": {
"value": 29.714,
"units": "F"
},
"cloud_cover": 22
}
],
"error": null
I am assuming what you mean is that you want to present the data in the current time of the Central Time zone. As of the date this question was asked, that would be CST (Central Standard Time). At another time it will be CDT (Central Daylight Time) based on daylight savings time rules that are followed in the Country/City for the time zone for which you wish to localize the data. The rules are all nicely kept in the IANA Timezone Database.
So the trick is that you pick your Country/City from the Timezone DB that follows the rules as they apply to your current time zone. For Central Time, America/Chicago usually works but YMMV.
There are a lot of ways to do this. This example is inefficiently iterating through the dictionary created by json.loads and replacing the time string with a converted string. The key is using the dateutil library to parse the timestamp string and convert using the proper UTC offset as defined for the time zone in the IANA database.
Hopefully this example has enough pieces you can copy and adapt to your own needs.
from dateutil.parser import parse
from dateutil import tz
import json
j = json.loads(weather)
# Loop through each data entry, reformatting the time
for entry in j["data"]:
if "datetime" in entry.keys():
parsed_dt = parse(entry["datetime"])
converted = parsed_dt.astimezone(tz.gettz("America/Chicago"))
entry["datetime"] = converted.isoformat()
print (json.dumps(j, indent=2))
The resulting JSON has datetime fields that contain an ISO timestamp for the CST time.
{
"metadata": null,
"data": [{
"datetime": "2022-12-10T23:00:00-06:00",
"is_day_time": false,
"icon_code": 5,
"weather_text": "Clear with few low clouds and few cirrus",
"temperature": {
"value": 45.968,
"units": "F"
},
"feels_like_temperature": {
"value": 39.092,
"units": "F"
},
"relative_humidity": 56,
"precipitation": {
"precipitation_probability": 4,
"total_precipitation": {
"value": 0.0,
"units": "in"
}
},
"wind": {
"speed": {
"value": 5.144953471725125,
"units": "mi/h"
},
"direction": 25
},
"wind_gust": {
"value": 9.014853256979242,
"units": "mi/h"
},
"pressure": {
"value": 29.4171829577118,
"units": "inHg"
},
"visibility": {
"value": 6.835083114610673,
"units": "mi"
},
"dew_point": {
"value": 31.01,
"units": "F"
},
"cloud_cover": 31
},
{
"datetime": "2022-12-11T00:00:00-06:00",
"is_day_time": false,
"icon_code": 4,
"weather_text": "Clear with few low clouds",
"temperature": {
"value": 45.068,
"units": "F"
},
"feels_like_temperature": {
"value": 38.066,
"units": "F"
},
"relative_humidity": 56,
"precipitation": {
"precipitation_probability": 5,
"total_precipitation": {
"value": 0.0,
"units": "in"
}
},
"wind": {
"speed": {
"value": 5.167322834645669,
"units": "mi/h"
},
"direction": 27
},
"wind_gust": {
"value": 8.724051539012168,
"units": "mi/h"
},
"pressure": {
"value": 29.4213171559632,
"units": "inHg"
},
"visibility": {
"value": 5.592340730136005,
"units": "mi"
},
"dew_point": {
"value": 30.2,
"units": "F"
},
"cloud_cover": 13
},
{
"datetime": "2022-12-11T01:00:00-06:00",
"is_day_time": false,
"icon_code": 4,
"weather_text": "Clear with few low clouds",
"temperature": {
"value": 44.33,
"units": "F"
},
"feels_like_temperature": {
"value": 37.364,
"units": "F"
},
"relative_humidity": 56,
"precipitation": {
"precipitation_probability": 4,
"total_precipitation": {
"value": 0.0,
"units": "in"
}
},
"wind": {
"speed": {
"value": 4.988367931281317,
"units": "mi/h"
},
"direction": 28
},
"wind_gust": {
"value": 8.254294917680744,
"units": "mi/h"
},
"pressure": {
"value": 29.4165923579616,
"units": "inHg"
},
"visibility": {
"value": 7.456454306848007,
"units": "mi"
},
"dew_point": {
"value": 29.714,
"units": "F"
},
"cloud_cover": 22
}
],
"error": null
}

Convert a MultiIndex pandas DataFrame to a nested JSON

I have the following Dataframe with MultiIndex rows in pandas.
time available_slots status
month day
1 1 10:00:00 1 AVAILABLE
1 12:00:00 1 AVAILABLE
1 14:00:00 1 AVAILABLE
1 16:00:00 1 AVAILABLE
1 18:00:00 1 AVAILABLE
2 10:00:00 1 AVAILABLE
... ... ... ...
2 28 12:00:00 1 AVAILABLE
28 14:00:00 1 AVAILABLE
28 16:00:00 1 AVAILABLE
28 18:00:00 1 AVAILABLE
28 20:00:00 1 AVAILABLE
And I need to transform it to a hierarchical nested JSON as this:
[
{
"month": 1,
"days": [
{
"day": 1,
"slots": [
{
"time": "10:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "12:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
...
]
},
{
"day": 2,
"slots": [
...
]
}
]
},
{
"month": 2,
"days":[
{
"day": 1,
"slots": [
...
]
}
]
},
...
]
Unfortunately, it is not as easy as doing df.to_json(orient="index").
Does anyone know if there is a method in pandas to perform this kind of transformations? or in what way I could iterate over the DataFrame to build the final object?
Here's one way. Basically repeated groupby + apply(to_dict) + reset_index until we get the desired shape:
out = (df.groupby(level=[0,1])
.apply(lambda x: x.to_dict('records'))
.reset_index()
.rename(columns={0:'slots'})
.groupby('month')
.apply(lambda x: x[['day','slots']].to_dict('records'))
.reset_index()
.rename(columns={0:'days'})
.to_json(orient='records', indent=True)
)
Output:
[
{
"month":1,
"days":[
{
"day":1,
"slots":[
{
"time":"10:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"12:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"14:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"16:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"18:00:00",
"available_slots":1,
"status":"AVAILABLE"
}
]
},
{
"day":2,
"slots":[
{
"time":"10:00:00",
"available_slots":1,
"status":"AVAILABLE"
}
]
}
]
},
{
"month":2,
"days":[
{
"day":28,
"slots":[
{
"time":"12:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"14:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"16:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"18:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"20:00:00",
"available_slots":1,
"status":"AVAILABLE"
}
]
}
]
}
]
You can use a double loop for each level of your index:
data = []
for month, df1 in df.groupby(level=0):
data.append({'month': month, 'days': []})
for day, df2 in df1.groupby(level=1):
data[-1]['days'].append({'day': day, 'slots': df2.to_dict('records')})
Output:
import json
print(json.dumps(data, indent=2))
[
{
"month": 1,
"days": [
{
"day": 1,
"slots": [
{
"time": "10:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "12:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "14:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "16:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "18:00:00",
"available_slots": 1,
"status": "AVAILABLE"
}
]
},
{
"day": 2,
"slots": [
{
"time": "10:00:00",
"available_slots": 1,
"status": "AVAILABLE"
}
]
}
]
},
{
"month": 2,
"days": [
{
"day": 28,
"slots": [
{
"time": "12:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "14:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "18:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "20:00:00",
"available_slots": 1,
"status": "AVAILABLE"
}
]
}
]
}
]

How to eliminate duplicate items while adding them to their own structure

I have a list of dictionary items, with each dictionary containing a list of presentation items. The sample dictionaries below are a small prototype of my real data set.
I need to remove duplicate presentations based on day (one presentation per day) and store them in a new dictionary with the same structure within the existing list.
So starting with:
[
{
"time": "04:00-20:59",
"category": 1,
"presentations": [
{
"presentation": "ABC",
"day": 7,
},
{
"presentation": "DEF",
"day": 7,
},
{
"presentation": "GHI",
"day": 8,
},
{
"presentation": "JKL",
"day": 8,
},
{
"presentation": "MNO",
"day": 9,
},
{
"presentation": "PQR",
"day": 9,
},
{
"presentation": "STU",
"day": 9,
}
]
} #only one dictionary item in the list for simplicity
]
The end result should be three dictionaries containing lists of presentations where there is one presentation for a given day:
[
{
"time": "04:00-20:59",
"category": 1,
"presentations": [
{
"presentation": "ABC",
"day": 7
},
{
"presentation": "DEF",
"day": 8
},
{
"presentation": "GHI",
"day": 9
}
]
},
{
"time": "04:00-20:59",
"category": 1,
"presentations": [
{
"presentation": "JKL",
"day": 7
},
{
"presentation": "MNO",
"day": 8
},
{
"presentation": "PQR",
"day": 9
}
]
},
{
"time": "04:00-20:59",
"category": 1,
"presentations": [
{
"presentation": "STU",
"day": 9
}
]
}
]
I don't know how to go about removing these duplicates (based on day) while adding them to their own dictionary.

Extracting data from JSON File to CSV

I have a big JSON file with a very complex structure
you can look on it here: https://drive.google.com/file/d/1tBVJ2xYSCpTTUGPJegvAz2ZXbeN0bteX/view?usp=sharing
it contains more than 7 millions lines, and I want to extract only the "text" field
I have written a python code, to extra all the values of the "text" key or field in the whole file, and it extracted only 12 values! while when I open the JSON file on the Visualstudio, I have more than 19000 values!!
you can see the code here:
import json
import csv
with open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file:
data = json.load(file)
fname = "outputText8.csv"
with open(fname, "w") as file:
csv_file = csv.writer(file,lineterminator='\n')
csv_file.writerow(["text"])
for item in data[i]["turns"]:
csv_file.writerow([item['text']])
please take a look on the JSON file as it is very large one and with a complex structure, so I an not paste it here to see because it would be not understandable
also this is a part of the son file:
[
{
"user_id": "U22HTHYNP",
"turns": [
{
"text": "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.",
"labels": {
"acts": [
{
"args": [
{
"val": "book",
"key": "intent"
}
],
"name": "inform"
},
{
"args": [
{
"val": "Atlantis",
"key": "dst_city"
},
{
"val": "Caprica",
"key": "or_city"
},
{
"val": "Saturday, August 13, 2016",
"key": "str_date"
},
{
"val": "8",
"key": "n_adults"
},
{
"val": "1700",
"key": "budget"
}
],
"name": "inform"
}
],
"acts_without_refs": [
{
"args": [
{
"val": "book",
"key": "intent"
}
],
"name": "inform"
},
{
"args": [
{
"val": "Atlantis",
"key": "dst_city"
},
{
"val": "Caprica",
"key": "or_city"
},
{
"val": "Saturday, August 13, 2016",
"key": "str_date"
},
{
"val": "8",
"key": "n_adults"
},
{
"val": "1700",
"key": "budget"
}
],
"name": "inform"
}
],
"active_frame": 1,
"frames": [
{
"info": {
"intent": [
{
"val": "book",
"negated": false
}
],
"budget": [
{
"val": "1700.0",
"negated": false
}
],
"dst_city": [
{
"val": "Atlantis",
"negated": false
}
],
"or_city": [
{
"val": "Caprica",
"negated": false
}
],
"str_date": [
{
"val": "august 13",
"negated": false
}
],
"n_adults": [
{
"val": "8",
"negated": false
}
]
},
"frame_id": 1,
"requests": [],
"frame_parent_id": null,
"binary_questions": [],
"compare_requests": []
}
]
},
"author": "user",
"timestamp": 1471272019730.0
},
{
"db": {
"result": [
[
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 51,
"month": 8
},
"departure": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 9
},
"price": 2118.81,
"hotel": {
"gst_rating": 7.15,
"vicinity": [],
"name": "Scarlet Palms Resort",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_PARKING",
"FREE_WIFI"
],
"dst_city": "Goiania",
"category": "3.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 2,
"min": 37
},
"arrival": {
"hour": 12,
"year": 2016,
"day": 10,
"min": 37,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 10,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 2,
"min": 37
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 4,
"min": 37,
"month": 8
},
"departure": {
"hour": 22,
"year": 2016,
"day": 3,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 7
},
"price": 2369.83,
"hotel": {
"gst_rating": 0,
"vicinity": [],
"name": "Sunway Hostel",
"country": "Argentina",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI"
],
"dst_city": "Rosario",
"category": "2.0 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 24,
"min": 0,
"month": 8
}
},
"seat": "BUSINESS",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 51,
"month": 8
},
"departure": {
"hour": 0,
"year": 2016,
"day": 16,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 9
},
"price": 2375.72,
"hotel": {
"gst_rating": 7.15,
"vicinity": [],
"name": "Scarlet Palms Resort",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_PARKING",
"FREE_WIFI"
],
"dst_city": "Goiania",
"category": "3.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 1,
"min": 30
},
"arrival": {
"hour": 11,
"year": 2016,
"day": 1,
"min": 30,
"month": 9
},
"departure": {
"hour": 10,
"year": 2016,
"day": 1,
"min": 0,
"month": 9
}
},
"seat": "BUSINESS",
"leaving": {
"duration": {
"hours": 1,
"min": 30
},
"arrival": {
"hour": 18,
"year": 2016,
"day": 19,
"min": 30,
"month": 8
},
"departure": {
"hour": 17,
"year": 2016,
"day": 19,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 13
},
"price": 2492.95,
"hotel": {
"gst_rating": 0,
"vicinity": [],
"name": "Hotel Mundo",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI",
"FREE_PARKING"
],
"dst_city": "Manaus",
"category": "2.5 star hotel"
}
},
{
"trip": {
"returning": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 10,
"year": 2016,
"day": 31,
"min": 51,
"month": 8
},
"departure": {
"hour": 10,
"year": 2016,
"day": 31,
"min": 0,
"month": 8
}
},
"seat": "ECONOMY",
"leaving": {
"duration": {
"hours": 0,
"min": 51
},
"arrival": {
"hour": 19,
"year": 2016,
"day": 27,
"min": 51,
"month": 8
},
"departure": {
"hour": 19,
"year": 2016,
"day": 27,
"min": 0,
"month": 8
}
},
"or_city": "Porto Alegre",
"duration_days": 4
},
"price": 2538.0,
"hotel": {
"gst_rating": 8.22,
"vicinity": [],
"name": "The Glee",
"country": "Brazil",
"amenities": [
"FREE_BREAKFAST",
"FREE_WIFI"
],
"dst_city": "Recife",
"category": "4.0 star hotel"
}
}
],
[],
[],
[],
[],
[],
[]
],
"search": [
{
"ORIGIN_CITY": "Porto Alegre",
"PRICE_MIN": "2000",
"NUM_ADULTS": "2",
"timestamp": 1471271949.995,
"PRICE_MAX": "3000",
"ARE_DATES_FLEXIBLE": "true",
"NUM_CHILDREN": "5",
"START_TIME": "1470110400000",
"MAX_DURATION": 2592000000.0,
"DESTINATION_CITY": "Brazil",
"RESULT_LIMIT": "10",
"END_TIME": "1472616000000"
},
{
"ORIGIN_CITY": "Atlantis",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272148.124,
"PRICE_MAX": "1700",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "NaN",
"END_TIME": "NaN"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MAX": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272189.07,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MAX": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272205.436,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272278.72,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1470715200000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272454.542,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1471060800000",
"END_TIME": "1472011200000"
},
{
"ORIGIN_CITY": "Caprica",
"PRICE_MIN": "1700",
"NUM_ADULTS": "8",
"RESULT_LIMIT": "10",
"timestamp": 1471272466.008,
"DESTINATION_CITY": "Atlantis",
"NUM_CHILDREN": "",
"ARE_DATES_FLEXIBLE": "true",
"START_TIME": "1471060800000",
"END_TIME": "1472011200000"
}
]
},
How it could be modified to extract all the "text" values from the JSON file to a CSV file?
This is a potential solution using pandas:
import pandas as pd
#importing data
dj = pd.read_json("frames2.json")
dtext = dj[["user_id","turns"]]
#Saving text records in a list
list_ = []
for record in dtext["turns"].values:
for r in record:
list_.append(r["text"])
#Exporting the csv
out = pd.Series(list_,name="text")
out.to_csv("text.csv")
It gives the following output.
Try:
import json
import csv
with open("/Users/zahraa-maher/rasa-init-demo/venv/Tickie/external_data/frames2.json") as file:
data = json.load(file)
fname = "outputText8.csv"
with open(fname, "w") as file:
csv_file = csv.writer(file,lineterminator='\n')
csv_file.writerow(["text"])
for keys,values in data.items():
now it up to you which of the fields you want to save, if you user a debugger you can see the values and Keys

Converting a nested JSON to CSV in Python

I would like to convert nested json to a csv file.
I am receiving the json from Rest API.
The fields in csv should look like following.
daterange_start,daterange_end,clicks,impressions,pivotvalues.
I am new to Python and JSON so would love to get some help.
Here is the sample json.
{
"elements": [
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 3
},
"end": {
"month": 3,
"year": 2019,
"day": 3
}
},
"clicks": 11,
"impressions": 2453,
"pivotValues": [
"urn:li:sponsoredCampaign:1234567"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 7
},
"end": {
"month": 3,
"year": 2019,
"day": 7
}
},
"clicks": 1,
"impressions": 629,
"pivotValues": [
"urn:li:sponsoredCampaign:1234565"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 21
},
"end": {
"month": 3,
"year": 2019,
"day": 21
}
},
"clicks": 3,
"impressions": 154,
"pivotValues": [
"urn:li:sponsoredCampaign:1323516"
]
}
],
"paging": {
"count": 10,
"start": 0,
"links": []
}
}
You could use json_normalize. The only issue is the "pivotValues" is a list. So not sure what you'd want there, or if there are more than 1 element within those lists. If it's just one element, you can just easily process that column. If it can have multiple elements, you can eaither create a new row for each element (meaning you have multiple rows with the same data, except different pivotValues, or you could extend each row to have each pivotValues, but then would have nulls with those lists as different lengths.
I also added on there (seeing that the pivotValues all have same prefix), splitting out hat value for you in case yo needed it.
Given:
data = {
"elements": [
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 3
},
"end": {
"month": 3,
"year": 2019,
"day": 3
}
},
"clicks": 11,
"impressions": 2453,
"pivotValues": [
"urn:li:sponsoredCampaign:1234567"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 7
},
"end": {
"month": 3,
"year": 2019,
"day": 7
}
},
"clicks": 1,
"impressions": 629,
"pivotValues": [
"urn:li:sponsoredCampaign:1234565"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 21
},
"end": {
"month": 3,
"year": 2019,
"day": 21
}
},
"clicks": 3,
"impressions": 154,
"pivotValues": [
"urn:li:sponsoredCampaign:1323516"
]
}
],
"paging": {
"count": 10,
"start": 0,
"links": []
}
}
Code:
import pandas as pd
from pandas.io.json import json_normalize
df = json_normalize(data['elements'])
df['pivotValues'] = df.pivotValues.apply(pd.Series).add_prefix('pivotValues_')
df['pivotValues_stripped'] = df['pivotValues'].str.rsplit(':',1, expand=True)[1]
df.to_csv('path/filename.csv', index=False)
Output:
print (results.to_string())
clicks dateRange.end.day dateRange.end.month dateRange.end.year dateRange.start.day dateRange.start.month dateRange.start.year impressions pivotValues pivotValues_stripped
0 11 3 3 2019 3 3 2019 2453 urn:li:sponsoredCampaign:1234567 1234567
1 1 7 3 2019 7 3 2019 629 urn:li:sponsoredCampaign:1234565 1234565
2 3 21 3 2019 21 3 2019 154 urn:li:sponsoredCampaign:1323516 1323516
You can load and parse the json in python with:
import json
y = json.loads(x)
y will be a python dict. Now loop over y['elements'] and create a list with your desired fields. For example extract the year of start and end dates:
list_for_csv=[]
for e in y['elements']:
list_for_csv.append([e['daterange']['start']['year'],e['daterange']['end']['year']])
Then use numpy to save as csv:
import numpy as np
for_csv = np.asarray(list_for_csv)
np.savetxt("your_file.csv", for_csv, delimiter=",")

Categories