I'm stuck trying to read the files in google colab, It should read the file as a simple JSON but I can't even do a json.dumps(file) without getting 100 of errors
Uploading the file:
import json
import csv
from google.colab import files
uploaded = files.upload()
Printing works, It shows the content of the file:
print(uploaded)
data = json.dumps(uploaded)
But I get Object of type 'bytes' is not JSON serializable when trying to do json.dumps(uploaded)
Shouldn't the file be read as json and not bytes? In some other cases, I tested it also read as dictionary
JSON file:
[
{
"type": "message",
"subtype": "channel_join",
"ts": "123",
"user": "DWADAWD",
"text": "<#DWADAWD> has joined the channel"
},
{
"type": "message",
"subtype": "channel_join",
"ts": "123",
"user": "DWADAWD",
"text": "<#DWADAWD> has joined the channel"
},
{
"text": "Let's chat",
"user_profile": {
"display_name": "XASD",
"team": "TDF31231",
"name": "XASD",
"is_restricted": false,
"is_ultra_restricted": false
},
"blocks": [
{
"type": "rich_text",
"block_id": "2N1",
"elements": [
{
"type": "rich_text_section",
"elements": [
{
"type": "text",
"text": "Let's chat"
}
]
}
]
}
]
}
]
If you upload just 1 file. You can get the content from its values()
data = next(iter(uploaded.values()))
Then, you can convert json string to dict
d = json.loads(data.decode())
Here's an example notebook
JSON handles Unicode strings, not byte sequences. Try:
json.dumps(uploaded.decode("utf-8"))
I prefer to use io and files.
First, I import them (and pandas):
import io
import pandas as pd
from google.colab import files
Then, I use a file widget to upload the file:
uploaded = files.upload()
To load the data into a dataframe:
df = pd.read_json(io.StringIO(uploaded.get('file.json').decode('utf-8')))
The dataframe df has all json data.
Related
I've struck out trying to find a suitable script to iterate through a folder of .json files and update a single line.
Below is an example json file located in a path among others. I would like to iterate through the json files in a folder containing several files like this with various information and update the "seller_fee_basis_points" from "0" to say "500" and save.
Would really appreciate the assistance.
{
"name": "Solflare X NFT",
"symbol": "",
"description": "Celebratory Solflare NFT for the Solflare X launch",
"seller_fee_basis_points": 0,
"image": "https://www.arweave.net/abcd5678?ext=png",
"animation_url": "https://www.arweave.net/efgh1234?ext=mp4",
"external_url": "https://solflare.com",
"attributes": [
{
"trait_type": "web",
"value": "yes"
},
{
"trait_type": "mobile",
"value": "yes"
},
{
"trait_type": "extension",
"value": "yes"
}
],
"collection": {
"name": "Solflare X NFT",
"family": "Solflare"
},
"properties": {
"files": [
{
"uri": "https://www.arweave.net/abcd5678?ext=png",
"type": "image/png"
},
{
"uri": "https://watch.videodelivery.net/9876jkl",
"type": "unknown",
"cdn": true
},
{
"uri": "https://www.arweave.net/efgh1234?ext=mp4",
"type": "video/mp4"
}
],
"category": "video",
"creators": [
{
"address": "SOLFLR15asd9d21325bsadythp547912501b",
"share": 100
}
]
}
}
Updated with an answer due to #JCaesar's help
import json
import glob
import os
SOURCE_DIRECTORY = r'my_favourite_directory'
KEY = 'seller_fee_basis_points'
NEW_VALUE = 500
for file in glob.glob(os.path.join(SOURCE_DIRECTORY, '*.json')):
json_data = json.loads(open(file, encoding="utf8").read())
# note that using the update method means
# that if KEY does not exist then it will be created
# which may not be what you want
json_data.update({KEY: NEW_VALUE})
json.dump(json_data, open(file, 'w'), indent=4)
I recommend using glob to find the files you're interested in. Then utilise the json module for reading and writing the JSON content.
This is very concise and has no sanity checking / exception handling but you should get the idea:
import json
import glob
import os
SOURCE_DIRECTORY = 'my_favourite_directory'
KEY = 'seller_fee_basis_points'
NEW_VALUE = 500
for file in glob.glob(os.path.join(SOURCE_DIRECTORY, '*.json')):
json_data = json.loads(open(file).read())
# note that using the update method means
# that if KEY does not exist then it will be created
# which may not be what you want
json_data.update({KEY: NEW_VALUE})
json.dump(json_data, open(file, 'w'), indent=4)
I have nested json file and i am trying to get the data into data frame. I have to extract sensor-time, then elements and finally sensor info.
Here is how the json file looks like:
{
"sensor-time": {
"timezone": "America/Los_Angeles",
"time": "2019-11-21T01:00:04-08:00"
},
"status": {
"code": "OK"
},
"content": {
"element": [{
"element-id": 0,
"element-name": "Line 0",
"sensor-type": "SINGLE_SENSOR",
"data-type": "LINE",
"from": "2019-11-21T00:00:00-08:00",
"to": "2019-11-21T01:00:00-08:00",
"resolution": "ONE_HOUR",
"measurement": [{
"from": "2019-11-21T00:00:00-08:00",
"to": "2019-11-21T01:00:00-08:00",
"value": [{
"value": 0,
"label": "fw"
}, {
"value": 0,
"label": "bw"
}
]
}
]
}
]
},
"sensor-info": {
"serial-number": "D8:80:39:D9:6B:9B",
"ip-address": "192.168.0.3",
"name": "XD01",
"group": "Boost Mobile",
"device-type": "PC2"
}
}
And here is my code so far:
import json
from pandas.io.json import json_normalize
import glob
import urllib
import sqlalchemy as sa
# Create empty dataframe
# Drill through each file with json extension in the folder, open it, load it and parse it into dataframe
file = 'C:/Test/Loading/testfile.json'
with open(file) as json_file:
json_data = json.load(json_file)
df = json_normalize(json_data, meta=['sensor-time'])
df
and here is the output when I run my code:
I tried using flatten_json librarry and the best I can get it is with this code:
with open(file) as json_file:
json_data = json.load(json_file)
flat = flatten_json(json_data)
df = json_normalize(flat)
And i get output with one row with 33 columns. So in my case since i have multiple values under measurments part of json files, i am getting a column for each of the measurements. What i have to get is 3 rows with 24 columns. One row for each measurements.
So how do i modify this now?
The simplest way I think would be to use pandas.DataFrame(json_data); then you can access these information doing:
pandas.DataFrame(json_data)['sensor-time']['time']
pandas.DataFrame(json_data)['content']['element]
pandas.DataFrame(json_data)['content']['element]
I want to create the JSON file from CSV file using the generic python script.
Found hone package from GitHub but some of the functionalities missing in that code.
csv to json
I want to code like generic template CSV to JSON.
[
{
"birth": {
"day": "7",
"month": "May",
"year": "1985"
},
"name": "Bob",
"reference": "TRUE",
"reference name": "Smith"
}
]
Only handled above type of JSON only.
[
{
"Type": "AwsEc2Instance",
"Id": "i-cafebabe",
"Partition": "aws",
"Region": "us-west-2",
"Tags": {
"billingCode": "Lotus-1-2-3",
"needsPatching": "true"
},
"Details": {
"AwsEc2Instance": {
"Type": "i3.xlarge",
"ImageId": "ami-abcd1234",
"IpV4Addresses": [ "54.194.252.215", "192.168.1.88" ],
"IpV6Addresses": [ "2001:db812341a2b::123" ],
"KeyName": "my_keypair",
"VpcId": "vpc-11112222",
"SubnetId": "subnet-56f5f633",
"LaunchedAt": "2018-05-08T16:46:19.000Z"
}
}
}
]
I want to handle nested array[] ,{}
I have done something like this before and below code can be modified as I have not seen your dataset.
dataframe = pd.read_excel('dataframefilepath', encoding='utf-8', header=0)
'''Adding to list to finally save it as JSON'''
df = []
for (columnName, columnData) in dataframe.iteritems():
if dataframe.columns.get_loc(columnName) > 0:
for indata, rwdata in dataframe.iterrows():
for insav, rwsave in df_to_Save.iterrows():
if rwdata.Selected_Prediction == rwsave.Selected_Prediction:
#print()
df_to_Save.loc[insav, 'Value_to_Save'] = rwdata[dataframe.columns.get_loc(columnName)]
#print(rwdata[dataframe.columns.get_loc(columnName)])
df.append(df_to_Save.set_index('Selected_Prediction').T.to_dict('record'))
df = eval(df)
'''Saving in JSON format'''
path_to_save = '\\your path'
with open(path_to_save, 'w') as json_file:
json.dump(df, json_file)
I am very much new to JSON parsing. Below is my JSON:
[
{
"description": "Newton",
"exam_code": {
"date_added": "2015-05-13T04:49:54+00:00",
"description": "Production",
"exam_tags": [
{
"date_added": "2012-01-13T03:39:17+00:00",
"descriptive_name": "Production v0.1",
"id": 1,
"max_count": "147",
"name": "Production"
}
],
"id": 1,
"name": "Production",
"prefix": "SA"
},
"name": "CM"
},
{
"description": "Opera",
"exam_code": {
"date_added": "2015-05-13T04:49:54+00:00",
"description": "Production",
"test_tags": [
{
"date_added": "2012-02-22T12:44:55+00:00",
"descriptive_name": "Production v0.1",
"id": 1,
"max_count": "147",
"name": "Production"
}
],
"id": 1,
"name": "Production",
"prefix": "SA"
},
"name": "OS"
}
]
Here I am trying to find if name value is CM print description value.
If name value is OS then print description value.
Please help me to to understand how JSON parsing can be done?
Considering you have already read the JSON string from somewhere, be it a file, stdin, or any other source.
You can actually deserialize it into a Python object by doing:
import json
# ...
json_data = json.loads(json_str)
Where json_str is the JSON string that you want to parse.
In your case, json_str will get deserialized into a Python list, so you can do any operation on it as you'd normally do with a list.
Of course, this includes iterating over the elements:
for item in json_data:
if item.get('name') in ('CM', 'OS'):
print(item['description'])
As you can see, the items in json_data have been deserialized into dict, so you can access the actual fields using dict operations.
Note
You can also deserialize a JSON from the source directly, provided you have access to the file handler/descriptor or stream:
# Loading from a file
import json
with open('my_json.json', 'r') as fd:
# Note that we're using json.load, not json.loads
json_data = json.load(fd)
# Loading from stdin
import json, sys
json_data = json.load(sys.stdin)
I'm new to Python and I would like to search and replace titles of IDs in a JSON-file. Normally I would use R for this Task, but how to do it in Python. Here a sample of my JSON code (with a Service ID and a layer ID). I'm interested in replacing the titles in the layer IDs:
...{"services": [
{
"id": "service",
"url": "http://...",
"title": "GEW",
"layers": [
{
"id": "0",
"title": "wrongTitle",
},
{
"id": "1",
"title": "againTitleWrong",
},
],
"options": {}
},],}
For the replace I would use a table/csv like this:
serviceID layerID oldTitle newTitle
service 0 wrongTitle newTitle1
service 1 againTitleWrong newTitle2
....
Do you have ideas? Thanks
Here's an working example on repl.it.
Code:
import json
import io
import csv
### json input
input = """
{
"layers": [
{
"id": "0",
"title": "wrongTitle"
},
{
"id": "1",
"title": "againTitleWrong"
}
]
}
"""
### parse the json
parsed_json = json.loads(input)
#### csv input
csv_input = """serviceID,layerID,oldTitle,newTitle
service,0,wrongTitle,newTitle1
service,1,againTitleWrong,newTitle2
"""
### parse csv and generate a correction lookup
parsed_csv = csv.DictReader(io.StringIO(csv_input))
lookup = {}
for row in parsed_csv:
lookup[row["layerID"]] = row["newTitle"]
#correct and print json
layers = parsed_json["layers"]
for layer in layers:
layer["title"] = lookup[layer["id"]]
parsed_json["layers"] = layers
print(json.dumps(parsed_json))
You don't say which version of Python you are using but there are built-on JSON parsers for the language.
For 2.x: https://docs.python.org/2.7/library/json.html
For 3.x: https://docs.python.org/3.4/library/json.html
These should be able to help you to parse the JSON and replace what you want.
As other users suggested, check the JSON module will be helpful.
Here gives a basic example on python2.7:
import json
j = '''{
"services":
[{
"id": "service",
"url": "http://...",
"title": "GEW",
"options": {},
"layers": [
{
"id": "0",
"title": "wrongTitle"
},
{
"id": "1",
"title": "againTitleWrong"
}
]
}]
}'''
s = json.loads(j)
s["services"][0]["layers"][0]["title"] = "new title"
# save json object to file
with open('file.json', 'w') as f:
json.dump(s, f)
You can index the element and change its title according to your csv file, which requires the use of CSV module.