I have a command line tool that takes strings that represent dictionaries and runs traffic
warp_cli -j '{"warp_ip":"172.18...
Robot Framework is generating parts of these dictionaries and inject unicode keys u'TCP'. If there is a unicode key being passed to the dictionary I get:
ValueError: No JSON object could be decoded
Example:
import json
streams = [
{
"burst_loop_count": "500",
"protocol": u"TCP",
"tcp_src_port": "10000",
},
{
"burst_loop_count": "500",
"protocol": u"TCP",
"tcp_src_port": "10000"
}
]
def _create_streams_arg(streams):
return str(streams).replace("'", '"')
print(json.loads(_create_streams_arg((streams))))
Is there a way to: In place map str(x) or x.encode("ascii", "ignore") to all values in Streams or is there a way to have json.loads() properly load something that holds unicode?
I got it working with:
def _encode(streams):
encoded = []
for stream in streams:
encoded.append({str(k): str(v) for k, v in stream.iteritems()})
return encoded
But I hoped there might be a cleaner solution
Related
My CSV headers look something like
from/email
to/0/email
personalization/0/email/
personalization/0/data/first_name
personalization/0/data/company_name
personalization/0/data/job_title
template_id
Output should be:
[
{
"from": {
"email": "me#x.com",
"name": "Me"
},
"to": [
{
"email": "mike#x.com"
}
],
"personalization": [
{
"email": "mike#x.com",
"data": {
"first_name": "Mike",
"company_name": "X.com",
"job_title": "Chef"
}
}
],
"template_id": "123456"
},
I tried
csvjson input.csv output.csv
csvtojson input.csv output.csv
csv2json input.csv output.csv
python3 app.py
import csv
import json
def csv_to_json(csvFilePath, jsonFilePath):
jsonArray = []
#read csv file
with open(csvFilePath, encoding='utf-8') as csvf:
#load csv file data using csv library's dictionary reader
csvReader = csv.DictReader(csvf)
#convert each csv row into python dict
for row in csvReader:
#add this python dict to json array
jsonArray.append(row)
#convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(jsonArray, indent=4)
jsonf.write(jsonString)
csvFilePath = r'outputt1.csv'
jsonFilePath = r'outputt1.json'
csv_to_json(csvFilePath, jsonFilePath)
node app.js
const CSVToJSON = require('csvtojson');
// convert users.csv file to JSON array
CSVToJSON().fromFile('outputt1.csv')
.then(from => {
// from is a JSON array
// log the JSON array
console.log(from);
}).catch(err => {
// log error if any
console.log(err);
});
All output some variation of single-line JSON with no nesting.
The only thing that worked was uploading it to https://www.convertcsv.com/csv-to-json.htm and converting each file by hand, but that is obviously not a solution.
I have seen a post recommending Choetl.Json for this exact purpose but was unable to install it on mac
Your problem should be broken down into two parts: parsing CSV data for conversion into JSON, and building a JSON structure following path-like descriptors.
For the first part, it is necessary to clarify the formatting of the CSV input, as there is no general standard for CSV, just a fundamental description in the RFC 4180 proposal and a lot of adoptions tailored to specific use cases or data types. As you didn't provide any actual CSV content, let's assume for the sake of simplicity that records are separated by newlines, fields are separated by commas, and field delimiters are not present as the data itself never contains any of these separators. Let's further assume that all records have the exact same number of fields, and that exactly one of them (namely the first) represents the headers. You may want to adjust these assumptions to your actual CSV data.
cat input.csv
from/email,to/0/email,personalization/0/email,personalization/0/data/first_name,personalization/0/data/company_name,personalization/0/data/job_title,template_id
me#x.com,mike#x.com,mike#x.com,Mike,X.com,Chef,123456
Based on this formatting, you can read in the CSV data using the --raw-input or -R option which streams in each newline-separated segment of raw text as a JSON string input. Ideally, your filter should then convert each input string record into an array of string fields by splitting at the comma, e.g. using the / operator:
jq -R '. / ","' input.csv
[
"from/email",
"to/0/email",
"personalization/0/email",
"personalization/0/data/first_name",
"personalization/0/data/company_name",
"personalization/0/data/job_title",
"template_id"
]
[
"me#x.com",
"mike#x.com",
"mike#x.com",
"Mike",
"X.com",
"Chef",
"123456"
]
Demo
As for the second part, you can now easily process these JSON arrays. In order to treat the first one (the headers) separately, you could use the --slurp or -s option which turns the input stream into an array whose elements can then be accessed using indices. Also, the setpath builtin comes in handy as it can set values within a JSON structure described as an array of strings and integers representing object fields and array indices, just as you do in your headers. This leaves you turning the header strings into such arrays by splitting at "/" and converting number-like segments into actual numbers. Finally, to successively build up your JSON objects you could iterate through the record fields using a reduce statement and align the record fields to their corresponding header fields using transpose:
… | jq -s '
(.[0] | map(. / "/" | map(tonumber? // .))) as $headers
| .[1:] | map(
reduce ([$headers, .] | transpose[]) as [$path, $value] (
{}; setpath($path; $value)
)
)
'
[
{
"from": {
"email": "me#x.com"
},
"to": [
{
"email": "mike#x.com"
}
],
"personalization": [
{
"email": "mike#x.com",
"data": {
"first_name": "Mike",
"company_name": "X.com",
"job_title": "Chef"
}
}
],
"template_id": "123456"
}
]
Demo
Notes
My showcase disregards the fact that your sample JSON output also provides an additional field name under the top-level field from because your sample CSV input headers don't include a matching field from/name
To emphasize the bipartite nature of this approach, I concluded with two cascading invocations of jq. This generally could (and mostly should) be combined into one. However, as combining the options --raw-input and --slurp would alter jq's read-in behaviour, you'd rather want to add the --null-input or -n option with [inputs | …] in the first filter which lets you dismiss the --slurp option in the second: jq -Rn '[inputs / "/"] | …' (Demo)
I am using csv module to convert json to csv and store it in a file or print it to stdout.
def write_csv(data:list, header:list, path:str=None):
# data is json format data as list
output_file = open(path, 'w') if path else sys.stdout
out = csv.writer(output_file)
out.writerow(header)
for row in data:
out.writerow([row[attr] for attr in header])
if path: output_file.close()
I want to store the converted csv to a variable instead of sending it to a file or stdout.
say I want to create a function like this:
def json_to_csv(data:list, header:list):
# convert json data into csv string
return string_csv
NOTE: format of data is simple
data is list of dictionaries of string to string maping
[
{
"username":"srbcheema",
"name":"Sarbjit Singh"
},
{
"username":"testing",
"name":"Test, user"
}
]
I want csv output to look like:
username,name
srbcheema,Sarbjit Singh
testing,"Test, user"
Converting JSON to CSV is not a trivial operation. There is also no standardized way to translate between them...
For example
my_json = {
"one": 1,
"two": 2,
"three": {
"nested": "structure"
}
}
Could be represented in a number of ways...
These are all (to my knowledge) valid CSVs that contain all the information from the JSON structure.
data
'{"one": 1, "two": 2, "three": {"nested": "structure"}}'
one,two,three
1,2,'{"nested": "structure"}'
one,two,three__nested
1,2,structure
In essence, you will have to figure out the best translation between the two based on your knowledge of the data. There is no right answer on how to go about this.
I'm relatively knew to Python so there's probably a better way, but this works:
def get_safe_string(string):
return '"'+string+'"' if "," in string else string
def json_to_csv(data):
csv_keys = data[0].keys()
header = ",".join(csv_keys)
res = list(",".join(get_safe_string(row.get(k)) for k in csv_keys) for row in data)
res.insert(0,header)
return "\n".join(r for r in res)
I'm receiving a string server side which I then convert to JSON:
127.0.0.1:8000/devices/f751/?json={ "DeviceId":"192-2993-2993", "Date":"1/4/2019 9:52:2", "Location":"-1.000000000,-1.000000000", "Key":"{XXXX-XXXX-XXXX}", "Data":" { \"Value0\":\"{ \"ReferenceValue\":\"Elevation\", \"Prediction\":\"22.216558464\"}\", \"Value1\":\"{ \"ReferenceValue\":\"Wind Speed\", \"Prediction\":\"42.216558464\"}\" } "}
After conversion using json.loads() I get the following output:
updatedRequest = json.loads(jsonRequest)
updatedRequest
{'DeviceId': '192-2993-2993',
'Date': '1/4/2019 9:52:2',
'Location': '-1.000000000,-1.000000000',
'Key': '{XXXX-XXXX-XXXX}',
'Data': '{ "Value0":"{ "ReferenceValue":"Elevation", "Prediction":"22.216558464"}", "Value1":"{ "ReferenceValue":"Wind Speed", "Prediction":"42.216558464"}" }'}
So far so good, I can access the Data value via updatedRequest['Data'].
updatedRequest['Data']
'{ "Value0":"{ "ReferenceValue":"Elevation", "Prediction":"22.216558464"}", "Value1":"{ "ReferenceValue":"Wind Speed", "Prediction":"42.216558464"}" }'
My issue when attempting to convert this into a Python usable dictionary (e.g updatedRequest['Data']['Value0']['ReferenceValue']). Because there is an unknown number of 'Value' keys, I'm uncertain as to what the best procedure would be to move this into workable data.
You have received a JSON document with a nested JSON document, itself containing further JSON documents, inside one another like a Matryoshka doll.
Unfortunately, you can only decode one level, because the next level is broken. There should be \ escapes in front of the " quote characters used for the 3rd level of JSON documents, just like the second level quotes were escaped when it was embedded in the top-level JSON document. Those are missing so no JSON parser can decode it anymore. The delimiters around JSON strings have been derailed by stray, unescaped " characters that were meant to be part of a JSON string value.
You either need to repair the client sending this data, and discard these malformed values as an invalid request.
For completeness sake, a valid document would look like this:
>>> v0 = '''{ "ReferenceValue":"Elevation", "Prediction":"22.216558464"}'''
>>> v1 = '''{ "ReferenceValue":"Wind Speed", "Prediction":"42.216558464"}" }'''
>>> data_value = json.dumps({'Value0': v0, 'Value1': v1})
>>> print(json.dumps({'Data': data_value, 'Date': '1/4/2019 9:52:2', 'DeviceId': '192-2993-2993', 'Key': '{XXXX-XXXX-XXXX}', 'Location': '-1.000000000,-1.000000000'}, indent=4))
{
"Data": "{\"Value0\": \"{ \\\"ReferenceValue\\\":\\\"Elevation\\\", \\\"Prediction\\\":\\\"22.216558464\\\"}\", \"Value1\": \"{ \\\"ReferenceValue\\\":\\\"Wind Speed\\\", \\\"Prediction\\\":\\\"42.216558464\\\"}\\\" }\"}",
"Date": "1/4/2019 9:52:2",
"DeviceId": "192-2993-2993",
"Key": "{XXXX-XXXX-XXXX}",
"Location": "-1.000000000,-1.000000000"
}
Note the \" and \\\" escapes in the Data value. On decoding, the string value for Data will have one level of escape sequences removed, forming " and \" sequences, where the " quotes are part of the JSON syntax and \" are part of the string values, which in turn can be decoded to " used in the innermost JSON document.
It really depends what you want to do with the data. You can loop through the 'Data' dictionary with:
for k,v in updatedRequest['Data'].items():
# do some stuff
This will allow you to process without having to deal with the variable number of items in this dictionary. Hard to say what is best without knowing exactly what you wish to do though!
I'm trying to loop through JSON data to find values for specific keys. My data is coming from a http request and the data looks like:
{'1': {'manufacturername': 'SVLJ',
'modelid': 'TCL014',
'name': 'Fling'},
'10': {'manufacturername': 'SONY',
'modelid': 'BLL4554',
'name': 'ACQ'}}
My current goal is to loop through each item number (1, 10, etc..) and get the value for light ('fling', 'acq', etc..). My latest attempt is:
import requests
RESOURCE_URL = 'xxx/xxx/'
def get_json(url):
raw_response = requests.get(url)
data = raw_response.json()
return data
def get_SMR():
url = "{}SMR/".format(RESOURCE_URL)
return get_json(url)
smr_json = get_SMR()
for SMR in smr_json:
print(SMR['name'])
When I try running this, I get the error:
TypeError: string indices must be integers
I've also tried importing the json library, and using json.loads(raw_response.text); however, it's still being recognized as a string, rather than an iterable json object (that can be referenced by key). Any and all insight would be greatly appreciated.
When you are doing for SMR in smr_json:, you are iterating over the keys of the dictionary. In other words, SMR is a string, which does not allow indexing by a string:
In [1]: SMR = 'test'
In [2]: SMR['string']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
...
TypeError: string indices must be integers
You've meant to iterate over both the keys and values:
for key, SMR in smr_json.items():
print(SMR['name'])
Or, just values:
for SMR in smr_json.values():
print(SMR['name'])
You are probably getting a string because that is not valid JSON. JSON requires " for strings, not '.
See json.org:
A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.
I think that the problem is in JSON file. Single quotes are not allowed.
I'd first replace the single quotes ' with the double quotes " , to have something like this:
{
"1": {
"manufacturername": "SVLJ",
"modelid": "TCL014",
"name": "Fling"
},
"10": {
"manufacturername": "SONY",
"modelid": "BLL4554",
"name": "ACQ"
}
}
I made a big mistake, when I choose the way of dumping data;
Now I have a text file, that consist of
{ "13234134": ["some", "strings", ...]}{"34545345": ["some", "strings", ...]} ..so on
How can I read it into python?
edit:
I have tried json,
when I add at begin and at end of file curly-braces manually, I have "ValueError: Expecting property name:", because "13234134" string maybi invalid for json, I do not know how to avoid it.
edit1
with open('new_file.txt', 'w') as outfile:
for index, user_id in enumerate(users):
json.dump(dict = get_user_tweets(user_id), outfile)
It looks like what you have is an undelimited stream of JSON objects. As if you'd called json.dump over and over on the same file, or ''.join(json.dumps(…) for …). And, in fact, the first one is exactly what you did. :)
So, you're in luck. JSON is a self-delimiting format, which means you can read up to the end of the first JSON object, then read from there up to the end of the next JSON object, and so on. The raw_decode method essentially does the hard part.
There's no stdlib function that wraps it up, and I don't know of any library that does it, but it's actually very easy to do yourself:
def loads_multiple(s):
decoder = json.JSONDecoder()
pos = 0
while pos < len(s):
pos, obj = decoder.raw_decode(s, pos)
yield obj
So, instead of doing this:
obj = json.loads(s)
do_stuff_with(obj)
… you do this:
for obj in loads_multi(s):
do_stuff_with(obj)
Or, if you want to combine all the objects into one big list:
objs = list(loads_multi(s))
Consider simply rewriting it to something that is valid json. If indeed your bad data only contains the format that you've shown (a series of json structures that are not comma-separated), then just add commas and square braces:
with open('/tmp/sto/junk.csv') as f:
data = f.read()
print(data)
s = "[ {} ]".format(data.strip().replace("}{", "},{"))
print(s)
import json
data = json.loads(s)
print(type(data))
Output:
{ "13234134": ["some", "strings"]}{"34545345": ["some", "strings", "like", "this"]}
[ { "13234134": ["some", "strings"]},{"34545345": ["some", "strings", "like", "this"]} ]
<class 'list'>