I am requesting URLs which contain JSON objects in python.
Some of these URLs contain several JSON objects (so several parts which are enclosed by these brackets: [])
When I tried to load these with json.loads() I got the error:
JSONDecodeError: Extra data: line 494 column 1 (char 50502)
Therefore I tried to split the JSON objects and write them into a list like this:
response = requests.get(url)
textinhalt = response.text
ref = textinhalt.rsplit('[')
tev = []
for line in ref:
daten = json.loads(line[line.find(r"{"):line.rfind("}")+1])
tev.append(daten)
But I get this error:
JSONDecodeError: Expecting ',' delimiter: line 10 column 29 (char 16011)
For example here is the part of the JSON that causes the Extra Data error(line 494):
474. "originalbild" : {
475. "alt" : "",
476. "height" : "3601",
477. "quelle" : "",
478. "src" : "/imgs/65/2/7/6/2/8/0/4/IMG_5630-3a13bb38ae440652.jpeg",
479. "untertitel" : "",
480. "width" : "5401"
481. },
482. "b" : "",
483. "redakteur_bid" : "",
484. "redakteur_email" : "",
485. "redakteur_inline" : "0",
486. "redakteur_kategorie" : "1",
487. "redakteur_kuerzel" : "",
488. "redakteur_nachname" : "",
489. "redakteur_redaktion" : "",
490. "redakteur_vorname" : "",
491. "ressort" : "",
492. "seitennavigation_liste" : [
493. {
494. "_baseurl" : "/Macoun-2019-Von-SwiftUI-bis-NFC-4547400.html",
495. "artikelseite" : 1,
496. "container_id" : 4547400,
497. "titel" : "Macoun 2019: Von SwiftUI bis NFC"
498. },
499. {
500. "artikelseite" : 2,
501. "titel" : "Motion Capturing in ARKit und RealityKit"
502. },
503. {
504. "artikelseite" : 3,
505. "titel" : "Bring deine Tests zum Rennen"
506. },
507. {
508. "artikelseite" : 4,
509. "titel" : "Tipps f\u00fcr Existenzgr\u00fcnder"
510. },
511. {
512. "artikelseite" : 5,
513. "titel" : "Cross Platform Entwicklung mit Kotlin"
514. }
515. ],
516. "seo_beschreibung" : "",
517. "seo_no_meta_description" : 0,
518. "seo_titel" : "",
519. "show_webdev_chooser" : null,
520. "socialbookmarks_keywords_ph_data" : "Apple, Entwickler, Programmieren, iOS, macOS",
521. "speakingurl_primitive" : "Macoun-2019-Von-SwiftUI-bis-NFC",
522. "teaser_anrisstext" : "",
523. "teaser_titel" : "",
524. "teaser_untertitel" : "",
525. "texte_anzahl_zeichen" : [
526. 12499
527. ],
What am I doing wrong?
Related
I've exported fitbit sleep data and got a json file with nested variables + dict. I would like to convert the json file to a csv file that will display all "regular" variables, e.g. "dateOfSleep" but also the nested variables, e.g. "deep" & "wake" with all dictionary information.
I tried json_normalize; but I can only make it work for the first nested variables, e.g. "levels". Anybody has an idea?
Much appreciated.
[{
"logId" : 32072056107,
"dateOfSleep" : "2021-05-08",
"startTime" : "2021-05-07T23:22:00.000",
"endTime" : "2021-05-08T08:05:30.000",
"duration" : 31380000,
"minutesToFallAsleep" : 0,
"minutesAsleep" : 468,
"minutesAwake" : 55,
"minutesAfterWakeup" : 0,
"timeInBed" : 523,
"efficiency" : 93,
"type" : "stages",
"infoCode" : 0,
"levels" : {
"summary" : {
"deep" : {
"count" : 5,
"minutes" : 85,
"thirtyDayAvgMinutes" : 68
},
"wake" : {
"count" : 30,
"minutes" : 55,
"thirtyDayAvgMinutes" : 56
},
"light" : {
"count" : 30,
"minutes" : 267,
"thirtyDayAvgMinutes" : 235
},
"rem" : {
"count" : 10,
"minutes" : 116,
"thirtyDayAvgMinutes" : 94
}
},
.....
Use pd.json_normalize, all nested levels are flatten and join with dots (by default):
import pandas as pd
import json
with open('data.json') as fp:
data = json.load(fp)
df = pd.json_normalize(data)
Output:
>>> df
logId dateOfSleep startTime ... levels.summary.rem.count levels.summary.rem.minutes levels.summary.rem.thirtyDayAvgMinutes
0 32072056107 2021-05-08 2021-05-07T23:22:00.000 ... 10 116 94
[1 rows x 25 columns]
Content of data.json:
[{
"logId" : 32072056107,
"dateOfSleep" : "2021-05-08",
"startTime" : "2021-05-07T23:22:00.000",
"endTime" : "2021-05-08T08:05:30.000",
"duration" : 31380000,
"minutesToFallAsleep" : 0,
"minutesAsleep" : 468,
"minutesAwake" : 55,
"minutesAfterWakeup" : 0,
"timeInBed" : 523,
"efficiency" : 93,
"type" : "stages",
"infoCode" : 0,
"levels" : {
"summary" : {
"deep" : {
"count" : 5,
"minutes" : 85,
"thirtyDayAvgMinutes" : 68
},
"wake" : {
"count" : 30,
"minutes" : 55,
"thirtyDayAvgMinutes" : 56
},
"light" : {
"count" : 30,
"minutes" : 267,
"thirtyDayAvgMinutes" : 235
},
"rem" : {
"count" : 10,
"minutes" : 116,
"thirtyDayAvgMinutes" : 94
}
}
}
}]
I have been working on an educational project a small part of it requires me to convert a single line of json data into an variable in python 3 which I recieve from domoticz (an external open source software) however due to my skill level with json I have expierenced some issues and I am not exactly sure what im doing wrong. I did get the 200 response everytime so I assume from what I understood that means the connection isnt the issue but rather the python code. (I censored the addressed but they are correct.)
The code im using:
import time
import re
import requests
from ctypes import c_int, c_char_p, byref, sizeof, c_uint16, c_int32, c_byte
from ctypes import c_void_p
from datetime import datetime
import os
import urllib.request
import json
import logging
import sys
from requests.exceptions import HTTPError
logger = logging.getLogger(__name__)
domoticzserver='ip'
switchid='3'
device='5'
tempbed=str(4)
def domoticzrequest (url):
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
return response.read()
import urllib.request, json
with urllib.request.urlopen("http://domoticzip/json.htm?type=devices&rid=4") as url:
data = json.loads(url.read().decode())
print(data)
The json I get back which i can see by typing clicking the url in python:
{
"ActTime" : 1606722346,
"AstrTwilightEnd" : "18:37",
"AstrTwilightStart" : "06:23",
"CivTwilightEnd" : "17:14",
"CivTwilightStart" : "07:47",
"DayLength" : "08:08",
"NautTwilightEnd" : "17:56",
"NautTwilightStart" : "07:04",
"ServerTime" : "2020-11-30 08:45:46",
"SunAtSouth" : "12:30",
"Sunrise" : "08:26",
"Sunset" : "16:34",
"app_version" : "2020.2",
"result" :
[
{
"AddjMulti" : 1.0,
"AddjMulti2" : 1.0,
"AddjValue" : 0.0,
"AddjValue2" : 0.0,
"BatteryLevel" : 255,
"CustomImage" : 0,
"Data" : "Normal",
"Description" : "",
"Favorite" : 0,
"HardwareID" : 1,
"HardwareName" : "Domoticz Internal",
"HardwareType" : "Domoticz Internal interface",
"HardwareTypeVal" : 67,
"HaveDimmer" : false,
"HaveGroupCmd" : false,
"HaveTimeout" : false,
"ID" : "148702",
"LastUpdate" : "2020-10-19 15:10:02",
"MaxDimLevel" : 0,
"Name" : "Domoticz Security Panel",
"Notifications" : "false",
"PlanID" : "0",
"PlanIDs" :
[
0
],
"Protected" : false,
"ShowNotifications" : true,
"SignalLevel" : "-",
"Status" : "Normal",
"StrParam1" : "",
"StrParam2" : "",
"SubType" : "Security Panel",
"SwitchType" : "Security",
"SwitchTypeVal" : 0,
"Timers" : "false",
"Type" : "Security",
"TypeImg" : "security",
"Unit" : 0,
"Used" : 0,
"XOffset" : "0",
"YOffset" : "0",
"idx" : "2"
},
{
"AddjMulti" : 1.0,
"AddjMulti2" : 1.0,
"AddjValue" : 0.0,
"AddjValue2" : 0.0,
"BatteryLevel" : 255,
"CustomImage" : 0,
"Data" : "-5.0 C",
"Description" : "",
"Favorite" : 1,
"HardwareID" : 2,
"HardwareName" : "Test sensor",
"HardwareType" : "Dummy (Does nothing, use for virtual switches only)",
"HardwareTypeVal" : 15,
"HaveTimeout" : true,
"ID" : "14053",
"LastUpdate" : "2020-11-09 09:03:34",
"Name" : "Temperatuur Kachel",
"Notifications" : "false",
"PlanID" : "0",
"PlanIDs" :
[
0
],
"Protected" : false,
"ShowNotifications" : true,
"SignalLevel" : "-",
"SubType" : "LaCrosse TX3",
"Temp" : -5.0,
"Timers" : "false",
"Type" : "Temp",
"TypeImg" : "temperature",
"Unit" : 1,
"Used" : 1,
"XOffset" : "0",
"YOffset" : "0",
"idx" : "3",
"trend" : 0
},
{
"AddjMulti" : 1.0,
"AddjMulti2" : 1.0,
"AddjValue" : 0.0,
"AddjValue2" : 0.0,
"BatteryLevel" : 255,
"CustomImage" : 0,
"Data" : "17.5",
"Description" : "",
"Favorite" : 1,
"HardwareID" : 3,
"HardwareName" : "Test switch",
"HardwareType" : "Dummy (Does nothing, use for virtual switches only)",
"HardwareTypeVal" : 15,
"HaveTimeout" : true,
"ID" : "0014054",
"LastUpdate" : "2020-11-06 11:51:09",
"Name" : "Temperatuur gewenst",
"Notifications" : "false",
"PlanID" : "0",
"PlanIDs" :
[
0
],
"Protected" : false,
"SetPoint" : "17.5",
"ShowNotifications" : true,
"SignalLevel" : "-",
"SubType" : "SetPoint",
"Timers" : "false",
"Type" : "Thermostat",
"TypeImg" : "override_mini",
"Unit" : 1,
"Used" : 1,
"XOffset" : "0",
"YOffset" : "0",
"idx" : "4"
}
],
"status" : "OK",
"title" : "Devices"
}
Basicly I want SetPoint(in the last tab of text also right above this) which is in this instance 17.5 as a variable in python and I will make the python code loop so it will grab that json url everytime to update the status of setpoint. But im having issues only grabbing the 17.5 to make it into a variable. I end up getting the entire json like this code is doing. or I will end up getting everything past and including the setpoint if I change some stuff. Does anyone know what im doing wrong and possibly where I should be looking for an solution? I am a bit unexpierenced with the json part of python and I have no clue where to get started as the code's I found and have tried seem to not work or give me errors.
Thank you very much for your time!
json.loads returns a python dictionary so maybe sth like this would do:
result = json['result']
set_point = 0.0
for res in result:
if 'SetPoint' in res:
set_point = res['SetPoint']
You are getting your data stored in data ={"key": argument} as a dictionary.
If you want to access a certain value you have to call for it. in Your case:
SetPoint = float(data["result"][-1]["SetPoint"])
To break it down:
data["result"] # gives you a list which elements are dictionaries.
the [-1] # calls for the last element in you list which contains the SetPoint
["SetPoint"] # then calls for the SetPoint Value which is a String "17.5"
float(...) converts the string to a float value
I need to prepend a Json object with ""Users" : " but I can't figure out how to handle the ":". The closest I've gotten is getting the colon within the quotes and then it spits out an extra comma. Any ideas? So the issue is the colon sits in the quotes and it adds in a comma, which the api endpoint won't accept.
Here is what It should look like
**["users" :** [{
"email": "hallbeth#placeholder.email",
"dataFields": {
"favoriteTomatoe": "Green Zebra",
"daysSinceLastOrder": "137",
"city": "Lake Michaelberg",
"firstName": "Richard",
"zip": "58570",
"lastName": "Tyler",
"age": "50",
"state": "UT",
"totalTomatoOrders": "23",
"streetAddress": "925 Holland Burgs Suite 652",
"phoneNumber": "+67(4)7940410189",
"gender": "male",
"customMessageOne": "Esse magnam voluptatibus id ex ipsam assumenda excepturi tenetur."
}
}]
And here is what the output looks like
**["users :",** [{
"email": "hallbeth#placeholder.email",
"dataFields": {
"favoriteTomatoe": "Green Zebra",
"daysSinceLastOrder": "137",
"city": "Lake Michaelberg",
"firstName": "Richard",
"zip": "58570",
"lastName": "Tyler",
"age": "50",
"state": "UT",
"totalTomatoOrders": "23",
"streetAddress": "925 Holland Burgs Suite 652",
"phoneNumber": "+67(4)7940410189",
"gender": "male",
"customMessageOne": "Esse magnam voluptatibus id ex ipsam assumenda excepturi tenetur."
}
}]
Here is my code
import requests
import json
import csv
import pdb
limit = 2
curVal = 0
user_list = []
user_list_2 = [
("users" + ' ' + ':')]
with open('john.csv', 'r') as csv_file:
csv_file = csv.reader(csv_file)
next(csv_file)
for line in csv_file :
user_list.append(
[{
"email" : line[2],
"dataFields" : {
"firstName": line[0],
"lastName" : line[1],
"favoriteTomatoe" : line[3],
"totalTomatoOrders" : line[4],
"daysSinceLastOrder" : line[5],
"zip" : line[6],
"phoneNumber" : line[7],
"age" : line[8],
"streetAddress" : line[9],
"city" : line[10],
"state" : line[11],
"customMessageOne" : line[12],
"gender" : line[13]
}
}])
if curVal == limit:
body = json.dumps(user_list_2 + user_list)
print(body)
headers = {
"Content-Type": "application/json",
"Accept": "application/json"}
res = requests.request("POST",
"https://api.iterable.com/api/users/bulkUpdate?apiKey="key",
headers=headers, data=body)
curVal = 0
user_list = []
print(res.url + "\n\n" + str(res.status_code) + res.text)
else:
curVal = curVal + 1
It seems there are multiple misunderstanding in your code.
First, each user is added to the user_list as a single-item-list containing one user-dictionary. You could simply skip the single-item-list level and simply append the dictionary:
user_list.append({
"email" : line[2],
"dataFields" : {
"firstName": line[0],
"lastName" : line[1],
"favoriteTomatoe" : line[3],
"totalTomatoOrders" : line[4],
"daysSinceLastOrder" : line[5],
"zip" : line[6],
"phoneNumber" : line[7],
"age" : line[8],
"streetAddress" : line[9],
"city" : line[10],
"state" : line[11],
"customMessageOne" : line[12],
"gender" : line[13]
}
})
Then you can consider a sort of mapping between Python and JSON types:
Python list = JSON array
Python dict = JSON object
So, the user_list can be interpreted as a JSON array and if you want it to be assigned as the Users property of a JSON object, you just have to assign the user_list as the value of a Python dict's Users key. Then passing the Python dict
to the json.dumps function should return the wanted JSON data:
body = json.dumps({'Users': user_list})
I am trying to use upsert to create a new badges record if it doesn't exist and append to the names array if it does. I have tried to follow Elasticsearch upserting and appending to array and the documentation without success. So far I have
es.update(index = '.people',
doc_type = 'badges',
id= match['badgeNumber'],
body = {
"script": {
"inline": "if(ctx._source.names.contains(nm)) {ctx.op = 'none'} else {ctx._source.names += params.nm}",
"lang" : "painless",
"params": {
"nm": name
}
},
"upsert": {
"names": name
}
})
The code works fine to add new documents such as:
{
"_index" : ".people",
"_type" : "badges",
"_id" : "12345",
"_score" : 1.0,
"_source" : {
"names" : [
"John Smith"
]
}
},
{
"_index" : ".people",
"_type" : "badges",
"_id" : "7896",
"_score" : 1.0,
"_source" : {
"names" : [
"Amy Wexler"
]
}
}
but if I try to update the list:
match = {'badge' = '12345'}
name = 'Johnny Smith'
update_names(name, match)
I get the error:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 525, in update
doc_type, id, '_update'), params=params, body=body)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
self._raise_error(response.status, raw_data)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'illegal_argument_exception', '[9wu5eiG][127.0.0.1:9300][indices:data/write/update[s]]')
output_file = open("processed.txt", "a")
with open('text.json') as json_file:
object_list = json.load(json_file)
receive_txids = []
for object in object_list:
if object['category'] == 'receive':
receive_txids.append(object['txid'])
with open('list.txt', "r") as list_file:
for txid in receive_txids:
if txid not in list_file:
print "for ea txid " + txid
print "NEW TRANSACTION";
output_file.write(txid + '\n')
that's my code
this is list.txt contents
db8cd79b4a0eafc6368bb65d2ff34d7d9c3d2016bee8528ab945a3d4bbad982d
2a93476e65b5500bc3d69856ddd512854d4939f1aabf488ac2806ec346a898a3
45b629a779e6e0bf6d160c37833a27f1f2cc1bfa34632d166cccae83e69eb6fe
bf5b7bc1aeaf7fb43f5d39b549278bee6665872bd74274dd5fad80d043002a3e
1e5f49fa1d0df059b9d7da8452cde9fb5a312c823401f5ed4ed4eafb5f98c1b0
7dc7cd4afcebaf8f17575be8b9acf06adcaadfe7fa5528453246307aa36e6ea0
aefdb89b461c118529bec78b35fed46cc5d7050b39902552fa2408361284c746
ec6abb67828c79cbf0b74131f0acfddc509efc9743bed0811d2316007cdcc482
text.json looks something like this:
[
{
"account" : "",
"address" : "D8xWhR8LqSdSLTxRWwouQ3EiSnvcjLmdo6",
"category" : "receive",
"amount" : 1000.00000000,
"confirmations" : 1963,
"blockhash" : "4569322b4c8c98fba3ef4c7bda91b53b4ee82d268eae2ff7658bc0d3753c00ff",
"blockindex" : 2,
"blocktime" : 1394242415,
"txid" : "45b629a779e6e0bf6d160c37833a27f1f2cc1bfa34632d166cccae83e69eb6fe",
"time" : 1394242265,
"timereceived" : 1394242265
},
{
"account" : "",
"address" : "D8xWhR8LqSdSLTxRWwouQ3EiSnvcjLmdo6",
"category" : "receive",
"amount" : 11.00000000,
"confirmations" : 1194,
"blockhash" : "eff3d32177bf19629fe0f8076807acbb02b34aedcbce1c27a19ce9872daecb7c",
"blockindex" : 6,
"blocktime" : 1394290663,
"txid" : "bf5b7bc1aeaf7fb43f5d39b549278bee6665872bd74274dd5fad80d043002a3e",
"time" : 1394290582,
"timereceived" : 1394290582
},
{
"account" : "",
"address" : "DKLMkLZmiSVXtEavDpQ4dasjZvC178QoM9",
"category" : "receive",
"amount" : 1.00000000,
"confirmations" : 1183,
"blockhash" : "7b5d3ebeb994dbff0940504db9e407bd90cad8a5a1ace05dcba4bc508ca27aff",
"blockindex" : 9,
"blocktime" : 1394291510,
"txid" : "1e5f49fa1d0df059b9d7da8452cde9fb5a312c823401f5ed4ed4eafb5f98c1b0",
"time" : 1394291510,
"timereceived" : 1394291578
},
{
"account" : "",
"address" : "DKLMkLZmiSVXtEavDpQ4dasjZvC178QoM9",
"category" : "receive",
"amount" : 1.00000000,
"confirmations" : 1179,
"blockhash" : "4d9bd6d2988bc749022c41d125f1134796aa314e0d0bde34eba855ad88e76a7f",
"blockindex" : 21,
"blocktime" : 1394291642,
"txid" : "7dc7cd4afcebaf8f17575be8b9acf06adcaadfe7fa5528453246307aa36e6ea0",
"time" : 1394291629,
"timereceived" : 1394291629
},
{
"account" : "",
"address" : "DKLMkLZmiSVXtEavDpQ4dasjZvC178QoM9",
"category" : "receive",
"amount" : 1.00000000,
"confirmations" : 1179,
"blockhash" : "4d9bd6d2988bc749022c41d125f1134796aa314e0d0bde34eba855ad88e76a7f",
"blockindex" : 20,
"blocktime" : 1394291642,
"txid" : "aefdb89b461c118529bec78b35fed46cc5d7050b39902552fa2408361284c746",
"time" : 1394291637,
"timereceived" : 1394291637
},
{
"account" : "",
"address" : "DKLMkLZmiSVXtEavDpQ4dasjZvC178QoM9",
"category" : "receive",
"amount" : 11.00000000,
"confirmations" : 34,
"blockhash" : "df34d9d44e87cd3315755d3e7794b10729fc3f5853c218ec237c43a89d918eb7",
"blockindex" : 5,
"blocktime" : 1394364125,
"txid" : "ec6abb67828c79cbf0b74131f0acfddc509efc9743bed0811d2316007cdcc482",
"time" : 1394348464,
"timereceived" : 1394348464
}
]
I cannot for the life of me find out why this isn't working. It's printing "NEW TRANSACTION" every time it iterates through.
I want to check for each txid (transaction id) in json.txt, if it already exists in list.txt. If not, I want to write it to "processed.txt"
File objects don't support containment tests; you'd need to read your text file.
It'd be easiest for you to put all transaction ids in a set, then remove all txids found in your list.txt using set operations. Whatever is left are new transactions to write to the file:
with open('text.json') as json_file:
recieve_txids = {o['txid'] for o in json.load(json_file) if o['category'] == 'recieve'}
with open('list.txt', "r") as list_file:
recieve_txids -= {l.strip() for l in list_file}
with open('processed.txt', "w") as output_file:
for txid in recieve_txids:
output_file.write(txid + '\n')
If you needed access to the original JSON objects still, use a dictionary, with the txid as the key, then remove all elements from the dictionary found in the file:
with open('text.json') as json_file:
recieve_txids = {o['txid']: o for o in json.load(json_file) if o['category'] == 'recieve'}
with open('list.txt', "r") as list_file:
for line in list_file:
txid = l.strip()
if txid in recieve_txids:
del recieve_txids[txid]
The open() function returns an iterator not a list so you can use the in operator only once. Repeatedly using in on the same iterator only checks an empty list returned by a file already having reached is end.
Furthermore each line still contains the trailing line separator characters so you should use strip('\n\r') to get rid of them.
And to quickly check if an item is in a list you should use a set.
Something like this should work:
transaction_ids = set()
with open('list.txt', 'r') as list_file:
for line in list_file:
transaction_ids.add(line.rstrip('\n\r')
for txid in receive_txids:
if txid not in transaction_ids:
print "NEW TRANSACTION";