I am running a piece of code in Python3 where I am consuming JSON data from the source. I don't have control over the source. While reading the json data I am getting following error:
simplejson.errors.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2
Here is the code
import logging
import simplejson as json
logging.basicConfig(level=logging.INFO)
consumer = KafkaConsumer(
bootstrap_servers='localhost:9092',
api_version=(1,0,0))
consumer.subscribe(['Test_Topic-1'])
for message in consumer:
msg_str=message.value
y = json.loads(msg_str)
print(y["city_name"])
As I can not change the source, I need to fix it at my end. I found out this post helpful as my data contains the timestamps with : in it: How to Fix JSON Key Values without double-quotes?
But it also fails for some values in my json data as those values contain : in it. e.g.
address:"1600:3050:rf02:hf64:h000:0000:345e:d321"
Is there any way where I can add double quotes to keys in my json data?
You can try to use module dirtyjson - it can fix some mistakes.
import dirtyjson
d = dirtyjson.loads('{address:"1600:3050:rf02:hf64:h000:0000:345e:d321"}')
print( d['address'] )
d = dirtyjson.loads('{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}')
print( d['abc'] )
It creates AttributedDict so it may need dict() to create normal dictionary
d = dirtyjson.loads('{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}')
print( d )
print( dict(d) )
Result:
AttributedDict([('abc', '1:2:3:4'), ('efg', '5:6:7:8'), ('hij', 'foo')])
{'abc': '1:2:3:4', 'efg': '5:6:7:8', 'hij': 'foo'}
I think your problem is that you have strings like this:
{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}
which are not valid JSON. You could try to repair it with a regular expression substitution:
import re
jtxt_bad ='{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo", klm:"bar"\n}'
jtxt = re.sub(r'\b([a-zA-Z]+):("[^"]+"[,\n}])', r'"\1":\2', jtxt_bad)
print(f'Original: {jtxt_bad}\nRepaired: {jtxt}')
The output of this is:
Original: {abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo", klm:"bar"
}
Repaired: {"abc":"1:2:3:4", "efg":"5:6:7:8", "hij":"foo", "klm":"bar"
}
The regular expression \b([a-zA-Z]+):("[^"]+"[,\}]) means: boundary, followed by one or more letters, followed by a :, followed by double-quoted string, followed by one of ,, }, \n. However, this will fail if there is a quote inside the string, such as "1:\"2:3".
I have json file which is inserted in a sqlite database.
After inserting, all non breaking space are automatically converted to whitespace, which is good!
json file looks like : [{'john' : "6\u00a0500\u00a0\u20ac" , 'dams' : "7\u00a0500\u00a0\u20ac"}, {'john' : "10\u00a0900\u00a0\u20ac" , 'dams' : "13\u00a0980\u00a0\u20ac"}] ##style it in code block
sqlite file looks like:
My goal is to remove whitespace, '€' and cast the value to integer.
I used trim, ltrim, rtrim, replace and combinations of trim and replace to remove whitespace, but it doesn't work.
First off, I would suggest that you ensure that you're using double quotes throughout your JSON files. This is the standard for JSON syntax and moreover, not having things consistent will cause more of a headache later on.
With that out of the way, here's my solution:
with open(jsonFile, "r") as file:
jsonLines = file.readlines()
cleanJsonLines = []
for jsonDict in jsonLines:
for key in jsonDict:
almostCleanJson = jsonDict[key].replace("\u00a0", "")
cleanJson = almostCleanJson.replace("\u20ac", "")
cleanJsonLines.append({key: cleanJson})
print(cleanJsonLines)
Output:
[{'john': '6500'}, {'dams': '7500'}]
I want to convert a string into a dictionary. I saved this dictionary previously in a text file.
The problem is now, that I am not sure, how the structure of the keys are. The values are generated with Counter(dictionaryName). The dictionary is really large, so I cannot check every key to see how it would be possible.
The keys can contain simple quotes like ', double quotes ", commas and maybe other characters. So is there any possibility to convert it back into a dictionary?
For example this is stored in the file:
Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23,...})
I found previous solutions with for example json, but I have problems with the double quotes and I cannot simply split for the commas.
If you trust the source, load from collections import Counter and eval() the string
How about something like:
>> from collections import Counter
>> line = '''Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23})'''
>> D = eval(line)
>> D
Counter({"'4,5'element1": 50, '4:55foobar': 23, 'element0': 512})
You could remove the Counter( and ) parts, then parse the rest with ast.literal_eval as long as it only involves basic Python data types:
import ast
def parse_Counter_string(s):
s = s.strip()
if not (s.startswith('Counter(') and s.endswith(')')):
raise ValueError('String does not match expected format')
# Counter( is 8 characters
# 12345678
s = s[8:-1]
return Counter(ast.literal_eval(s))
In the future, I recommend picking a different way to serialize your data.
you can use demjson library for doing this, you can have the text directly in your program
import demjson
counter = demjson.decode("enter your text here")
if it is in the file ,you can do the following steps :
WD = dirname(realpath(__file__))
file = open(WD, "filename"), "r")
counter = demjson.decode(file.read())
file.close()
I am receiving data over a socket, a bunch of JSON strings. However, I receive a set amount of bytes, so sometimes the last of my JSON strings is cut-off. I will typically get the following:
{"pitch":-30.778193,"yaw":-124.63285,"roll":-8.977466}
{"pitch":-30.856342,"yaw":-124.57556,"roll":-7.7220345}
{"pitch":-31.574106,"yaw":-124.65623,"roll":-7.911794}
{"pitch":-30.479567,"yaw":-124.24301,"roll":-8.730827}
{"pitch":-29.30239,"yaw":-123.97949,"roll":-8.134723}
{"pitch":-29.84712,"yaw":-124.584465,"roll":-8.588374}
{"pitch":-31.072054,"yaw":-124.707466,"roll":-8.877062}
{"pitch":-31.493435,"yaw":-124.75457,"roll":-9.019922}
{"pitch":-29.591925,"yaw":-124.960815,"roll":-9.379437}
{"pitch":-29.37105,"yaw":-125.14427,"roll":-9.642341}
{"pitch":-29.483717,"yaw":-125.16528,"roll":-9.687177}
{"pitch":-30.903332,"yaw":-124.603935,"roll":-9.423098}
{"pitch":-30.211857,"yaw":-124.471664,"roll":-9.116135}
{"pitch":-30.837414,"yaw":-125.18984,"roll":-9.824204}
{"pitch":-30.526165,"yaw":-124.85788,"roll":-9.158611}
{"pitch":-30.333513,"yaw":-123.68705,"roll":-7.9481263}
{"pitch":-30.903502,"yaw":-123.78847,"roll":-8.209373}
{"pitch":-31.194769,"yaw":-124.79708,"roll":-8.709783}
{"pitch":-30.816765,"yaw":-125
With Python, I would like to create a string array of the first 18 complete { data... } strings.
Here is what I have tried: cleanData = re.search('{.*}', data) but it seems like this is only giving me the very first { data... } entry. How can I get the full string array of complete { } sets?
To get all, you can use re.finditer or re.findall.
>>> re.findall(r'{.*}', s)
['{"pitch":-30.778193,"yaw":-124.63285,"roll":-8.977466}', '{"pitch":-30.856342,"yaw":-124.57556,"roll":-7.7220345}', '{"pitch":-31.574106,"yaw":-124.65623,"roll":-7.911794}', '{"pitch":-30.479567,"yaw":-124.24301,"roll":-8.730827}', '{"pitch":-29.30239,"yaw":-123.97949,"roll":-8.134723}', '{"pitch":-29.84712,"yaw":-124.584465,"roll":-8.588374}', '{"pitch":-31.072054,"yaw":-124.707466,"roll":-8.877062}', '{"pitch":-31.493435,"yaw":-124.75457,"roll":-9.019922}', '{"pitch":-29.591925,"yaw":-124.960815,"roll":-9.379437}', '{"pitch":-29.37105,"yaw":-125.14427,"roll":-9.642341}', '{"pitch":-29.483717,"yaw":-125.16528,"roll":-9.687177}', '{"pitch":-30.903332,"yaw":-124.603935,"roll":-9.423098}', '{"pitch":-30.211857,"yaw":-124.471664,"roll":-9.116135}', '{"pitch":-30.837414,"yaw":-125.18984,"roll":-9.824204}', '{"pitch":-30.526165,"yaw":-124.85788,"roll":-9.158611}', '{"pitch":-30.333513,"yaw":-123.68705,"roll":-7.9481263}', '{"pitch":-30.903502,"yaw":-123.78847,"roll":-8.209373}', '{"pitch":-31.194769,"yaw":-124.79708,"roll":-8.709783}']
>>>
OR
>>> [x.group() for x in re.finditer(r'{.*}', s)]
['{"pitch":-30.778193,"yaw":-124.63285,"roll":-8.977466}', '{"pitch":-30.856342,"yaw":-124.57556,"roll":-7.7220345}', '{"pitch":-31.574106,"yaw":-124.65623,"roll":-7.911794}', '{"pitch":-30.479567,"yaw":-124.24301,"roll":-8.730827}', '{"pitch":-29.30239,"yaw":-123.97949,"roll":-8.134723}', '{"pitch":-29.84712,"yaw":-124.584465,"roll":-8.588374}', '{"pitch":-31.072054,"yaw":-124.707466,"roll":-8.877062}', '{"pitch":-31.493435,"yaw":-124.75457,"roll":-9.019922}', '{"pitch":-29.591925,"yaw":-124.960815,"roll":-9.379437}', '{"pitch":-29.37105,"yaw":-125.14427,"roll":-9.642341}', '{"pitch":-29.483717,"yaw":-125.16528,"roll":-9.687177}', '{"pitch":-30.903332,"yaw":-124.603935,"roll":-9.423098}', '{"pitch":-30.211857,"yaw":-124.471664,"roll":-9.116135}', '{"pitch":-30.837414,"yaw":-125.18984,"roll":-9.824204}', '{"pitch":-30.526165,"yaw":-124.85788,"roll":-9.158611}', '{"pitch":-30.333513,"yaw":-123.68705,"roll":-7.9481263}', '{"pitch":-30.903502,"yaw":-123.78847,"roll":-8.209373}', '{"pitch":-31.194769,"yaw":-124.79708,"roll":-8.709783}']
>>>
You need re.findall() (or re.finditer)
>>> import re
>>> for r in re.findall(r'{.*}', data)[:18]:
print r
{"pitch":-30.778193,"yaw":-124.63285,"roll":-8.977466}
{"pitch":-30.856342,"yaw":-124.57556,"roll":-7.7220345}
{"pitch":-31.574106,"yaw":-124.65623,"roll":-7.911794}
{"pitch":-30.479567,"yaw":-124.24301,"roll":-8.730827}
{"pitch":-29.30239,"yaw":-123.97949,"roll":-8.134723}
{"pitch":-29.84712,"yaw":-124.584465,"roll":-8.588374}
{"pitch":-31.072054,"yaw":-124.707466,"roll":-8.877062}
{"pitch":-31.493435,"yaw":-124.75457,"roll":-9.019922}
{"pitch":-29.591925,"yaw":-124.960815,"roll":-9.379437}
{"pitch":-29.37105,"yaw":-125.14427,"roll":-9.642341}
{"pitch":-29.483717,"yaw":-125.16528,"roll":-9.687177}
{"pitch":-30.903332,"yaw":-124.603935,"roll":-9.423098}
{"pitch":-30.211857,"yaw":-124.471664,"roll":-9.116135}
{"pitch":-30.837414,"yaw":-125.18984,"roll":-9.824204}
{"pitch":-30.526165,"yaw":-124.85788,"roll":-9.158611}
{"pitch":-30.333513,"yaw":-123.68705,"roll":-7.9481263}
{"pitch":-30.903502,"yaw":-123.78847,"roll":-8.209373}
{"pitch":-31.194769,"yaw":-124.79708,"roll":-8.709783}
Extracting lines that start and end with a specific character can be done without any regex, use str.startswith and str.endswith methods when iterating through the lines in a file:
results = []
with open(filepath, 'r') as f:
for line in f:
if line.startswith('{') and line.rstrip('\n').endswith('}'):
results.append(line.rstrip('\n'))
Note the .rstrip('\n') is used before .endswith to make sure the final newline does not interfere with the } check at the end of the string.
So I have a bunch of line of codes like these in a row in my program:
str = str.replace('ten', '10s')
str = str.replace('twy', '20s')
str = str.replave('fy', '40s')
...
I want to make it such that I don't have to manually open my source file to add new cases. For example ('sy', '70'). I know I have to put all these in a function somehow, but I'd like to map cases that are not in my "mapper lib" from the command line. Configuration file maybe? how?
Thanks!
You could use a config file in json format like this:
[
["ten", "10s"],
["twy", "20s"],
["fy", "40s"]
]
Save it as 'replacements.json' and then use it this way:
import json
with open('replacements.json') as i:
replacements = json.load(i)
text = 'ten, twy, fy'
for r in replacements:
text = text.replace(r[0], r[1])
Then when you need to change the values just edit the replacements.json file without touching any Python code.
The format for you replacements file could be anything but json is easy to use and edit.
a simple solution could be to put those in a file, read them in your program and do your replaces in a loop..
Many ways to do this, if it's a rarely changing thing you could consider doing it with a Python dict:
mappings = {
'ten': '10s',
'twy': '20s',
'fy': '40s',
}
def replace(str_):
for s, r in mappings.iteritems():
str_.replace(s, r)
return str_
Alternatively in a Text file (make sure you use a safe delimiter which isn't used in any of the keys!)
mappings.txt
ten|10s
twy|20s
fy|40s
And the Python part:
mappings = {}
for line in open('mappings.txt'):
k, v = line.split('|', 1)
mappings[k] = v
And use the replace from above :)
You could use csv to store the replacements in a human-editable form in a file:
import csv
with open('replacements.csv', 'rb') as f:
replacements = list(csv.reader(f))
for old, new in replacements:
your_string = your_string.replace(old, new)
where replacements.csv:
ten,10s
twy,20s
fy,40s
It avoids unnecessary markup such as ", [] in the json format and allows a delimiter (,) to be present in a string itself unlike the plain text format from #WoLpH's answer.
(live example)