How to find all dictionaries from a long string in python - python

I am trying to retrieve all JSON like dictionaries from a long string.
For example,
{"uri": "something"} is referencing {"link": "www.aurl.com"}
I want to get {"uri": "something"} and {"link": "www.aurl.com"} as result. Is there a way to do this through regex in python?

Probably the "nicest" way to do this is to let a real JSON decoder do the work, not using horrible regexes. Find all open braces as "possible object start points", then try to parse them with JSONDecoder's raw_decode method (which returns the object parsed and number of characters consumed on success making it possible to skip successfully parsed objects efficiently). For example:
import json
def get_all_json(teststr):
decoder = json.JSONDecoder()
# Find first possible JSON object start point
sliceat = teststr.find('{')
while sliceat != -1:
# Slice off the non-object prefix
teststr = teststr[sliceat:]
try:
# See if we can parse it as a JSON object
obj, consumed = decoder.raw_decode(teststr)
except Exception:
# If we couldn't, find the next open brace to try again
sliceat = teststr.find('{', 1)
else:
# If we could, yield the parsed object and skip the text it was parsed from
yield obj
sliceat = consumed
This is a generator function, so you can either iterate the objects one by one e.g. for obj in get_all_json(mystr): or if you need them all at once for indexing, iterating multiple times or the like, all_objs = list(get_all_json(mystr)).

Related

Is there a way to search for a string and copy text in front until it reaches a comma?

I am new to python and wanted to store the recentAveragePrice inside a variable (from a string like this one)
{"assetStock":null,"sales":250694,"numberRemaining":null,"recentAveragePrice":731,"originalPrice":null,"priceDataPoints":[{"value":661,"date":"2022-08-11T05:00:00Z"},{"value":592,"date":"2022-08-10T05:00:00Z"},{"value":443,"date":"2022-08-09T05:00:00Z"}],"volumeDataPoints":[{"value":155,"date":"2022-08-11T05:00:00Z"},{"value":4595,"date":"2022-08-10T05:00:00Z"},{"value":12675,"date":"2022-08-09T05:00:00Z"},{"value":22179,"date":"2022-08-08T05:00:00Z"},{"value":15181,"date":"2022-08-07T05:00:00Z"},{"value":14541,"date":"2022-08-06T05:00:00Z"},{"value":15310,"date":"2022-08-05T05:00:00Z"},{"value":14146,"date":"2022-08-04T05:00:00Z"},{"value":13083,"date":"2022-08-03T05:00:00Z"},{"value":14460,"date":"2022-08-02T05:00:00Z"},{"value":16809,"date":"2022-08-01T05:00:00Z"},{"value":17571,"date":"2022-07-31T05:00:00Z"},{"value":23907,"date":"2022-07-30T05:00:00Z"},{"value":39007,"date":"2022-07-29T05:00:00Z"},{"value":38823,"date":"2022-07-28T05:00:00Z"}]}
My current solution is this:
var = sampleStr[78] + sampleStr[79] + sampleStr[80]
It works for the current string but if the recentAveragePrice was above 999 it would stop working and i was wondering if instead of getting a fixed number i could search for it inside the string.
Your replit code shows that you're acquiring JSON data from some website. Here's an example based on the URL that you're using. It shows how you check the response status, acquire the JSON data as a Python dictionary then print a value associated with a particular key. If the key is missing, it will print None:
import requests
(r := requests.get('https://economy.roblox.com/v1/assets/10159617728/resale-data')).raise_for_status()
jdata = r.json()
print(jdata.get('recentAveragePrice'))
Output:
640
Since this is json you should just be able to parse it and access recentAveragePrice:
import json
sample_string = '''{"assetStock":null,"sales":250694,"numberRemaining":null,"recentAveragePrice":731,"originalPrice":null,"priceDataPoints":[{"value":661,"date":"2022-08-11T05:00:00Z"},{"value":592,"date":"2022-08-10T05:00:00Z"},{"value":443,"date":"2022-08-09T05:00:00Z"}],"volumeDataPoints":[{"value":155,"date":"2022-08-11T05:00:00Z"},{"value":4595,"date":"2022-08-10T05:00:00Z"},{"value":12675,"date":"2022-08-09T05:00:00Z"},{"value":22179,"date":"2022-08-08T05:00:00Z"},{"value":15181,"date":"2022-08-07T05:00:00Z"},{"value":14541,"date":"2022-08-06T05:00:00Z"},{"value":15310,"date":"2022-08-05T05:00:00Z"},{"value":14146,"date":"2022-08-04T05:00:00Z"},{"value":13083,"date":"2022-08-03T05:00:00Z"},{"value":14460,"date":"2022-08-02T05:00:00Z"},{"value":16809,"date":"2022-08-01T05:00:00Z"},{"value":17571,"date":"2022-07-31T05:00:00Z"},{"value":23907,"date":"2022-07-30T05:00:00Z"},{"value":39007,"date":"2022-07-29T05:00:00Z"},{"value":38823,"date":"2022-07-28T05:00:00Z"}]}'''
data = json.loads(sample_string)
recent_price = data['recentAveragePrice']
print(recent_price)
outputs:
731
Your data is in a popular format called JSON (JavaScript Object Notation). It's commonly used to exchange data between different systems like a server and a client, or a Python program and JavaScript program.
Now Python doesn't use JSON per-se, but it has a data type called a dictionary that behaves very similarly to JSON. You can access elements of a dictionary as simply as:
print(my_dictionary["recentAveragePrice"])
Python has a built-in library meant specifically to handle JSON data, and it includes a function called loads() that can convert a string into a Python dictionary. We'll use that.
Finally, putting all that together, here is a more robust program to help parse your string and pick out the data you need. Dictionaries can do a lot more cool stuff, so make sure you take a look at the links above.
# import the JSON library
# specifically, we import the `loads()` function, which will convert a JSON string into a Python object
from json import loads
# let's store your string in a variable
original_string = """
{"assetStock":null,"sales":250694,"numberRemaining":null,"recentAveragePrice":731,"originalPrice":null,"priceDataPoints":[{"value":661,"date":"2022-08-11T05:00:00Z"},{"value":592,"date":"2022-08-10T05:00:00Z"},{"value":443,"date":"2022-08-09T05:00:00Z"}],"volumeDataPoints":[{"value":155,"date":"2022-08-11T05:00:00Z"},{"value":4595,"date":"2022-08-10T05:00:00Z"},{"value":12675,"date":"2022-08-09T05:00:00Z"},{"value":22179,"date":"2022-08-08T05:00:00Z"},{"value":15181,"date":"2022-08-07T05:00:00Z"},{"value":14541,"date":"2022-08-06T05:00:00Z"},{"value":15310,"date":"2022-08-05T05:00:00Z"},{"value":14146,"date":"2022-08-04T05:00:00Z"},{"value":13083,"date":"2022-08-03T05:00:00Z"},{"value":14460,"date":"2022-08-02T05:00:00Z"},{"value":16809,"date":"2022-08-01T05:00:00Z"},{"value":17571,"date":"2022-07-31T05:00:00Z"},{"value":23907,"date":"2022-07-30T05:00:00Z"},{"value":39007,"date":"2022-07-29T05:00:00Z"},{"value":38823,"date":"2022-07-28T05:00:00Z"}]}
"""
# convert the string into a dictionary object
dictionary_object = loads(original_string)
# access the element you need
print(dictionary_object["recentAveragePrice"])
Output upon running this program:
$ python exp.py
731

Python: Json.load gives list and can't parse data from it

I have a data.json file, which looks like this:
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
I am trying to get "Event" from this file using python and miserably failing at this.
with open('data.json', 'r') as json_file:
data = json.load(json_file)
print (data['Event'])
I get the following error:
TypeError: list indices must be integers or slices, not str
And even when I try
print (data[0]['Event'])
then I get this error:
TypeError: string indices must be integers
One more thing:
print(type(data))
gives me "list"
I have searched all over and have not found a solution to this. I would really appreciate your suggestions.
You could use the ast module for this:
import ast
mydata = ["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
data = ast.literal_eval(mydata[0])
data
{'Day': 'Today', 'Event': '1', 'Date': '2019-03-20'}
data['Event']
'1'
Edit
Your original code does load the data into a list structure, but only contains a single string entry inside that list, despite proper json syntax. ast, like json, will parse that string entry into a python data structure, dict.
As it sits, when you try to index that list, it's not the same as calling a key in a dict, hence the slices cannot be str:
alist = [{'a':1, 'b':2, 'c':3}]
alist['a']
TypeError
# need to grab the dict entry from the list
adict = alist[0]
adict['a']
1
You need to convert the elements in data to dict using json module.
Ex:
import json
with open(filename) as infile:
data = json.load(infile)
for d in data:
print(json.loads(d)['Event'])
Or:
data = list(map(json.loads, data))
print(data[0]["Event"])
Output:
1
Your problem is that you are parsing it as a list that consists of a single element that is a string.
["{\"Day\":\"Today\",\"Event\":\"1\", \"Date\":\"2019-03-20\"}"]
See how the entire content of the list is surrounded by " on either side and every other " is preceded by a \? The slash generally means to ignore the special meaning the following character might have, but interpret it as purely a string.
If you have control over the file's contents, the easiest solution would be to adjust it. You will want it to be in a format like this:
[{"Day":"Today", "Event": "1", "Date": "2019-03-20"}]
Edit: As others have suggested, you can also parse it in its current state. Granted, cleaning the data is tedious, but oftentimes worth the effort. Though this may not be one of those cases. I'm leaving this answer up anyway because it may help with explaining why OPs initial attempt did not work, and why he received the error messages he got.

Python List to JSON

I am trying to convert the output of my function which returns a list into a JSON object.
The function outputs the following the list = [b'E28011600000208', b'E28023232083', b'3000948484']
I would like to create a JSON object that has the following attributes:
{"tag": ["E28011600000208", "E28023232083", "3000948484"]}
Decoding of a list item was not shown in the similar example, I need help with that if thats the approach to solving this problem.
The function that I am calling is as follows :
reader.read(timeout=500)
Performs a synchronous read, and then returns a list of TagReadData objects resulting from the search. If no tags were found then the list will be empty.
For example:
print(reader.read())
[b'E2002047381502180820C296', b'0000000000000000C0002403']
In my code I have done the following:
tags = reader.read()
data = json.dumps({tag: tags}, separator=(',','b'))
print (data)
I get the error:
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'3000948484' is not JSON serializable
I tried the solution below to remove the byte string my code is as follows:
tags = reader.read()
tags = list(map(lambda x:x.decode('utf-8'),tags))
data = json.dumps({'tag':tags})
print(data)
I get the error:
AttributeError: 'mercury.TagReadData' object has no attribute 'decode'
The output is now JSON but I still have the b' string in my JSON file. I have the following code:
tag = list(map(lambda x: str(x), tag))
data = json.dumps({'tag': tag})
print(data)
The code outputs the following:
{"tag": ["b'30000000321'", "b'300000000'"]}
How do I go about removing the b? By doing str(x) in python 3.5 it was suppose to decode the byte but it didn't.
Python dict should have unique keys. So repeating keys will not work and as a result it will hold just one value. However if you keep the only one key i.e tag and value as list should work.
Having said that, json.dumps or json.loads does not handle the dict object if it contain tuple, byte etc. object. Here in your example the list is byte string which is having JSON (de)serializing problem.
Now if you dont care about the byte string and want to decode to utf-8 which basically convert to string then you can find the solution here.
l = [b'E28011600000208', b'E28023232083', b'3000948484']
l = list(map(lambda x: x.decode('utf-8'), l)))
data = json.dumps({'tag': l})
print(data)
# Out: '{"tag": ["E28011600000208", "E28023232083", "3000948484"]}'
But if you want to keep byte string then look at how to handle while serializeing and desirializing the object using custom json encoder class via extra params cls
json.dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
pickle is another useful lib to pack/unpack data and it does take care of any objects(python). But it returns packed data as byte string not useful if need to share to 3rd application. Very useful when memcache any python object.
pickle.loads(pickle.dumps(l))

django stream a dictionary multiple times w/o breaking JSON

I want to stream a response using Django. In order to do so,
I have a view like this:
def flights(request, query):
req_obj = Flights.RequestObject(query)
return StreamingHttpResponse(req_obj.make_response(), content_type="application/json")
In order to produce the data for the stream, I have a generator function "make_response",
which is a method of the class "Flights", instantiated as "req_obj".
The generator function yields, at particular moments, a pure python dictionary.
def make_response(self):
for _ in range(0,3):
yield some_dict
time.sleep(1)
This results in the following behaviour:
after the first yield, the json content return is valid;
after the second (and following) yields, the json content returned is NOT valid;
if the dictionary returned is something like this
{"data": "some_data"}
,
after the second yield, the response the user receives is:
{"data": "some_data"}{"data": "some_data"}
, which is NOT valid json.
Any suggestions on how to solve this problem?
Have you tried something like
req_obj.update(req_obj.make_response())
which will update your initial dict with the values newly yielded from your method?
The solution is on the client side (eg: browser). You need to interpret the JSON result when it comes. By definition of a stream you never know when it finished, so every time you receive some data on the client side, you need to interpret it as a self contain JSON message.
So in your case, the client would probably need to append the received JSON dictionary to an array.
You might need to manually "start" and "end" the json output. Example:
def make_response(self):
yield '['
for _ in range(0,3):
yield some_dict
yield ',' # you might add extra magic here to detect if this is the last time
# the loop will execute so you can skip the ','.
# It will still be valid json regardless
time.sleep(1)
yield ']'

Python development - elementtree XML and string operations

I am using ElementTree to load up a series of XML files and parse them. As a file is parsed, I am grabbing a few bits of data from it ( a headline and a paragraph of text). I then need to grab some file names that are stored in the XML. They are contained in an element called ContentItem.
My code looks a bit like this:
for item in dirlist:
newsML = ET.parse(item)
NewsLines = newsML.getroot()
HeadLine = NewsLines.getiterator("HeadLine")
result.append(HeadLine)
p = NewsLines.getiterator("p")
result.append(p)
ci = NewsLines.getiterator("ContentItem")
for i in ci:
result.append(i.attrib)
Now, if there was only one type of file, this would have been fine, but it contains 3 types (jpg, flv and a mp4). So as I loop through them in the view, it spits them out, but how do I just grab the flv if I only want that one? or just the mp4? They don't always appear in the same order in the list either.
Is there a way to say if it ends in .mp4 then do this action, or is there a way to do that in the template even?
If i try to do this;
url = i.attrib
if url.get("Href", () ).endswith('jpg'):
result.append(i.attrib)
I get an error tuple object has no attribute endswith. Why is this a tuple? I thought it was a dict?
You get a tuple because you supply a tuple (the parentheses) as the default return value for url.get(). Supply an empty string, and you can use its .endswith() method. Also note that the element itself has a get() method to retrieve attribute values (you do not have to go via .attrib). Example:
if i.get('Href', '').endswith('.jpg'):
result.append(i.attrib)

Categories