How can I ensure that my Python regular expression outputs a dictionary? - python

I'm using Beej's Python Flickr API to ask Flickr for JSON. The unparsed string Flickr returns looks like this:
jsonFlickrApi({'photos': 'example'})
I want to access the returned data as a dictionary, so I have:
photos = "jsonFlickrApi({'photos': 'test'})"
# to match {'photos': 'example'}
response_parser = re.compile(r'jsonFlickrApi\((.*?)\)$')
parsed_photos = response_parser.findall(photos)
However, parsed_photos is a list, not a dictionary (according to type(parsed_photos). It outputs like:
["{'photos': 'test'}"]
How can I ensure that my parsed data ends up as a dictionary type?

If you're using Python 2.6, you can just use the JSON module to parse JSON stuff.
import json
json.loads(dictString)
If you're using an earlier version of Python, you can download the simplejson module and use that.
Example:
>>> json.loads('{"hello" : 4}')
{u'hello': 4}

You need to use a JSON parser to convert the string representation to actual Python data structure. Take a look at the documentation of the json module in the standard library for some examples.
In other words you'd have to add the following line at the end of your code
photos = json.loads(parsed_photos[0])
PS. In theory you could also use eval to achieve the same effect, as JSON is (almost) compatible with Python literals, but doing that would open a huge security hole. Just to let you know.

Related

Is there a way to search for a string and copy text in front until it reaches a comma?

I am new to python and wanted to store the recentAveragePrice inside a variable (from a string like this one)
{"assetStock":null,"sales":250694,"numberRemaining":null,"recentAveragePrice":731,"originalPrice":null,"priceDataPoints":[{"value":661,"date":"2022-08-11T05:00:00Z"},{"value":592,"date":"2022-08-10T05:00:00Z"},{"value":443,"date":"2022-08-09T05:00:00Z"}],"volumeDataPoints":[{"value":155,"date":"2022-08-11T05:00:00Z"},{"value":4595,"date":"2022-08-10T05:00:00Z"},{"value":12675,"date":"2022-08-09T05:00:00Z"},{"value":22179,"date":"2022-08-08T05:00:00Z"},{"value":15181,"date":"2022-08-07T05:00:00Z"},{"value":14541,"date":"2022-08-06T05:00:00Z"},{"value":15310,"date":"2022-08-05T05:00:00Z"},{"value":14146,"date":"2022-08-04T05:00:00Z"},{"value":13083,"date":"2022-08-03T05:00:00Z"},{"value":14460,"date":"2022-08-02T05:00:00Z"},{"value":16809,"date":"2022-08-01T05:00:00Z"},{"value":17571,"date":"2022-07-31T05:00:00Z"},{"value":23907,"date":"2022-07-30T05:00:00Z"},{"value":39007,"date":"2022-07-29T05:00:00Z"},{"value":38823,"date":"2022-07-28T05:00:00Z"}]}
My current solution is this:
var = sampleStr[78] + sampleStr[79] + sampleStr[80]
It works for the current string but if the recentAveragePrice was above 999 it would stop working and i was wondering if instead of getting a fixed number i could search for it inside the string.
Your replit code shows that you're acquiring JSON data from some website. Here's an example based on the URL that you're using. It shows how you check the response status, acquire the JSON data as a Python dictionary then print a value associated with a particular key. If the key is missing, it will print None:
import requests
(r := requests.get('https://economy.roblox.com/v1/assets/10159617728/resale-data')).raise_for_status()
jdata = r.json()
print(jdata.get('recentAveragePrice'))
Output:
640
Since this is json you should just be able to parse it and access recentAveragePrice:
import json
sample_string = '''{"assetStock":null,"sales":250694,"numberRemaining":null,"recentAveragePrice":731,"originalPrice":null,"priceDataPoints":[{"value":661,"date":"2022-08-11T05:00:00Z"},{"value":592,"date":"2022-08-10T05:00:00Z"},{"value":443,"date":"2022-08-09T05:00:00Z"}],"volumeDataPoints":[{"value":155,"date":"2022-08-11T05:00:00Z"},{"value":4595,"date":"2022-08-10T05:00:00Z"},{"value":12675,"date":"2022-08-09T05:00:00Z"},{"value":22179,"date":"2022-08-08T05:00:00Z"},{"value":15181,"date":"2022-08-07T05:00:00Z"},{"value":14541,"date":"2022-08-06T05:00:00Z"},{"value":15310,"date":"2022-08-05T05:00:00Z"},{"value":14146,"date":"2022-08-04T05:00:00Z"},{"value":13083,"date":"2022-08-03T05:00:00Z"},{"value":14460,"date":"2022-08-02T05:00:00Z"},{"value":16809,"date":"2022-08-01T05:00:00Z"},{"value":17571,"date":"2022-07-31T05:00:00Z"},{"value":23907,"date":"2022-07-30T05:00:00Z"},{"value":39007,"date":"2022-07-29T05:00:00Z"},{"value":38823,"date":"2022-07-28T05:00:00Z"}]}'''
data = json.loads(sample_string)
recent_price = data['recentAveragePrice']
print(recent_price)
outputs:
731
Your data is in a popular format called JSON (JavaScript Object Notation). It's commonly used to exchange data between different systems like a server and a client, or a Python program and JavaScript program.
Now Python doesn't use JSON per-se, but it has a data type called a dictionary that behaves very similarly to JSON. You can access elements of a dictionary as simply as:
print(my_dictionary["recentAveragePrice"])
Python has a built-in library meant specifically to handle JSON data, and it includes a function called loads() that can convert a string into a Python dictionary. We'll use that.
Finally, putting all that together, here is a more robust program to help parse your string and pick out the data you need. Dictionaries can do a lot more cool stuff, so make sure you take a look at the links above.
# import the JSON library
# specifically, we import the `loads()` function, which will convert a JSON string into a Python object
from json import loads
# let's store your string in a variable
original_string = """
{"assetStock":null,"sales":250694,"numberRemaining":null,"recentAveragePrice":731,"originalPrice":null,"priceDataPoints":[{"value":661,"date":"2022-08-11T05:00:00Z"},{"value":592,"date":"2022-08-10T05:00:00Z"},{"value":443,"date":"2022-08-09T05:00:00Z"}],"volumeDataPoints":[{"value":155,"date":"2022-08-11T05:00:00Z"},{"value":4595,"date":"2022-08-10T05:00:00Z"},{"value":12675,"date":"2022-08-09T05:00:00Z"},{"value":22179,"date":"2022-08-08T05:00:00Z"},{"value":15181,"date":"2022-08-07T05:00:00Z"},{"value":14541,"date":"2022-08-06T05:00:00Z"},{"value":15310,"date":"2022-08-05T05:00:00Z"},{"value":14146,"date":"2022-08-04T05:00:00Z"},{"value":13083,"date":"2022-08-03T05:00:00Z"},{"value":14460,"date":"2022-08-02T05:00:00Z"},{"value":16809,"date":"2022-08-01T05:00:00Z"},{"value":17571,"date":"2022-07-31T05:00:00Z"},{"value":23907,"date":"2022-07-30T05:00:00Z"},{"value":39007,"date":"2022-07-29T05:00:00Z"},{"value":38823,"date":"2022-07-28T05:00:00Z"}]}
"""
# convert the string into a dictionary object
dictionary_object = loads(original_string)
# access the element you need
print(dictionary_object["recentAveragePrice"])
Output upon running this program:
$ python exp.py
731

How to extract only wanted property from JSON object

When I run the code:
import requests
import json
def get_fact():
catFact = requests.get("https://catfact.ninja/fact?max_length=140")
json_data = json.loads(catFact.text)
return json_data
print(get_fact())
The output is like
{'fact': "Cats are the world's most popular pets, outnumbering dogs by as many as three to one", 'length': 84}
However I just want the fact.
How do I get rid of the 'fact:' at the front and 'length:' at the back?
What you want is to access the key in the python dict you made with the json.loads call. We actually don't need the json library as requests can read and deserialize JSON itself.
This code also checks if the response was OK and fails with informative error message. It follows PEP 20 – The Zen of Python.
import requests
def get_fact():
# Get the facts dictionary in a JSON serialized form.
cat_fact_response = requests.get("https://catfact.ninja/fact?max_length=140")
# Let the response raise the exception if something bad happened to the cat facts server connection.
cat_fact_response.raise_for_status()
# Deserialize the json (make a Python dict from the text we got). requests can do that on it's own:
cat_fact_dict = cat_fact_response.json()
# Access the fact from the json from the dictionary
return cat_fact_dict['fact']
print(get_fact())
When called you get following output as wanted:
# python3 script.py
The cat's tail is used to maintain balance.
Short answer:
you need to use either get_fact()['fact'] or get_fact().get('fact'). The former will throw an exception if fact doesn't exist whereas the latter will return None.
Why:
In your code sample you fetch some json data, and then print out the entire bit of json. When you parse json, the output is a key/value map called a dictionary (or map or object in other languages). The dictionary in this case contains two keys: fact and length. If you only one want of the values, then you need to tell python that you want only a single value -- fact in this case.
Remember though: this wouldn't apply to every json object you read. Not every one is going to have a fact key.
What you are returning in get_fact is a complete JSON object which you are then printing.
To get just its property fact (without the length) use a reference to that key or property like:
return json_data["fact"]
Below is also a link to a tutorial on using JSON in Python:
w3schools: Python JSON
To extract fact field from the response, use:
import requests
import json
def get_fact():
catFact = requests.get("https://catfact.ninja/fact?max_length=140")
json_data = json.loads(catFact.text)
return json_data['fact'] # <- HERE
print(get_fact())
Output:
Cats have "nine lives" thanks to a flexible spine and powerful leg and back muscles
Note: you don't need json module here, use json() method of Response instance returned by requests:
import requests
def get_fact():
catFact = requests.get("https://catfact.ninja/fact?max_length=140").json()
return catFact['fact']
print(get_fact())

Overflow error when reading json file

I am trying to read a json which includes a number of tweets, but I get the following error.
OverflowError: int too large to convert
The script filters multiple json files to get specific tweets, and it crashes when reaching to a specific json.
The line that creates the error is this one :
df_temp = pd.read_json(path_or_buf=json_path, lines=True)
Here is the error in the cmd
Just store the user id as a String, and treat it like it is one (this is actually what you should do when dealing with this kind of ids). If you can't change the json input format, you can always parse it like a string before parsing it like a json object, and add the quotes to the id code, using for instance regexes: Regex in python.
I don't know with which library you are parsing the json, but maybe also implicit casting will work: either try the "getString" method on the number instead of the "getInt" method, or force python to treat the object like a string, with something like x = "" + json.getId()
Python is pretty loose on typing and may let you do it.

Turning Python Object into a JSON string

I have been working on this bot that takes the twitter api and brings it into my database. Before I was grabbing 1 tweet at a time which wasn't efficient considering I was using 1 request out of the limit they had. So instead I decided to grab 150 results. I get these results back:
[Status(ID=780587171757625344, ScreenName=Ampsx, Created=Tue Sep 27 01:57:39 +0000 2016, Text='You know who you are #memes').
I get about 150 of these. Is there a library where I can turn this into JSON?
If you're using 2.6+ there's a bundled library you can use (docs), just:
import json
json_string = json.dumps(object)
We use this a lot for quick API endpoints, you just need to be careful about having functions or complex nesting in the objects you're trying to serialize, it's quite configurable (so you can skip fields, customize output of some, etc.) but can get messy pretty quickly.
Yes, a quick Google search would have revealed the json module.
import json
# instead of an empty list, create a list of dict objects
# representing the statues as you'd like to see them in JSON.
statuses = { 'statuses': [] }
with open('file.json', 'w') as f:
f.write(json.dumps(statuses).encode('utf-8'))
import json
jsondata = json.dumps(TwitterStatusObject.__dict__)

What is the difference between json.JSONDecoder().decode() and json.loads()

I am using urllib2 to grab the html of a url and then a regex to extract a JSON that I need from there. I want to get the usual "dictionary of dictionaries" Python object and both of the following work:
my_json #a correctly formatted json string
json_dict1 = json.JSONDecoder().decode(my_json)
json_dict2 = json.loads(my_json)
What is the difference and which is better in what circumstances (besides mine, but that one in particular)?
json.loads() essentially creates a json.JSONDecoder() instance and calls decode on it. As such your first line is exactly the same thing as the second line. See the json.loads() source code.
The module offers you flexibility; a simple function API or a full OO API that you can subclass if needed.

Categories