Output of python code is one character per line - python

I'm new to Python and having some trouble with an API scraping I'm attempting. What I want to do is pull a list of book titles using this code:
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
for title in doc["sourceResource"]["title"]:
print (title)
Which works to pull the titles, but most (not all) titles are outputting as one character per line. I've tried adding .splitlines() but this doesn't fix the problem. Any advice would be appreciated!

The problem is that you have two types of title in the response, some are plain strings "Germain the wizard" and some others are arrays of string ['Joe Strong, the boy wizard : or, The mysteries of magic exposed /']. It seems like in this particular case, all lists have length one, but I guess that will not always be the case. To illustrate what you might need to do I added a join here instead of just taking title[0].
import requests
import json
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
title = doc["sourceResource"]["title"]
if isinstance(title, list):
print(" ".join(title))
else:
print(title)
In my opinion that should never happen, an API should return predictable types, otherwise it looks messy on the users' side.

Related

transform JSON file to be usable

Long story short, i get the query from spotify api which is JSON that has data about newest albums. How do i get the specific info from that like let's say every band name or every album title. I've tried a lot of ways to get that info that i found on the internet and nothing seems to work for me and after couple of hours im kinda frustrated
JSON data is on jsfiddle
here is the request
endpoint = "https://api.spotify.com/v1/browse/new-releases"
lookup_url = f"{endpoint}"
r = requests.get(lookup_url, headers=headers)
print(r.json())
you can find the
When you make this request like the comments have mentioned you get a dictionary which you can then access the keys and values. For example if you want to get the album_type you could do the following:
print(data["albums"]["items"][0]["album_type"])
Since items contains a list you would need to get the first values 0 and then access the album_type.
Output:
single
Here is a link to the code I used with your json.
I suggest you look into how to deal with json data in python, this is a good place to start.
I copied the data from the jsfiddle link.
Now try the following code:
import ast
pyobj=ast.literal_eval(str_cop_from_src)
later you can try with keys
pyobj["albums"]["items"][0]["album_type"]
pyobj will be a python dictionary will all data.

Neatly Print Comments Through Python Reddit API

I am trying to do some text analysis with Reddit comments. The script I have currently prints out the body and upvote count all comments on a given subreddit's "hot" posts with more than 5 upvotes:
import praw
reddit = praw.Reddit(client_id=ID,
client_secret=SECRET, password=PWORD,
user_agent=UAGENT, username=UNAME)
subreddit = reddit.subreddit('cryptocurrency')
for submission in subreddit.hot(limit=10):
submission.comments.replace_more(limit=10)
for comment in submission.comments.list():
submission.comment_sort = 'top'
if comment.ups > 5:
print(comment.body, comment.ups)
However, the outputs look something like this:
(u'Just hodl and let the plebs lose money on scamcoin ICO\'s that don\'t even have a working product. I don\'t understand some of these "traders" and "investors".', 9)
(u"Good idea imho but it's gonna be abused af. Think about it. It will be the sexual go to app real soon. If they will 'ban' nudity on it, then you will simply get the instagram chicks on there with all the horny guys liking their photos and giving them free money. 'if this gets 1000 likes I will post a pic of me in bikini' ", 7)
(u"But but but, I just sold a kidney and bought in at the top, now I can't afford to get the stitches removed!\n\n/s just in case.", 7)
Two questions:
Is there any way to convert the outputs to JSON using python?
If not, how can I get rid of all of the excess characters other than the body and the upvote count?
My ultimate goal is to have this output neatly organized so that I can analyze keywords vs. upvote count (what keywords get the most upvotes, etc).
Thank you!
Answer to question 2: It looks like you are writing in Python 2, but are using Python 3 print syntax. To get rid of the tuple notation in your print call you need
from __future__ import print_function
at the top of your program.
1) Is there any way to convert the outputs to JSON using python?
It's almost as simple as this
output_string = json.dumps(comments)
Except a couple of keys cause the error TypeError: Object of type Foo is not JSON serializable
We can solve this. PRAW objects which are not serializable will work correctly when converted to a string.
def is_serializable(k, v):
try:
json.dumps({k: v})
except TypeError:
return False
return True
for comment in comments:
for k, v in comment.items():
if is_serializable(k, v):
comment[k] = v
else:
comment[k] = str(v)
Now saving works.
json.dumps(comments)
2) If not, how can I get rid of all of the excess characters other than the body and the upvote count?
I think you're asking how to remove keys you do not want. You can use:
save_keys = ['body', 'ups']
for k in list(comment):
if not k in save_keys:
del comment[k]
We use list(dict) to iterate over a copy of dict's keys. This prevents you from mutating the same thing you are iterating on.
list(dict) is the same as `list(dict.keys())

Django assert that response contains one of a list of possible strings

I'm writing tests for my Django app using the built-in testing tools. Right now I'm trying to write a test for a page that displays a list of a user's followers. When a user has no followers the page displays a message randomly picked from a list of strings. As an example:
NO_FOLLOWERS_MESSAGES = [
"You don't have any followers.",
"Sargent Dan, you ain't got no followers!"
]
So now I want to write a test that asserts that the response contains one of those strings. If I was only using one string, I could just use self.assertContains(request, "You don't have any followers.") but I'm stuck on how to write the test with multiple possible outcomes. Any help would be appreciated.
Try this:
if not any([x in response.content for x in NO_FOLLOWERS_MESSAGES]):
raise AssertionError("Did not match any of the messages in the request")
About any(): https://docs.python.org/2/library/functions.html#any
Would something like this work?
found_quip = [quip in response.content for quip in NO_FOLLOWERS_MESSAGES]
self.assertTrue(any(found_quip))
Internally assertContains(), uses the count from _assert_contains()
So if you want to preserve exactly the same behavior as assertContains(), and given that the implementation of _assert_contains() isn't a trivial one, you can get inspiration from the source code above and adapt one for your needs
Our assertContainsAny() inspired by assertContains()
def assertContainsAny(self, response, texts, status_code=200,
msg_prefix='', html=False):
total_count = 0
for text in texts:
text_repr, real_count, msg_prefix = self._assert_contains(response, text, status_code, msg_prefix, html)
total_count += real_count
self.assertTrue(total_count != 0, "None of the text options were found in the response")
Use by passing the argument texts as a list, e.g.
self.assertContainsAny(response, NO_FOLLOWERS_MESSAGES)

BioPython Pubmed Eutils url?

I'm trying to run some queries against Pubmed's Eutils service. If I run them on the website I get a certain number of records returned, in this case 13126 (link to pubmed).
A while ago I bodged together a python script to build a query to do much the same thing, and the resultant url returns the same number of hits (link to Eutils result).
Of course, not having any formal programming background, it was all a bit cludgy, so I'm trying to do the same thing using Biopython. I think the following code should do the same thing, but it returns a greater number of hits, 23303.
from Bio import Entrez
Entrez.email = "A.N.Other#example.com"
handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
record = Entrez.read(handle)
print(record["Count"])
I'm fairly sure it's just down to some subtlety in how the url is being generated, but I can't work out how to see what url is being generated by Biopython. Can anyone give me some pointers?
Thanks!
EDIT:
It's something to do with how the url is being generated, as I can get back the original number of hits by modifying the code to include double quotes around the search term, thus:
handle = Entrez.esearch(db='pubmed', term='"stem+cell"[ALL]', datetype='pdat', mindate='2012', maxdate='2012')
I'm still interested in knowing what url is being generated by Biopython as it'll help me work out how i have to structure the search term for when i want to do more complicated searches.
handle = Entrez.esearch(db="pubmed", term="stem+cell[All Fields]",datetype="pdat", mindate="2012", maxdate="2012")
print(handle.url)
You've solved this already (Entrez likes explicit double quoting round combined search terms), but currently the URL generated is not exposed via the API. The simplest trick would be to edit the Bio/Entrez/__init__.py file to add a print statement inside the _open function.
Update: Recent versions of Biopython now save the URL as an attribute of the returned handle, i.e. in this example try doing print(handle.url)

Python splitting values from urllib in string

I'm trying to get IP location and other stuff from ipinfodb.com, but I'm stuck.
I want to split all of the values into new strings that I can format how I want later. What I wrote so far is:
resp = urllib2.urlopen('http://api.ipinfodb.com/v3/ip-city/?key=mykey&ip=someip').read()
out = resp.replace(";", " ")
print out
Before I replaced the string into new one the output was:
OK;;someip;somecountry;somecountrycode;somecity;somecity;-;42.1975;23.3342;+05:00
So I made it show only
OK someip somecountry somecountrycode somecity somecity - 42.1975;23.3342 +05:00
But the problem is that this is pretty stupid, because I want to use them not in one string, but in more, because what I do now is print out and it outputs this, I want to change it like print country, print city and it outputs the country,city etc. I tried checking in their site, there's some class for that but it's for different api version so I can't use it (v2, mine is v3). Does anyone have an idea how to do that?
PS. Sorry if the answer is obvious or I'm mistaken, I'm new with Python :s
You need to split the resp text by ;:
out = resp.split(';')
Now out is a list of values instead, use indexes to access various items:
print 'Country: {}'.format(out[3])
Alternatively, add format=json to your query string and receive a JSON response from that API:
import json
resp = urllib2.urlopen('http://api.ipinfodb.com/v3/ip-city/?format=json&key=mykey&ip=someip')
data = json.load(resp)
print data['countryName']

Categories