Python splitting values from urllib in string - python

I'm trying to get IP location and other stuff from ipinfodb.com, but I'm stuck.
I want to split all of the values into new strings that I can format how I want later. What I wrote so far is:
resp = urllib2.urlopen('http://api.ipinfodb.com/v3/ip-city/?key=mykey&ip=someip').read()
out = resp.replace(";", " ")
print out
Before I replaced the string into new one the output was:
OK;;someip;somecountry;somecountrycode;somecity;somecity;-;42.1975;23.3342;+05:00
So I made it show only
OK someip somecountry somecountrycode somecity somecity - 42.1975;23.3342 +05:00
But the problem is that this is pretty stupid, because I want to use them not in one string, but in more, because what I do now is print out and it outputs this, I want to change it like print country, print city and it outputs the country,city etc. I tried checking in their site, there's some class for that but it's for different api version so I can't use it (v2, mine is v3). Does anyone have an idea how to do that?
PS. Sorry if the answer is obvious or I'm mistaken, I'm new with Python :s

You need to split the resp text by ;:
out = resp.split(';')
Now out is a list of values instead, use indexes to access various items:
print 'Country: {}'.format(out[3])
Alternatively, add format=json to your query string and receive a JSON response from that API:
import json
resp = urllib2.urlopen('http://api.ipinfodb.com/v3/ip-city/?format=json&key=mykey&ip=someip')
data = json.load(resp)
print data['countryName']

Related

Output of python code is one character per line

I'm new to Python and having some trouble with an API scraping I'm attempting. What I want to do is pull a list of book titles using this code:
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
for title in doc["sourceResource"]["title"]:
print (title)
Which works to pull the titles, but most (not all) titles are outputting as one character per line. I've tried adding .splitlines() but this doesn't fix the problem. Any advice would be appreciated!
The problem is that you have two types of title in the response, some are plain strings "Germain the wizard" and some others are arrays of string ['Joe Strong, the boy wizard : or, The mysteries of magic exposed /']. It seems like in this particular case, all lists have length one, but I guess that will not always be the case. To illustrate what you might need to do I added a join here instead of just taking title[0].
import requests
import json
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
title = doc["sourceResource"]["title"]
if isinstance(title, list):
print(" ".join(title))
else:
print(title)
In my opinion that should never happen, an API should return predictable types, otherwise it looks messy on the users' side.

Python - Format a string from a list not working

I want to crawl a webpage for some information and what I've done so far It's working but I need to do a request to another url from the website, I'm trying to format it but it's not working, this is what I have so far:
name = input("> ")
page = requests.get("http://www.mobafire.com/league-of-legends/champions")
tree = html.fromstring(page.content)
for index, champ in enumerate(champ_list):
if name == champ:
y = tree.xpath(".//*[#id='browse-build']/a[{}]/#href".format(index + 1))
print(y)
guide = requests.get("http://www.mobafire.com{}".format(y))
builds = html.fromstring(guide.content)
print(builds)
for title in builds.xpath(".//table[#class='browse-table']/tr[2]/td[2]/div[1]/a/text()"):
print(title)
From the input, the user enters a name; if the name matches one from a list (champ_list) it prints an url and from there it formats it to the guide variable and gets another information but I'm getting errors such as invalid ipv6.
This is the output url (one of them but they're similar anyway) ['/league-of-legends/champion/ivern-133']
I tried using slicing but it doesn't do anything, probably I'm using it wrong or it doesn't work in this case. I tried using replace as well, they don't work on lists; tried using it as:
y = [y.replace("'", "") for y in y] so I could see if it removed at least the quotes but it didn't work neither; what can be another approach to format this properly?
I take it y is the list you want to insert into the string?
Try this:
"http://www.mobafire.com{}".format('/'.join(y))

How do you slice part of an input statement in Python 3?

I have a question which asks me to get a user's email address and then return the URL it is associated with. So, for example: 'abc123#address.com' --> 'http:://www.address.com'
I did get this:
def main():
email_address = input('Enter your email address (eg. abc123#address.com): ').strip()
strip_username = email_address.split('#', 1)[-1]
the_url(strip_username)
def the_url(url_ending):
print('Your associated URL is: http://www.' + str(url_ending))
main()
which does what I want, but this code: split('#'...) is something I haven't learned yet. I just found it online. I need to use indexing and splicing for this program, but how can I use splicing if I don't know the length of the user's email? I need to get rid of everything before and including the '#' symbol so that it can leave me with just 'address.com' but I don't know what address it will be. It could be hotmail, gmail, etc. Thanks, and I'm really new to Python so I'm trying to only use what I've learned in class so far.
The split method just splits up the string based on the character to you give it, so:
"Hello#cat".split("#")
Will give you
["Hello", "cat"]
Then you can just take the 1st index of that array to give you whatever's after the first # symbol.
If you don't want to use str.split then by indexing and slicing,
you can do something like this.
>>> str = 'abc123#address.com'
>>> 'http://www.' + str[str.index('#')+1:]
'http://www.address.com'

Python Regex to pull multiple pieces of data out of a data structure

I need a regex to pull tidbits out of the following data structure. This data is in a javascript variable. I'm using BeautifulSoup and Mechanize to make the request and parse the page but I don't see how I can get what I need without a regex. More details follow below.
raw data:
var d = [[909.0546875,842.3125,32429,'TownID: 32429','GREY','circle_grey.png',970,'goldpimp\'s city','','N/A'],[1434.8890625,1365.41484375,32143,'TownID: 32143','GREY','circle_grey.png',899,'1..','','N/A'],[1553.92265625,1117.43046875,32326,'TownID: 32326','GREY','circle_grey.png',522,'Avacyns Pantheon','','N/A'],[1305.17265625,1328.6421875,28927,'TownID: 28927','GREY','circle_grey.png',3554,'Furiocity','','N/A'],...(cont.)
For example on the first line I need to pull TownID: 32429, 970, and goldpimp\'s city
I need to do this for the whole data structure to get each townID and associated information. Sorry for the newbie question but regex really racks my brain.
d is a list, you can access lists by indexing. So, why the regex? You don't need it.
For getting your result:
for city in d:
print "%s %s %s" % (city[3], city[6], city[7])
The print statement prints the text in console. Each %s will be replaced (in order) with a string from the right group (first %s will be replaced with city[3], second with city[6] and third with city[7]).
EDIT
OK, if d comes from a Javascript source, you need to convert to Python data using json.loads, store the result of it in a variable and access with the eariler method (see info about the Python's json module here for 2.7 and here for 3.3).

Retrieving a lot url addresses

Edit: Just for clarification I am using python, and would like to do this within python.
I am in the middle of collecting data for a research project at our university. Basically I need to scrape a lot of information from a website that monitors the European Parliament. Here is an example of how the url of one site looks like:
http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-0190&language=EN
The numbers after the reference part of the address refers to:
A7 = Parliament in session (previous parliaments are A6 etc.),
2010 = year,
0190 = number of the file.
What I want to do is to create a variable that has all the urls for different parliaments, so I can loop over this variable and scrape the information from the websites.
P.S: I have tried this:
number = range(1,190,1)
for i in number:
search_url = "http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-" + str(number[i]) +"&language=EN"
results = search_url
print results
but this gives me the following error:
Traceback (most recent call last):
File "", line 7, in
IndexError: list index out of range
If I understand correctly, you just want to be able to loop over the parliments?
i.e. you want A7, A6, A5...?
If that's what you want a simple loop could handle it:
for p in xrange(7,0, -1):
parliment = "A%d" % p
print p
for the other values similar loops would work just as well:
for year in xrange(2010, 2000, -1):
print year
for filenum in xrange(100,200):
fnum = "%.4d" % filenum
print fnum
You could easily nest your loops in the proper order to generate the combination(s) you need. HTH!
Edit:
String formatting is super useful, and here's how you can do it with your example:
# Just create a string with the format specifier in it: %.4d - a [d]ecimal with a
# precision/width of 4 - so instead of 3 you'll get 0003
search_url = "http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-%.4d&language=EN"
# This creates a Python generator. They're super powerful and fun to use,
# and you can iterate over them, just like a collection.
# 1 is the default step, so no need for it in this case
for number in xrange(1,190):
print search_url % number
String formatting takes a string with a variety of specifiers - you'll recognize them because they have % in them - followed by % and a tuple containing the arguments to the format string.
If you want to add the year and parliment, change the string to this:
search_url = "http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A%d-%d-%.4d&language=EN"
where the important changes are here:
reference=A%d-%d-%.4d&language=EN
That means you'll need to pass 3 decimals like so:
print search_url % (parliment, year, number)
Can you use python and wget ? Loop through the sessions that exist, and create a string to give to wget? Or is that overkill?
Sorry I can't give this as a comment, but I don't have a high enough score yet.
Looking at the code you quoted in the comment above, your problem is you are trying to add a string and an integer. While some languages will do on the fly conversion (useful when it works but confusing when it doesn't), you have to explicitly convert it with str().
It should be something like:
"http://firstpartofurl" + str(number[i]) + "restofurl"
or, you can use string formatting (using % etc. as Wayne's answer).
Use selenium. Since it controls uses a real browser, it can handle sites using complex javascript. Many language bindings are available, including python.

Categories