I am little bit confused about creating full url.
I have such code :
def flats(self):
return [JsonFlatPage(property_data = flat, url = flat['propertyUrl'])
for flat in self.data['properties']]
in flat['propertyUrl'] I have '/properties/75599853', but I need to get like this one:
'https://www.rightmove.co.uk/properties/75599853#/'
with full path and # at the end.
I know that I should make constant URI in settings file, but then how I can convert it? Should I use f-strings?
I think since the base url https://www.rightmove.co.uk/ is fixed, you can do something like below to get what you need:
def flats(self):
baseUrl = 'https://www.rightmove.co.uk/'
return [JsonFlatPage(property_data = flat, url = baseUrl + flat['propertyUrl'] + "#/")
for flat in self.data['properties']]
You can also use f-strings as you mentioned as:
def flats(self):
baseUrl = 'https://www.rightmove.co.uk/'
return [JsonFlatPage(property_data = flat, url = f"{baseUrl}{flat['propertyUrl']}#/")
for flat in self.data['properties']]
Related
I'm a beginner programmer, I'm adventuring with APIs and python.
I would like to know if I can get all types of pokémon withou passing a number as I did here:
import requests
name = "charizard"
url = f'https://pokeapi.co/api/v2/pokemon/%7Bname%7D'
poke_request = requests.get(url)
poke_request = poke_request.json()
types = poke_request['types'][0]['type']['name']
I tried doing some loops or passing variables but I always end up with some "slices" error.
If there's a way to print the types inside a list, that would be great!
PokeAPI has direct access to the pokemon types:
import requests
url = "https://pokeapi.co/api/v2/type/"
poke_request = requests.get(url)
types = poke_request.json()
for typ in types["results"]:
print(typ)
EDIT:
You can save the names as a list using
names = [typ["name"] for typ in types["results"]]
The user #JustLearning helped me out and I was able to adapt the solution to the scenario I wanted:
import requests
url = "https://pokeapi.co/api/v2/pokemon/charizard"
poke_request = requests.get(url)
types = poke_request.json()
names = [typ["type"]["name"] for typ in types["types"]]
print(names)
i want to build some function that read a url from txt file, then save it to some variable, then add some values inside the url between another values
example of the url: https://domains.livedns.co.il/API/DomainsAPI.asmx/NewDomain?UserName=apidemo#livedns.co.il&Password=demo
lets say i want to inject some values between UserName and Password and save it into file again and use it later.
i started to write the function and play with urllib parser but i still doesnt understand how to do that.
what i tried until now:
def dlastpurchase():
if os.path.isfile("livednsurl.txt"):
apikeyfile = open("livednsurl.txt", "r")
apikey = apikeyfile.read()
url_parse = urlsplit(apikey)
print(url_parse.geturl())
dlastpurchase()
Thanks in advance for every tip and help
A little bit more complex example that I believe you will find interesting and also enjoy improving it (while it takes care of some scenarios, it might be lacking in some). Also functional to enable reuse in other cases. Here we go
assuming we have a text file, named 'urls.txt' that contains this url
https://domains.livedns.co.il/API/DomainsAPI.asmx/NewDomain?UserName=apidemo#livedns.co.il&Password=demo
from os import error
from urllib.parse import urlparse, parse_qs, urlunparse
filename = 'urls.txt'
function to parse the url and return its query parameters as well as the url object, which will be used to reconstruct the url later on
def parse_url(url):
"""parse a given url and return its query parameters
Args:
url (string): url string to parse
Returns:
parsed (tupple): the tupple object returned by urlparse
query_parameters (dictionary): dictionary containing the query parameters as keys
"""
try :
# parse the url and get the queries parameters from there
parsed = urlparse(url)
# parse the queries and return the dictionary containing them
query_result = parse_qs(parsed.query)
return (query_result, parsed)
except(error):
print('something failed !!!')
print(error)
return False
function to add a new query parameter or to replace an existing one
def insert_or_replace_word(query_dic, word,value):
"""Insert a value for the query parameters of a url
Args:
query_dic (object): the dictionary containing the query parameters
word (string): the query parameter to replace or insert values for
value (string): the value to insert or use as replacement
Returns:
result (string):the result of the insertion or replacement
"""
try:
query_dic[word] = value
return query_dic
except (error):
print('Something went wrong {0}'.format(error))
function to format the query parameter and get them ready to reconstruct the new url
def format_query_strings(query_dic):
"""format the final query dictionaries ready to be used to construct a new url and construct the new url
Args:
query_dic (dictionary): final query dictionary after insertion or update
"""
final_string = ''
for key, value in query_dic.items():
#unfortunatly, query params from parse_qs are in list, so remove them before creating the final string
if type(value) == list:
query_string = '{0}={1}'.format(key, value[0])
final_string += '{0}&'.format(query_string)
else:
query_string = '{0}={1}'.format(key, value)
final_string += '{0}&'.format(query_string)
# this is to remove any extra & inserted at the end of the loop above
if final_string.endswith('&'):
final_string = final_string[:len(final_string)-1]
return final_string
we check out everything works by reading in text file, performing above operation and then saving the new url to a new file
with open(filename) as url:
lines = url.readlines()
for line in lines:
query_params,parsed = parse_url(line)
new_query_dic = insert_or_replace_word(query_params,'UserName','newUsername')
final = format_query_strings(new_query_dic)
#here you have to pass an iterable of lenth 6 in order to reconstruct the url
new_url_object = [parsed.scheme,parsed.netloc,parsed.path,parsed.params,final,parsed.fragment]
#this reconstructs the new url
new_url = urlunparse(new_url_object)
#create a new file and append the link inside of it
with open('new_urls.txt', 'a') as new_file:
new_file.writelines(new_c)
new_file.write('\n')
You don't have to use fancy tools to do that. Just split the url based on "?" Character. Then, split the second part based on "&" character. Add your new params to the list you have, and merge them with the base url you get.
url = "https://domains.livedns.co.il/API/DomainsAPI.asmx/NewDomain?UserName=apidemo#livedns.co.il&Password=demo"
base, params = url.split("?")
params = params.split("&")
params.insert(2, "new_user=yololo&new_passwd=hololo")
for param in params:
base += param + "&"
base = base.strip("&")
print(base)
I did it like this since you asked for inserting to a specific location. But url params are not depends on the order, so you can just append at the end of the url for ease. Or, you can edit the parameters from the list I show.
I want to get the top artists from a specific country from the last fm API in JSON and save the name and url in the name and url variables. But it always appears "TypeError: byte indices must be integers". Do you know where is the issue?
Working example:
import requests
api_key = "xxx"
for i in range(2,5):
artists = requests.get('http://ws.audioscrobbler.com/2.0/?method=geo.gettopartists&country=spain&format=json&page='+str(i)+'&api_key='+api_key)
for artist in artists:
print(artist)
#name = artist['topartists']['artist']['name']
#url = artist['topartists']['artist']['url']
You want:
response = requests.get(...)
data = response.json()
for artist in data["topartists"]["artist"]:
name = artist["name"]
# etc
Explanation: requests.get() returns a response object. Iterating over the response object is actually iterating over the raw textual response content, line by line. Since this content is actually json, you want to first decode it to Python (response.json() is mainly a shortcut for json.loads(response.content)). You then get a python dict with, in this case, a single key "topartists" which points to a list of "artist" dicts.
A couple hints:
First you may want to learn to use string formatting instead of string concatenation. This :
'http://ws.audioscrobbler.com/2.0/?method=geo.gettopartists&country=spain&format=json&page='+str(i)+'&api_key='+api_key
is ugly and hardly readable. Using string formatting:
urltemplate = "http://ws.audioscrobbler.com/2.0/?method=geo.gettopartists&country=spain&format=json&page={page}&api_key={api_key}"
url = urltemplate.format(page=i, api_key=api_key)
but actually requests knows how to build a querystring from a dict, so you should really use this instead:
query = {
"method": "geo.gettopartists",
"country":"spain",
"format":"json",
"api_key": api_key
}
url = "http://ws.audioscrobbler.com/2.0/"
for pagenum in range(x, y):
query["page"] = pagenum
response = requests.get(url, params=query)
# etc
Then, you may also want to handle errors - there are quite a few things that can go wrong doing an HTTP request.
I create a url with {} format to change the url on the fly.
It works totally fine on my PC.
But once I upload and run it from scrapinghub one(state) of the many substitutions(others work fine) does not work, it returns %7B%7D& in the url which is encoded curly braces.
Why does this happen? What do I miss when referencing State variable?
This is the url from my code:
def __init__(self):
self.state = 'AL'
self.zip = '35204'
self.tax_rate = 0
self.years = [2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017]
def parse_m(self, response):
r = json.loads(response.text)
models = r['models']
year = response.meta['year']
make = response.meta['make']
for model in models:
for milage in [40000,50000,60000,70000,80000,90000,100000]:
url = '****/vehicles/?year={}&make={}&model={}&state={}&mileage={}&zip={}'.format(year,make, model, self.state, milage, self.zip)
and this is the url i see in the log of scrapinghub:
***/vehicles/?year=2010&make=LOTUS&model=EXIGE%20S&state=%7B%7D&mileage=100000&zip=35204
This is not a scrapinghub issue. It has to be your code only. If I do below
>>> "state={}".format({})
'state={}'
This would end up being
state=%7B%7D
I would add
assert type(self.state) is str
to my code to ensure this situation doesn't happen and if it does then you get an AssertionError
Example - I have the following dictionary...
URLDict = {'OTX2':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=OTX2&action=view_all',
'RAB3GAP':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=RAB3GAP1&action=view_all',
'SOX2':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=SOX2&action=view_all',
'STRA6':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=STRA6&action=view_all',
'MLYCD':'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=MLYCD&action=view_all'}
I would like to use urllib to call each url in a for loop, how can this be done?
I have successfully done this with with the urls in a list format like this...
OTX2 = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=OTX2&action=view_all'
RAB3GAP = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=RAB3GAP1&action=view_all'
SOX2 = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=SOX2&action=view_all'
STRA6 = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=STRA6&action=view_all'
MLYCD = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=MLYCD&action=view_all'
URLList = [OTX2,RAB3GAP,SOX2,STRA6,PAX6,MLYCD]
for URL in URLList:
sourcepage = urllib.urlopen(URL)
sourcetext = sourcepage.read()
but I want to also be able to print the key later when returning data. Using a list format the key would be a variable and thus not able to access it for printing, I would lonly be able to print the value.
Thanks for any help.
Tom
Have you tried (as a simple example):
for key, value in URLDict.iteritems():
print key, value
Doesn't look like a dictionary is even necessary.
dbs = ['OTX2', 'RAB3GAP', 'SOX2', 'STRA6', 'PAX6', 'MLYCD']
urlbase = 'http://lsdb.hgu.mrc.ac.uk/variants.php?select_db=%s&action=view_all'
for db in dbs:
sourcepage = urllib.urlopen(urlbase % db)
sourcetext = sourcepage.read()
I would go about it like this:
for url_key in URLDict:
URL = URLDict[url_key]
sourcepage = urllib.urlopen(URL)
sourcetext = sourcepage.read()
The url is obviously URLDict[url_key] and you can retain the key value within the name url_key. For exemple:
print url_key
On the first iteration will printOTX2.