So, all I want to do is send a request to the 511 api and return the train times from the train station. I can do that using the full url request, but I would like to be able to set values without paste-ing together a string and then sending that string. I want to have the api return the train times for different stations. I see other requests that use headers, but I don't know how to use headers with a request and am confused by the documentation.
This works...
urllib2.Request("http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?token=xxxx-xxx&stopcode=70142")
response = urllib2.urlopen(request)
the_page = response.read()
I want to be able to set values like this...
token = xxx-xxxx
stopcode = 70142
url = "http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?"
... and then put them together like this...
urllib2.Request(url,token, stopcode)
and get the same result.
The string formatting documentation would be a good place to start to learn more about different ways to plug in values.
val1 = 'test'
val2 = 'test2'
url = "https://www.example.com/{0}/blah/{1}".format(val1, val2)
urllib2.Request(url)
The missing piece is "urllib" needs to be used along with "urllib2". Specifically, the function urllib.urlencode() returns the encoded versions of the values.
From the urllib documentation here
import urllib
query_args = { 'q':'query string', 'foo':'bar' }
encoded_args = urllib.urlencode(query_args)
print 'Encoded:', encoded_args
url = 'http://localhost:8080/?' + encoded_args
print urllib.urlopen(url).read()
So the corrected code is as follows:
import urllib
import urllib2
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
encoded_args = urllib.urlencode(query_args)
request = urllib2.Request(url+encoded_args)
response = urllib2.urlopen(request)
print(response.read())
Actually, it is a million times easier to use requests package and not urllib, urllib2. All that code above can be replaced with this:
import requests
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
r = request.get(url, params = query_args)
r.text
Related
I have been successfully implementing python Requests module to send out POST requests to server with specified
resp = requests.request("POST", url, proxies, data, headers, params, timeout)
However, for a certain reason, I now need to use python urllib2 module to query. For urllib2.urlopen's parameter "data," what I understand is that it helps to form the query string (which is the same as Requests "params"). requests.request's parameter "data," on the other hand, is used to fill the request body.
After searching and reading many posts, examples, and documentations, I still have not been able to figure out what is the corresponding parameter of requests.request's "data" in urllib2.
Any advice is much appreciated! Thanks.
-Janton
It doesn't matter what it is called - it is a matter of passing it in at the right place. For example in this example, the POST data is a dictionary (name can be anything).
The dictionary is urlencoded and the urlencoded name can again be anything but I've picked "postdata", which is the data that is POSTed
import urllib # for the urlencode
import urllib2
searchdict = {'q' : 'urllib2'}
url = 'https://duckduckgo.com/html'
postdata = urllib.urlencode(searchdict)
req = urllib2.Request(url, postdata)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()
If your POST data is plain text (not a Python type such as a dictionary) it can work without urllib.urlencode:
import urllib2
searchstring = 'q=urllib2'
url = 'https://duckduckgo.com/html'
req = urllib2.Request(url, searchstring)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()
This is my code thus far.
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = urllib.urlopen(url)
print response
data = json.load(response)
print data
The problem is that when I look at the json in the browser it is long and contains more features than I see when printing it.
To be more exact, I'm looking for the 'points' part which should be
data['points']['points']
however
data['points']
has only 2 attributes and doesn't contain the second 'points' that I do see in the url in the browser.
Could it be that I can only load 1 "layer" deep and not 2?
You need to add a user-agent to your request.
Using requests (which urllib documentation recommends over directly using urllib), you can do:
import requests
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = requests.get(url, headers={'user-agent': 'Mozilla 5.0'})
print(response.json())
# long output....
For learning POST parmaeter urllib, I'm trying to grab table values for a particular date in the following code entered in the parameters. However, it doesn't return the values for 12th September but instead response shows the date is 12th October.
Using POSTMAN, the response is returned for the correct date but with Python, I'm unable to obtain values for other than current month. Any explanation to what could be causing this? any help/suggestion is appreciated.
import urllib
import urllib2
url = ''
data = urllib.urlencode({'priceDate.month' : '09', 'priceDate.date' : '12','priceDate.year':'2016','submit':'Show Prices'})
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
d = response.read()
print d
Just use requests module.
import requests as re
url = "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
parms = {'priceDate.month':'09','priceDate.day':'12','priceDate.year':'2016','submit':'CSV+Format'}
resp = re.post(url, parms)
resp.content
I have two Python scripts. One uses the Urllib2 library and one uses the Requests library.
I have found Requests easier to implement, but I can't find an equivalent for urlib2's read() function. For example:
...
response = url.urlopen(req)
print response.geturl()
print response.getcode()
data = response.read()
print data
Once I have built up my post url, data = response.read() gives me the content - I am trying to connect to a vcloud director api instance and the response shows the endpoints that I have access to. However if I use the Requests library as follows.....
....
def post_call(username, org, password, key, secret):
endpoint = '<URL ENDPOINT>'
post_url = endpoint + 'sessions'
get_url = endpoint + 'org'
headers = {'Accept':'application/*+xml;version=5.1', \
'Authorization':'Basic '+ base64.b64encode(username + "#" + org + ":" + password), \
'x-id-sec':base64.b64encode(key + ":" + secret)}
print headers
post_call = requests.post(post_url, data=None, headers = headers)
print post_call, "POST call"
print post_call.text, "TEXT"
print post_call.content, "CONTENT"
post_call.status_code, "STATUS CODE"
....
....the print post_call.text and print post_call.content returns nothing, even though the status code equals 200 in the requests post call.
Why isn't my response from Requests returning any text or content?
Requests doesn't have an equivalent to Urlib2's read().
>>> import requests
>>> response = requests.get("http://www.google.com")
>>> print response.content
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head>....'
>>> print response.content == response.text
True
It looks like the POST request you are making is returning no content. Which is often the case with a POST request. Perhaps it set a cookie? The status code is telling you that the POST succeeded after all.
Edit for Python 3:
Python now handles data types differently. response.content returns a sequence of bytes (integers that represent ASCII) while response.text is a string (sequence of chars).
Thus,
>>> print response.content == response.text
False
>>> print str(response.content) == response.text
True
If the response is in json you could do something like (python3):
import json
import requests as reqs
# Make the HTTP request.
response = reqs.get('http://demo.ckan.org/api/3/action/group_list')
# Use the json module to load CKAN's response into a dictionary.
response_dict = json.loads(response.text)
for i in response_dict:
print("key: ", i, "val: ", response_dict[i])
To see everything in the response you can use .__dict__:
print(response.__dict__)
If you push, for example image, to some API and want the result address(response) back you could do:
import requests
url = 'https://uguu.se/api.php?d=upload-tool'
data = {"name": filename}
files = {'file': open(full_file_path, 'rb')}
response = requests.post(url, data=data, files=files)
current_url = response.text
print(response.text)
If the Response is in Json you can directly use below method in Python3, no need for json import and json.loads() method:
response.json()
There are three different ways for you to get the contents of the response you have got.
Content - (response.content) - libraries like beautifulsoup accept input as binary
JSON (response.json()) - most of the API calls give response in this format only
Text (response.text) - serves any purpose including regex based search, or dumping data to a file etc.
Depending the type of webpage you are scraping, you can use the attribute accordingly.
I have a urllib2 opener, and wish to use it for a POST request with some data.
I am looking to receive the content of the page that I am POSTing to, and also the URL of the page that is returned (I think this is just a 30x code; so something along those lines would be awesome!).
Think of this as the code:
anOpener = urllib2.build_opener(???,???)
anOpener.addheaders = [(???,???),(???,???),...,(???,???)]
# do some other stuff with the opener
data = urllib.urlencode(dictionaryWithPostValues)
pageContent = anOpener.THE_ANSWER_TO_THIS_QUESTION
pageURL = anOpener.THE_SECOND_PART_OF_THIS_QUESTION
This is such a silly question once one realizes the answer.
Just use:
open(URL,data)
for the first part, and like Rachel Sanders mentioned,
geturl()
for the second part.
I really can't figure out how the whole Request/opener thing works though; I couldn't find any nice documentation :/
This page should help you out:
http://www.voidspace.org.uk/python/articles/urllib2.shtml#data
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
the_url = response.geturl() # <- doc claims this gets the redirected url
It looks like you can also use response.info() to get the Location header directly instead of using .geturl().
Hope that helps!
If you add data to the request the method gets automatically changed to POST. Check out the following example:
import urllib2
import json
url = "http://server.local/x/y"
data = {"name":"JackBauer"}
method = "PUT"
request = urllib2.Request(url)
request.add_header("Content-Type", "application/json")
request.get_method = lambda: method
if data: request.add_data(json.dumps(data))
response = urllib2.urlopen(request)
if response: print response.read()
As i mentioned the lambda is not needed if you use GET/POST.