For learning POST parmaeter urllib, I'm trying to grab table values for a particular date in the following code entered in the parameters. However, it doesn't return the values for 12th September but instead response shows the date is 12th October.
Using POSTMAN, the response is returned for the correct date but with Python, I'm unable to obtain values for other than current month. Any explanation to what could be causing this? any help/suggestion is appreciated.
import urllib
import urllib2
url = ''
data = urllib.urlencode({'priceDate.month' : '09', 'priceDate.date' : '12','priceDate.year':'2016','submit':'Show Prices'})
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
d = response.read()
print d
Just use requests module.
import requests as re
url = "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
parms = {'priceDate.month':'09','priceDate.day':'12','priceDate.year':'2016','submit':'CSV+Format'}
resp = re.post(url, parms)
resp.content
Related
I'm trying to make a python script that will scrape data from this website
https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315
and download the CSV from yesterday. As you can see, it's got two menu options for dates, a radio button for CSV and then a submit button.
I thought perhaps I could use the requests library? Not looking for someone to do it for me, but if anyone could point me in the right direction that would be great!
I know this is too simple but here is what I have so far:
import requests
print('Download Starting...')
url = 'https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315'
r = requests.get(url)
filename = url.split('/')[-1] # this will take only -1 splitted part of the url
with open(filename,'wb') as output_file:
output_file.write(r.content)
print('done')
You need first to use requests.Session() in order to store cookies and re-send them in subsequent requests. The process is the following :
get the original URL first to get the cookies (session id)
make a request on POST /reports/ci_report/server/request.php with some parameters including date and output format. The result is a json with an id like this :
{'jrId': 'jr_13879611'}
make a request on GET /reports/ci_report/server/streamReport.php?jrId=jr_13879611 which gives the csv data
There is a parameter in the POST request where we need the menuitem query param value from your original url, so we parse the query params to get it using urlparse :
import requests
import time
import urllib.parse as urlparse
from urllib.parse import parse_qs
from datetime import datetime,timedelta
yesterday = datetime.now() - timedelta(1)
yesterday_date = f'{yesterday.strftime("%d")}-{yesterday.strftime("%B")[:3]}-{yesterday.strftime("%Y")}'
original_url = "https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315"
parsed = urlparse.urlparse(original_url)
target_url = "https://noms.wei-pipeline.com/reports/ci_report/server/request.php"
stream_report_url = "https://noms.wei-pipeline.com/reports/ci_report/server/streamReport.php"
s = requests.Session()
# load the cookies
s.get(original_url)
#get id
r = s.post(target_url,
params = {
"request.preventCache": int(round(time.time() * 1000))
},
data = {
"ReportProc": "CIPR_DAILY_BULLETIN",
"p_ci_id": parse_qs(parsed.query)['menuitem'][0],
"p_opun": "PL",
"p_gas_day_from": yesterday_date,
"p_gas_day_to": yesterday_date,
"p_output_option": "CSV"
})
r = s.get(stream_report_url, params = r.json())
print(r.text)
Try this on repl.it
I have been successfully implementing python Requests module to send out POST requests to server with specified
resp = requests.request("POST", url, proxies, data, headers, params, timeout)
However, for a certain reason, I now need to use python urllib2 module to query. For urllib2.urlopen's parameter "data," what I understand is that it helps to form the query string (which is the same as Requests "params"). requests.request's parameter "data," on the other hand, is used to fill the request body.
After searching and reading many posts, examples, and documentations, I still have not been able to figure out what is the corresponding parameter of requests.request's "data" in urllib2.
Any advice is much appreciated! Thanks.
-Janton
It doesn't matter what it is called - it is a matter of passing it in at the right place. For example in this example, the POST data is a dictionary (name can be anything).
The dictionary is urlencoded and the urlencoded name can again be anything but I've picked "postdata", which is the data that is POSTed
import urllib # for the urlencode
import urllib2
searchdict = {'q' : 'urllib2'}
url = 'https://duckduckgo.com/html'
postdata = urllib.urlencode(searchdict)
req = urllib2.Request(url, postdata)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()
If your POST data is plain text (not a Python type such as a dictionary) it can work without urllib.urlencode:
import urllib2
searchstring = 'q=urllib2'
url = 'https://duckduckgo.com/html'
req = urllib2.Request(url, searchstring)
response = urllib2.urlopen(req)
print response.read()
print response.getcode()
This is my code thus far.
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = urllib.urlopen(url)
print response
data = json.load(response)
print data
The problem is that when I look at the json in the browser it is long and contains more features than I see when printing it.
To be more exact, I'm looking for the 'points' part which should be
data['points']['points']
however
data['points']
has only 2 attributes and doesn't contain the second 'points' that I do see in the url in the browser.
Could it be that I can only load 1 "layer" deep and not 2?
You need to add a user-agent to your request.
Using requests (which urllib documentation recommends over directly using urllib), you can do:
import requests
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = requests.get(url, headers={'user-agent': 'Mozilla 5.0'})
print(response.json())
# long output....
So, all I want to do is send a request to the 511 api and return the train times from the train station. I can do that using the full url request, but I would like to be able to set values without paste-ing together a string and then sending that string. I want to have the api return the train times for different stations. I see other requests that use headers, but I don't know how to use headers with a request and am confused by the documentation.
This works...
urllib2.Request("http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?token=xxxx-xxx&stopcode=70142")
response = urllib2.urlopen(request)
the_page = response.read()
I want to be able to set values like this...
token = xxx-xxxx
stopcode = 70142
url = "http://services.my511.org/Transit2.0/GetNextDeparturesByStopCode.aspx?"
... and then put them together like this...
urllib2.Request(url,token, stopcode)
and get the same result.
The string formatting documentation would be a good place to start to learn more about different ways to plug in values.
val1 = 'test'
val2 = 'test2'
url = "https://www.example.com/{0}/blah/{1}".format(val1, val2)
urllib2.Request(url)
The missing piece is "urllib" needs to be used along with "urllib2". Specifically, the function urllib.urlencode() returns the encoded versions of the values.
From the urllib documentation here
import urllib
query_args = { 'q':'query string', 'foo':'bar' }
encoded_args = urllib.urlencode(query_args)
print 'Encoded:', encoded_args
url = 'http://localhost:8080/?' + encoded_args
print urllib.urlopen(url).read()
So the corrected code is as follows:
import urllib
import urllib2
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
encoded_args = urllib.urlencode(query_args)
request = urllib2.Request(url+encoded_args)
response = urllib2.urlopen(request)
print(response.read())
Actually, it is a million times easier to use requests package and not urllib, urllib2. All that code above can be replaced with this:
import requests
token = xxx-xxxx
stopcode = 70142
query_args = {"token":token, "stopcode":stopcode}
r = request.get(url, params = query_args)
r.text
I have a urllib2 opener, and wish to use it for a POST request with some data.
I am looking to receive the content of the page that I am POSTing to, and also the URL of the page that is returned (I think this is just a 30x code; so something along those lines would be awesome!).
Think of this as the code:
anOpener = urllib2.build_opener(???,???)
anOpener.addheaders = [(???,???),(???,???),...,(???,???)]
# do some other stuff with the opener
data = urllib.urlencode(dictionaryWithPostValues)
pageContent = anOpener.THE_ANSWER_TO_THIS_QUESTION
pageURL = anOpener.THE_SECOND_PART_OF_THIS_QUESTION
This is such a silly question once one realizes the answer.
Just use:
open(URL,data)
for the first part, and like Rachel Sanders mentioned,
geturl()
for the second part.
I really can't figure out how the whole Request/opener thing works though; I couldn't find any nice documentation :/
This page should help you out:
http://www.voidspace.org.uk/python/articles/urllib2.shtml#data
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
the_url = response.geturl() # <- doc claims this gets the redirected url
It looks like you can also use response.info() to get the Location header directly instead of using .geturl().
Hope that helps!
If you add data to the request the method gets automatically changed to POST. Check out the following example:
import urllib2
import json
url = "http://server.local/x/y"
data = {"name":"JackBauer"}
method = "PUT"
request = urllib2.Request(url)
request.add_header("Content-Type", "application/json")
request.get_method = lambda: method
if data: request.add_data(json.dumps(data))
response = urllib2.urlopen(request)
if response: print response.read()
As i mentioned the lambda is not needed if you use GET/POST.