I am receiving following response from the server
ctrlDateTime%24txtSpecifyFromDate=05%2F02%2F2015&
ctrlDateTime%24rgApplicable=rdoApplicableFor&
ctrlDateTime%24txtSpecifyToDate=05%2F02%2F2015&
I am trying with
br["ctrlDateTime%24txtSpecifyFromDate"]="05%2F02%2F2015";
br["ctrlDateTime%24rgApplicable"]="rdoApplicableFor";
br["ctrlDateTime%24txtSpecifyToDate"]="05%2F02%2F2015";
How can I fix ControlnotfoundError? Here is my code:
Any idea how to solve it?
import mechanize
import re
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0')]
response = br.open("http://marketinformation.natgrid.co.uk /gas/frmDataItemExplorer.aspx")
br.select_form(nr=0)
br.set_all_readonly(False)
mnext = re.search("""<a id="lnkNext" href="javascript:__doPostBack('(.*?)','(.*?)')">XML""", html)
br["tvDataItem_ExpandState"]="cccccccceennncennccccccccc";
br["tvDataItem_SelectedNode"]="";
br["__EVENTTARGET"]="lbtnCSVDaily";
br["__EVENTARGUMENT"]="";
br["tvDataItem_PopulateLog"]="";
br["__VIEWSTATE"]="%2FwEP.....SNIP....%2F90SB9E%3D";
br["__VIEWSTATEGENERATOR"]="B2D04314";
br["__EVENTVALIDATION"]="%2FwEW...SNIP...uPceSw%3D%3D";
br["txtSearch"]="";
br["tvDataItemn11CheckBox"]="on";
br["tvDataItemn15CheckBox"]="on";
br["ctrlDateTime%24txtSpecifyFromDate"]="05%2F02%2F2015";
br["ctrlDateTime%24rgApplicable"]="rdoApplicableFor";
br["ctrlDateTime%24txtSpecifyToDate"]="05%2F02%2F2015";
br["btnViewData"]="View+Data+for+Data+Items";
br["hdnIsAddToList"]="";
response = br.submit()
print(response.read());
Thanks in advance.
P.
This is solved in two steps: 1) I replaced %24 with '$'; 2) some of the parameters required a true parameter to be passed on and some to be passed on as ['',]
Related
Input
import requests
from http import cookiejar
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64;rv:57.0) Gecko/20100101 Firefox/57.0'}
url = "http://www.baidu.com/"
session = requests.Session()
req = session.put(url = url,headers=headers)
cookie = requests.utils.dict_from_cookiejar(req.cookies)
print(session.cookies.get_dict())
print(cookie)
Gives output:
{'BAIDUID': '323CFCB910A545D7FCCDA005A9E070BC:FG=1', 'BDSVRTM': '0'}
{'BAIDUID': '323CFCB910A545D7FCCDA005A9E070BC:FG=1'}
as here.
I try to use this code to get all cookies from the Baidu website but only return the first cookie. I compare it with the original web cookies(in the picture), it has 9 cookies. How can I get all the cookies?
You didn't maintain your session, so it terminated after the second cookie.
import requests
from http import cookiejar
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64;rv:57.0) Gecko/20100101 Firefox/57.0'}
url = "http://www.baidu.com/"
with requests.Session() as s:
req = s.get(url, headers=headers)
print(req.cookies.get_dict())
>> print(req.cookies.get_dict().keys())
>>> ['BDSVRTM', 'BAIDUID', 'H_PS_PSSID', 'BIDUPSID', 'PSTM', 'BD_HOME']
I am using python 3.5.2. I want to scrap a webpage where cookies are required. But when I use requests.session() the cookies maintained in the session are not updated, thus my scraping failed constantly. Following is my code snippet.
import requests
from bs4 import BeautifulSoup
import time
import requests.utils
session = requests.session()
session.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"})
print(session.cookies.get_dict())
url = "http://www.beianbaba.com/"
session.get(url)
print(session.cookies.get_dict())
Do you guys have any idea about this?Thank you so much in advance.
It seems like that website request is not providing any cookies. I used the exact same code but requested for https://google.com:
import requests
session = requests.Session()
session.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"})
print(session.cookies.get_dict())
url = "http://google.com/"
session.get(url)
print(session.cookies.get_dict())
And got this output:
{}
{'NID': 'a cookie that i removed'}
I am trying to use the below code to access websites in python 3 using urllib
url = "http://www.goal.com"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
r = urllib.request.Request(url=url, headers=headers)
urllib.request.urlopen(r).read(1000)
It works fine when it access "yahoo.com", but it always returned error 403 when accessing sites such as "goal.com, hkticketing.com.hk" and I cannot figure out what I am missing. Appreciate for your help.
In python 2.x version , you can use urllib2 to fetch the contents. You can invoke the add headers function to add the header information. Then invoke the open method and read the contents. Finally print them.
import urllib2
import sys
print sys.version
url = urllib2.build_opener()
url.addheaders = [('User-agent', 'Mozilla/5.0(Windows NT 6.1; WOW64; rv:23.0)Gecko/20100101 Firefox/23.0')]
print url.open('http://hkticketing.com.hk').read()
I am attempting to use Mechanize to get some data and I get the error "Nonetype object does not support item assignment". I copy the code I am using below.
import mechanize
url = "http://www.tropicos.org"
br = mechanize.Browser()
br.form["ct100_MainContentPlaceHolder_acNameControl_textBox"] = "poa annua"
response = br.submit()
print response.read()
Your problem is that you are not calling open on br before you access form. Thus, try the following:
import mechanize
url = "http://www.tropicos.org"
br = mechanize.Browser()
br.open(url) #RIGHT HERE
br.form["ct100_MainContentPlaceHolder_acNameControl_textBox"] = "poa annua"
response = br.submit()
print response.read()
And it should work.
Try to add useragent and number form.
your code will look like:
import mechanize
useragents = [('User-agent',
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)')]
url = "http://www.tropicos.org"
br = mechanize.Browser()
br.addheaders = [('User-agent', random.choice(useragents))]
site = br.open(url)
br.select_form(nr=0)
br.form["ct100_MainContentPlaceHolder_acNameControl_textBox"] = "poa annua"
response = br.submit()
print response.read()
usually the select_form is set to 0
some cases its hiden
I am having an issue with mechanize's timeout feature. On most pages it works perfectly, if the URL fails to load in a reasonable amount of time it raises an error: urllib2.URLError: <urlopen error timed out>. However, on certain pages the timer does not work and the program becomes unresponsive even to a keyboard interrupt. Here is an example page where that occurs:
import mechanize
url = 'https://web.archive.org/web/20141104183547/http://www.dallasnews.com/'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Firefox')]
html = br.open(url, timeout=0.01).read() #hangs on this page, timeout set extremely low to trigger timeout on all pages for debugging
First, does this script hang for other people for this particular URL? Second, what could be going wrong/how do I debug?
I don't know why that url request hangs up for mechanize but using urllib2; the request comes back fine. Maybe they have some code that recognizes mechanize despite setting robots to false.
I think urllib2 should be a good solution for your situation
import mechanize
import urllib2
url = 'https://web.archive.org/web/20141104183547/http://www.dallasnews.com/'
try:
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
html = br.open(url).read() #set_handle_robots
except:
req = urllib2.Request(url, headers={'User-Agent' : 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16'})
con = urllib2.urlopen( req )
html = con.read()
print html