At this link when hover over any row, then there is an image box which says "i" you can click to get extra data. Then navigate to Lines History. Where is that information coming from? I can't find the URL that is connected with that.
I used dev tools in chrome, and found out that there's an ajax post being made:
Request URL:http://www.sbrforum.com/ajax/?a=[SBR.Odds.Modules]OddsEvent_GetLinesHistory
Form Data: UserId=0&Sport=basketball&League=NBA&EventId=259672&View=LH&SportsbookId=238&DefaultBookId=238&ConsensusBookId=19&PeriodTypeId=&StartDate=2014-03-24&MatchupLink=http%3A%2F%2Fwww.sbrforum.com%2Fnba-basketball%2Fmatchups%2F20140324-602%2F&Key=de2f9e1485ba96a69201680d1f7bace4&theme=default
but when I try to visit this url in browser I got Invalid Ajax Call -- from host:
Any idea?
Like you say, it's probably an HTTP POST request.
When you navigate to the URL with the browser, the browser issues a GET request, without all the form data.
Try curl, wget, or the javascript console in your browser to do a POST.
Related
1 - How do websites send cookies to the browser ?
2 - How does the website know the browser address to send it cookies ?
3 - How do websites detect visits ?
4 - How does the browser send cookies back to the website ?
Cookies aren't just 'sent', this could be done, in javascript for example (through an api or user actions), but normally is done on first load.
1. Cookies are set in HTTP headers.
You receive these when you first load the page. You can inspect this in the "network" tab when you press F12 in your browser. Click on an item, and check out it's headers. They look something like this:
They are sent along with the document, HTTP isn't a stream, to keep it simple, it's just UTF-8 text files.
3. Whenever you visit a website, you send a request to the server.
When the server receives the request (Along with maybe headers, and extra data if you submit a form), a server can then preform some logic with the request, and extract data from the request headers and body, or maybe even set a cookie, that tells the server you've visited! Anyways here is an example of a request:
4. Whenever you send that request, the headers, which include your cookies get sent along with it.
They look like:
Disclaimer:
All images are from google.
I have a link: https://uchebnik.mos.ru/exam/test/test_by_binding/2511452/homework/152004494/variant/45906101/num/1?generation_context_type=lesson&external_binding_id=2230159&referer=homework®istration={my_token}
If I have logged in to the site, then I should be automatically redirected by the link: https://uchebnik.mos.ru/exam/test/training_spec/124085/task/1?registration={my_token}
I use cookies for authorization: "auth_token", "profile_id", "udacl".
How i can get this redirect link?
I use urllib.open, request.url, but the link does not change.
My guess is that this is due to JavaScript authorization, but I tried the requests_html library with JS support, but the result does not change.
Preferably without using webdrivers.
I'm trying to make a sneaker bot for Nike sneakers brazil site (nike.com.br/snkrs) using scraping. First, I tried to make a spider that logs into the site, I see that have some login requests on the network so I tried to send the ones that I think are needed.
Here is the full code of the spider:
<https://pastebin.com/Xke6kP1P>
But when I tried to crawl the spider to test I get some http request errors like 400 Bad Request, 401 POST HTTP Request not handle or allowed or 403 Forbidden.
Here is the full error code:
<https://pastebin.com/ge9pr6qx>
I already using proxy ip and user agent rotation middlewares but still get these errors.
One way to try to solve this problem is editing your settings.py.
You should copy headers that come with a response.
In order to achieve an access with success copy all this parameters from a default web browser.
Access the developer tools (F12) network tab, do a navigation in site, copy information from some request in that domain.
settings.py
USER_AGENT = '' #copy user-agent from browser
# Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
'Accept': '', #copy this info from browser
'Accept-Language': '', #copy this info from browser
}
Here is a good tutorial (2017) that explains in detail how can you handle navigation errors, old but you can get the main idea. This tutorial is also linked in scrapy web site resources section.
Hope it helps.
You get 403 forbidden because of your _abck cookie.
I'm not sure if such a thing is possible, but I am trying to submit to a form such as https://lambdaschool.com/contact using a POST request.
I currently have the following:
import requests
payload = {"name":"MyName","lastname":"MyLast","email":"someemail#gmail.com","message":"My message"}
r = requests.post('http://lambdaschool.com/contact',params=payload)
print(r.text)
But I get the following error:
<title>405 Method Not Allowed</title>
etc.
Is such a thing possible to submit using a POST request?
If it were that simple, you'd see a lot of bots attacking every login form ever.
That URL obviously doesn't accept POST requests. That doesn't mean the submit button is POST-ing to that page (though clicking the button also gives that same error...)
You need to open the chrome / Firefox dev tools and watch the request to see what happens on form submit and replicate that data in Python.
Another option would be the mechanize or Selenium webdriver libraries to simulate a browser and fill out the form
params is for query parameters. You either want data, for a form encoded body, or json, for a JSON body.
I think the url should be 'http://lambdaschool.com/contact-form'.
I am trying to fetch the HTML content of a website using urllib2. The site has a body onload event that submit a form on this site and hence it goes to a destination site and render the details I need.
response = urllib2.urlopen('www.xyz.com?var=999-999')
www.xyz.com contains a form that is posted to "www.abc.com", this
action value varies depending upon the content in url 'var=999-999'
which means action value will change if the var value changes to
'888-888'
response.read()
this still gives me the html content of "www.xyz.com" , but I want
that of resulting action url. Any suggestions of fetching the html
content from the final page?
Thanks in advance
You have to figured out the call to that second page, including parameters sent, so you can make that call yourself from your python code, best way is navigate first page with google chrome page inspector opened, then go to Network tab where the POST call would be captured and you can see the parameters sent and all. Then just recreate that same POST call from urllib2.