Headers while running EC2 Ubuntu instance - python

I am attempting to run my code on a aws ec2(ubuntu) instance. The codes work perfectly fine on my local but doesnt seem to be able to connect to website inside server.
Im assuming it has to do something with the headers. I have installed firefox and chrome on the server but doesnt seem to do anything.
Any ideas on how to fix this problem would be appreciated.
import requests
HEADERS = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'}
# Making a get request
response = requests.get("https://us.louisvuitton.com/eng-us/products/pocket-organizer-monogram-other-nvprod2380073v", headers=HEADERS) #hangs here, cant make request in server
# print response
print(response.status_code)
Output:
Doesn't give me one, just stays blank until I KeyboardInterrupt.

Related

some websites won't respond (timeout) to requests.get. Can headers resolve?

I'm new to using requests and I can't seem to get https://www.costco.com to respond to a simple requests.get command.
I don't understand why, I believe it is because it knows I'm not on a browser.
I don't get any response, not even a 404.
so I tried with a simple header and it worked on my local machine.
headers = {"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0"
}
page = requests.get("https://www.costco.com", headers=headers, timeout=10)
But when I put this in an AWS Lambda function it went back to no response.
Is there a way to know why it won't respond at all? like what headers it is after?
Note that I have no trouble getting a response from https://google.com.

scraping yell with python requests gives 403 error

I have this code
from requests.sessions import Session
url = "https://www.yell.com/s/launderettes-birmingham.html"
s = Session()
headers = {
'user-agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
}
r = s.get(url,headers=headers)
print(r.status_code)
but I get 403 output, instead 200
I can scrape this data with selenium, but is there a way to scrape this with requests
If you modify your code like so:
print(r.text)
print(r.status_code)
you will see, that the reason you are getting a 400 error code is due to yell using Cloudflare browser check.
As it uses javascript, there is no way to reliably use the requests module.
Since you mentioned you are going to use selenium, make sure to use the undetected driver package
Also, be sure to rotate your IP to avoid getting your IP blocked.

Python trying to send request with requests library but nothing happened?

Like the title said, im trying to send request a url using requests with headers, but when I try to print the status code it doesn't print anything in the terminal, I checked my internet connection and changed to test it but nothing changes.
Here's my code ;
import requests
from bs4 import BeautifulSoup
from requests.exceptions import ReadTimeout
link="https://www.exampleurl.com"
header={
"accept-language": "tr,en;q=0.9,en-GB;q=0.8,en-US;q=0.7",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36 Edg/99.0.1150.36'
}
r=requests.get(link)
print(r.status_code)
When I execute this command, nothing appears, don't know why. If someone can help me I will be so glad.
you can use request.head(link) like below:
r=requests.head(link)
print(r.status_code)
I get the same problem. The get() never returns.
Since you have created a header variable I thought about using that:
r = requests.get(link, headers=header)
Now I get status 200 returned.

Request returns 200 in python, but an exact copy in Node.js returns 401 - What's wrong?

I am trying to send a GET request to an API using Node.JS. I don't have control over the server side. The API requires two things to be authenticated. I am getting those two values by logging in manually and then copying them over from chrome to my script.
A cookie
The user-agent that was used to perform the login
While this whole thing used to work a couple weeks or months ago, I now kept getting a status 401 (unauthorized). I asked a friend for help, who isn't a pro in node, but pretty good with python. He tried to build the same request with python and to our both surprise, it works perfectly fine.
So here I am, having two scripts that are supposed to do an absolutely identical action, but both have a different outcome. The request headers are both identical - since the python request works fine, it's also confirmed, that these are valid and enough to authenticate the request. They are both running on the same machine under Windows 10.
Script in Node.JS (returns a 401 - unauthorized):
const request = require("request");
const url = "https://api.rollbit.com/steam/market?query&order=1&showTradelocked=false&showCustomPriced=true&min=0&max=4294967295"
const headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
'Cookie': '__Secure-RollbitSession=JWDEFp________HaLLfT'
}
request.get(url, {headers: headers, json: true}, function(err, resp, body){
console.log(" > Response status in JS: " + resp.statusCode)
})
Same script in Python (returns a 200 - success):
import requests
url = "https://api.rollbit.com/steam/market?query&order=1&showTradelocked=false&showCustomPriced=true&min=0&max=4294967295"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
'Cookie': '__Secure-RollbitSession=JWDEFp________HaLLfT'
}
r = requests.request("GET", url, headers=headers)
print(" > Response status: in PY:", r.status_code)
Things I've tried:
I intercepted both requests in the scripts above with http toolkit to see if python is adding something to the headers.
Node.JS request - returned 401
Python request - returned 200
As seen in the intercepted results, python is adding some accept-encoding and accept headers. I tried to copy the FULL exact same headers python is sending into my node.js script, but I still get the same result (401) even though the (once again) intercepted requests now look identical.
I'm on the newest python and tried node 10.x, 12.18.0 and also the latest release.
At this point I don't know what to try any more. I don't really need it, but its completely bugging me that it isn't working for mysterious reasons and I would really like to find out what is happening.
Thank you!

Get request with Python requests returning different page than seen in browser

I have been trying to do a get request on a YouTube video page in order to read simple information off of the page. I have done this many times before, and generally it is quite easy to reverse engineer a get request with the help of Google Chrome's developer tools.
To demonstrate, here is a screen shot of the request I get when I reload a YouTube video in a fresh incognito window (to prevent cookies from being sent) as seen from the developer menu:
chrome screenshot
Every time I close the window and reload the page I recieve nearly identical HTML (apart from authorization keys and the like), the bottom of which can seen here: another chrome screenshot
First I tried recreating this request using a header-less get with Requests in Python:
import requests
sesh = requests.Session()
print sesh.get("https://www.youtube.com/watch?v=5eA8IVrQWn8").content
this returns a different page which still contains some of the data present on the page I get from chrome but not nearly all of it. Next I tried including all the headers I saw in the chrome request using the following code:
import requests
sesh = requests.Session()
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"accept-encoding": "gzip, deflate, br",
"accept-language":"en-US,en;q=0.8",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"}
print sesh.get("https://www.youtube.com/watch?v=5eA8IVrQWn8", headers = headers).content
However this very strangely returns a seemingly random quick paragraph of unicode characters in varying lengths, sometimes around 10 characters long, sometimes closer to 50. I couldn't think of any other ways to make this closer to the request I was seeing from chrome. I tried fiddling with this for a couple of hours doing things like running the request multiple times in the same session and messing with the headers a bit, but to no avail.
FInally out of desperation I tried dropping everything except the user agent, using the following code:
import requests
sesh = requests.Session()
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"}
print sesh.get("https://www.youtube.com/watch?v=5eA8IVrQWn8", headers = headers).content
And this got me the page I wanted.
However I am left unsatisfied with the knowledge that somehow replicating the Get I was seeing in chrome didn't work. What am I missing from my second attempt?

Categories