python requests.get() keep getting 401 - python

The task I want to complete is very simple. To do a http get request using python.
Below is the code I used:
url = 'http://www.costcobusinessdelivery.com/AjaxWarehouseBrowseLookupView?storeId=11301&catalogId=11701&langId=-1&parentGeoNode=10112'
requests.get(url)
Then I got:
<Response [401]>
I am new to python, can someone help? Thanks!
Update:
Based on the comments. It seems the code is okay, but I do get the 401 response. I doubt my company's network has some restrictions? But I can access and get a valid response through a browser. Is there a way to bypass my company's firewall/proxy or whatever? Just to pretend that I am using a browser in python? Thanks again!

If your browser is accessing the web via a proxy server, look that up on your browser settings and use that in python.
r = requests.get(url,
proxies={"http": "http://61.233.25.166:80"})
your proxy server will have a different address.

Related

urllib.error.HTTPError: HTTP Error 403: Forbidden with urllib.requests

I am trying to read an image URL from the internet and be able to get the image onto my machine via python, I used example used in this blog post https://www.geeksforgeeks.org/how-to-open-an-image-from-the-url-in-pil/ which was https://media.geeksforgeeks.org/wp-content/uploads/20210318103632/gfg-300x300.png, however, when I try my own example it just doesn't seem to work I've tried the HTTP version and it still gives me the 403 error. Does anyone know what the cause could be?
import urllib.request
urllib.request.urlretrieve(
"http://image.prntscr.com/image/ynfpUXgaRmGPwj5YdZJmaw.png",
"gfg.png")
Output:
urllib.error.HTTPError: HTTP Error 403: Forbidden
The server at prntscr.com is actively rejecting your request. There are many reasons why that could be. Some sites will check for the user agent of the caller to make see if that's the case. In my case, I used httpie to test if it would allow me to download through a non-browser app. It worked. So then I simply reused made up a user header to see if it's just the lack of user-agent.
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'MyApp/1.0')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(
"http://image.prntscr.com/image/ynfpUXgaRmGPwj5YdZJmaw.png",
"gfg.png")
It worked! Now I don't know what logic the server uses. For instance, I tried a standard Mozilla/5.0 and that did not work. You won't always encounter this issue (most sites are pretty lax in what they allow as long as you are reasonable), but when you do, try playing with the user-agent. If nothing works, try using the same user-agent as your browser for instance.
I had the same problem and it was due to an expired URL. I checked the response text and I was getting "URL signature expired" which is a message you wouldn't normally see unless you checked the response text.
This means some URLs just expire, usually for security purposes. Try to get the URL again and update the URL in your script. If there isn't a new URL for the content you're trying to scrape, then unfortunately you can't scrape for it.

Star Citizen Python API Request

I have worked with a few API's, but not sure how to get started with sending requests for Star Citizen. Does anyone know how you might go about using python to send a get request for say getting some data on game items. Here is their official API documentation but not sure where to start!
https://starcitizen-api.com/gamedata.php#get-items
Could anyone post an example get request that return data?
from the docs, the urls seems to be /xxxxxxxx/v1/gamedata/get/3.6.1/ship?name=Avenger or some such where i guess the xxx is your personal key or account or whatever
try this:
import requests
url = '/xxxxxxxx/v1/gamedata/get/3.6.1/ship?name=Avenger'
response = requests.get(url, verify = False)
contents = response.json()
just make sure the url is complete, should work the same for any web API really
EDIT:
from the docs it looks like the url should look like this (since the host is listed as Host: api.starcitizen-api.com
https://api.starcitizen-api.com/xxxxxxx/v1/gamedata/get/3.6.1/ship?name=Avenger

Generating a Cookie in Python Requests

I'm relatively new to Python so excuse any errors or misconceptions I may have. I've done hours and hours of research and have hit a stopping point.
I'm using the Requests library to pull data from a website that requires a login. I was initially successful logging in through through a session.post,(payload)/session.get. I had a [200] response. Once I tried to view the JSON data that was beyond the login, I hit a [403] response. Long story short, I can make it work by logging in through a browser and inspecting the web elements to find the current session cookie and then defining the headers in requests to pass along that exact cookie with session.get
My questions is...is it possible to set/generate/find this cookie through python after logging in? After logging in and out a few times, I can see that some of the components of the cookie remain the same but others do not. The website I'm using is garmin connect.
Any and all help is appreciated.
If your issue is about login purposes, then you can use a session object. It stores the corresponding cookies so you can make requests, and it generally handles the cookies for you. Here is an example:
s = requests.Session()
# all cookies received will be stored in the session object
s.post('http://www...',data=payload)
s.get('http://www...')
Furthermore, with the requests library, you can get a cookie from a response, like this:
url = 'http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies
But you can also give cookie back to the server on subsequent requests, like this:
url = 'http://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
I hope this helps!
Reference: How to use cookies in Python Requests

Sniffing http then using python requests to make same call

Instagram insights isn't offered on desktop so in order to get round this I set up mitm proxy and have sniffed the http requests that take place between instagram and my phone when I log in to insights.
How do I re-engineer this so I can use the same command with the requests library in Python?
This is what I've tried so far... but I keep getting a 'failed' due to not being logged in.
import json
headers = {"Content-Type": "application/json"}
payload = {"username":"souXXX","password":"XXXX","_csrftoken":"XXXXXXn2fIQfrQXXXXXn"}
r = requests.post("https://i.instagram.com/api/v1/accounts/login/", data=payload, headers=headers)
I've spent aaaages trying to get this work, and I'm not really sure what I'm doing wrong. As such, I'd love some advice/guidance.
Thanks in advance!

scraping website on web proxy using python

I am working on scraping databases that I have access to using the duke library web proxy. I encountered the issue that since the data base is accessed through a proxy server, I can't directly scrape this database as I would if the database was did not require proxy authentication.
I tried several thing:
I wrote one script that logs into the duke network (https://shib.oit.duke.edu/idp/AuthnEngine').
I then hardcode in my login data:
login_data = urllib.urlencode({'j_username' : 'userxx',
'j_password' : 'passwordxx',
'Submit' : 'Enter'
})
I then login:
resp = opener.open('https://shib.oit.duke.edu/idp/AuthnEngine', login_data)
and then I create a cookie jar object to hold the cookies from proxy website.
then i try to access the database with my script and it is still telling me authentication is required. I wanted to know how I can get around the authentication required for the proxy server.
If you have any suggestions please let me know.
Thank you,
Jan
A proxy login does not store cookies but instead uses the Proxy-Authorization header. This header will need to be sent with every request similar to Cookies. The header is of the same format as regular Basic Authentication, although there are different formats possible (Digest, NTLM.) I suggest you check the headers of a normal login and copy and paste the Proxy-Authorization header that was sent.

Categories