Python trying to send request with requests library but nothing happened? - python

Like the title said, im trying to send request a url using requests with headers, but when I try to print the status code it doesn't print anything in the terminal, I checked my internet connection and changed to test it but nothing changes.
Here's my code ;
import requests
from bs4 import BeautifulSoup
from requests.exceptions import ReadTimeout
link="https://www.exampleurl.com"
header={
"accept-language": "tr,en;q=0.9,en-GB;q=0.8,en-US;q=0.7",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36 Edg/99.0.1150.36'
}
r=requests.get(link)
print(r.status_code)
When I execute this command, nothing appears, don't know why. If someone can help me I will be so glad.

you can use request.head(link) like below:
r=requests.head(link)
print(r.status_code)

I get the same problem. The get() never returns.
Since you have created a header variable I thought about using that:
r = requests.get(link, headers=header)
Now I get status 200 returned.

Related

scraping yell with python requests gives 403 error

I have this code
from requests.sessions import Session
url = "https://www.yell.com/s/launderettes-birmingham.html"
s = Session()
headers = {
'user-agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36",
}
r = s.get(url,headers=headers)
print(r.status_code)
but I get 403 output, instead 200
I can scrape this data with selenium, but is there a way to scrape this with requests
If you modify your code like so:
print(r.text)
print(r.status_code)
you will see, that the reason you are getting a 400 error code is due to yell using Cloudflare browser check.
As it uses javascript, there is no way to reliably use the requests module.
Since you mentioned you are going to use selenium, make sure to use the undetected driver package
Also, be sure to rotate your IP to avoid getting your IP blocked.

KeyError: 'data' error while parsing Subreddit JSON

I am trying to read title of latest posts from r/chrome subreddit using Python.
But when I execute the python file, I get the KeyError: 'data' error
Here's my code:
import json, requests
def getReddit():
redditLatest = requests.get('https://www.reddit.com/r/chrome/new/.json').json()
print(redditLatest['data']['children'][0]['data']['title'])
getReddit()
Terminal:
Please help with a solution.
As Mikhail Beliansky already mentioned, debug your response.
import requests
redditLatest = requests.get('https://www.reddit.com/r/chrome/new/.json').json()
print(redditLatest)
# {'message': 'Too Many Requests', 'error': 429}
You can see that reddit recognizes that you are not a "normal" client/browser. Especially because requests adds a user-agent like "python-requests/2.25.1".
You can add a common browser user-agent to your request. If you don't make too many requests, this may work for you.
redditLatest = requests.get(
'https://www.reddit.com/r/chrome/new/.json',
headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36'}
).json()
print(redditLatest)
# {'kind': 'Listing', 'data': {...}}

Access denied [403] when accessing site with BeautifulSoup python

I want to scrape https://www.jdsports.it/ using BeautifulSoup but I get access denied.
On my pc I don't get any problem accessing the site and I'm using the same user agent of the Python program but on the program the result is different, you can see the output below.
EDIT:
I think I need cookies to gain access to the site. How can I get them and use them to access the site with the python program to scrape it?
-The script works if I use "https://www.jdsports.com" that's the same site but with different region.
Thanks!
import time
import requests
from bs4 import BeautifulSoup
import smtplib
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}
url = 'https://www.jdsports.it/'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
soup.encode('utf-8')
status = soup.findAll.get_text()
print (status)
The output is:
<html><head>
<title>Access Denied</title>
</head><body>
<h1>Access Denied</h1>
You don't have permission to access "http://www.jdsports.it/" on this server.<p>
Reference #18.35657b5c.1589627513.36921df8
</p></body>
</html>
>
python beautifulsoup user-agent cookies python-requests
Suspected HTTP/2 at first, but wasn't able to get that working either. Perhaps you are more lucky, here's a HTTP/2 starting point:
import asyncio
import httpx
import logging
logging.basicConfig(format='%(message)s', level=logging.DEBUG)
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
}
url = 'https://www.jdsports.it/'
async def f():
client = httpx.AsyncClient(http2=True)
r = await client.get(url, allow_redirects=True, headers=headers)
print(r.text)
asyncio.run(f())
(Tested both on Windows and Linux.) Could this have something to do with TLS1.2? That's where I'd look next, as curl works.

Unable to login to twitter using python requests library

I have tried to login to the twitter account using requests library. But I am getting url response as "400". It is not working. I used all the required payload parameters and headers. But still, I am unable to figure out how to login.
import requests
from bs4 import BeautifulSoup
payload={
"session[username_or_email]":"***************",
"session[password]":"*************",
"authenticity_token":"*************",
"ui_metrics":'{"rf":{"a4f2d7a8e3d9736f0815ae7b34692191bca9f114a7d5602c7758a3e6087b6b30":0,"ad92fc8b83fb5dec3f720f89a7f0fb415a26130516362f230b02251edd96a54a":0,"a011babb5c5df598f93bcc4a38dfad0276f69df36faff48eea95bac67cefeffe":0,"a75214752b7e90fd50725fce21cc26761ef3613173b0f8764d52c8b53f136bbf":0},"s":"mTArUSdNtTOm6WaGwNeRjMAU3EhNA3VGbFeCIZeEkjjLTAbccFDTJjcTEB2tQ9iuNJUzniFKyvhZNOGdH1LIwmi1YSMcFTOHu2Wi49yKvONv0obfg1dW27znR_C2n-ev2zMvN5166j1ccsxWKIheiWw-eHM7oXA54U40cWHvdCrunJJKj2INkTrcVph-y2fccu1m3hp31vngqBiL-XmeLWYiyZ-NYOmV8f5iXW9WWMvISTcSwzz9vd_n9-tLSKociT-1ap5ZVFWNUWIycSflj8WcOmmRFzq4kwa-NsS0FRp-DQ2FOkozhhhQi9HDvSODUlGsdQWBPkGKKtDWbtnj9gAAAWEty4Xv"}',
"scribe_log":"",
"redirect_after_login":"",
"authenticity_token":"******************",
"return_to_ssl":"",
"remember_me":"1",
"lang":""
}
headers={
"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"accept-encoding":"gzip, deflate, br",
"accept-language":"en-US,en;q=0.9",
"cache-control":"max-age=0",
"cookie":'moments_profile_moments_nav_tooltip_self=true; syndication_guest_id=v1%3A150345116906281638; eu_cn=1; kdt=QErLcBT9OjM5gjEznmsRcHlMTK6biDyAw4gfI5ro; remember_checked_on=1; _ga=GA1.2.1923324433.1496571570; tfw_exp=0; _gid=GA1.2.106381927.1516638134; __utma=43838368.1923324433.1496571570.1516764481.1516764481.1; __utmz=43838368.1516764481.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); lang=en; ct0=7ceea26f7fd3d186152512d26365cddf; _twitter_sess=BAh7CiIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNo%250ASGFzaHsABjoKQHVzZWR7ADoPY3JlYXRlZF9hdGwrCL8wyy1hAToMY3NyZl9p%250AZCIlNjJjODQ1MjZiZWQzOGUyODZlOWUxNmNkMWJhZTZjYjc6B2lkIiU4MmZm%250AYWQ3Mzc1OGFhNmJjOTIxZjlmOGEyMzk3MjE1NToJdXNlcmwrCQAAVbhKiEIN--32d967262e1de8852d20ace15ec93d87b9a902a8; personalization_id="v1_snKt6bqCONQsnFuE8EOZDA=="; guest_id=v1%3A151689245583269291; _gat=1; ads_prefs="HBERAAA="; twid="u=955475925457502208"; auth_token=50decb38f16f3c264f480b0cd1cc30a9bcce9f08',
"referer":"https://twitter.com/login",
"upgrade-insecure-requests":"1",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}
res = requests.get("https://twitter.com/login",data=payload,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
print(res.status_code)
print(res.url)
for item in soup.find_all(class_="title"):
print(item.text)
How to login to twitter? what all parameters did i miss? Please help me out with this.
Note: I am not using APIs or selenium driver. I want to do it using requests library. Please help me. Thanks in Advance.
You're using the GET method to access an auth endpoint, usually the POST method is used for such purposes, try using requests.post instead of requests.get.

Still cannot access web-site by POST

I would like to get store info from the web-site(http://www.hilife.com.tw/storeInquiry_street.aspx).
The method I found by chrome is POST.
By using below method, I still cannot access.
Could someone give me a hint?
import requests
from bs4 import BeautifulSoup
head = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'
}
payload = {
'__EVENTTARGET':'AREA',
'__EVENTARGUMENT':'',
'__LASTFOCUS':'',
'__VIEWSTATE':'/wEPDwULLTE0NjI2MjI3MjMPZBYCAgcPZBYMAgEPZBYCAgEPFgIeBFRleHQFLiQoJyNzdG9yZUlucXVpcnlfc3RyZWV0JykuYXR0cignY2xhc3MnLCdzZWwnKTtkAgMPEA8WBh4NRGF0YVRleHRGaWVsZAUJY2l0eV9uYW1lHg5EYXRhVmFsdWVGaWVsZAUJY2l0eV9uYW1lHgtfIURhdGFCb3VuZGdkEBUSCeWPsOWMl+W4ggnln7rpmobluIIJ5paw5YyX5biCCeWunOiYree4ownmlrDnq7nnuKMJ5qGD5ZyS5biCCeiLl+agl+e4ownlj7DkuK3luIIJ5b2w5YyW57ijCeWNl+aKlee4ownlmInnvqnnuKMJ6Zuy5p6X57ijCeWPsOWNl+W4ggnpq5jpm4TluIIJ5bGP5p2x57ijCemHkemWgOe4ownmlrDnq7nluIIJ5ZiJ576p5biCFRIJ5Y+w5YyX5biCCeWfuumahuW4ggnmlrDljJfluIIJ5a6c6Jit57ijCeaWsOeruee4ownmoYPlnJLluIIJ6IuX5qCX57ijCeWPsOS4reW4ggnlvbDljJbnuKMJ5Y2X5oqV57ijCeWYiee+qee4ownpm7LmnpfnuKMJ5Y+w5Y2X5biCCemrmOmbhOW4ggnlsY/mnbHnuKMJ6YeR6ZaA57ijCeaWsOerueW4ggnlmInnvqnluIIUKwMSZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnFgECB2QCBQ8QDxYGHwEFCXRvd25fbmFtZR8CBQl0b3duX25hbWUfA2dkEBUWBuS4reWNgAbmnbHljYAG5Y2X5Y2ABuilv+WNgAbljJfljYAJ5YyX5bGv5Y2ACeilv+Wxr+WNgAnljZflsa/ljYAJ5aSq5bmz5Y2ACeWkp+mHjOWNgAnpnKfls7DljYAJ54OP5pel5Y2ACeixkOWOn+WNgAnlkI7ph4zljYAJ5r2t5a2Q5Y2ACeWkp+mbheWNgAnnpZ7lsqHljYAJ5aSn6IKa5Y2ACeaymem5v+WNgAnmoqfmo7LljYAJ5riF5rC05Y2ACeWkp+eUsuWNgBUWBuS4reWNgAbmnbHljYAG5Y2X5Y2ABuilv+WNgAbljJfljYAJ5YyX5bGv5Y2ACeilv+Wxr+WNgAnljZflsa/ljYAJ5aSq5bmz5Y2ACeWkp+mHjOWNgAnpnKfls7DljYAJ54OP5pel5Y2ACeixkOWOn+WNgAnlkI7ph4zljYAJ5r2t5a2Q5Y2ACeWkp+mbheWNgAnnpZ7lsqHljYAJ5aSn6IKa5Y2ACeaymem5v+WNgAnmoqfmo7LljYAJ5riF5rC05Y2ACeWkp+eUsuWNgBQrAxZnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnFgECBGQCBw8PFgIfAAUJ5Y+w5Lit5biCZGQCCQ8PFgIfAAUG5YyX5Y2AZGQCCw8WAh4LXyFJdGVtQ291bnQCAhYEZg9kFgJmDxUFBEg2NDYP5Y+w5Lit5aSq5bmz5bqXIOWPsOS4reW4guWMl+WNgDQwNOWkquW5s+i3rzcy6JmfCzA0LTIyMjkwOTI4Azg3MGQCAQ9kFgJmDxUFBDQzMTgP5Y+w5Lit5rC45aSq5bqXM+WPsOS4reW4guWMl+WNgDQwNOWkquWOn+i3r+S6jOautTI0MOiZn+S4gOaok+WFqOmDqAswNC0yMzY5MDA1NwM4NzFkZFHxmtQaBu2Yr9cvskfEZMWn57JLRfjPYBFYDy+tHr6X',
'__VIEWSTATEGENERATOR':'B77476FC',
'__EVENTVALIDATION':'/wEdACtWrrgS52/ojbuYEYvRDXHZ2ryV+Ed5kWYedGp5zjHs3Neeeo/9TTvNTdElW+hiVA25mZnLEQUYPOZFLnuVu9jOT+Zq1/xceVgC7GxWRM+A8tOS3xZBjlhgzlx5UN3H3D0UrdtoyeScvRqxFL8L3gGKRyCJu029oItLX7X6c7SW7C7IVzuAeZ6t9kFMeOQus7MtrV7YeOXrlOP8inI96UkaJEU7Ro3FtK29+B+NamR2j4qInKVwJ4+JD3cjWm5buZdnOhT/ISzrljaf+F9GnVjm4dGchVglf1PxMMHl7EEoLjs20TZ856RDCGXvzK/6J+tEFp7zDvFTYGoeHtuHy+YF/IoR/CRFBAaEkys48FIAUCSUKnxACPyW6Ar2guIADjOqYue7v4fhV1jIq65P/lwanoaJpIsboCbjakbTYnqK8BLngMayrRehyT58dmj3SbzY1mOtzSNnakdpUxaC0EpOJ7rhB52A2FKsxy5EbP0PwHHuHNMa9dit0AxPMfYUP1/LWuYPWMX0W8tyEMKxoUcYsCb+qJLF9yXPgM6c8sIQTRxcBokm1PGzFN4M6vnSF8OfFSC+c0frLZ4GH6l497B/5oDIjq7Bz4/cPeGCavvh9NUqPcmzJIr8Abx9vjtMGpZSwBdVY3bR/ARswIDrmWLt1qMD4jcRvGPxBa8nsRR8HNdVINbR+iOSFLwVhBCg+s+mV5NeTdOKvAeggfOsJHmJKL0ApQSCyjY5kEiOvo2JAI07C08ENIFF7HpDTaGCi93i2WnmdDrYoaoLZi96dRTlk4xoWV9tc7rd9X/wE6QoKHxFtADSz9WkgtbUn88lAhY2++OiqWCaQZobh7K26ndH1z34JXVB7C/AiOEV+CCb97oVyooxWullV44iFQ0isVBjYC1XWS3eGf1PwMS++A+EjQTkl9VJhIRDoS6sg2mD7mikimBjQGvZX/lcYtKSrjY=',
'CITY':'台中市',
'AREA':'北區'
}
res = requests.post('http://www.hilife.com.tw/storeInquiry_street.aspx', data=payload, headers = head)
res.encoding = 'utf-8'
print res.text
I see that you are missing this : Content-Type:application/x-www-form-urlencoded, you have to send a header like this as well as send data in x-www-form-urlencoded format . I recommend using POSTMAN for testing before writing the code. Also make sure you have relevant permission before crawling third party website. Happy Crawling.

Categories