Input
import requests
from http import cookiejar
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64;rv:57.0) Gecko/20100101 Firefox/57.0'}
url = "http://www.baidu.com/"
session = requests.Session()
req = session.put(url = url,headers=headers)
cookie = requests.utils.dict_from_cookiejar(req.cookies)
print(session.cookies.get_dict())
print(cookie)
Gives output:
{'BAIDUID': '323CFCB910A545D7FCCDA005A9E070BC:FG=1', 'BDSVRTM': '0'}
{'BAIDUID': '323CFCB910A545D7FCCDA005A9E070BC:FG=1'}
as here.
I try to use this code to get all cookies from the Baidu website but only return the first cookie. I compare it with the original web cookies(in the picture), it has 9 cookies. How can I get all the cookies?
You didn't maintain your session, so it terminated after the second cookie.
import requests
from http import cookiejar
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64;rv:57.0) Gecko/20100101 Firefox/57.0'}
url = "http://www.baidu.com/"
with requests.Session() as s:
req = s.get(url, headers=headers)
print(req.cookies.get_dict())
>> print(req.cookies.get_dict().keys())
>>> ['BDSVRTM', 'BAIDUID', 'H_PS_PSSID', 'BIDUPSID', 'PSTM', 'BD_HOME']
Related
I am trying to Access a comfort panel with windows CE from Windows 10 and get a audittrail.cvs with python. After logging in using a username and password, I try to download the csv file with the audit trail info but the csv containing the HTML info gets downloaded instead. How do I download the actual file?
import requests
from bs4 import BeautifulSoup
loginurl = ("http://10.70.148.11/FormLogin")
secure_url = ("http://10.70.148.11/StorageCardSD?UP=TRUE&FORCEBROWSE")
downloadurl = ("http://10.70.148.11/StorageCardSD/AuditTrail0.csv?UP=TRUE&FORCEBROWSE")
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
payload = {
'Login': 'Admin',
'Password': 'Pass'
}
with requests.session() as s:
s.post(loginurl, data=payload)
r = s.get(secure_url)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())
req = requests.get(downloadurl, headers=headers, allow_redirects=True)
url_content = req.content
csv_file = open('audittrail.csv', 'wb')
csv_file.write(url_content)
csv_file.close()
When you try to get the file, you are no longer in the requests session and therefore do not have the necessary cookies.
Try making the requests with your session logged in (the requests session).
It should work.
I am trying to replicate ajax request from a web page (https://droughtmonitor.unl.edu/Data/DataTables.aspx). AJAX is initiated when we select values from dropdowns.
I am using the following request using python, but not able to see the response as in Network tab of the browser.
import bs4
import requests
import lxml
ses = requests.Session()
ses.get('https://droughtmonitor.unl.edu/Data/DataTables.aspx')
headers_dict = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
url = 'https://droughtmonitor.unl.edu/Ajax2018.aspx/ReturnTabularDMAreaPercent_urban'
req_data = {'area':'00064', 'statstype':'1'}
resp = ses.post(url,data = req_data,headers = headers_dict)
soup = bs4.BeautifulSoup(resp.content,'lxml')
print(soup)
You need to add several things to your request to get an Answer from the server.
You need to convert your dict to json to pass it as string and not as dict.
You also need to specify the type of request-data by setting the request header to Content-Type:application/json; charset=utf-8
with those changes I was able to request the correkt data.
import bs4
import requests
ses = requests.Session()
ses.get('https://droughtmonitor.unl.edu/Data/DataTables.aspx')
headers_dict = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'Content-Type': 'application/json; charset=utf-8'}
url = 'https://droughtmonitor.unl.edu/Ajax2018.aspx/ReturnTabularDMAreaPercent_urban'
req_data = json.dumps({'area':'00037', 'statstype':'1'})
resp = ses.post(url,data = req_data,headers = headers_dict)
soup = bs4.BeautifulSoup(resp.content,'lxml')
print(soup)
Quite a tricky problem I must say.
From the requests documentation:
Instead of encoding the dict yourself, you can also pass it directly
using the json parameter (added in version 2.4.2) and it will be
encoded automatically:
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> r = requests.post(url, json=payload)
Then, to get the output, call r.json() and you will get the data you are looking for.
I want to login 'douban.com' with python session
import requests
url = 'https://www.douban.com/'
logurl = 'https://accounts.douban.com/passport/login_popup'
data = {'username': 'abc#gmail.com',
'password': 'abcdef'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/80.0.3987.87 Safari/537.36'}
se = requests.session()
request = se.post(logurl, data=data, headers=headers)
request1 = se.get(url)
print(request1.content)
this display "b''",I don't get this work or not!
You're getting an empty response, meaning the request is not working properly. You want to debug the request further by looking into Your response. Check Requests lib documentation. request1.status_code and request1.headers might interest you.
The b'' is just a Bytes literal prefix. Python 3 documentation:
I'm using python requests library and I'm trying to login to https://www.udemy.com/join/login-popup/, the problem is when I use the following header:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
it returns CSRF verification failed. Request aborted.
When I change it to:
headers = {'Referer': url}
it returns Please verify that you are a human.
any suggestions?
My code:
import requests
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/74.0.3729.131 Safari/537.36'}
request = s.get(url, headers=headers)
cookies = dict(cookies=request.cookies)
csrf = request.cookies['csrftoken']
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail',
'password': 'maypassword'}
request = s.post(url, data=data_login, headers={'Referer': url}, cookies=cookies['cookies'])
print(request.content)
There are a couple of issues with your current code:
The header you are using is missing a few things
The value that you are passing for csrfmiddlewaretoken isn't correct
As you're using requests.session() you shouldn't include cookies manually (in this case)
Try this code:
import requests
from bs4 import BeautifulSoup
with requests.session() as s:
url = 'https://www.udemy.com/join/login-popup/'
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0", "Referer": "https://www.udemy.com/join/login-popup/", "Upgrade-Insecure-Requests": "1"}
request = s.get(url, headers=headers)
soup = BeautifulSoup(request.text, "lxml")
csrf = soup.find("input",{"name":"csrfmiddlewaretoken"})["value"]
data_login = {'csrfmiddlewaretoken': csrf, 'locale': 'en_US', 'email': 'myemail#test.com','password': 'maypassword'}
request = s.post(url, data=data_login, headers=headers)
print(request.content)
(PS: I'm using the Beautifulsoup library in order to find the value of csrfmiddlewaretoken)
Hope this helps
I am using python 3.5.2. I want to scrap a webpage where cookies are required. But when I use requests.session() the cookies maintained in the session are not updated, thus my scraping failed constantly. Following is my code snippet.
import requests
from bs4 import BeautifulSoup
import time
import requests.utils
session = requests.session()
session.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"})
print(session.cookies.get_dict())
url = "http://www.beianbaba.com/"
session.get(url)
print(session.cookies.get_dict())
Do you guys have any idea about this?Thank you so much in advance.
It seems like that website request is not providing any cookies. I used the exact same code but requested for https://google.com:
import requests
session = requests.Session()
session.headers.update({"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0"})
print(session.cookies.get_dict())
url = "http://google.com/"
session.get(url)
print(session.cookies.get_dict())
And got this output:
{}
{'NID': 'a cookie that i removed'}