Logging into Facebook with requests Python 2020 - python

Hello I would like to make a bot that automatically logs into facebook and makes a post on a specific group.I think I will use selenium to make a post, which will be easy so I am just asking for help with the first part. I have problems because some form data from network developer tool tab is hidden and not displayed in websites html and I don't know how to find it. Here is my code so far:
import requests
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
}
data = {
'email': '----------',
'pass': '---------',
'timezone': '-60',
'locale': 'pl_PL',
'next': 'https://www.facebook.com/',
'login_source': 'login_bluebar',
'prefill_contact_point': '512250794',
'prefill_source': 'browser_onload',
'prefill_type': 'password',
'skstamp': 'eyJoYXNoIjoiYThiN2EyOTMwNTJhZTUzODg0YjZiNWNlOWQ1NzZjZjUiLCJoYXNoMiI6IjQ3ZWI4M2U1ZjVmYTQxMTQ4MDIxYWVlZTgzNTk3ZWJmIiwicm91bmRzIjo1LCJzZWVkIjoiYjU0NWE4MzczOTgwYTZhODViZjUzYmE3ZmM0OWIyOWYiLCJzZWVkMiI6IjdiNTU0NzBjM2M5NjlhMTY3YmZkZmIwZjE5ODlmNDdhIiwidGltZV90YWtlbiI6ODA3OTAsInN1cmZhY2UiOiJsb2dpbiJ9'
}
with requests.Session () as s:
url = 'https://www.facebook.com/'
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
data['jazoest'] = soup.find('input', attrs={'name': 'jazoest'})['value']
data['lsd'] = soup.find('input', attrs={'name': 'lsd'})['value']
data['lgnrnd'] = soup.find('input', attrs={'name': 'lgnrnd'})['value']
data['lgndim'] = soup.find('input', attrs={'name': 'lgndim'})['value']
data['ab_test_data'] = soup.find('input', attrs={'name': 'ab_test_data'})['value']
data['lgnjs'] = soup.find('input', attrs={'name': 'lgnjs'})['value']
data['guid'] = soup.find('input', attrs={'name': 'guid'})['value']
data['lgndim'] = soup.find('input', attrs={'name': 'lgndim'})['value']
r = s.post(url, data=data, headers=headers)
print(r.content)
I would be very happy if someone could help me with it. Is there a better way to do such things in 2020? Yes I know that there were made some posts about logging into facebook with requests with bs4 but they are from 2018 and I think that facebook changed a lot, like some headers disappear or change their name after each time I log in.

You can use the Facebook API, which is available at developers.facebook.com
Rather than using a third-party library, you could post directly to the group using the API(see here for more details)

Related

requests returning HTTP 404 when logging in after POST

I am trying to scrape this course review website for my college, but to do so I need to log in. I think I'm doing everything right in the login process:
The payload is complete with all of the relevant information. I used inspect element and network to verify that I hadn't missed any input fields and get_authenticity_token is successfully scraping the relevant string.
Maybe I'm doing something wrong in my header? I just copied someone else's code for that. Might not even need a header.
import requests
from bs4 import BeautifulSoup
session = requests.Session()
session.headers = {'User-Agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36')}
payload = {'email':'person#email.com',
'password':'secret',
'utf8':'✓',
'commit': 'Sign In'
}
def get_authenticity_token(html):
soup = BeautifulSoup(html, "html.parser")
token = soup.find('input', attrs={'name': 'authenticity_token'})
if not token:
print('could not find `authenticity_token` on login form')
return token.get('value').strip()
s = session.get("https://pomonastudents.org/login")
payload.update({
'authenticity_token': get_authenticity_token(s.text)
})
s = session.post("https://pomonastudents.org/login", data=payload)
print(s.text)
print(payload)
Why might this not be working? What steps can I take to investigate possible causes?
edit: fixed awkward wording and added last sentence.
The following is how I meant. Try this:
import requests
from bs4 import BeautifulSoup
payload = {
'utf8': '✓',
'authenticity_token': '',
'email': 'person#email.com',
'password': 'secret',
'commit': 'Sign In'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
s.headers['X-Requested-With'] = 'XMLHttpRequest'
res = s.get("https://pomonastudents.org/login")
soup = BeautifulSoup(res.text, "html.parser")
payload['authenticity_token'] = soup.select_one("[name='authenticity_token']")["value"]
s.headers['X-CSRF-Token'] = soup.select_one("[name='csrf-token']")["content"]
resp = s.post('https://pomonastudents.org/login/credentials',data=payload)
print(resp.status_code)

How do I login into a web page using Python requests? Having problems with headers [duplicate]

This question already has answers here:
Logging in to LinkedIn with python requests sessions
(5 answers)
Closed 2 years ago.
Im trying to log into a LinkedIn, using Python, to get some data from there.
But after 6 hours of sending request im getting the same "You must be authenticated to access this page." response.
Im guessing the problem is with the headers. But I wasnt able to make it work.
Here is what I have came up with :
import requests
from bs4 import BeautifulSoup
payload = {
'session_key' : EMAIL,
'session_password' : PASSWORD
}
headerSet = {
'content-type' : 'appplication/x-www-form-urlencoded',
'user-agent' : 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Mobile Safari/537.36'
}
feed = 'https://www.linkedin.com/feed/'
url = 'https://www.linkedin.com/login/checkpoint/lg/login-submit'
with requests.Session() as s:
p = s.post(url, data=payload, headers=headerSet)
print(p.text)
r = s.get(url)
soup = BeautifulSoup(r.content)
print(soup.prettify())
def Login():
session = requests.Session() # Creating a session to contain the cookies
url = 'https://www.linkedin.com/login' # the url that contains the post paramters
Get_Headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en,en-US;q=0.9,ar;q=0.8",
}
main_html = session.get(url,headers=Get_Headers,verify=False).text
soup = BeautifulSoup(main_html, 'html.parser')
csrfToken = soup.find('input', {'name': 'csrfToken'}).get('value')
sIdString = soup.find('input', {'name': 'sIdString'}).get('value')
controlId = soup.find('input', {'name': 'controlId'}).get('value')
parentPageKey = soup.find('input', {'name': 'parentPageKey'}).get('value')
trk = soup.find('input', {'name': 'trk'}).get('value')
session_redirect= soup.find('input', {'name': 'session_redirect'}).get('value')
loginCsrfParam = soup.find('input', {'name': 'loginCsrfParam'}).get('value')
pageInstance = soup.find('input', {'name': 'pageInstance'}).get('value')
pageInstance = soup.find('input', {'name': 'pageInstance'}).get('value')
Post_Data = {
'csrfToken': csrfToken,
'session_key': 'email here',
'ac': 0,
'sIdString': sIdString,
'controlId': controlId,
'parentPageKey': parentPageKey,
'pageInstance': pageInstance,
'trk': trk,
'session_redirect': session_redirect,
'loginCsrfParam': loginCsrfParam,
# fp_data: {"X-kcnN2cez-f":"AwIHlXRpAQAAImHbUFfTzoZj1Eyqi8fC6QjI9Fmx9REX3bPOsas9PjU3F-c3AX8AAAGLr4YowLkAAAAAAAAAAA==","X-kcnN2cez-b":"ioxn52","X-kcnN2cez-c":"AwIHlXRpAQAAImHbUFfTzoZj1Eyqi8fC6QjI9Fmx9REX3bPOsas9PjU3F-c3AX8AAAGLr4YowLkAAAAAAAAAAA==","X-kcnN2cez-d":"o_0","X-kcnN2cez-z":"p","X-kcnN2cez-a":"mERr=S3cqxiJTs013LppL2FX-hrNIyGSTPqoG0q9y2ey212-BAlvKgqkYI6DpubzA04LyKCzxs2aFXZUeCAqbA0J8J19gVsvAH2GCVF=IJRL=GgTo4cYMse=XnxKaUPD_DlqHMCpiUqbz2B95L01XLgTpy4Aj2B4KHkAIwsunKJxLFUSjHpFH-btFhvtVzMMRCnvXdInUvjHwUu0Tgq21BAa42HXiSiRQ7BPDmBu00HQ3ZZqtA5Y2BH4Nwl-ZA=tYsafqRABOLNYoyz4R-LqodY3vKY6qHRhyZBDjLJb65seebi0Ijd572XGjgj-ItgIL5_KsCHZdIVHxxSPfIZTqhD057fzfjeau-E5bv9EHuQuDQEfM1Dui3XGfFanxC14h1kQG6ZNT05Ql25vOrqYaBfPLV6cb4JaG78rBQUiYo5FDNtTF8AbXnk-cL0f6kP-=Va_v0yNf5HQKlyPj7qFq2gcuS6vQ8rb8j1rF9ATI=_hpOvAiLu7wGrhC9ZhN4apxgeBHD_l6nnDaGliSxO6sfnLu7E6zDoP3PsEg7DINdnnbbQfy_2dXGBAjNHPLc8J6TE17MTbEUPGXP=_RrGcwxhBO6gQFc9L=eaoOLVokoD1I08DjqUu2zqd_G4j05dFIo0zZldut-NkliRiodRbC5a4zdeeCeUVzz5nJ-3=NFeKux0kolH3zrHT5X0BYrzLACtbBKQOBHu8mBGV_vRMfzMHl0Um34fkVHF5pxHVgP5GZ1c7uuxZRA_-MCH9KBQ_23iK-RDvhYtLqk7J1z_EYmXcsBXOKopvlEsmpR0n7vXTN7rMd0EtEXgkT8C1hZKs02qD8dXmUl-3mVkItABd3ajFaBumclJmgnXEsMpaXQJ1Yg2879G4TEKD0N6NKkEMYm663Sz1-UsHikt-qzMVRk2vr_94PvK7=_EBAgyMPLu4AYSOZwPEG9sDeJN8cleK61qqZp24oBq=lY0g3xzU6SKM_p_sXOdlcEuvT4buIQdkFisTHLpGuloYl1sIJM4v=EyTVitXUlGcQ_A3Xc1ANmuZBnaO1IaCddAiyOJu7YwI2X6TrahiydICsB6I64pt=4ei_5mM2cGG7DmjFxuScN4RuO=pipxUCuFFUNiqQLA47o1hhjltK-bHTPo52DynU=NGDymA9Fsla7_rvpQZmLkc95=rgh6UC5eYnmLlj6AF83LuT_sTz8k1jbzTaYtVFzBPF_eqHT99shHAeRa5b8NixbT1YFUUKKcdG78h8h_sRo30c47ZSPvef7Uwlpdl__LwRy0IodrrbYz0F-apuhRRobfCANnkkQxhhUGXJF9P9BV3fsm1O-agwiw=JR-xM-jTmmaM-IwL97b8NE=hjVlOraSsKQp2r1J7Q7mlpvK_kej=iv58Au2mwpA98XUeVANShksAAPdRccHJL16SZ25JIeCoHRg17c9q8vgXQtPAsEi4zgJ7wdB_893GOdyz-JRogEKJ_S-mYlACD7tXRy2NKpiHbnYn-5idv0gt4GNIvd6v0CSXsIX_KFzEQluXvqL=CI3EhmLZN4kpeUo6rBd62s0yqRfVaLgOJafnDwQVS_1nqaP7nk3OOBpFsyZjxe2-CqyeajgYvKGI3_7gGZ1RQv2sSfcGo42GkFeeXbu_UT8VVIhgpX5qF1VEoqy4td1bbbT44u8474_-X4Jd_Y-I36lgYbs3ubT4-NynaDMq0H5tmw1Tld27yxQKpDK_FS64dGCzNRVJntCG1uacVbLBq0I1jfRL2YholfY_F140nHRbxMgaGfYiKnqF5dCaAQ0GwFA1Us8F=uSgMLzIInR36dkhqzwg3im8Bb3febKn-c2Ad3xO3cr10xUGd7F3Uyg79vLhflPyEBunRRXx849Qyh4TVaQ0Dh3B_cZj8UmDuHj9JIUVijsyRJJkz5MlPn-ILq1D3tdColOt-88BY9gM=P7gtac6nnN3icdx9t6NpA852u1pX1chF9aSxL6U=3AVgwoLo07sNla-X8SZ30PEKs5=asFaR_TxGlT3IDgklxeX9bRiZORffxlmFyrXv3gYEZET2RQB5F8mPlFr1--lU4sBzFVYhwPPsldXmGfbHeZCF7nqqKq=gqes7SIjP_hNOrV5lTPNoNvLv23YTT96V=8hzE60qOaJ-fuSSKmy-MrgE_70tGZPA6Rvo84IvpPHK8r_Yhco0Mc07S-SdDML7hQIzl6jMG9CsQkmUICaiRO10E1_8dPfiEi5PX3h=VZNfJR4LoE3w9zobqX1xJFQ2K2OoVXAD0kpk0j80hPib4PXm03R16JbuSdiQZkA34MdjXDdCD32asj4cFxTXA3NHiurr_hvJcrwtsTq6f2z=zxYdTKd2YAErt24t-HZe=NyxRfZJMjkPrem=NopoDdL173jAwSByzdulsZUoQvtByOKJ-Qiu-1N-iTLMZin2hqYP75z8yBdtaavo2ADGxUxkx3lhlx48exvkLznHL7VK4F3gfJf_-hJFoRPrMZdrIczEaT0nCEtnhv51PiT2hCjO4YNQoyc4fyaZ0Sjj6LgMXLlf1uhJrjrEHikljb7jIX46=33RKuxgZ75m21zQy62YM0atrlcvDiv75EpQhXSCp5FSa8FAmTaqAsIp-EqdndgpC=KpKR=A8qM3Bfxn6bu5PUur31joEISDIoY16jLPFH7yLQS00=H45AT79b_Kr4ly29yEq6EMjfgrOYlClpL8Zkh9QoolVq6vzBwlTdgKTTjogqGH8QUVROh59j3XbqBPdu3CUeZURxv4Z_Ux9HZ0ajkeUgMwhyY1ZXQbpzwheVloCwm6JEH-Djw809IbBI-GRxE_G3FsTPb3U9s1nQX1ql6liGRLE2o4bAw0LDcVcXewxrQevi_P-oTsPDwlSNooDMgsxsMbOge02-JP2rDcSZVhh4CwJCtwGMgH=BNnawNa9VtwllYfxtHX4J4mNiDsQEGcNLokco7vjzDboZGvUpsXFVN6R-LiU=IzehNXttqdQ6lXyegehy4lT4H1PADC2OfQxQf2qJydcfFaf04wqz-Grc_jrN0Lf0B1qACTe6qwcoNC6SbuK7Y1CJZlMuRbRdtnKRTeg9TmC4txi=BZjVkarT7LGhZOb3JgQLTvVwFdDpx7SmJZqeX7TyaRFpb00jnYS9AtzpbkGnpNL==lZ2XjN0maSaYQm8i9B86RaeNx1IrEb2wIspPFQAPOkb_JuDa6H=A2rOi-6pmQG2qZFVIwBKdGYGkRiRIq78fxdpiPdlqNX7xrR6xXlV1HOQ0Hqs0us9nZ-pYUH0ycERbKDtLLaRVZTSZGtj1MjBZG2XJBgrzG7BuM_2uAj=-Ki_=EN4XAavl_Xtct7n-qSj1Bcw7nCSox9YQvx9tZgRD8h5Ptu3VMjNdx7vLf0ScwZmQbdzzzs=iRU_8lOHMyYnNfKqlKjJJuCTZ2MshQJXrns-yr3d=-PtYap943j__peH1C746duK6jPbCjtMwo4EIJQAAYq=7XDgyGyAMOv6Jst3C-PqSMoa5br1bGRBbFxC5CIwnxNZfPQY7GTSwy-q4wIAlwAdqinjnynVCSv-CAilYb8Bt0uu9IdPFnv4Sq_4G5117=xlAhJIGOASs_5MzfHGjpg6MTCvKyfPcgeEky0h6mrilFe2nnxX_UsMdQRO86q5TYE1RiM-7ZpmaffALnAn8rDBvMdKMcYSjccwf6MmcreR8LQHmOnjNl02SU=kHUECMeNyqtj39QpzR=owpHlng-FbZc9LPNxXZ6CVw4kxs23trxVIVIyzxXe1r4TKNQQDhD2=7cfesxRpOmCyQGsoK1B5us9-q169loN_q6ARab0lmURCCXVPT6JuBupwyZQc5xGMPGk7NjI2bfjukqUdwwHUb6=dUfw9hzQtqdegvqEdcYPoBtbyXUxSLPNrjnx_mH1Gx=b506Nj6q-mI1jEg7N_r6UZxvZ5XFoJa8IHBaKrlHBZV7B-yNE3B1vr6jLrTReq74bfRQSNQ7l5TBsG9FQvGv7qNhUo2fz-uLICjaF9AkYwAGh2P6AX-2SBidA3wJ-hPAlHas06hS5snu5PVoXtUprCjo_2mxNkzs8ImZ21Q=OLAwbReyd78jgmxRJU1y8Bs9ic=xvS7NUkGN52jKk4s3whtS3LosxyI=UJ2EK--7qPLwl5HbCOrQEn9STAz0ZvMdDiA1Ju3Fu=7slJfvJeQqp6DhDymfU76wBED7IicNKNrcUn0Ub_T8hAzgw3QRMog_A=VoCHJNOkP2DOy_SELHVPtDPrFAeSsxAak=w5J8Ya5EqLE242Nr2sNPgrT8Je_zOkls6nMeiE_Zx=i5cM3f_3Y8nrJSgRIaLzue=QQ2-t=etBHbbaB8tnbtFmH1CFJdU_x9enMDEG-FqILtDtGOTdfLPbia1fgVGERGP_57gblo="}
'_d': 'd',
'session_password': 'password here'
}
print(Post_Data)
Get_Headers['referer'] = url
source = session.post(url='https://www.linkedin.com/checkpoint/lg/login-submit', data=Post_Data,headers=Get_Headers,verify=False, allow_redirects=True).text
also check this out Logging in to LinkedIn with python requests sessions
This is quite common, but there is a quick workaround
Open your web browser, log out of linkedin.
press F12, and go to the network tab, clear any request that's there
login to linkedin as you would usually do
in the network tab you'll see your HTTP POST request.
take every http headers parameters and add them to your python request headers.
try your script again. There is no way linkedin can know from now on.

Logging to a website using Python's requests

So i am currently doing a project for my school and I need to login to our canteen website using Python. I am using requests, but the code is not working. It just redirects me to starting page, instead of the user page. I have tried this code on other website and it worked just fine. I have found out, that this website uses some JavaServer pages. May that be the problem?
I have tried a few tutorials on Youtube and even searched something here, but nothing worked for me.
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36 OPR/58.0.3135.53'
}
login_data = {
'j_username': '**',
'j_password': '**',
'terminal': 'false',
'type': 'web',
'_spring_security_remember_me': 'on'
}
with requests.session() as c:
url = 'https://jidelna.mgo.opava.cz:6204/faces/secured/info.jsp?terminal=false&keyboard=false&printer=false'
r = c.get(url)
soup = BeautifulSoup(r.content, features="html.parser")
login_data['_csrf'] = soup.find('input', attrs={'name': '_csrf'})['value']
login_data['targetUrl'] = soup.find('input', attrs={'name': 'targetUrl'})['value']
r = c.post(url, data=login_data, headers=headers)
You are sending the post request to the wrong url. If you use developer tools to inspect the login form you can get the action attribute of the form.
In the network tab in developer tools you can see the POST request being made and the parameters. You should make the post request to https://jidelna.mgo.opava.cz:6204/j_spring_security_check
If all of these does not work, also consider emulating the headers as far as possible. There is a cookie being sent, so you might have to use session with Requests.
If everything else fails there is always selenium.

Python Screen Scraping Forbes.com

I'm writing a Python program to extract and store metadata from interesting online tech articles: "og:title", "og:description", "og:image", og:url, and og:site_name.
This is the code I'm using...
# Setup Headers
headers = {}
headers['Accept'] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
headers['Accept-Charset'] = 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
headers['Accept-Encoding'] = 'none'
headers['Accept-Language'] = "en-US,en;q=0.8"
headers['Connection'] = 'keep-alive'
headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
# Create the Request
http = urllib3.PoolManager()
# Create the Response
response = http.request('GET ', url, headers)
# BeautifulSoup - Construct
soup = BeautifulSoup(response.data, 'html.parser')
# Scrape <meta property="og:title" content=" x x x ">
if tag.get("property", None) == "og:title":
if len(tag.get("content", None)) > len(title):
title = tag.get("content", None)
The program runs fine on all but one site. On "forbes.com", I can't get to the articles using Python:
url=
https://www.forbes.com/consent/?toURL=https://www.forbes.com/sites/shermanlee/2018/07/31/privacy-revolution-how-blockchain-is-reshaping-our-economy/#72c3b4e21086
I can't bypass this consent page; which seems to be the "Cookie Consent Manager" solution from "TrustArc". On a computer, you basically provide your consent... and each consecutive run, you're able to access the articles.
If I reference the "toURL" url:
https://www.forbes.com/sites/shermanlee/2018/07/31/privacy-revolution-how-blockchain-is-reshaping-our-economy/#72c3b4e21086
And bypass the "https://www.forbes.com/consent/" page, I'm redirected back to this page.
I've tried to see if there is a cookie I could set in the header, but couldn't find the magic key.
Can anyone help me?
There is a required cookie notice_gdpr_prefs that needs to be sent to view the data :
import requests
from bs4 import BeautifulSoup
src = requests.get(
"https://www.forbes.com/sites/shermanlee/2018/07/31/privacy-revolution-how-blockchain-is-reshaping-our-economy/",
headers= {
"cookie": "notice_gdpr_prefs"
})
soup = BeautifulSoup(src.content, 'html.parser')
title = soup.find("meta", property="og:title")
print(title["content"])

POST URL Encoded vs Line-based text data via Python Requests

I'm trying to scrape some data from a website and I can't get the POST to work, it acts as though I didn't give it the input data ("appnote").
When I examine the POST data it looks relatively the same except that the actual webform's POST is called "URL Encoded" and lists each form input, whereas mine is labeled "Line-based text data".
Here's my code: (appnote) and Search (Search) are the most relevant pieces I need
import requests
import cookielib
jar = cookielib.CookieJar()
url = 'http://www.vivotek.com/faq/'
headers = {'content-type': 'application/x-www-form-urlencoded'}
post_data = {#'__EVENTTARGET':'',
#'__EVENTARGUMENT':'',
'__LASTFOCUS':'',
'__VIEWSTATE':'',
'__VIEWSTATEGENERATOR':'',
'__VIEWSTATEENCRYPTED':'',
'__PREVIOUSPAGE':'',
'__EVENTVALIDATION':''
'ctl00$HeaderUc1$LanguageDDLUc1$ddlLanguage':'en',
'ctl00$ContentPlaceHolder1$CategoryDDLUc1$DropDownList1':'-1',
'ctl00$ContentPlaceHolder1$ProductDDLUc1$DropDownList1':'-1',
'ctl00$ContentPlaceHolder1$Content':'appnote',
'ctl00$ContentPlaceHolder1$Search':'Search'
}
response = requests.get(url, cookies=jar)
response = requests.post(url, cookies=jar, data=post_data, headers=headers)
print(response.text)
Links to images of what I'm talking about in Wireshark:
Wireshark Form
Wireshark Line
I also tried it using wget with the same results.
The main problem is that you are not setting the important hidden field values, like __VIEWSTATE.
For this to work using requests, you need to parse the page html and get the appropriate input values.
Here's the solution using BeautifulSoup HTML parser and requests:
from bs4 import BeautifulSoup
import requests
url = 'http://www.vivotek.com/faq/'
query = 'appnote'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36'}
session = requests.Session()
response = session.get(url, headers=headers)
soup = BeautifulSoup(response.content)
post_data = {'__EVENTTARGET':'',
'__EVENTARGUMENT':'',
'__LASTFOCUS':'',
'__VIEWSTATE': soup.find('input', id='__VIEWSTATE')['value'],
'__VIEWSTATEGENERATOR': soup.find('input', id='__VIEWSTATEGENERATOR')['value'],
'__VIEWSTATEENCRYPTED': '',
'__PREVIOUSPAGE': soup.find('input', id='__PREVIOUSPAGE')['value'],
'__EVENTVALIDATION': soup.find('input', id='__EVENTVALIDATION')['value'],
'ctl00$HeaderUc1$LanguageDDLUc1$ddlLanguage': 'en',
'ctl00$ContentPlaceHolder1$CategoryDDLUc1$DropDownList1': '-1',
'ctl00$ContentPlaceHolder1$ProductDDLUc1$DropDownList1': '-1',
'ctl00$ContentPlaceHolder1$Content': query,
'ctl00$ContentPlaceHolder1$Search': 'Search'
}
response = session.post(url, data=post_data, headers=headers)
soup = BeautifulSoup(response.content)
for item in soup.select('a#ArticleShowLink'):
print item.text.strip()
Prints the specific results for the appnote query:
How to troubleshoot when you can't watch video streaming?
Recording performance benchmarking tool
...

Categories