I am making a program in Python 2.7 which I want to POST a URL to the browser. Here is my code which should explain it much better than I can:
import requests, json, webbrowser
pid = "AQ6723"
size = "660"
recaptcha = ""
baseURL = 'http://www.adidas.co.uk/on/demandware.store/Sites-adidas-GB-Site/en_GB/Cart-MiniAddProduct'
payload = {
'dwfrm_cart_continueShopping': 'Continue+Shopping',
'layer': 'Add+To+Bag+overlay',
'pid': '%20' + pid + '_' + size,
'pid': '%20' + pid + '_' + size,
'g-recaptcha-response': recaptcha,
'Quantity': "1",
'masterPid': pid,
'ajax': "true"
}
headers = {
'Host': 'www.adidas.co.uk',
'Connection': 'keep-alive',
'Content-Length': '85',
'Accept': '*/*',
'Origin': 'http://www.adidas.co.uk',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8,de;q=0.6',
}
print(pid)
finishedProduct = requests.post(baseURL, data = json.dumps(payload), headers = headers)
webbrowser.open(finishedProduct)
This obviously isn't correct but how could I "achieve" this? I just want to be able to see the result of the POST request in browser which ultimately would be a product in-cart.
There is no way of doing this in python natively. If you are looking to automate your browser, look in to Selenium. Selenium has a python binding.
Related
I'm writing a Python script to scrape a table from this site (this is public information about ocean tide levels).
One of the stations I'd like to scrape is Punta del Este, code 83.0, in any given day. But my scripts returns a different table than the browser even when the POST request seems to have the same input.
When I fill the form in my browser, the headers and data sent to the server are these:
So I wrote my script to make a POST request as it follows:
url = 'https://www.ambiente.gub.uy/SIH-JSF/paginas/sdh/consultaHDMCApublic.xhtml'
s = requests.Session()
r = s.get(url, verify=False)
soupGet = BeautifulSoup(r.content, 'lxml')
#JSESSIONID = s.cookies['JSESSIONID']
javax_faces_ViewState = soupGet.find("input", {"type": "hidden", "name":"javax.faces.ViewState"})['value']
headersSih = {
'Accept': 'application/xml, text/xml, */*; q=0.01',
'Accept-Language': 'gzip, deflate, br',
'Accept-Language': 'es-ES,es;q=0.6',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
# 'Cookie': 'JSESSIONID=FBE5ZdMQVFrgQ-P6K_yTc1bw.dinaguasihappproduccion',
'Faces-Request': 'partial/ajax',
'Origin': 'https://www.ambiente.gub.uy',
'Referer': url,
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'Sec-GPC': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
ini_date = datetime.strftime(fecha0 , '%d/%m/%Y %H:%M')
end_date = datetime.strftime(fecha0 + timedelta(days=1), '%d/%m/%Y %H:%M')
codigo = 830
dataSih = {
'javax.faces.partial.ajax': 'true',
'javax.faces.source': 'formConsultaHorario:j_idt64',
'javax.faces.partial.execute': '#all',
'javax.faces.partial.render': 'formConsultaHorario:pnlhorarioConsulta',
'formConsultaHorario:j_idt64': 'formConsultaHorario:j_idt64',
'formConsultaHorario': 'formConsultaHorario',
'formConsultaHorario:estacion_focus': '',
'formConsultaHorario:estacion_input': codigo,
'formConsultaHorario:fechaDesde_input': ini_date,
'formConsultaHorario:fechaHasta_input': end_date,
'formConsultaHorario:variables_focus': '',
'formConsultaHorario:variables_input': '26', # Variable: H,Nivel
'formConsultaHorario:fcal_focus': '',
'formConsultaHorario:fcal_input': '7', # Tipo calculo: Ingresado
'formConsultaHorario:ptiempo_focus': '',
'formConsultaHorario:ptiempo_input': '2', #Paso de tiempo: Escala horaria
'javax.faces.ViewState': javax_faces_ViewState,
}
page = s.post(url, headers=headersSih, data=dataSih)
However, when I do it via browser I get a table full of data, while python request returns (page.text) a table saying "No data was found".
Is there something I'm missing? I've tried changing a lots of stuff but nothing seems to do the trick.
Maybe on this website javascript loads the data. Requests dont activate it. If you want to get data from there use Selenium
I want to send reset link to email adresses but i can't pass the captcha. I have a capmonster account to resolve captchas, tried Selenium before but i couldn't.
This is my code:
import requests
import json
s = requests.Session()
Grab = s.get("https://www.instagram.com/accounts/login/")
Headd = {
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'tr-TR,tr;q=0.9,en-US;q=0.8,en;q=0.7',
'content-length': '104',
'content-type': 'application/x-www-form-urlencoded',
'origin': 'https://www.instagram.com',
'referer': 'https://www.instagram.com/accounts/password/reset/',
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Mobile Safari/537.36',
'x-csrftoken': Grab.cookies.get_dict()['csrftoken'],
'x-instagram-ajax': 'c6160c6b689a',
'x-requested-with': 'XMLHttpRequest'
}
LoginData = {
"email_or_username": "example#outlook.com",
"recaptcha_challenge_field": ""
}
AccLogin = s.post('https://www.instagram.com/accounts/account_recovery_send_ajax/', headers=Headd, data=LoginData)
res = json.loads(AccLogin.text)
print(res)
this is the result:
{'message': 'checkpoint_required', 'checkpoint_url': 'https://www.instagram.com/challenge/AXHQIDuh8SBT-M1AVt14AvFB8HLczbgGoyMMvnp86BsPApnJhDJkWE04ZvwjjnczcaLk_g/Afxv1hZK6GoZ_gqxVubIGNLbEyMAAMo6gVAokxxs2ScpC72bLEz6kjkjmJPi33BZdcL-SZ8ZNpy9dw/?challenge_node_id=18315435868046003&challenge_context=%7B%22step_name%22:+%22%22,+%22nonce_code%22:+%22bpjtu8gd1a%22,+%22user_id%22:+%22AXGMD9Ch0rKgE6Zo5g91rV1qjm2JFFwxQC1axVNqoGW6heLiXhcW5lqRNcT3aP-73-y_7g%22,+%22cni%22:+%2218315435868046003%22,+%22is_stateless%22:+false,+%22present_as_modal%22:+false%7D', 'lock': False, 'flow_render_type': 0, 'status': 'fail'}
I'm working on scraping from barchart.com using modified code from this stack overflow question:
The header and payload information are from the XHR of the website I was attempting to scrape.
from urllib.parse import unquote
geturl=r'https://www.barchart.com/options/highest-implied-volatility'
apiurl=r'https://www.barchart.com/proxies/core-api/v1/quotes/get'
getheaders={
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}
getpay={
'page': 'all'
}
s=requests.Session()
r=s.get(geturl,params=getpay, headers=getheaders)
headersIV = {
'method': 'GET',
'scheme': 'https',
'authority': 'www.barchart.com',
'Host' : 'www.barchart.com',
'Accept': 'application/json',
'Accept-Encoding': 'gzip, deflate, br',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15',
'Accept-Language': 'en-us',
'Referer': 'https://www.barchart.com/options/highest-implied-volatility',
'Connection': 'keep-alive',
'X-XSRF-TOKEN': 'eyJpdiI6Ik8vQTBkcGxZVVF1aG5QeE9TUnk5L3c9PSIsInZhbHVlIjoiMDd6STJyM1FPZEtMMFdLNEcrVjNNWUMva1l3WWxwblMvdEFZMEIzSllzalFySGFoblcyRzgrRmNZa1RMRHdZcTlBVExQTjBQUEhVdTVaNWhMZlJ0ZFM4c3ZaeHMvVmptM2FGQXJobnM1WTl1REx1d3M1eDI2RUc2SEtHY2wzTnUiLCJtYWMiOiIyNGExYjI3N2JkOGRiZGEwYjY4MTQ3OGFiYmYxZGE3ZmJhZmQyMDQwM2NiZTc0YTMzZDFkNjI4ZGIwZmY2YTU0In0=',
'path': '/proxies/core-api/v1/options/get?fields=symbol%2CbaseSymbol%2CbaseLastPrice%2CbaseSymbolType%2CsymbolType%2CstrikePrice%2CexpirationDate%2CdaysToExpiration%2CbidPrice%2Cmidpoint%2CaskPrice%2ClastPrice%2Cvolume%2CopenInterest%2CvolumeOpenInterestRatio%2Cvolatility%2CtradeTime%2CsymbolCode%2ChasOptions&orderBy=volatility&baseSymbolTypes=stock&between(lastPrice%2C.10%2C)=&between(daysToExpiration%2C15%2C)=&between(tradeTime%2C2021-10-21%2C2021-10-22)=&orderDir=desc&between(volatility%2C60%2C)=&limit=200&between(volume%2C500%2C)=&between(openInterest%2C100%2C)=&in(exchange%2C(AMEX%2CNASDAQ%2CNYSE))=&meta=field.shortName%2Cfield.type%2Cfield.description&hasOptions=true&raw=1',
}
payloadIV={
'fields': 'symbol,baseSymbol,baseLastPrice,baseSymbolType,symbolType,strikePrice,expirationDate,daysToExpiration,bidPrice,midpoint,askPrice,lastPrice,volume,openInterest,volumeOpenInterestRatio,volatility,tradeTime,symbolCode,hasOptions',
'orderBy': 'volatility',
'baseSymbolTypes': 'stock',
'between(lastPrice,.10,)':'',
'between(daysToExpiration,15,)':'',
'between(tradeTime,2021-10-21,2021-10-22)':'',
'orderDir': 'desc',
'between(volatility,60,)':'',
'limit': '200',
'between(volume,500,)':'',
'between(openInterest,100,)':'',
'in(exchange,(AMEX,NASDAQ,NYSE))':'',
'meta': 'field.shortName,field.type,field.description',
'hasOptions': 'true',
'raw': '1'
}
r=s.get(apiurl,params=payloadIV,headers=headersIV)
j=r.json()
print(j)
It returns this error message: {'error': {'message': 'Internal error.', 'code': 500}}
I am pretty new to scraping data using API and XHR data. I think I might be doing many things correctly right now but I don't know where I might be making the mistake.
I would like to get the json data from for instance https://app.weathercloud.net/d0838117883#current using python requests module.
I tried:
import re
import requests
device='0838117883'
URL='https://app.weathercloud.net'
URL1=URL+'/d'+device
URL2=URL+'/device/stats'
headers={'Content-Type':'text/plain; charset=UTF-8',
'Referer':URL1,
'User-Agent':'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36',
'Accept':'application/json, text/javascript,*/*'}
with requests.Session() as s:
#get html from URL1 in order to get the CSRF token
page = s.get(URL1)
CSRF=re.findall('WEATHERCLOUD_CSRF_TOKEN:"(.*)"},',page.text)[0]
#create parameters for URL2, in order to get the json file
params={'code':device,'WEATHERCLOUD_CSRF_TOKEN':CSRF}
page_stats=requests.get(URL2,params=params,headers=headers)
print(page_stats.url)
print(page_stats) #<Response [200]>
print(page_stats.text) #empty
print(page_stats.json()) #error
But the page_stats is empty.
How can I get the stats data from weathercloud?
Inspecting the page with DevTools, you'll find a useful endpoint:
https://app.weathercloud.net/device/stats
You can "replicate" the original web request made by your browser with requests library:
import requests
cookies = {
'PHPSESSID': '************************',
'WEATHERCLOUD_CSRF_TOKEN':'***********************',
'_ga': '**********',
'_gid': '**********',
'__gads': 'ID=**********',
'WeathercloudCookieAgreed': 'true',
'_gat': '1',
'WEATHERCLOUD_RECENT_ED3C8': '*****************',
}
headers = {
'Connection': 'keep-alive',
'sec-ch-ua': '^\\^Google',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'X-Requested-With': 'XMLHttpRequest',
'sec-ch-ua-mobile': '?0',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36',
'sec-ch-ua-platform': '^\\^Windows^\\^',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://app.weathercloud.net/d0838117883',
'Accept-Language': 'it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7,es;q=0.6',
}
params = (
('code', '0838117883'),
('WEATHERCLOUD_CSRF_TOKEN', '****************'),
)
response = requests.get('https://app.weathercloud.net/device/stats', headers=headers, params=params, cookies=cookies)
# Serializing json
json_object = json.loads(response.text)
json Output:
{'last_update': 1632842172,
'bar_current': [1632842172, 1006.2],
'bar_day_max': [1632794772, 1013.4],
'bar_day_min': [1632845772, 1006.2],
'bar_month_max': [1632220572, 1028],
'bar_month_min': [1632715572, 997.3],
'bar_year_max': [1614418512, 1038.1],
'bar_year_min': [1615434432, 988.1],
'wdir_current': [1632842172, 180],
..............}
That's it.
I am having my own project to create a instagram bot. The first thin i need to do is to create a login. I created the below python script to login the instagram. However, it returns 403 status code to me. Anyone can give some advice on what's wrong?
import requests
import json
import random
import time
Base_url = 'https://www.instagram.com/'
Login_url = 'https://www.instagram.com/accounts/login/'
Username = 'username'
Passowrd = 'password'
User_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
login_data = {'username': Username, 'password': Passowrd}
session = requests.Session()
session.headers.update({
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Content-Length': '0',
'Host': 'www.instagram.com',
'Origin': 'https://www.instagram.com',
'Referer': 'https://www.instagram.com/',
'User-Agent': User_agent,
'X-Instagram-AJAX': '1',
'Content-Type': 'application/x-www-form-urlencoded',
'X-Requested-With': 'XMLHttpRequest'
})
req = session.get(Base_url)
session.headers.update({'X-CSFRToken': req.cookies['csrftoken']})
time.sleep(5 * random.random())
login = session.post(Login_url, data=login_data, allow_redirects = True )
session.headers.update({'X-CSFRToken': login.cookies['csrftoken']})
csrftoken = login.cookies['csrftoken']
#ig_vw=1536; ig_pr=1.25; ig_vh=772; ig_or=landscape-primary;
session.cookies['ig_vw'] = '1536'
session.cookies['ig_pr'] = '1.25'
session.cookies['ig_vh'] = '772'
session.cookies['ig_or'] = 'landscape-primary'
time.sleep(5 * random.random())
print(login.status_code)
if login.status_code == 200:
#login_text_notjson = login.text
print('sucessfully login')
try:
login_text = json.loads(login.text)
except Exception:
print('there is an error')
else:
#print(login_text_notjson)
print(login_text)
else:
print('you fail to login')
Very grateful if anyone can give me the information about creating a bot for that. I still have no idea why my user_agent didn't work for that.
This will allow you to login to instagram (python3) - can be easily edited for your class
self.s.headers.update({
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'content-type': 'application/x-www-form-urlencoded',
'origin': 'https://www.instagram.com',
'referer': 'https://www.instagram.com/accounts/login/',
'user-agent': 'Mozilla/5.0 (Linux; U; Android 2.3.3; en-us; HTC_DesireS_S510e Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1',
'x-instagram-ajax': '1',
'x-requested-with': 'XMLHttpRequest'
})
Grab = self.s.get("https://www.instagram.com/accounts/login/")
self.s.headers.update({"x-csrftoken": Grab.cookies.get_dict()['csrftoken']})
LoginData = {
'username': Username,
'password': Password,
'queryParams': '{}'
}
AccLogin = self.s.post("https://www.instagram.com/accounts/login/ajax/",
data=LoginData)
if AccLogin.json()['authenticated']:
self.s.headers.update({"x-csrftoken": AccLogin.cookies.get_dict()['csrftoken']})