api_url = "https://en.coinjinja.com/api/events/search"
headers = {'origin': 'https://en.coinjinja.com',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7',
'authority': 'en.coinjinja.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36',
'content-type': 'application/json',
'content-length': '126',
'accept': '*/*',
'referer': 'https://en.coinjinja.com/events/time/next_week/tags/hardfork+airdrop+burn+exchange+partnership'
}
data = {"start": "2020-02-17","end":"2020-02-24","symbol":"","types":["hardfork","airdrop","burn","exchange","partnership"]}
api_request = requests.post(api_url, headers=headers, data=json.dumps(data))
print(api_request.headers)
print(api_request.encoding)
print(api_request.content.decode('utf-8','ignore'))
You need to install brotli package to work with 'Content-Encoding': 'br'. It's duplicate of unable to decode Python web request
Related
Scraping an AJAX web page using python and requests
I used the script in above link to get a table on Barchart webite and it somehow stopped working recently with the error message {'error': {'message': 'The payload is invalid.', 'code': 400}}. I guess some of the filed names have been changed but I am pretty new to web scanning and I couldn't figure out how to fix it. Any suggestions?
import requests
geturl=r'https://www.barchart.com/futures/quotes/CLJ19/all-futures'
apiurl=r'https://www.barchart.com/proxies/core-api/v1/quotes/get'
getheaders={
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}
getpay={
'page': 'all'
}
s=requests.Session()
r=s.get(geturl,params=getpay, headers=getheaders)
headers={
'accept': 'application/json',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'referer': 'https://www.barchart.com/futures/quotes/CLJ19/all-futures?page=all',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']
}
payload={
'fields': 'symbol,contractSymbol,lastPrice,priceChange,openPrice,highPrice,lowPrice,previousPrice,volume,openInterest,tradeTime,symbolCode,symbolType,hasOptions',
'list': 'futures.contractInRoot',
'root': 'CL',
'meta': 'field.shortName,field.type,field.description',
'hasOptions': 'true',
'raw': '1'
}
r=s.get(apiurl,params=payload,headers=headers)
j=r.json()
print(j)
OUT: {'error': {'message': 'The payload is invalid.', 'code': 400}}
This happened with me too. This is because the website gets the table from an internal API and the cookies should be decoded to avoid this error.
Try this solution:
1- import the unquote function at the beginning of your code
from urllib.parse import unquote
2- Change this line:
'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']
to this:
'x-xsrf-token': unquote(unquote(s.cookies.get_dict()['XSRF-TOKEN']))
I want to scrape facebook companies for their date (if they have).
problem is that when I try to retrieve the HTML, I get the Hebrew version of it (I'm located in Israel)
this is part of the result:
�1u�9X�/.������~�O+$B\^����y�����e�;�+
Code:
import requests
from bs4 import BeautifulSoup
headers = {'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'referer': 'https',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
}
url = 'https://www.facebook.com/pg/google/about/'
def fetch(URL):
try:
response = requests.get(url=URL, headers=headers).text
print(response)
except:
print('Could not retrieve data, or connect')
fetch(url)
Is there a way to check the EN website? any subdomain? or i should use proxy in the request?
What are you seeing isn't Hebrew version of the site, but compressed response from the server. As quick solution, you can remove accept-encoding header from the request:
import requests
from bs4 import BeautifulSoup
headers = {
'accept': '*/*',
# 'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'referer': 'https',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
}
url = 'https://www.facebook.com/pg/google/about/'
def fetch(URL):
try:
response = requests.get(url=URL, headers=headers).text
print(response)
except:
print('Could not retrieve data, or connect')
fetch(url)
Prints the uncompressed page:
<!DOCTYPE html>
<html lang="en" id="facebook" class="no_js">
<head><meta charset="utf-8" /><meta name="referrer" content="origin-when-crossorigin" id="meta_referrer" /><script>window._cstart=+new Date();</script><script>function envFlush(a){function b(b){for(var c in a)b[
...and so on.
So I have been having some trouble with requests and I don't know why this is happening. I realized this before, but I thought it was something on my part (it still probably is but I don't know what it is, hence the question) I was working on a script for Instagram. I got it to create the Instagram account just fine. However I want it to follow a person I want. This is the code I used:
pat = '/web/friendships/' + e + '/follow/'
headers = {
'authority': 'www.instagram.com',
'method': 'POST',
'path': pat,
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,es;q=0.8,zh-CN;q=0.7,zh;q=0.6',
'content-length': '0',
'origin': 'https://www.instagram.com',
'referer': link,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
'x-csrftoken': 'eARRsWAqOvY73Xi9Fe3WLrWK9Rdxjy6o',
'x-instagram-ajax': 'a6abd1200036',
'x-requested-with': 'XMLHttpRequest'
}
link = 'https://www.instagram.com/web/friendships/' + e + '/follow/'
w = s.post(link, headers = headers,proxies = proxies)
print(Fore.YELLOW + time.strftime("[%Y-%m-%d %H:%M:%S]") + Fore.GREEN + 'Followed ' + e)
So this request runs without any errors. I have set s = requests.Session() at the beginning of the script already. AFAIK this replicates the POST request exactly like when you follow on Instagram. Then why is this not working??? I got no errors when the script ran.
Another example of this problem is in a Slickdeals script I was working on. I got the account logging in to work but when I go to a listing and try to Vote up using requests the same thing happens (no errors in the script however it doesn't vote up). The code I used is this:
cool = {
'authority': 'slickdeals.net',
'method': 'POST',
'path': '/forums/sdthreadrate_ajax.php',
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en,en-US;q=0.9',
'origin': 'https://slickdeals.net',
'referer': link,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}
vote_up_payload = {
'ajax': '1',
'do': 'sdthreadratevote',
'postid': postid,
'vote': '1',
'votetypeid': '1',
'controltype': 'modern',
'securitytoken': sec_token,
'where_from': '/forums/sdthreadrate_ajax.php',
}
voteu = s.post('https://slickdeals.net/forums/sdthreadrate_ajax.php',
headers = cool, data = vote_up_payload)
print('done')
So what is the problem? Why does the script work but not do what it should?
[EDIT]: Ok the instagram issue makes sense. Any reason as to why the slickdeals doesn't work?
I have simplest POST request code
import requests
headers = {
'origin': 'https://jet.com',
'accept-encoding': 'gzip, deflate, br',
'x-csrf-token': 'IzaENk9W-Xzv9I5NcCJtIf9h_nT24p5fU-Tk',
'jet-referer': '/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ',
'x-requested-with': 'XMLHttpRequest',
'accept-language': 'en-US,en;q=0.8',
'cookie': 'akacd_phased_release=3673158615~rv=53~id=041cdc832c1ee67c7be18df3f637ad43; jet.csrf=_JKKPyR5fKD-cPDGmGv8AJk5; jid=7292a61d-af8f-4d6f-a339-7f62afead9a0; jet-phaser=%7B%22experiments%22%3A%5B%7B%22variant%22%3A%22a%22%2C%22version%22%3A1%2C%22id%22%3A%22a_a_test16%22%7D%2C%7B%22variant%22%3A%22slp_categories%22%2C%22version%22%3A1%2C%22id%22%3A%22slp_categories%22%7D%2C%7B%22variant%22%3A%22on_cat_nav_clicked%22%2C%22version%22%3A1%2C%22id%22%3A%22catnav_load%22%7D%2C%7B%22variant%22%3A%22zipcode_table%22%2C%22version%22%3A1%2C%22id%22%3A%22zipcode_table%22%7D%5D%2C%22id%22%3A%222982c0e7-287e-42bb-8858-564332ada868%22%7D; ak_bmsc=746D16A88CE3AE7088B0CD38DB850B694F8C5E56B1650000DAA82659A1D56252~plJIR8hXtAZjTSjYEr3IIpW0tW+u0nQ9IrXdfV5GjSfmXed7+tD65YJOVp5Vg0vdSqkzseD0yUZUQkGErBjGxwmozzj5VjhJks1AYDABrb2mFO6QqZyObX99GucJA834gIYo6/8QDIhWMK1uFvgOZrFa3SogxRuT5MBtC8QBA1YPOlK37Ecu1WRsE2nh55E24F0mFDx5hXcfBAhWdMne6NrQ88JE9ZDxjW5n8qsh+QAHo=; _sdsat_landing_page=https://jet.com/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ|1495705823651; _sdsat_session_count=1; AMCVS_A7EE579F557F617B7F000101%40AdobeOrg=1; AMCV_A7EE579F557F617B7F000101%40AdobeOrg=-227196251%7CMCIDTS%7C17312%7CMCMID%7C11996417004070294145733272597342763775%7CMCAID%7CNONE%7CMCAAMLH-1496310624%7C3%7CMCAAMB-1496310625%7Chmk_Lq6TPIBMW925SPhw3Q%7CMCOPTOUT-1495713041s%7CNONE; __qca=P0-949691368-1495705852397; mm_gens=Rollout%20SO123%20-%20PDP%20Grid%20Image%7Ctitle%7Chide%7Cattr%7Chide%7Cprice%7Chide~SO19712%20HP%20Rec%20View%7Clast_viewed%7Cimage-only~SO17648%20-%20PLA%20PDP%7Cdesc%7CDefault%7Cbuybox%7Cmodal%7Cexp_cart%7Chide-cart%7Ctop_caro%7CDefault; jcmp_productSku=882b1010309d48048b8f3151ddccb3cf; _sdsat_all_pages_canary_variants=a_a_test16:a|slp_categories:slp_categories|catnav_load:on_cat_nav_clicked|zipcode_table:zipcode_table; _sdsat_all_pages_native_pay_eligible=No; _uetsid=_uet6ed8c6ab; _tq_id.TV-098163-1.3372=ef52068e069c26b9.1495705843.0.1495705884..; _ga=GA1.2.789964406.1495705830; _gid=GA1.2.1682210002.1495705884; s_cc=true; __pr.NaN=6jvgorz8tb; mm-so17648=gen; __pr.11xw=xqez1m3cvl; _sdsat_all_pages_login_status=logged-out; _sdsat_jid_cookie=7292a61d-af8f-4d6f-a339-7f62afead9a0; _sdsat_phaser_id=2982c0e7-287e-42bb-8858-564332ada868; _sdsat_all_pages_jet_platform=desktop; _sdsat_all_pages_site_version=3.860.1495036770896|2017-05-16 20:35:36 UTC; _sdsat_all_pages_canary_variants_2=a_a_test16:a~slp_categories:slp_categories~catnav_load:on_cat_nav_clicked~zipcode_table:zipcode_table; jet=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6WyJmMmUwMjI1NS1iODFkLTRlOTktOGU1Yi0yZGI1MjU0ZTdjNzUiXSwiamNtcEhpc3RvcnkiOltbXV0sImlwWmlwY29kZSI6WyIyMTA2MSJdLCJjbGllbnRUaWNrZXQiOlsiZXlKMGVYQWlPaUpLVjFRaUxDSmhiR2NpT2lKSVV6STFOaUo5LmV5SmpiR2xsYm5SZmFXUWlPaUl3Tm1JMlkyTTNaVGRtTnpVME16TmhPREU0T0RjelpUWmpZMkV4WTJRelppSXNJbWx6Y3lJNkltcGxkQzVqYjIwaUxDSmhkV1FpT2lKM1pXSmpiR2xsYm5RaWZRLnlKMXdoYklDVml4TE1iblliV0xQY1RvdF9EWUo3MjFYQkdFMzBpUktpdTQiXSwicHJvbW9jb2RlIjpbIlNQUklORzE1Il0sInBsYSI6W3RydWVdLCJmcmVlU2hpcHBpbmciOltmYWxzZV0sImpjbXAiOlt7ImpjbXAiOiJwbGE6Z2dsOm5qX2R1cl9nZW5fcGF0aW9fX2dhcmRlbl9hMjpwYXRpb19fZ2FyZGVuX2dyaWxsc19fb3V0ZG9vcl9jb29raW5nX2dyaWxsX2NvdmVyc19hMjpuYTpwbGFfNzg0NzQ0NTQyXzQwNTY4Mzg3NzA2X3BsYS0yOTM2MjcyMDMzNDE6bmE6bmE6bmE6Mjo4ODJiMTAxMDMwOWQ0ODA0OGI4ZjMxNTFkZGNjYjNjZiIsImNvZGUiOiJQTEExNSIsInNrdSI6Ijg4MmIxMDEwMzA5ZDQ4MDQ4YjhmMzE1MWRkY2NiM2NmIn1dLCJpYXQiOjE0OTU3MDU4OTh9.6OEM9e9fTyUZdFGju19da4rEnFh8kPyg8wENmKyhYgc; bm_sv=360FA6B793BB42A17F395D08A2D90484~BLAlpOUET7ALPzcGziB9dbZNvjFjG3XLQPFGCRTk+2bnO/ivK7G+kOe1WXpHgIFmyZhniWIzp2MpGel1xHNmiYg0QOLNqourdIffulr2J9tzacGPmXXhD6ieNGp9PAeTqVMi+2kSccO1+JzO+CaGFw==; s_tps=30; s_pvs=173; mmapi.p.pd=%221759837076%7CDwAAAApVAgDxP2Qu1Q4AARAAAUJz0Q1JAQAmoW6kU6PUSKeaIXVTo9RIAAAAAP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FAAZEaXJlY3QB1Q4BAAAAAAAAAAAAt8wAAH0vAQC3zAAABQDZlQAAAmpAfP%2FVDgD%2F%2F%2F%2F%2FAdUO1Q7%2F%2FwYAAAEAAAAAAc9dAQCNFgIAADyXAABuDVKACdUOAP%2F%2F%2F%2F8B1Q7VDv%2F%2FCgAAAQAAAAABs2ABANweAgAAiY0AAMCzlXtx1Q4A%2F%2F%2F%2F%2FwHVDtUO%2F%2F8GAAABAAAAAAORSwEATPoBAJJLAQBO%2BgEAk0sBAFD6AQABt8wAAAYAAADYlQAAHMPK3ZbVDgD%2F%2F%2F%2F%2FAdUO1Q7%2F%2FwYAAAEAAAAAAc5dAQCJFgIAAbfMAAAGAAAAmpgAAFAf9YUU1Q4A%2F%2F%2F%2F%2FwHVDtUO%2F%2F8EAAABAAAAAAR0YwEA1R4CAHVjAQDWHgIAdmMBANgeAgB3YwEA2x4CAAG3zAAABAAAAAAAAAAAAUU%3D%22; mmapi.p.srv=%22fravwcgus04%22; mmapi.e.PLA=%22true%22; mmapi.p.uat=%7B%22PLATraffic%22%3A%22true%22%7D; _sdsat_lt_pages_viewed=6; _sdsat_pages_viewed=6; _sdsat_traffic_source=',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'content-type': 'application/json',
'accept': 'application/json, text/javascript, */*; q=0.01',
'referer': 'https://jet.com/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ',
'authority': 'jet.com',
'dnt': '1',
}
data = '{"zipcode":"21061","sku":"87e89b3ce17f4742ab6d72aeaaa5480d","origination":"PDP"}'
r=requests.post('https://jet.com/api/product/v2', headers=headers, data=data)
print(r)
It returns 200
And I want to convert this simple request to Python Request.
body = '{"zipcode":"21061","sku":"87e89b3ce17f4742ab6d72aeaaa5480d","origination":"PDP"}'
yield Request(url = 'https://jet.com/api/product/v2', callback=self.parse_jet_page, meta={'data':data}, method="POST", body=body, headers=self.jet_headers)
it returns 400, looks like headers are being over-written or something. Or is there bug?
I guess the error is caused by cookies.
By default, the "cookie" entry in your HTTP headers shall be overriden by a built-in downloader middleware CookiesMiddleware. Scrapy expects a user to use Request.cookies for passing cookies.
If you do need to pass cookies directly in Request.headers (instead of using Request.cookies), you'll need to disable the built-in CookiesMiddleware. You may simply set COOKIES_ENABLED=False in settings.
I have been using fiddler to inspect an http post that an application makes and then trying to replicate that post using requests in python.
The link I am posting to: http://www.example.com/ws/for/250/buy
Now in fiddler I can clearly see the headers which are easy to replicate using requests. However when I look in textview on fiddler I see this :
tuples=4,421&flows=undefined
To replicate that I think I need to use the data parameter which I found on the docs, however I am not sure how to write this in python? As in do I do it as a dictionary and split it up according to the & sign, or do i have to specify it a a string, etc?
My current code
url = 'http://www.example.com/ws/for/250/buy'
headers = {
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1003.1 Safari/535.19 Awesomium/1.7.1',
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en',
'Accept-Charset': 'iso-8859-1,*,utf-8',
}
r6 = requests.post(url, headers = headers, verify = False)
Something like
url = 'http://www.example.com/ws/for/250/buy'
headers = {
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1003.1 Safari/535.19 Awesomium/1.7.1',
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en',
'Accept-Charset': 'iso-8859-1,*,utf-8',
}
r6 = requests.post(url, headers=headers, verify=False, data={'tuples': '4,421', 'flows': 'undefined'})
Should work
You provide a dictionary to the data argument:
r = requests.post(url, data={'tuples': '4,421', 'flows': 'undefined'})