Python requests post error.. can't decoding

Python requests post error.. can't decoding - python

api_url = "https://en.coinjinja.com/api/events/search"
headers = {'origin': 'https://en.coinjinja.com',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7',
'authority': 'en.coinjinja.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36',
'content-type': 'application/json',
'content-length': '126',
'accept': '*/*',
'referer': 'https://en.coinjinja.com/events/time/next_week/tags/hardfork+airdrop+burn+exchange+partnership'
}
data = {"start": "2020-02-17","end":"2020-02-24","symbol":"","types":["hardfork","airdrop","burn","exchange","partnership"]}
api_request = requests.post(api_url, headers=headers, data=json.dumps(data))
print(api_request.headers)
print(api_request.encoding)
print(api_request.content.decode('utf-8','ignore'))

You need to install brotli package to work with 'Content-Encoding': 'br'. It's duplicate of unable to decode Python web request

Related

Scraping a table on Barchart website using python

Scraping an AJAX web page using python and requests
I used the script in above link to get a table on Barchart webite and it somehow stopped working recently with the error message {'error': {'message': 'The payload is invalid.', 'code': 400}}. I guess some of the filed names have been changed but I am pretty new to web scanning and I couldn't figure out how to fix it. Any suggestions?
import requests
geturl=r'https://www.barchart.com/futures/quotes/CLJ19/all-futures'
apiurl=r'https://www.barchart.com/proxies/core-api/v1/quotes/get'
getheaders={
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}
getpay={
'page': 'all'
}
s=requests.Session()
r=s.get(geturl,params=getpay, headers=getheaders)
headers={
'accept': 'application/json',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'referer': 'https://www.barchart.com/futures/quotes/CLJ19/all-futures?page=all',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']
}
payload={
'fields': 'symbol,contractSymbol,lastPrice,priceChange,openPrice,highPrice,lowPrice,previousPrice,volume,openInterest,tradeTime,symbolCode,symbolType,hasOptions',
'list': 'futures.contractInRoot',
'root': 'CL',
'meta': 'field.shortName,field.type,field.description',
'hasOptions': 'true',
'raw': '1'
}
r=s.get(apiurl,params=payload,headers=headers)
j=r.json()
print(j)
OUT: {'error': {'message': 'The payload is invalid.', 'code': 400}}

This happened with me too. This is because the website gets the table from an internal API and the cookies should be decoded to avoid this error.
Try this solution:
1- import the unquote function at the beginning of your code
from urllib.parse import unquote
2- Change this line:
'x-xsrf-token': s.cookies.get_dict()['XSRF-TOKEN']
to this:
'x-xsrf-token': unquote(unquote(s.cookies.get_dict()['XSRF-TOKEN']))

Requests / BeautifulSoup Facebook language error

I want to scrape facebook companies for their date (if they have).
problem is that when I try to retrieve the HTML, I get the Hebrew version of it (I'm located in Israel)
this is part of the result:
�1u�9X�/.������~�O+$B\^����y�����e�;�+
Code:
import requests
from bs4 import BeautifulSoup
headers = {'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'referer': 'https',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
}
url = 'https://www.facebook.com/pg/google/about/'
def fetch(URL):
try:
response = requests.get(url=URL, headers=headers).text
print(response)
except:
print('Could not retrieve data, or connect')
fetch(url)
Is there a way to check the EN website? any subdomain? or i should use proxy in the request?

What are you seeing isn't Hebrew version of the site, but compressed response from the server. As quick solution, you can remove accept-encoding header from the request:
import requests
from bs4 import BeautifulSoup
headers = {
'accept': '*/*',
# 'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'cache-control': 'no-cache',
'dnt': '1',
'pragma': 'no-cache',
'referer': 'https',
'sec-fetch-mode': 'no-cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
}
url = 'https://www.facebook.com/pg/google/about/'
def fetch(URL):
try:
response = requests.get(url=URL, headers=headers).text
print(response)
except:
print('Could not retrieve data, or connect')
fetch(url)
Prints the uncompressed page:
<!DOCTYPE html>
<html lang="en" id="facebook" class="no_js">
<head><meta charset="utf-8" /><meta name="referrer" content="origin-when-crossorigin" id="meta_referrer" /><script>window._cstart=+new Date();</script><script>function envFlush(a){function b(b){for(var c in a)b[
...and so on.

POST Request runs without error but does not do what it should

So I have been having some trouble with requests and I don't know why this is happening. I realized this before, but I thought it was something on my part (it still probably is but I don't know what it is, hence the question) I was working on a script for Instagram. I got it to create the Instagram account just fine. However I want it to follow a person I want. This is the code I used:
pat = '/web/friendships/' + e + '/follow/'
headers = {
'authority': 'www.instagram.com',
'method': 'POST',
'path': pat,
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,es;q=0.8,zh-CN;q=0.7,zh;q=0.6',
'content-length': '0',
'origin': 'https://www.instagram.com',
'referer': link,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
'x-csrftoken': 'eARRsWAqOvY73Xi9Fe3WLrWK9Rdxjy6o',
'x-instagram-ajax': 'a6abd1200036',
'x-requested-with': 'XMLHttpRequest'
}
link = 'https://www.instagram.com/web/friendships/' + e + '/follow/'
w = s.post(link, headers = headers,proxies = proxies)
print(Fore.YELLOW + time.strftime("[%Y-%m-%d %H:%M:%S]") + Fore.GREEN + 'Followed ' + e)
So this request runs without any errors. I have set s = requests.Session() at the beginning of the script already. AFAIK this replicates the POST request exactly like when you follow on Instagram. Then why is this not working??? I got no errors when the script ran.
Another example of this problem is in a Slickdeals script I was working on. I got the account logging in to work but when I go to a listing and try to Vote up using requests the same thing happens (no errors in the script however it doesn't vote up). The code I used is this:
cool = {
'authority': 'slickdeals.net',
'method': 'POST',
'path': '/forums/sdthreadrate_ajax.php',
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en,en-US;q=0.9',
'origin': 'https://slickdeals.net',
'referer': link,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}
vote_up_payload = {
'ajax': '1',
'do': 'sdthreadratevote',
'postid': postid,
'vote': '1',
'votetypeid': '1',
'controltype': 'modern',
'securitytoken': sec_token,
'where_from': '/forums/sdthreadrate_ajax.php',
}
voteu = s.post('https://slickdeals.net/forums/sdthreadrate_ajax.php',
headers = cool, data = vote_up_payload)
print('done')
So what is the problem? Why does the script work but not do what it should?
[EDIT]: Ok the instagram issue makes sense. Any reason as to why the slickdeals doesn't work?

Why this simple POST request not working in Python Scrapy whereas it works with simple request.post()

I have simplest POST request code
import requests
headers = {
'origin': 'https://jet.com',
'accept-encoding': 'gzip, deflate, br',
'x-csrf-token': 'IzaENk9W-Xzv9I5NcCJtIf9h_nT24p5fU-Tk',
'jet-referer': '/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ',
'x-requested-with': 'XMLHttpRequest',
'accept-language': 'en-US,en;q=0.8',
'cookie': 'akacd_phased_release=3673158615~rv=53~id=041cdc832c1ee67c7be18df3f637ad43; jet.csrf=_JKKPyR5fKD-cPDGmGv8AJk5; jid=7292a61d-af8f-4d6f-a339-7f62afead9a0; jet-phaser=%7B%22experiments%22%3A%5B%7B%22variant%22%3A%22a%22%2C%22version%22%3A1%2C%22id%22%3A%22a_a_test16%22%7D%2C%7B%22variant%22%3A%22slp_categories%22%2C%22version%22%3A1%2C%22id%22%3A%22slp_categories%22%7D%2C%7B%22variant%22%3A%22on_cat_nav_clicked%22%2C%22version%22%3A1%2C%22id%22%3A%22catnav_load%22%7D%2C%7B%22variant%22%3A%22zipcode_table%22%2C%22version%22%3A1%2C%22id%22%3A%22zipcode_table%22%7D%5D%2C%22id%22%3A%222982c0e7-287e-42bb-8858-564332ada868%22%7D; ak_bmsc=746D16A88CE3AE7088B0CD38DB850B694F8C5E56B1650000DAA82659A1D56252~plJIR8hXtAZjTSjYEr3IIpW0tW+u0nQ9IrXdfV5GjSfmXed7+tD65YJOVp5Vg0vdSqkzseD0yUZUQkGErBjGxwmozzj5VjhJks1AYDABrb2mFO6QqZyObX99GucJA834gIYo6/8QDIhWMK1uFvgOZrFa3SogxRuT5MBtC8QBA1YPOlK37Ecu1WRsE2nh55E24F0mFDx5hXcfBAhWdMne6NrQ88JE9ZDxjW5n8qsh+QAHo=; _sdsat_landing_page=https://jet.com/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ|1495705823651; _sdsat_session_count=1; AMCVS_A7EE579F557F617B7F000101%40AdobeOrg=1; AMCV_A7EE579F557F617B7F000101%40AdobeOrg=-227196251%7CMCIDTS%7C17312%7CMCMID%7C11996417004070294145733272597342763775%7CMCAID%7CNONE%7CMCAAMLH-1496310624%7C3%7CMCAAMB-1496310625%7Chmk_Lq6TPIBMW925SPhw3Q%7CMCOPTOUT-1495713041s%7CNONE; __qca=P0-949691368-1495705852397; mm_gens=Rollout%20SO123%20-%20PDP%20Grid%20Image%7Ctitle%7Chide%7Cattr%7Chide%7Cprice%7Chide~SO19712%20HP%20Rec%20View%7Clast_viewed%7Cimage-only~SO17648%20-%20PLA%20PDP%7Cdesc%7CDefault%7Cbuybox%7Cmodal%7Cexp_cart%7Chide-cart%7Ctop_caro%7CDefault; jcmp_productSku=882b1010309d48048b8f3151ddccb3cf; _sdsat_all_pages_canary_variants=a_a_test16:a|slp_categories:slp_categories|catnav_load:on_cat_nav_clicked|zipcode_table:zipcode_table; _sdsat_all_pages_native_pay_eligible=No; _uetsid=_uet6ed8c6ab; _tq_id.TV-098163-1.3372=ef52068e069c26b9.1495705843.0.1495705884..; _ga=GA1.2.789964406.1495705830; _gid=GA1.2.1682210002.1495705884; s_cc=true; __pr.NaN=6jvgorz8tb; mm-so17648=gen; __pr.11xw=xqez1m3cvl; _sdsat_all_pages_login_status=logged-out; _sdsat_jid_cookie=7292a61d-af8f-4d6f-a339-7f62afead9a0; _sdsat_phaser_id=2982c0e7-287e-42bb-8858-564332ada868; _sdsat_all_pages_jet_platform=desktop; _sdsat_all_pages_site_version=3.860.1495036770896|2017-05-16 20:35:36 UTC; _sdsat_all_pages_canary_variants_2=a_a_test16:a~slp_categories:slp_categories~catnav_load:on_cat_nav_clicked~zipcode_table:zipcode_table; jet=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6WyJmMmUwMjI1NS1iODFkLTRlOTktOGU1Yi0yZGI1MjU0ZTdjNzUiXSwiamNtcEhpc3RvcnkiOltbXV0sImlwWmlwY29kZSI6WyIyMTA2MSJdLCJjbGllbnRUaWNrZXQiOlsiZXlKMGVYQWlPaUpLVjFRaUxDSmhiR2NpT2lKSVV6STFOaUo5LmV5SmpiR2xsYm5SZmFXUWlPaUl3Tm1JMlkyTTNaVGRtTnpVME16TmhPREU0T0RjelpUWmpZMkV4WTJRelppSXNJbWx6Y3lJNkltcGxkQzVqYjIwaUxDSmhkV1FpT2lKM1pXSmpiR2xsYm5RaWZRLnlKMXdoYklDVml4TE1iblliV0xQY1RvdF9EWUo3MjFYQkdFMzBpUktpdTQiXSwicHJvbW9jb2RlIjpbIlNQUklORzE1Il0sInBsYSI6W3RydWVdLCJmcmVlU2hpcHBpbmciOltmYWxzZV0sImpjbXAiOlt7ImpjbXAiOiJwbGE6Z2dsOm5qX2R1cl9nZW5fcGF0aW9fX2dhcmRlbl9hMjpwYXRpb19fZ2FyZGVuX2dyaWxsc19fb3V0ZG9vcl9jb29raW5nX2dyaWxsX2NvdmVyc19hMjpuYTpwbGFfNzg0NzQ0NTQyXzQwNTY4Mzg3NzA2X3BsYS0yOTM2MjcyMDMzNDE6bmE6bmE6bmE6Mjo4ODJiMTAxMDMwOWQ0ODA0OGI4ZjMxNTFkZGNjYjNjZiIsImNvZGUiOiJQTEExNSIsInNrdSI6Ijg4MmIxMDEwMzA5ZDQ4MDQ4YjhmMzE1MWRkY2NiM2NmIn1dLCJpYXQiOjE0OTU3MDU4OTh9.6OEM9e9fTyUZdFGju19da4rEnFh8kPyg8wENmKyhYgc; bm_sv=360FA6B793BB42A17F395D08A2D90484~BLAlpOUET7ALPzcGziB9dbZNvjFjG3XLQPFGCRTk+2bnO/ivK7G+kOe1WXpHgIFmyZhniWIzp2MpGel1xHNmiYg0QOLNqourdIffulr2J9tzacGPmXXhD6ieNGp9PAeTqVMi+2kSccO1+JzO+CaGFw==; s_tps=30; s_pvs=173; mmapi.p.pd=%221759837076%7CDwAAAApVAgDxP2Qu1Q4AARAAAUJz0Q1JAQAmoW6kU6PUSKeaIXVTo9RIAAAAAP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FAAZEaXJlY3QB1Q4BAAAAAAAAAAAAt8wAAH0vAQC3zAAABQDZlQAAAmpAfP%2FVDgD%2F%2F%2F%2F%2FAdUO1Q7%2F%2FwYAAAEAAAAAAc9dAQCNFgIAADyXAABuDVKACdUOAP%2F%2F%2F%2F8B1Q7VDv%2F%2FCgAAAQAAAAABs2ABANweAgAAiY0AAMCzlXtx1Q4A%2F%2F%2F%2F%2FwHVDtUO%2F%2F8GAAABAAAAAAORSwEATPoBAJJLAQBO%2BgEAk0sBAFD6AQABt8wAAAYAAADYlQAAHMPK3ZbVDgD%2F%2F%2F%2F%2FAdUO1Q7%2F%2FwYAAAEAAAAAAc5dAQCJFgIAAbfMAAAGAAAAmpgAAFAf9YUU1Q4A%2F%2F%2F%2F%2FwHVDtUO%2F%2F8EAAABAAAAAAR0YwEA1R4CAHVjAQDWHgIAdmMBANgeAgB3YwEA2x4CAAG3zAAABAAAAAAAAAAAAUU%3D%22; mmapi.p.srv=%22fravwcgus04%22; mmapi.e.PLA=%22true%22; mmapi.p.uat=%7B%22PLATraffic%22%3A%22true%22%7D; _sdsat_lt_pages_viewed=6; _sdsat_pages_viewed=6; _sdsat_traffic_source=',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'content-type': 'application/json',
'accept': 'application/json, text/javascript, */*; q=0.01',
'referer': 'https://jet.com/product/detail/87e89b3ce17f4742ab6d72aeaaa5480d?gclid=CPzS982CgdMCFcS1wAodABwIOQ',
'authority': 'jet.com',
'dnt': '1',
}
data = '{"zipcode":"21061","sku":"87e89b3ce17f4742ab6d72aeaaa5480d","origination":"PDP"}'
r=requests.post('https://jet.com/api/product/v2', headers=headers, data=data)
print(r)
It returns 200
And I want to convert this simple request to Python Request.
body = '{"zipcode":"21061","sku":"87e89b3ce17f4742ab6d72aeaaa5480d","origination":"PDP"}'
yield Request(url = 'https://jet.com/api/product/v2', callback=self.parse_jet_page, meta={'data':data}, method="POST", body=body, headers=self.jet_headers)
it returns 400, looks like headers are being over-written or something. Or is there bug?

I guess the error is caused by cookies.
By default, the "cookie" entry in your HTTP headers shall be overriden by a built-in downloader middleware CookiesMiddleware. Scrapy expects a user to use Request.cookies for passing cookies.
If you do need to pass cookies directly in Request.headers (instead of using Request.cookies), you'll need to disable the built-in CookiesMiddleware. You may simply set COOKIES_ENABLED=False in settings.

Specifying the body content in a post using requests(python)

I have been using fiddler to inspect an http post that an application makes and then trying to replicate that post using requests in python.
The link I am posting to: http://www.example.com/ws/for/250/buy
Now in fiddler I can clearly see the headers which are easy to replicate using requests. However when I look in textview on fiddler I see this :
tuples=4,421&flows=undefined
To replicate that I think I need to use the data parameter which I found on the docs, however I am not sure how to write this in python? As in do I do it as a dictionary and split it up according to the & sign, or do i have to specify it a a string, etc?
My current code
url = 'http://www.example.com/ws/for/250/buy'
headers = {
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1003.1 Safari/535.19 Awesomium/1.7.1',
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en',
'Accept-Charset': 'iso-8859-1,*,utf-8',
}
r6 = requests.post(url, headers = headers, verify = False)

Something like
url = 'http://www.example.com/ws/for/250/buy'
headers = {
'Connection': 'keep-alive',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1003.1 Safari/535.19 Awesomium/1.7.1',
'Accept': '*/*',
'Accept-Encoding': 'gzip,deflate',
'Accept-Language': 'en',
'Accept-Charset': 'iso-8859-1,*,utf-8',
}
r6 = requests.post(url, headers=headers, verify=False, data={'tuples': '4,421', 'flows': 'undefined'})
Should work

You provide a dictionary to the data argument:
r = requests.post(url, data={'tuples': '4,421', 'flows': 'undefined'})

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python requests post error.. can't decoding - python

You need to install brotli package to work with 'Content-Encoding': 'br'. It's duplicate of unable to decode Python web request

Related

Scraping a table on Barchart website using python

Requests / BeautifulSoup Facebook language error

POST Request runs without error but does not do what it should

Why this simple POST request not working in Python Scrapy whereas it works with simple request.post()

Specifying the body content in a post using requests(python)

Categories

Resources