Polldaddy - PD_buttonXXXXX.className='pds-vote-button';alert("This poll did not load properly."); [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed yesterday.
Improve this question
Working on a project to create a bot to vote on a school context.
REQUEST:
https://polls.polldaddy.com/vote-js.php?va=50&pt=0&r=0&p=XXXX&a=YYYYY%2C&o=&t=24136&token=e987a94442b462982294c5a918bb69d6&pz=181
HEADER:
{'Authority': 'polls.polldaddy.fm', 'method': 'GET', 'path': '/vote-js.php?va=50&pt=0&r=0&p=xxxxxx&a=YYYY%2C&o=&t=24136&token=e987a94442b462982294c5a918bb69d6&pz=181', 'scheme': 'https', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15', 'referer': 'https://poll.fm/XXXX', 'Upgrade-Insecure-Requests': '1', 'Accept-Encoding': 'gzip, deflate, sdch', 'Accept-Language': 'en-US,en;q=0.8'}
RESULT:
PD_buttonXXXXX.className='pds-vote-button';alert("This poll did not load properly.");
Did anyone have the same problem? Were you able to bypass this issue?
I'm getting a 200 OK, but my vote is not being processed

Related

How can i fix the 400 status code with this data? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 12 hours ago.
Improve this question
kikikicz_post='https://wtb.kikikickz.com/v1/integrations/airtable/b9586bc6-4151-4c84-a65f-b2d3443c928f/appZLS7at5DuMRxBe/WTB Softr/records?block_id=89e7021d-8d6d-434a-8803-7f64e519831f'
headers2 = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
'Content-Type': 'application/json; charset=utf-8'}
data2 = {
"page_size": 100,
"view": "Grid view",
"filter_by_formula": "OR(SEARCH(\"dz4709-001\", LOWER(ARRAYJOIN(dz4709-001))),SEARCH(\"dz4709-001\", LOWER(ARRAYJOIN(dz4709-001))),SEARCH(\"dz4709-001\", LOWER(ARRAYJOIN(Nike))))",
"sort_resources": [
{
"field": "Nom",
"direction": "asc"
}
],
"rows": 0,
"airtable_response_formatting": {
"format": "string"
}
}
session=requests.Session()
res2=session.post(kikikicz_post,json=data2,headers=headers2)
print(res2)
I am trying to make post request but keep getting 400 error tried to change payload but still same should i change something ?
Can find any api , its airtable just searched for item and i found this post request with the response.
post
payload

Error status code 403 even with headers, Python Requests

I am sending a request to some url. I Copied the curl url to get the code from curl to python tool. So all the headers are included, but my request is not working and I recieve status code 403 on printing and error code 1020 in the html output. The code is
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
# 'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
}
response = requests.get('https://v2.gcchmc.org/book-appointment/', headers=headers)
print(response.status_code)
print(response.cookies.get_dict())
with open("test.html",'w') as f:
f.write(response.text)
I also get cookies but not getting the desired response. I know I can do it with selenium but I want to know the reason behind this. Thanks in advance.
Note:
I have installed all the libraries installed with request with same version as computer and still not working and throwing 403 error
The site is protected by cloudflare which aims to block, among other things, unauthorized data scraping. From What is data scraping?
The process of web scraping is fairly simple, though the
implementation can be complex. Web scraping occurs in 3 steps:
First the piece of code used to pull the information, which we call a scraper bot, sends an HTTP GET request to a specific website.
When the website responds, the scraper parses the HTML document for a specific pattern of data.
Once the data is extracted, it is converted into whatever specific format the scraper bot’s author designed.
You can use urllib instead of requests, it seems to be able to deal with cloudflare
req = urllib.request.Request('https://v2.gcchmc.org/book-appointment/')
req.add_headers('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0')
req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8')
req.add_header('Accept-Language', 'en-US,en;q=0.5')
r = urllib.request.urlopen(req).read().decode('utf-8')
with open("test.html", 'w', encoding="utf-8") as f:
f.write(r)
It works on my machine, so I am not sure what the problem is.
However, when I want send a request which does not work, I often try if it works using playwright. Playwright uses a browser driver and thus mimics your actual browser when visiting the page. It can be installed using pip install playwright. When you try it for the first time it may give an error which tells you to install the drivers, just follow the instruction to do so.
With playwright you can try the following:
from playwright.sync_api import sync_playwright
url = 'https://v2.gcchmc.org/book-appointment/'
ua = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/69.0.3497.100 Safari/537.36"
)
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page(user_agent=ua)
page.goto(url)
page.wait_for_timeout(1000)
html = page.content()
print(html)
A downside of playwright is that it requires the installation of the chromium (or other) browsers. This is a downside as it may complicate deployment, as the browser can not simply be added to requirements.txt, and a container image is required.
Try running Burp Suite's Proxy to see all the headers and other data like cookies. Then you could mimic the request with the Python module. That's what I always do.
Good luck!
Had the same problem recently.
Using the javascript fetch-api with Selenium-Profiles worked for me.
example js:
fetch('http://example.com/movies.json')
.then((response) => response.json())
.then((data) => console.log(data));o
Example Python with Selenium-Profiles:
headers = {
"accept": "application/json",
"accept-encoding": "gzip, deflate, br",
"accept-language": profile["cdp"]["useragent"]["acceptLanguage"],
"content-type": "application/json",
# "cookie": cookie_str, # optional
"sec-ch-ua": "'Google Chrome';v='107', 'Chromium';v='107', 'Not=A?Brand';v='24'",
"sec-ch-ua-mobile": "?0", # "?1" for mobile
"sec-ch-ua-platform": "'" + profile['cdp']['useragent']['userAgentMetadata']['platform'] + "'",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"user-agent": profile['cdp']['useragent']['userAgent']
}
answer = driver.requests.fetch("https://www.example.com/",
options={
"body": json.dumps(post_data),
"headers": headers,
"method":"POST",
"mode":"same-origin"
})
I don't know why this occurs, but I assume cloudfare and others are able to detect, whether a request is made with javascript.

Download image by requests not work correct

I have a question.
Is there any possibility to download images that are currently on the website through requests, but without using the url?
In my case, it does not work, because the image on the website is different than the one under the link from which I am downloading it. This is due to the fact that the image changes each time the link is entered. And I want to download exactly what's on the page to rewrite the code.
Previously, I used selenium and the screenshot option for this, but I have already rewritten all the code for requests and I only miss this one.
Anyone have an idea how to download a photo that is currently on the site?
Below is the code with links:
import requests
from requests_html import HTMLSession
headers = {
'Content-Type': 'image/png',
'Host': 'www.oglaszamy24.pl',
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-GPC': '1',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://www.oglaszamy24.pl/dodaj-ogloszenie2.php?c1=8&c2=40&at=1',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7'
}
session = HTMLSession()
r = session.get('https://www.oglaszamy24.pl/dodaj-ogloszenie2.php?c1=8&c2=40&at=1')
r.html.render(sleep=2,timeout=20)
links = r.html.find("#captcha_img")
result = str(links)
results = result.split("src=")[1].split("'")[1]
resultss = "https://www.oglaszamy24.pl/"+results
with open ('image.png', 'wb') as f:
f.write(requests.get(resultss, headers=headers).content)
i'd rather try to use PIL (Python Image Library) and get a screenshot of the element's bounding box (coordinates and size). You can get those with libs like BS4 (BeautifulSoup) or Selenium.
Then you'd have a local copy (screenshot) of what the user would see.
A lot of sites have protection against scrapers and capcha services usually do not allow their resources to be downloaded, either via requests or otherwise.
But like that NFT joke goes: you don't download a screenshot...

Python - Is there a way to bypass illegal characters in dictionary keys?

I was wondering if there was a way to bypass illegal characters in dictionary keys?
I am making a "Famous Birthdays" boost faker, essentially a bot that keeps hitting the Famous Birthdays api to make it look like someone boosted the profile the person chooses. I pulled the API from clicking the "Boost" button while on the Network tab in inspect element.
Here are the dictionary keys in question:
':authority': 'www.famousbirthdays.com',
':method': 'POST',
':path': '/api/people/boost',
':scheme': 'https',
When I run my program, it throws an error telling me that I need to remove the illegal characters in question.
requests.exceptions.InvalidHeader:
Invalid leading whitespace, reserved character(s), or returncharacter(s) in header name: ':authority'
Is there anyway to bypass this? The illegal characters cannot be removed because they pertain to the headers, and if I change the headers, it will break the program.
EDIT: Here is my code so it is easier to solve the issue:
#https://www.famousbirthdays.com/api/people/boost
import requests
from hyper.contrib import HTTP20Adapter
def main():
person = input('Insert page link (e.g kristian-ramey)\n')
def getHeaders(person):
global headers
headers = {
':authority': 'www.famousbirthdays.com',
':method': 'POST',
':path': '/api/people/boost',
':scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,es;q=0.8,fr;q=0.7',
'content-length': '23',
'content-type': 'application/x-www-form-urlencoded',
'cookie': '__aaxsc=1; _ga=GA1.2.1274875939.1657675345; _gid=GA1.2.109078920.1657675345; lookup=las+vegas; XSRF-TOKEN=eyJpdiI6InZvNmdIXC9hSmhPRzZVelpRMVwvNUV4Zz09IiwidmFsdWUiOiJrbGJYZWloaUpjbnp4cVFCXC8xSndoSmxudmxLbjU5aG9cL2NCWW5qSWVWZ0lJZjVZbzY3ZWRwWlI3ZFZiTnJLZHYiLCJtYWMiOiJkYWViNjNlZmIxODc1ZTllOWM5OGEzMmM0ZTkxMGMzODA2ZTA4MDMyODQ1OWFjYzA4MzQyNDgzMmZjNjQ4ODY0In0%3D; laravel_session=eyJpdiI6Ino1QkR6TTBGaU1kSFBhSjZNMUhMWFE9PSIsInZhbHVlIjoicFwvaVh0TUpReWxmS0pJVzVsV3JzUUlBOGpUKzYySjlwcEdzcTN1ZTlLNUVTenpqazFuZkRsa2xBNWFRT0JRbnkiLCJtYWMiOiJlZDhlODBhMTVjOTYxYzRlZDAxM2JhMGZjYzVkMWE0NmY0NGQyOTkwMmIwNWJhMmNmYmEyYzc0MGVhYTU3OWYzIn0%3D; aasd=8%7C1657676628018',
'origin': 'https://www.famousbirthdays.com',
'referer': f'https://www.famousbirthdays.com/people/{person}.html',
'sec-ch-ua': "\".Not/A)Brand\";v=\"99\", \"Google Chrome\";v=\"103\", \"Chromium\";v=\"103\"",
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': "\"Windows\"",
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
return headers
session = requests.session()
session.mount('http://www.famousbirthdays.com/api/people/boost', HTTP20Adapter())
r = session.post('http://www.famousbirthdays.com/api/people/boost',headers=getHeaders(person))
print(r.text)
print(r.status_code)
main()
Just use dict comprehensions and strip() method
raw_dict = {':authority': 'www.famousbirthdays.com', ':method': 'POST',}
good_dict = {item.lstrip(":"):value for item, value in q.items()}
print(good_dict)
The output is:
{'authority': 'www.famousbirthdays.com', 'method': 'POST'}

Getting 403 with python requests

I have a scraper that has worked without an issue for 18 months until today. Now I get 403 response from htlv.org and don't seem to be able to fix the issue. My code is below so the answer is not the usual to just add headers. If I print response.text it says something about captchas. So I assume I'd have to bypass captcha or my ip is blocked? Please help :)
import requests
url = 'https://www.hltv.org/matches'
headers = {
"Accept-Language": "en-US,en;q=0.5",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Referer": "http://thewebsite.com",
"Connection": "keep-alive"}
response = requests.get(url, headers=headers)
print response
EDIT: This remains a mystery to me, but today my code started working again on my main PC. Did not make any changes to the code.
KokoseiJ could not reproduce the problem, but Booboo did. The code also worked on my old PC, which I dug from storage, but not on my main PC. Anyways, thanks to all who tried to help me with this issue.
I am posting this not as a solution but as something that did not work, but may be useful information.
I went to https://www.hltv.org/matches then brought up Chrome's Inspector and reloaded the page and looked at the request headers Chrome (supposedly) used for the GET request. Some of the header names began with a ':', which requests considers illegal. But looking around Stack Overflow, I found a way to get around that (supposedly for Python 3.7 and greater). See the accepted answer and comments here for details.
This still resulted in a 403 error. Perhaps somebody might spot an error in this (or not).
These were the headers shown by the Inspector:
:authority: www.hltv.org
:method: GET
:path: /matches
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: no-cache
cookie: MatchFilter={%22active%22:false%2C%22live%22:false%2C%22stars%22:1%2C%22lan%22:false%2C%22teams%22:[]}
dnt: 1
pragma: no-cache
sec-ch-ua: " Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36
And the code:
import requests
import http.client
import re
http.client._is_legal_header_name = re.compile(rb'\S[^:\r\n]*').fullmatch
url = 'https://www.hltv.org/matches'
headers = {
':authority': 'www.hltv.org',
':method': 'GET',
':path': '/matches',
':scheme': 'https',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'cache-control': 'no-cache',
'cookie': 'MatchFilter={%22active%22:false%2C%22live%22:false%2C%22stars%22:1%2C%22lan%22:false%2C%22teams%22:[]}',
'dnt': '1',
'pragma': 'no-cache',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
}
response = requests.get(url, headers=headers)
print(response.text)
print(response)
Also came across this issue recently.
My solution was using th js-fetch library (see answer)
I assume cloudfare and others found some way to detect, wheather a request is made by a browser (js) or other programming languages.

Categories