I tried to create program which will create account on this site: https://www.kostkuj.cz/register
My main problem is that i dont really know how it works, so i re-builded one project to my requirements.
I also tried sending requests with login data as parsed text:
email: "email#gmail.com"
plainPassword: {first: "pass1", second: "pass1"}
first: "pass1"
second: "pass1"
username: "username3"
But i dont know what i am doing wrong.
This is my whole code:
import requests
headers = {
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36 OPR/63.0.3368.71'
}
register_data = {
{"username":"username3","email":"email#gmail.com","plainPassword":{"first":"pass1","second":"pass1"}}
}
with requests.Session() as s:
url = 'https://api.kostkuj.cz/register'
r = s.get(url, headers=headers)
r = s.post(url, data=register_data, headers=headers)
print(r.content)
I am getting error like:
Traceback (most recent call last):
File "C:\Users\jiris\Desktop\spamkostkuj.py", line 9, in <module>
{"username":"username3","email":"email#gmail.com","plainPassword":{"first":"pass1","second":"pass1"}}
TypeError: unhashable type: 'dict'
First thing headers needs ro in pair i.e. key value pair. In your case somthing like:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36 OPR/63.0.3368.71'
}
Secondly, your are sending data as dict of dict which is incorrect, instead send a dict, like:
register_data = {"username":"username3","email":"email#gmail.com","plainPassword":{"first":"pass1","second":"pass1"}}
Try this. Hopefully is should work for you.
Here you go:
r = s.post(url, json=register_data, headers=headers)
Related
I would like to scrape tabular content from the landing page of this website. There are 100 rows in it's first page. When I observe network activity in dev tools, I could notice that some get requests is being issued to this url https://io6.dexscreener.io/u/ws3/screener3/ with appropriate parameters which ends up producing json content.
However, when I try to mimic that requests through my following efforts:
import requests
url = 'https://io6.dexscreener.io/u/ws3/screener3/'
params = {
'EIO': '4',
'transport': 'polling',
't': 'NwYSrFK',
'sid': 'ztAOHWOb-1ulTq-0AQwi',
}
headers = {
'accept': '*/*',
'referer': 'https://dexscreener.com/',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(url,params=params)
print(res.content)
I get this response:
`{"code":3,"message":"Bad request"}`
How can I get response having tabular content from that webpage?
Here is a very quick and dirty piece of python code that does the initial handshake and sets up the websocket connection and downloads the data in json format infinitely. I haven't tested this code extensively and I am not sure exactly what is necessary or not (in terms of the steps in the handshake) but I have mimicked the browser behaviour and it seems to work fine:
import requests
from websocket import create_connection
import json
s = requests.Session()
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://dexscreener.com/ethereum'
resp = s.get(url,headers=headers)
print(resp)
step1 = s.get('https://io3.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Os')
step2 = s.get('https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-S5')
obj = json.loads(step2.text[1:])
code = obj['sid']
payload = '40/u/ws/screener/consolidated/platform/ethereum/h1/top/1,'
step3 = s.post(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xt&sid={code}',data=payload)
step4 = s.get(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xu&sid={code}')
d = step4.text.replace('','').replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,','').replace(payload,'')
start = '["screener",'
end = ']["latestBlock",'
dirty = d[d.find(start)+len(start):d.rfind(end)].strip()
clean = json.loads(dirty)
print(clean)
# Initialize the headers needed for the websocket connection
headers = json.dumps({
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
'Cache-Control':'no-cache',
'Connection':'Upgrade',
'Host':'io3.dexscreener.io',
'Origin':'https://dexscreener.com',
'Pragma':'no-cache',
'Sec-WebSocket-Extensions':'permessage-deflate; client_max_window_bits',
'Sec-WebSocket-Key':'ssklBDKxAOUt3D47SoEttQ==',
'Sec-WebSocket-Version':'13',
'Upgrade':'websocket',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
})
# Then create a connection to the tunnel
ws = create_connection(f"wss://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=websocket&sid={code}",headers=headers)
# Then send the initial messages through the tunnel
ws.send('2probe')
ws.send('5')
# Here you will view the message return from the tunnel
while True:
try:
json_data = json.loads(ws.recv().replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,',''))
print(json_data)
except:
pass
Can someone help me to solve my problem in this code?
CODE:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/83.0.4103.61 Safari/537.36"}
url = "https://www.amazon.com/RUNMUS-Surround-Canceling-Compatible-Controller/dp/B07GRM747Y"
resp = requests.get(url, headers=headers)
s = BeautifulSoup(resp.content, features='lxml')
product_title = s.select("#productTitle")[0].get_text().strip()
print(product_title)
If you try to print what you get as response, you will not encounter related errors.
import requests
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"}
url = "https://www.amazon.com/RUNMUS-Surround-Canceling-Compatible-Controller/dp/B07GRM747Y"
resp = requests.get(url, headers=headers)
print(resp.content)
The output you are getting from this request:
b'<!--\n To discuss automated access to Amazon data please contact api-services-support#amazon.com.\n For information about migrating to our APIs refer to our Marketplace APIs...
The site you are sending requests is not allowing you to access content with provided headers. So your s.select("#productTitle") creates empty list therefore you are getting an index error.
I am new to python and trying to build a data scraper.
When trying to run:
schedule_data = requests.get('https://nsmmhl.goalline.ca/schedule.php?league_id=264&from_date=2019-09-12&to_date=2020-02-02')
I get:
<bound method Response.raise_for_status of <Response [403]>>
Can anybody tell me why? The link works fine.
Set the header attribute of the request
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/80.0.3987.149 Safari/537.36'
}
schedule_data = requests.get(
'https://nsmmhl.goalline.ca/schedule.php?league_id=264&from_date=2019-09-12&to_date=2020-02-02',
headers=headers
)
I want to login 'douban.com' with python session
import requests
url = 'https://www.douban.com/'
logurl = 'https://accounts.douban.com/passport/login_popup'
data = {'username': 'abc#gmail.com',
'password': 'abcdef'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/80.0.3987.87 Safari/537.36'}
se = requests.session()
request = se.post(logurl, data=data, headers=headers)
request1 = se.get(url)
print(request1.content)
this display "b''",I don't get this work or not!
You're getting an empty response, meaning the request is not working properly. You want to debug the request further by looking into Your response. Check Requests lib documentation. request1.status_code and request1.headers might interest you.
The b'' is just a Bytes literal prefix. Python 3 documentation:
I've seen some similar threads but neither gave me the answer. I simply need to get html content from one website. I'm sending the POST request with data for particular case and then using GET requests I want to scrape the text from html. The problem is that I always receive the first page's content. Not sure what I am doing wrong.
import requests
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection':'keep-alive',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://przegladarka-ekw.ms.gov.pl',
'Referer':'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
data = {
'kodWydzialu':'PT1R',
'nrKw':'00037314',
'cyfraK':'9',
}
url = 'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW'
r = requests.session()
r.post(url, data=data, headers=headers)
final_content = r.get(url, headers=headers)
print(final_content.text)
The GET requests come from ("https://przegladarka-ekw.ms.gov.pl/eukw_prz/eukw201906070952/js/jquery-1.11.0_min.js
") but it returns a wall of code. My goal is to scrape the page which appears after providing the data from above to search menu.
try this
import json
import urllib.request
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7',
'Connection':'keep-alive',
'Content-Type':'application/x-www-form-urlencoded',
'Origin':'https://przegladarka-ekw.ms.gov.pl',
'Referer':'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
data = {
'kodWydzialu':'PT1R',
'nrKw':'00037314',
'cyfraK':'9',
}
url = 'https://przegladarka-ekw.ms.gov.pl/eukw_prz/KsiegiWieczyste/wyszukiwanieKW'
r=urllib.request.urlopen(url, data=bytes(json.dumps(data), encoding="utf-8"))
final_content = r
for i in r:
print(i)