I want to parse product data from this page, but with requests.get it is not work. So I inspected page networks and found intereste link:
I tried to send post request to this link with correct form data, but in response i got only {"message":"Expecting value (near 1:1)","status":400}
How can I get correct product data from this page?
It looks like your post is mostly code; please add some more details.
It looks like your post is mostly code; please add some more details.
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36",
"Accept": '*/*',
"Accept-Encoding": "gzip, deflate, br",
'Connection': 'keep-alive',
'Host': 'cgrd9wlxe4-dsn.algolia.net',
'Origin': 'https://www.eprice.it',
'Referer': "https://www.eprice.it/",
'Content-Type': 'application/x-www-form-urlencoded',
"Sec-Fetch-Dest": 'empty',
"Sec-Fetch-Mode": 'cors',
'Sec-Fetch-Site': 'cross-site',
'sec-ch-ua': "Not A;Brand",
"sec-ch-ua-mobile": '?0',
"sec-ch-ua-platform": "Windows",
}
form_data = {
"requests": [
{
"indexName": "prd_products_suggest",
"params": {
"highlightPreTag": "<strong>",
"highlightPostTag": "</strong>",
"query": 6970995781939,
"hitsPerPage": 36,
"clickAnalytics": 1,
"analyticsTags": ["main", "desktop"],
"ruleContexts": ["ovr", "desktop", "t1"],
"facetingAfterDistinct": 1,
"getRankingInfo": 1,
"page": 0,
"maxValuesPerFacet": 10,
"facets": ["manufacturer", "offer.price", "scegliPer", "offer.shopType",
"reviews.avgRatingInt",
"navigation.lvl0,navigation.lvl1,navigation.lvl2,navigation.lvl3"],
"tagFilters": ""
}
},
{
"indexName": "prd_products_suggest_b",
"params": {
"query": 6970995781939,
"hitsPerPage": 10,
"clickAnalytics": 1,
"analyticsTags": ["car_offerte_oggi", "desktop"],
"ruleContexts": ["ovr", "car_offerte_oggi", "desktop"],
"getRankingInfo": 1,
"page": 0,
"maxValuesPerFacet": 10,
"minProximity": 2,
"facetFilters": [],
"facets": ["manufacturer", "offer.price", "scegliPer", "offer.shopType", "reviews.avgRatingInt",
"navigation.lvl0,navigation.lvl1,navigation.lvl2,navigation.lvl3"],
"tagFilters": ""
}
}
]
}
response = requests.post(
url="https://cgrd9wlxe4-dsn.algolia.net/1/indexes/*/queries?"
"x-algolia-agent=Algolia%20for%20JavaScript%20(4.11.0)%3B%20Browser%20(lite)&"
"x-algolia-api-key=e9c9895532cb88b620f96f3e6617c00f&"
"x-algolia-application-id=CGRD9WLXE4",
headers=headers,
data=form_data
)
print(response.text)
Algolia is a hosted search API, a retail company can index their product list into Algolia and then integrate their front-end to it to query for products to display to customers.
When you're inspecting the page networks, you're seeing the calls out to this search API that are formed by the owner of the website, the retail company, likely using the client you can download from Algolia directly.
I'm not sure why the form-data isn't working, if you download the client in Python, you will find their own example of how to integrate to it. But, the point of the context above is that the hosted Search API takes requests in different ways, so you can take the source message you have, set the Content-Type header to 'application-json' and you'll get a response.
PostMan Call
The full API documentation - https://www.algolia.com/doc/rest-api/search/#search-index-post
Related
I am beginner in mastering of Python and scraping data into internet.Once I faced issue I can't solve. There is a site tenchat.ru. On personal pages there is hidden information like phone numbers and mail addresses. I was trying to extract that data trying GET and POST requests. Reloading page on Network page of Devtools I see that XHR response with name of person('venss') in preload has fields like individualConnection: false and hiddenFields: ["phoneWithPrefixInnerSto", "contactPhone"]0: "phoneWithPrefixInnerSto"1: "contactPhone". It is get request. There is also field 'current' where in preload I see my data similar to json. And in preload I see also field individualConnection: false. I guess that as this field has value False server doesn't allow me to see phone number.
Another issue is that http addresses shown in headers always give me 401 response. Only url https://tenchat.ru/venss gives me response 200. Main address is https://tenchat.ru. But there is authentication via phone number and sms.
May be it is not possible at all to get data I want but may be there is a way which is not known for me.
If more info is needed I may provide.
Please advise.
I used requests, httpx, requests_html for sending POST and GET requests. Also I used Insomnia to varify if JSON is available and curlconverter.
My code:
import httpx
url = 'https://tenchat.ru/prisyazhnaya_o'
url1 = 'https://tenchat.ru/gostinder/api/web/auth/account/username/venss'
url2 = 'https://tenchat.ru/gostinder/api/web/auth/account'
params = {"cache-control": "no-cache, no-store, max-age=0, must-revalidate", "content-encoding": "gzip",
"content-type": "application/json", "date": "Thu, 05 Jan 2023 09:11:25 GMT",
"expires": 0, "pragma": "no-cache", "referrer-policy": "no-referrer", "server": "nginx",
"strict-transport-security": "max-age=31536000", "vary": "Accept-Encoding",
"x-content-type-options": "nosniff", "x-frame-options": "ACCEPT",
"x-xss-protection": "1; mode=block"}
cookies = {
'_ym_uid': '167240745648292064',
'_ym_d': '1672407456',
'tmr_lvid': 'bd33e7b65db07187d68680e428694394',
'tmr_lvidTS': '1672407474003',
'_ym_isad': '1',
'_ym_visorc': 'w',
'TCAF': '0pVpAvxaxLhSt1pjk_op4SZmcBs',
'TCRF': 'OKF7KxB0Wz1l17BIMzPEopsIKPM',
'SESSION': '225fb0ff-0503-4a86-b459-5ceab7fca181',
}
headers = {
'authority': 'tenchat.ru',
'accept': 'application/json, text/plain, */*',
'accept-language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7',
# 'cookie': '_ym_uid=167240745648292064; _ym_d=1672407456; tmr_lvid=bd33e7b65db07187d68680e428694394; tmr_lvidTS=1672407474003; _ym_isad=1; _ym_visorc=w; TCAF=0pVpAvxaxLhSt1pjk_op4SZmcBs; TCRF=OKF7KxB0Wz1l17BIMzPEopsIKPM; SESSION=225fb0ff-0503-4a86-b459-5ceab7fca181',
'dnt': '1',
'sec-ch-ua': '"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
}
response = httpx.post(url, cookies=cookies, headers=headers)
print(response.status_code)
I´m getting stuck on trying to bring P2P selling data from Binance using Python. Running the code below I can bring the information from de BUY section but I´m not being able to see the information from the SELL section. Can you help me?
The following code runs right but it only shows the BUY section of Binance P2P. When I try to use this URL for example (https://p2p.binance.com/es/trade/sell/BUSD?fiat=ARS&payment=ALL) nothing changes.
url_2 = 'https://p2p.binance.com/bapi/c2c/v2/friendly/c2c/adv/search'
p2p = requests.get(url)
q = p2p.text
w = json.loads(q)
e = w['data']
df = pd.json_normalize(e)
df
To access the p2p data you need to POST to https://p2p.binance.com/bapi/c2c/v2/friendly/c2c/adv/search
So an e.g.:
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"Content-Length": "123",
"content-type": "application/json",
"Host": "p2p.binance.com",
"Origin": "https://p2p.binance.com",
"Pragma": "no-cache",
"TE": "Trailers",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
data = {
"USDT": {
"asset": "USDT",
"fiat": "ZAR",
"merchantCheck": True,
"page": 1,
"payTypes": ["BANK"],
"publisherType": None,
"rows": 20,
"tradeType": "Sell,
},
r = requests.post('https://p2p.binance.com/bapi/c2c/v2/friendly/c2c/adv/search', headers=headers, json=data).json()
You can change data accordingly to your needs. (e.g. change "Tradetype" to "Buy") unfortunately the API isn't documented so working out the parameters requires some trial and error. This Question has a good list of the parameters.
I am trying to extend this repo with support for cryptocurrency trading using Python (will create a PR once completed).
I have all the API methods working with the exception of actually placing trades.
The endpoint for placing crypto orders is https://nummus.robinhood.com/orders/
This endpoint expects a POST request to be made with the body in JSON format along with the following headers:
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en;q=1, fr;q=0.9, de;q=0.8, ja;q=0.7, nl;q=0.6, it;q=0.5",
"Content-Type": "application/json",
"X-Robinhood-API-Version": "1.0.0",
"Connection": "keep-alive",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
"Origin": "https://robinhood.com",
"Authorization": "Bearer <access_token>"
The payload I'm sending looks like this:
{
'account_id': <account id>,
'currency_pair_id': '3d961844-d360-45fc-989b-f6fca761d511', // this is BTC
'price': <BTC price derived using quotes API>,
'quantity': <BTC quantity>,
'ref_id': str(uuid.uuid4()), // Im not sure why this is needed but I saw someone else use [the uuid library][2] to derive this value like this
'side': 'buy',
'time_in_force': 'gtc',
'type': 'market'
}
The response I get is as follows:
400 Client Error: Bad Request for url: https://nummus.robinhood.com/orders/
I can confirm that I am able to authenticate successfully since I am able to use the https://nummus.robinhood.com/accounts/ and https://nummus.robinhood.com/holdings/ endpoints to view my account data and holdings.
I also believe that my access_token in the Authentication header is correct because if I set it to some random value (Bearer abc123, for instance) I get a 401 Client Error: Unauthorized response.
I think the issue has to do with the payload but I am not able to find good documentation for the nummus.robinhood.com API.
Does anyone see how/whether my request payload is malformed and/or can point me in the right direction to documentation for the nummus.robinhood.com/orders endpoint?
You need to pass the json payload as the value to the parameter json in the requests post call
import requests
json_payload = {
'account_id': <account id>,
'currency_pair_id': '3d961844-d360-45fc-989b-f6fca761d511', // this is BTC
'price': <BTC price derived using quotes API>,
'quantity': <BTC quantity>,
'ref_id': str(uuid.uuid4()), // Im not sure why this is needed but I saw someone else use [the uuid library][2] to derive this value like this
'side': 'buy',
'time_in_force': 'gtc',
'type': 'market'
}
headers = {
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en;q=1, fr;q=0.9, de;q=0.8, ja;q=0.7, nl;q=0.6, it;q=0.5",
"Content-Type": "application/json",
"X-Robinhood-API-Version": "1.0.0",
"Connection": "keep-alive",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
"Origin": "https://robinhood.com",
"Authorization": "Bearer <access_token>"
}
url = "https://nummus.robinhood.com/orders/"
s = requests.Session()
res = s.request("post", url, json=json_payload, timeout=10, headers=headers)
print(res.status_code)
print(res.text)
When I open the portal in a browser, it automatically logs me in through SSO.
Once logged-in, I want to retrieve the Authorization token, which I can use to automate few other purposes, like - apply leaves for a particular date, fill timesheets, etc. using a python code.
I am currently able to perform these actions by manually copying the token from the chrome inspect window and hardcoding the same in my python code (using requests module in python). However, in order to truly automate the process, I need to dynamically retrieve the authorization token, which refreshes everyday.
Here is the snippet of the code I am using.
url = 'mycompanyportal url for leaves module'
headers={"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9,hi;q=0.8",
"Content-Type": "application/x-www-form-urlencoded",
"Accept" : "application/json",
"Cookie": "_ga=GA1.2.18131865....................................",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36",
"Authorization": "Bearer eyJ0eXAiOiJKV1...................................."
}
for leaveDate in leaveDates:
r = requests.post(url, allow_redirects=True, data={
'comments': "COVID-19 BCP",
'fromDate': leaveDate,
'isHalfDay': "N",
'leaveType': "WFH",
'leaveTypeId': 3,
'toDate': leaveDate
}, headers=headers)
I am trying to login to a site called grailed.com and follow a certain product. The code below is what I have tried.
The code below succeeds in logging in with my credentials. However whenever I try to follow a product (the id in the payload is the id of the product) the code runs without any errors but fails to follow the product. I am confused at this behavior. Is it a similar case to Instagram (where Instagram blocks any attempt to interact programmatically with their site and force you to use their API (grailed.com does not have a API for the public to use AFAIK)
I tried the following code (which looks exactly like the POST request sent when you follow on the site).
headers/data defined here
r = requests.Session()
v = r.post("https://www.grailed.com/api/sign_in", json=data,headers = headers)
headers = {
'authority': 'www.grailed.com',
'method': 'POST',
"path": "/api/follows",
'scheme': 'https',
'accept': 'application/json',
'accept-encoding': 'gzip, deflate, br',
"content-type": "application/json",
"x-amplitude-id": "1547853919085",
"x-api-version": "application/grailed.api.v1",
"x-csrf-token": "9ph4VotTqyOBQzcUt8c3C5tJrFV7VlT9U5XrXdbt9/8G8I14mGllOMNGqGNYlkES/Z8OLfffIEJeRv9qydISIw==",
"origin": "https://www.grailed.com",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
}
payload = {
"id": "7917017"
}
b = r.post("https://www.grailed.com/api/follows",json = payload,headers = headers)
If API is not designed to be public, you are most likely missing csrf token in your follow headers.
You have to find an CSRF token, and add it to /api/follows POST.
taking fast look at code, this might be hard as everything goes inside javascript.