Where to get header values for request while webscraping? - python

I am trying to web scrap "https://pricehistoryapp.com/" to obtain the product's highest and lowest prices. I am using python requests library for this.
I have observed that this site obtains the next address to go by using a request named 'getSlugFromUrl' made to server. This is also first xhr request made as search button is pressed. I understood some part of pay load and headers but not able to figure out others:
Header:
:authority: ph.pricetoolkit.com
:method: POST
:path: /api/product/history/getSlugFromUrl
:scheme: https
accept: application/json
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
auth: 8nV5lXoVu/Qk2z5BhZmbPj4rxSdd35ThEbiRgK1kCCz+wU0Guh+6qal03DS3J6HG
cache-control: no-cache
content-length: 738
content-type: application/x-www-form-urlencoded
origin: https://pricehistoryapp.com
pragma: no-cache
referer: https://pricehistoryapp.com/
sec-ch-ua: "Microsoft Edge";v="107", "Chromium";v="107", "Not=A?Brand";v="24"
sec-ch-ua-mobile: ?1
sec-ch-ua-platform: "Android"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Mobile Safari/537.36 Edg/107.0.1418.24
Payload:
purl:https://www.flipkart.com/viewsonic-vx-series-24-inch-wqhd-led-backlit-ips-panel-frameless-monitor-vx2480-2k-shd/p/itmedaf0773f47ba?pid=MONG5KEK2GDSGTSY
lid: LSTMONG5KEK2GDSGTSYAQGKIT
marketplace: FLIPKART
store: 6bo/g0i/9no
srno: b_1_5
otracker: hp_omu_Best+of+Electronics_4_3.dealCard.OMU_NOBMPKW1HQ7A_3
iid: 083d5b0d-6840-426e-811b-28b45d6e6ea7.MONG5KEK2GDSGTSY.SEARCH
ssid: d5n99toygg0000001667914476777
For instance from where is auth obtained from in header, or from where lid, iid, ssid obtained for payload. I know the question is really stupid, but please guide me towards a solution. Thanks in advance.

Related

How to get response with this request in python?

Hi I have a request as so
GET https://dropezy.goadda.in/master/v1/id/products/search?q=buah&storeId=619e02f0f580166d48277fc0&sortBy=relevance HTTP/2.0
accept: application/json
enro-api-key: rhcvwcdd-7aza-zvv7qskyu-4b8jm8kwppkf
cache-control: max-age=31536000
store-id: 619e02f0f580166d48277fc0
user-agent: Mozilla/5.0 (Linux; Android 11; sdk_gphone_x86 Build/RSR1.201013.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/83.0.4103.106 Mobile Safari/537.36
content-type: application/json; charset=utf-8
origin: https://www.dropezy.com
x-requested-with: com.enroco.dropezy
sec-fetch-site: cross-site
sec-fetch-mode: cors
sec-fetch-dest: empty
referer: https://www.dropezy.com/id/search/
accept-encoding: gzip, deflate
accept-language: en-US,en;q=0.9
content-length: 0
When I tried using request in python it returns message not found
This is my current code
url = 'https://dropezy.goadda.in/master/v1/id/products/search?q=buah&storeId=619e02f0f580166d48277fc0&sortBy=relevance HTTP/2.0'
accept='application/json'
jsonData = requests.post(url, accept).json()
Can somebody help? How to insert the other components such as user-agent, enro-api-key into requests?
The two requests aren't the same. The first is a GET request, but the second is a POST.
Use requests.get to make a GET request.
If you need to supply headers in a request, yes, you can do that: https://docs.python-requests.org/en/latest/user/quickstart/#custom-headers.
Generally, the docs for Requests are quite good: https://docs.python-requests.org/en/latest/

Python Script to post Authorization Header to specific IP/port

I need a python script that posts a given Authorization Bearer header to a specific ipand port.
This is what I have so far.
#!/usr/bin/python
import urllib3
import certifi
http = urllib3.PoolManager(ca_certs=certifi.where())
url = 'http://172.10.10.2:3000'
req = http.request('POST', url, fields={'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbWQiOiJscyR7SUZTfS1sYSR7SUZTfS90bXAvbmMifQ.EziCTtJn1PpPXvemJllF36w7ADNkhKiktZ5sv9ADR3o'})
print(req.data.decode('utf-8'))
I currently get an error when running this stating that the Required authorization token is not found
The bearer code is created manually and then input into the script, if there was a way to import that from the site I create it on that would be helpful.
What I need the output to be in this -
GET / HTTP/1.1
Host: 172.10.10.2:3000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: close
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbWQiOiJpcCR7SUZTfWEifQ.RkoZinBcg2_5HRGgv1AtErhscIVBRv2hUcGIF08ZlCM

How to mimic export CSV functionality via Python code (where

When I use the export CSV functionality in testrail, I see that it does a POST request to the following API : /index.php?/plans/export_csv/:plan_id.
_token: <APIToken>
section_ids:
section_include:
columns: tests:id,cases:title,tests:assignedto_id,tests:original_case_id,tests:elapsed,cases:priority_id,tests:run_name,tests:tested_by,tests:tested_on
layout: results
separator_hint: 1
format: excel
Along with the following request headers.
authority: <Authority>
:method: POST
:path: /index.php?/plans/export_csv/:plan_id
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-GB,en-US;q=0.9,en;q=0.8
cache-control: max-age=0
content-length: 320
content-type: application/x-www-form-urlencoded
cookie: notificationbar=12345; tr_session=<session_id>; tr_rememberme=<id>
origin: <Origin>
referer: <Referer>
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: same-origin
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36
Is there a way for me to mimic this via Python?
You should take a look at the requests package.
Requests is a library that helps you make HTTP1.1 requests.

Using Python and requests module to post

There are similar questions posted, but I still seem to have a problem. I am expecting to receive a registration email after running this. I receive nothing. Two questions. What is wrong? How would I even know if the data was successfully submitted as opposed to the page just loading normally?
serviceurl = 'https://signup.com/'
payload = {'register-fname': 'Peter', 'register-lname': "Parker", 'register-email': 'xyz#email.com', 'register-password': '9dlD313kF'}
r2 = requests.post(serviceurl, data=payload)
print(r2.status_code)
The url for the POST request is actually https://signup.com/api/users, and it returns 200 (in my browser).
You need to replicate what your browser does. This might include certain request headers.
You will want to use your browser's dev tools/network inspector to gather this information.
The information below it from my Firefox on my computer:
Request headers:
Host: signup.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Content-Type: application/json;charset=utf-8
Content-Length: 107
Origin: https://signup.com
Connection: keep-alive
Referer: https://signup.com/
Cookie: _vspot_session_id=ce1937cf52382239112bd4b98e0f1bce; G_ENABLED_IDPS=google; _ga=GA1.2.712393353.1584425227; _gid=GA1.2.1095477818.1584425227; __utma=160565439.712393353.1584425227.1584425227.1584425227.1; __utmb=160565439.2.10.1584425227; __utmc=160565439; __utmz=160565439.1584425227.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; __qca=P0-1580853344-1584425227133; _gat=1
Pragma: no-cache
Cache-Control: no-cache
Payload:
{"status":true,"code":null,"email":"TestEmail#hotmail.com","user":{"id":20540206,"email":"TestEmail#hotmail.com","name":"TestName TestSurname","hashedpassword":"4ffdbb1c33d14ed2bd02164755c43b4ad8098be2","salt":"700264767700800.7531319164902858","accesskey":"68dd25c3ae0290be69c0b59877636a5bc5190078","isregistered":true,"activationkey":"f1a6732b237379a8a1e6c5d14e58cf4958bf2cea","isactivated":false,"chgpwd":false,"timezone":"","phonenumber":"","zipcode":"","gender":"N","age":-1,"isdeferred":false,"wasdeferred":false,"deferreddate":null,"registerdate":"2020/03/17 06:09:27 +0000","activationdate":null,"addeddate":"2020/03/17 06:09:27 +0000","admin":false,"democount":0,"demodate":null,"invitationsrequest":null,"isvalid":true,"timesinvalidated":0,"invaliddate":null,"subscribe":0,"premium":false,"contributiondate":null,"contributionamount":0,"premiumenddate":null,"promo":"","register_token":"","premiumstartdate":null,"premiumsubscrlength":0,"initial_reg_type":"","retailmenot":null,"sees":null,"created_at":"2020/03/17 06:09:27 +0000","updated_at":"2020/03/17 06:09:27 +0000","first_name":"TestName","last_name":"TestSurname"},"first_name":"TestName","last_name":"TestSurname","mobile_redirect":false}
There's a lot to replicate. Things like the hashed password, salt, dates, etc would have been generated by JavaScript executed by your browser.
Keep in mind, the website owner might not appreciate a bot creating user accounts.

Python sending AMF

I'm Learning Python And for one of my project I need to POST data to server which uses AMF messaging.
Captured headers looks like this:
POST (info hided)/amfgateway.php HTTP/1.1
Host: (info hided)
Connection: keep-alive
Content-Length: 52
Origin: (info hided)
X-Requested-With: ShockwaveFlash/16.0.0.235
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
Content-Type: application/x-amf
Accept: */*
Referer: (info hided)
Accept-Encoding: gzip, deflate
Accept-Language: lt,en-US;q=0.8,en;q=0.6,ru;q=0.4,pl;q=0.2
Cookie: (info hided)
bcAmfService.addFriend /1
Aa$
And it's not a problem for me to POST headers but how do I format data that is sended to server:
I know there is a PyAmf library and I looked at documentation but it's very abstract and for beginner like me it's hard to put pieces together in one code.
So how do I format this data in Python?

Categories