how to make a POST request in Scrapy that requires Request payload

how to make a POST request in Scrapy that requires Request payload - python

I am trying to parse data from this website.
In Network section of inspect element i found this link https://busfor.pl/api/v1/searches that is used for a POST request that returns JSON i am interested in.
But for making this POST request there is request Payload with some dictionary.
I assumed it like normal formdata that we use to make FormRequest in scrapy but it returns 403 error.
I have already tried the following.
url = "https://busfor.pl/api/v1/searches"
formdata = {"from_id" : d_id
,"to_id" : a_id
,"on" : '2019-10-10'
,"passengers" : 1
,"details" : []
}
yield scrapy.FormRequest(url, callback=self.parse, formdata=formdata)
This returns 403 Error
I also tried this by referring to one of the StackOverflow post.
url = "https://busfor.pl/api/v1/searches"
payload = [{"from_id" : d_id
,"to_id" : a_id
,"on" : '2019-10-10'
,"passengers" : 1
,"details" : []
}]
yield scrapy.Request(url, self.parse, method = "POST", body = json.dumps(payload))
But even this returns the same error.
Can someone help me. to figure out how to parse the required data using Scrapy.

The way to send POST requests with json data is the later, but you are passing a wrong json to the site, it expects a dictionary, not a list of dictionaries.
So instead of:
payload = [{"from_id" : d_id
,"to_id" : a_id
,"on" : '2019-10-10'
,"passengers" : 1
,"details" : []
}]
You should use:
payload = {"from_id" : d_id
,"to_id" : a_id
,"on" : '2019-10-10'
,"passengers" : 1
,"details" : []
}
Another thing you didn't notice are the headers passed to the POST request, sometimes the site uses IDs and hashes to control access to their API, in this case I found two values that appear to be needed, X-CSRF-Token and X-NewRelic-ID. Luckily for us these two values are available on the search page.
Here is a working spider, the search result is available at the method self.parse_search.
import json
import scrapy
class BusForSpider(scrapy.Spider):
name = 'busfor'
start_urls = ['https://busfor.pl/autobusy/Sopot/Gda%C5%84sk?from_id=62113&on=2019-10-09&passengers=1&search=true&to_id=3559']
search_url = 'https://busfor.pl/api/v1/searches'
def parse(self, response):
payload = {"from_id" : '62113',
"to_id" : '3559',
"on" : '2019-10-10',
"passengers" : 1,
"details" : []}
csrf_token = response.xpath('//meta[#name="csrf-token"]/#content').get()
newrelic_id = response.xpath('//script/text()').re_first(r'xpid:"(.*?)"')
headers = {
'X-CSRF-Token': csrf_token,
'X-NewRelic-ID': newrelic_id,
'Content-Type': 'application/json; charset=UTF-8',
}
yield scrapy.Request(self.search_url, callback=self.parse_search, method="POST", body=json.dumps(payload), headers=headers)
def parse_search(self, response):
data = json.loads(response.text)

Related

Gravity form API with python

The documentation of the API is here, and I try to implement this line in python
//retrieve entries created on a specific day (use the date_created field)
//this example returns entries created on September 10, 2019
https://localhost/wp-json/gf/v2/entries?search={"field_filters": [{"key":"date_created","value":"09/10/2019","operator":"is"}]}
But when I try to do with python in the following code, I got an error:
import json
import oauthlib
from requests_oauthlib import OAuth1Session
consumer_key = ""
client_secret = ""
session = OAuth1Session(consumer_key,
client_secret=client_secret,signature_type=oauthlib.oauth1.SIGNATURE_TYPE_QUERY)
url = 'https://localhost/wp-json/gf/v2/entries?search={"field_filters": [{"key":"date_created","value":"09/01/2023","operator":"is"}]}'
r = session.get(url)
print(r.content)
The error message is :
ValueError: Error trying to decode a non urlencoded string. Found invalid characters: {']', '['} in the string: 'search=%7B%22field_filters%22:%20[%7B%22key%22:%22date_created%22,%22value%22:%2209/01/2023%22,%22operator%22:%22is%22%7D]%7D'. Please ensure the request/response body is x-www-form-urlencoded.
One solution is to parameterize the url:
import requests
import json
url = 'https://localhost/wp-json/gf/v2/entries'
params = {
"search": {"field_filters": [{"key":"date_created","value":"09/01/2023","operator":"is"}]}
}
headers = {'Content-type': 'application/json'}
response = session.get(url, params=params, headers=headers)
print(response.json())
But in the retrieved entries, the data is not filtered with the specified date.
In the official documentation, they gave a date in this format "09/01/2023", but in my dataset, the format is: "2023-01-10 19:16:59"
Do I have to transform the format ? I tried a different format for the date
date_created = "09/01/2023"
date_created = datetime.strptime(date_created, "%d/%m/%Y").strftime("%Y-%m-%d %H:%M:%S")
What alternative solutions can I test ?

What if you use urllib.parse.urlencode function, so your code would looks like:
import json
import oauthlib
from requests_oauthlib import OAuth1Session
import urllib.parse
consumer_key = ""
client_secret = ""
session = OAuth1Session(consumer_key,
client_secret=client_secret,signature_type=oauthlib.oauth1.SIGNATURE_TYPE_QUERY)
params = {
"search": {"field_filters": [{"key":"date_created","value":"09/01/2023","operator":"is"}]}
}
encoded_params = urllib.parse.urlencode(params)
url = f'https://localhost/wp-json/gf/v2/entries?{encoded_params}'
r = session.get(url)
print(r.content)
hope that helps

I had the same problem and found a solution with this code:
params = {
'search': json.dumps({
'field_filters': [
{ 'key': 'date_created', 'value': '2023-01-01', 'operator': 'is' }
],
'mode': 'all'
})
}
encoded_params = urllib.parse.urlencode(params, quote_via=urllib.parse.quote)
url = 'http://localhost/depot_git/wp-json/gf/v2/forms/1/entries?' + encoded_params + '&paging[page_size]=999999999' # nombre de réponses par page forcé manuellement
I'm not really sure what permitted it to work as I'm an absolute beginner with Python, but I found that you need double quotes in the URL ( " ) instead of simple quotes ( ' ), so the solution by William Castrillon wasn't enough.
As for the date format, Gravity Forms seems to understand DD/MM/YYYY. It doesn't need a time either.

How to pass the customer id dynamically in the tap payment method to save the card value

I am sending the post request to the TAP PAYMENT GATEWAY in order to save the card, the url is expecting two parameters like one is the source (the recently generated token) and inside the url the {customer_id}, I am trying the string concatenation, but it is showing the error like Invalid JSON request.
views.py:
ifCustomerExits = CustomerIds.objects.filter(email=email)
totalData = ifCustomerExits.count()
if totalData > 1:
for data in ifCustomerExits:
customerId = data.customer_id
print("CUSTOMER_ID CREATED ONE:", customerId)
tokenId = request.session.get('generatedTokenId')
payload = {
"source": tokenId
}
headers = {
'authorization': "Bearer sk_test_**********************",
'content-type': "application/json"
}
# HERE DOWN IS THE url of TAP COMPANY'S API:
url = "https://api.tap.company/v2/card/%7B"+customerId+"%7D"
response = requests.request("POST", url, data=payload, headers=headers)
json_data3 = json.loads(response.text)
card_id = json_data3["id"]
return sponsorParticularPerson(request, sponsorProjectId)
Their expected url = https://api.tap.company/v2/card/{customer_id}
Their documentation link: https://tappayments.api-docs.io/2.0/cards/create-a-card

Try this..
First convert dict. into JSON and send post request with request.post:
import json
...
customerId = str(data.customer_id)
print("CUSTOMER_ID CREATED ONE:", customerId)
tokenId = request.session.get('generatedTokenId')
payload = {
'source': tokenId
}
headers = {
'authorization': "Bearer sk_test_**************************",
'content-type': "application/json"
}
pd = json.dumps(payload)
# HERE DOWN IS THE url of TAP COMPANY'S API:
url = "https://api.tap.company/v2/card/%7B"+customerId+"%7D"
response = requests.post(url, data=pd, headers=headers)
json_data3 = json.loads(response.text)
card_id = json_data3["id"]
return sponsorParticularPerson(request, card_id)
Please tell me this works or not...

How to match an exact word in a json soup?

I am parsing through Patient Metadata scraped from a url, and I am trying to access the 'PatientID' field. However, there is also an 'OtherPatientIDs' field, which is grabbed by my search.
I have tried looking into using regular expressions but I am unclear on how to match an EXACT string or how to incorporate it into my code.
So at the moment, I have done:
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
PatientID = "PatientID"
lines = soup.decode('utf8').split("\n")
for line in lines:
if "PatientID" in line:
PatientID = line.split(':')[1].split('\"')[1].split('\"')[0]
print(PatientID)
Which successfully finds the values of both the PatientID AND OtherPatientIDs field. How do I specify that I only want the PatientID field?
EDIT:
I was asked to give an example of what I get with response.text, and it's of the form:
{
"ID" : "shqowihdojcoughwoeh"
"LastUpdate: "20190507"
"MainTags" : {
"OtherPatientIDs" : "0304992098"
"PatientBirthDate" : "29/04/1803"
"PatientID" : "92879837"
"PatientName" : "LASTNAME^FIRSTNAME"
},
"Type" : "Patient"
}

Why not use the json library instead?
import json
import requests
response = requests.get(url)
data = json.loads(response.text)
print(data['MainTags']['PatientID'])

Issue Updating Confluence Page with Python and API

I am working on updating a Confluence page via python and the Confluence API.
I have found a function to write the data to the my page, unfortunately it creates a new page with my data and the old page becomes an archived version.
I have been searching the reference material and cannot see the reason I am getting a new page instead of appending the data to the end of the page. The only solution I can think of is to copy the body and append the new data to it, then create the new page ... but I am thinking there should be a way to append.
The write function code I found / am leveraging is as follows :
def write_data(auth, html, pageid, title = None):
info = get_page_info(auth, pageid)
print ("after get page info")
ver = int(info['version']['number']) + 1
ancestors = get_page_ancestors(auth, pageid)
anc = ancestors[-1]
del anc['_links']
del anc['_expandable']
del anc['extensions']
if title is not None:
info['title'] = title
data = {
'id' : str(pageid),
'type' : 'page',
'title' : info['title'],
'version' : {'number' : ver},
'ancestors' : [anc],
'body' : {
'storage' :
{
'representation' : 'storage',
'value' : str(html),
}
}
}
data = json.dumps(data)
pprint (data)
url = '{base}/{pageid}'.format(base = BASE_URL, pageid = pageid)
r = requests.put(
url,
data = data,
auth = auth,
headers = { 'Content-Type' : 'application/json' }
)
r.raise_for_status()
I am starting to think copying / appending to the body is the option, but hope someone else has encountered this issue.

Not an elegant solution, but I went with the copy the old body and append option.
Basically, wrote a simple function to return the existing body :
def get_page_content(auth, pageid):
url = '{base}/{pageid}?expand=body.storage'.format(
base = BASE_URL,
pageid = pageid)
r = requests.get(url, auth = auth)
r.raise_for_status()
return (r.json()['body']['storage']['value'])
In this example, just appending (+=) a new string to the existing body.
def write_data(auth, html, pageid, title = None):
info = get_page_info(auth, pageid)
page_content = get_page_content(auth, pageid)
ver = int(info['version']['number']) + 1
ancestors = get_page_ancestors(auth, pageid)
anc = ancestors[-1]
del anc['_links']
del anc['_expandable']
del anc['extensions']
page_content += "\n\n"
page_content += html
if title is not None:
info['title'] = title
data = {
'id' : str(pageid),
'type' : 'page',
'title' : info['title'],
'version' : {'number' : ver},
'ancestors' : [anc],
'body' : {
'storage' :
{
'representation' : 'storage',
'value' : str(page_content),
}
}
}
data = json.dumps(data)
url = '{base}/{pageid}'.format(base = BASE_URL, pageid = pageid)
r = requests.put(
url,
data = data,
auth = auth,
headers = { 'Content-Type' : 'application/json' }
)
r.raise_for_status()
print "Wrote '%s' version %d" % (info['title'], ver)
print "URL: %s%d" % (VIEW_URL, pageid)
Should note that as this is a post to a confluence body, the text being passed is html. '\n' for newline does not work, you need to pass '<br \>' etc ...
If anyone has a more elegant solution, would welcome the suggestions.
Dan.

How to create a MR using GitLab API?

I am trying to create a merge request using the GitLab Merge Request API with python and the python requests package. This is a snippet of my code
import requests, json
MR = 'http://www.gitlab.com/api/v4/projects/317/merge_requests'
id = '317'
gitlabAccessToken = 'MySecretAccessToken'
sourceBranch = 'issue110'
targetBranch = 'master'
title = 'title'
description = 'description'
header = { 'PRIVATE-TOKEN' : gitlabAccessToken,
'id' : id,
'title' : title,
'source_branch' : sourceBranch,
'target_branch' : targetBranch
}
reply = requests.post(MR, headers = header)
status = json.loads(reply.text)
but I keep getting the following message in the reply
{'error': 'title is missing, source_branch is missing, target_branch is missing'}
What should I change in my request to make it work?

Apart from PRIVATE-TOKEN, all the parameters should be passed as form-encoded parameters, meaning:
header = {'PRIVATE-TOKEN' : gitlabAccessToken}
params = {
'id' : id,
'title' : title,
'source_branch' : sourceBranch,
'target_branch' : targetBranch
}
reply = requests.post(MR, data=params, headers=header)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to make a POST request in Scrapy that requires Request payload - python

Related

Gravity form API with python

How to pass the customer id dynamically in the tap payment method to save the card value

How to match an exact word in a json soup?

Issue Updating Confluence Page with Python and API

How to create a MR using GitLab API?

Categories

Resources