I am using an api which takes html code as input.Lets say it is accesible at http://10.21.2.80:8000/Application/validate_content.php
validate_content.php
$html_data = trim(urldecode($_POST['html'])); // html is key
validate($html_data)
access.py
I am sending a request to this api using python requests like
import requests
openfile = open('file.txt')
html_data = openfile.read()
openfile.close()
url = http://10.21.2.80:8000/Application/validate_content.php?id=12&offset=10
response = requests.post(url,data={'html':html_data})
validate() checks weather html code follows 508 compliance rules or not. If it follows the rules then it returns PASS, else it returns the errors in the code.
When I am making request using POSTMAN, the API is giving right response(Validating and returning errors). But with python code it is always returning PASS.
I don't know what went wrong. Can anyone suggest me the right way to do it.
Related
I tried the following script, but unfortunately the output file is identical to the input file. I'm not sure what's wrong with it.
import requests
url_lines = open('banana1.txt').read().splitlines()
remove_from_urls = []
for url in url_lines:
remove_url = requests.get(url)
print(remove_url.status_code)
if remove_url.status_code == 404:
remove_from_urls.append(url)
continue
url_lines = [url for url in url_lines if url not in remove_from_urls]
print(url_lines)
# Save urls example
with open('banana2.txt', 'w+') as file:
for item in url_lines:
file.write(item + '\n')
There seems to be no error in your code, but there are few things that would help to make it more readable and consistent. The first course of action should be to make sure there is at least one url that would return a 404 status code.
Edit: After providing the actual URL.
The 404 problem
In your case, the problem is the Twitter actually does not return 404 error for your "Not found" url. You can test it using curl:
$ curl -o /dev/null -w "%{http_code}" "https://twitter.com/davemeltzerWON/status/1321279214365016064"
200
Or using Python:
import requests
response = requests.get("https://twitter.com/davemeltzerWON/status/1321279214365016064")
print(response.status_code)
The output for both should be 200.
Since Twitter is a JavaScript application that loads its content after it has been processed in browser, you cannot find the information you are looking for in the HTML response. You would need to use something like Selenium to actually process the JavaScript for you and then you would be able to look for actual text like "not found" on the web page.
Code review
Please make sure to close the file properly. Also, file object is a lines iterator, you can convert it to list very easily. Another trick to make the code more readable is to make use of Python set. So you may read the file like this:
with open("banana1.txt") as fid:
url_lines = set(fid)
Then you simply remove all the links that do not work:
not_working = set()
for url in url_lines:
if requests.get(url).status_code == 404:
not_working.add(url)
working = url_lines - not_working
with open("banana2.txt", "w") as fid:
fid.write("\n".join(working))
Also, if some of the links point to the same server, you should make use of requests.Session class:
from requests import Session
session = Session()
Then replace requests.get with session.get, you should get some performance boost since the Session uses keep-alive connection and many other things.
Im trying to see if I'm able to get the response data as I'm trying to learn how to use regex on Locust. I'm trying to reproduce my test script from JMeter using Locust.
This is the part of the code that I'm having problem with.
import time,csv,json
from locust import HttpUser, task,between,tag
class ResponseGet(HttpUser):
response_data= ""
wait_time= between (1,1.5)
host= "https://portal.com"
username= "NA"
password= "NA"
#task
def portal(self):
print("Portal Task")
response = self.client.post('/login', json={'username':'user','password':'123'})
print(response)
self.response_data = json.loads(response.text)
print(response_data)
I've tried this suggestion and I somehow can't make it work.
My idea is get response data > use regex to extract string > pass the string for the next task to use
For example:
Get login response data > use regex to extract token > use the token for the next task.
Is there any better way to do this?
The way you're doing it should work, but Locust's HttpUser's client is based on Requests so if you want to access the response data as a JSON you should be able to do that with just self.response_data = response.json(). But that will only work if the response body is valid JSON. Your code will also fail if the response body is not JSON.
If your problem is in parsing the response text as JSON, it's likely that the response just isn't JSON, possibly because you're getting an error or something. You could print the response body before your attempt to load it as JSON. But your current print(response) won't do that because it will just be printing the Response object returned by Requests. You'd need to print(response.text()) instead.
As far as whether a regex would be the right solution for getting at the token returned in the response, that will depend on how exactly the response is formatted.
I am using am attempting to do a bulk download of a series of PDFs from a site that requires login authentication. I am able to successfully log in, however, when I attempt a GET request for '/transcripts/transcript.pdf?user_id=3007' but, the request returns the content for '/transcripts/transcript.pdf'.
Does anyone have any idea why the URL param is not sending? Or why it would be rerouted?
I have tried passing the parameter 'user_id' as data, params, and hardcoded in the URL.
I have removed the actual domain from the strings below just for privacy
with requests.Session() as s:
login = s.get('<domain>/login/canvas')
# print the html returned or something more intelligent to see if it's a successful login page.
print(login.text)
login_html = lxml.html.fromstring(login.text)
hidden_inputs = login_html.xpath(r'//form//input[#type="hidden"]')
form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
print("form: ",form)
form['pseudonym_session[unique_id]']= username
form['pseudonym_session[password]']= password
response = s.post('<domain>/login/canvas',data=form)
print(response.url, response.status_code) # gets <domain>?login_success=1 200
# An authorised request.
data = { 'user_id':'3007'}
r = s.get('<domain>/transcripts/transcript.pdf?user_id=3007', data=data)
print(r.url) # gets <domain>/transcripts/transcript.pdf
print(r.status_code) # gets 200
with open('test.pdf', 'wb') as f:
f.write(r.content)
GET response returns /transcripts/transcript.pdf and not /transcripts/transcript.pdf?user_id=3007
From the looks of it, you are trying to use canvas. I'm pretty sure in canvas, you can bulk download all test attachments.
If that's not the case, There are a few things to try:
after logging in, try typing the url with user_id into a browser. Does that take you directly to the PDF file or links to one?
if so, look at the url, it may simply not display the parameters; some websites do this, don't worry about it
If not, GET may not be enough; perhaps the site uses javascript, etc.
after looking through the '.history' of the request I found a series of 302 redirects.
The first was to '/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf'
In a desperate attempt, I tried: s.get('/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf%3Fuser_id%3D3007') and this still rerouted me a few times but ultimately got me the file I wanted!
If anyone has a more elegant solution to this or any resources that I can read I would greatly appreciate it!
I have been working on some code that will grab emergency incident information from a service called PulsePoint. It works with software built into computer controlled dispatch centers.
This is an app that empowers citizen heroes that are CPR trained to help before a first resonder arrives on scene. I'm merely using it to get other emergency incidents.
I reversed-engineered there app as they have no documentation on how to make your own requests. Because of this i have knowingly left in the api key and auth info because its in plain text in the Android manifest file.
I will definitely make a python module eventually for interfacing with this service, for now its just messy.
Anyhow, sorry for that long boring intro.
My real question is, how can i simplify this function so that it looks and runs a bit cleaner in making a timed request and returning a json object that can be used through subscripts?
import requests, time, json
def getjsonobject(agency):
startsecond = time.strftime("%S")
url = REDACTED
body = []
currentagency = requests.get(url=url, verify=False, stream=True, auth=requests.auth.HTTPBasicAuth(REDACTED, REDCATED), timeout = 13)
for chunk in currentagency.iter_content(1024):
body.append(chunk)
if(int(startsecond) + 5 < int(time.strftime("%S"))): #Shitty internet proof, with timeout above
raise Exception("Server sent to much data")
jsonstringforagency = str(b''.join(body))[2:][:-1] #Removes charecters that define the response body so that the next line doesnt error
currentagencyjson = json.loads(jsonstringforagency) #Loads response as decodable JSON
return currentagencyjson
currentincidents = getjsonobject("lafdw")
for inci in currentincidents["incidents"]["active"]:
print(inci["FullDisplayAddress"])
Requests handles acquiring the body data, checking for json, and parsing the json for you automatically, and since you're giving the timeout arg I don't think you need separate timeout handling. Request also handles constructing the URL for get requests, so you can put your query information into a dictionary, which is much nicer. Combining those changes and removing unused imports gives you this:
import requests
params = dict(both=1,
minimal=1,
apikey=REDACTED)
url = REDACTED
def getjsonobject(agency):
myParams = dict(params, agency=agency)
return requests.get(url, verify=False, params=myParams, stream=True,
auth=requests.auth.HTTPBasicAuth(REDACTED, REDACTED),
timeout = 13).json()
Which gives the same output for me.
A while ago, I made a python function which took a URL of an image and passed it to Imgur's API v2. Since I've been notified that the v2 API is going to be deprecated, I've attempted to make it using API v3.
As they say in the Imgur API documentation:
[Sending] an authorization header with your client_id along with your requests [...] also works if you'd like to upload images anonymously (without the image being tied to an account). This lets us know which application is accessing the API.**
Authorization: Client-ID YOURCLIENTID
It's unclear to me (especially with the italics they put) whether they mean that the header should be {'Authorization': 'Client-ID ' + clientID}, or {'Authorization: Client-ID ': clientID}, or {'Authorization:', 'Client-ID ' + clientID}, or some other variation...
Either way, I tried and this is what I got (using Python 2.7.3):
def sideLoad(imgURL):
img = urllib.quote_plus(imgURL)
req = urllib2.Request('https://api.imgur.com/3/image',
urllib.urlencode([('image', img),
('key', clientSecret)]))
req.add_header('Authorization', 'Client-ID ' + clientID)
response = urllib2.urlopen(req)
return response.geturl()
This seems to me like it does everything Imgur wants me to do: I've got the right endpoint, passing data to urllib2.Request makes it a POST request according to the Python docs, I'm passing the image parameter with the form-encoded URL, I also tried giving it my client secret as a POST parameter since I got an error saying I need an ID (even though there is no mention of the need for me to use my client secret anywhere in the relevant documentation). I add the Authorization header and it seems to be the right form, so... why am I getting an Error 400: Bad Request?
Side-question: I might be able to debug it myself if I could see the actual error Imgur returns, but because it returns an erroneous HTTP status, Python dies and gives me one of those nauseating stack traces. Is there any way I could have Python stop whining and give me the error message JSON that I know Imgur returns?
Well, I'll be damned. I tried taking out the encoding functions and just straight up forming the string, and I got it to work. I guess Imgur's API expects the non-form-encoded URL?
Oh... or was it because I used both quote_plus() and url_encode(), encoding the URL twice? That seems even more likely...
This is my working solution, at long last, for something that took me a day when I thought it'd take an hour at most:
def sideLoad(imgURL):
img = urllib.quote_plus(imgURL)
req = urllib2.Request('https://api.imgur.com/3/image', 'image=' + img)
req.add_header('Authorization', 'Client-ID ' + clientID)
response = urllib2.urlopen(req)
response = json.loads(response.read())
return str(response[u'data'][u'link'])
It's not a final version, mind you, it still lacks some testing (I'll see whether I can get rid of quote_plus(), or if it's perhaps preferable to use url_encode alone) as well as error handling (especially for big gifs, the most frequent case of failure).
I hope this helps! I searched all over Google, Imgur and Stack Overflow and the information about anonymous usage of APIv3 were confusing (and drowned in a sea of utterly horrifying OAuth2 stuff).
In python 3.4 using urllib I was able to do it like this:
import urllib.request
import json
opener = urllib.request.build_opener()
opener.addheaders = [("Authorization", "Client-ID"+ yourClientId)]
jsonStr = opener.open("https://api.imgur.com/3/image/"+pictureId).read().decode("utf-8")
jsonObj = json.loads(jsonStr)
#jsonObj is a python dictionary of the imgur json response.