I'm downloading JSON file in Python3 with Requests:
data = requests.get(addr).text
But some characters at end sometimes got cutted. For example: only 4570 of 4630 characters was in string.What is reason of this behaviour? Spamming F5 in browser does not reproduce problem, only in script.
EDIT:
Ok, I see that sometimes server responding HTTP304 (Not changed) and missing Content-Length header. Any solution possible to read all without this header?
Related
I have two python apps running on separate ports using web.py. I am trying to send JSON strings in the range of 30,000-40,000 characters long from one app to another. The JSON contains all the information necessary to generate a powerpoint report. I tried enabling this communication using requests as such:
import requests
template = <long JSON string>
url = 'http://0.0.0.0:6060/api/getPpt?template={}'.format(template)
resp= requests.get(url).text
I notice that on the receiving end the json has been truncated to 803 characters long and so when it decodes the JSON I get:
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 780 (char 779)
I assume this has to with a limitation on how long a URL request can be, from either web.py or requests or this is a standardised thing. Is there a way around this or do I need to find another way of enabling communication between these two python apps. If sending such long JSONs via http isn't possible could you please suggest alternatives. Thanks!
Do not put that much data into a URL. Most browsers limit the total length of a URL (including the query string) to about 2000 characters, servers to about 8000.
See What is the maximum length of a URL in different browsers?, which quotes the HTTP/1.1 standard, RFC7230:
Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets.
You need to send that much data in the request body instead. Use POST or PUT as the method.
The requests library itself does not place any limits on the URL length; it sends the URL to the server without truncating it. It is your server that has truncated it here, instead of giving you a 414 URI Too Long status code.
I am trying to use requests to pull information from the NPI API but it is taking on average over 20 seconds to pull the information. If I try and access it via my web browser it takes less than a second. I'm rather new to this and any help would be greatly appreciated. Here is my code.
import json
import sys
import requests
url = "https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=&last_name=&organization_name=&address_purpose=&city=&state=&postal_code=10017&country_code=&limit=&skip="
htmlfile=requests.get(url)
data = htmlfile.json()
for i in data["results"]:
print(i)
This might be due to the response being incorrectly formatted, or due to requests taking longer than necessary to set up the request. To solve these issues, read on:
Server response formatted incorrectly
A possible issue might be that the response parsing is actually the offending line. You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.
In case some headers are set incorrectly, this can lead to parsing errors which prevents chunked transfer (source).
In other cases, setting the encoding manually might resolve parsing problems (source).
To fix those, try:
r = requests.get(url)
r.raw.chunked = True # Fix issue 1
r.encoding = 'utf-8' # Fix issue 2
print(response.text)
Setting up the request takes long
This is mainly applicable if you're sending multiple requests in a row. To prevent requests having to set up the connection each time, you can utilize a requests.Session. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):
import requests
session = requests.Session()
for _ in range(10):
session.get(url)
Didn't solve your issue?
If that did not solve your issue, I have collected some other possible solutions here.
I have some Ring routes which I'm running one of two ways.
lein ring server, with the lein-ring plugin
using org.httpkit.server, like (hs/run-server app {:port 3000}))
It's a web app (being consumed by an Angular.js browser client).
I have some API tests written in Python using the Requests library:
my_r = requests.post(MY_ROUTE,
data=MY_DATA,
headers={"Content-Type": "application/json"},
timeout=10)
When I use lein ring server, this request works fine in the JS client and the Python tests.
When I use httpkit, this works fine in the JS client but the Python client times out with
socket.timeout: timed out
I can't figure out why the Python client is timing out. It happens with httpkit but not with lein-ring, so I can only assume that the cause is related to the difference.
I've looked at the traffic in WireShark and both look like they give the correct response. Both have the same Content-Length field (15 bytes).
I've raised the number of threads to 10 (shouldn't need to) and no change.
Any ideas what's wrong?
I found how to fix this, but no satisfactory explanation.
I was using wrap-json-response Ring middleware to take a HashMap and convert it to JSON. I switched to doing my own conversion in my handler with json/write-str, and this fixes it.
At a guess it might be something to do with the server handling output buffering, but that's speculation.
I've combed through the Wireshark dumps and I can't see any relevant differences between the two. The sent Content-Length fields are identical. The 'bytes in flight' differ, at 518 and 524.
No clue as to why the web browser was happy with this but Python Requests wasn't, and whether or this is a bug in Requests, httpkit, ring-middleware-format or my own code.
I wonder how can I send cookie and phpssid with urllib2 in python?
Actually I want to read a page I've logged in with my browser, but when I try to read it with this script I encounter a text which seems to say that you've missed something.
My script :
#!/usr/bin/python
import urllib2
f = urllib2.urlopen('http://mywebsite.com/sub/create.php?id=20')
content = f.read()
file = open('file.txt', 'w')
file.write(content)
file.close()
The error message I save instead of the real page :
Warning: session_start() [function.session-start]: Cannot send session cookie - headers already sent by (output started at /home/number/domains/1number.com/public_html/s4/app/mywidgets.php:1) in /home/number/domains/1number.com/public_html/s4/app/mywidgets.php on line 23
Warning: session_start() [function.session-start]: Cannot send session cache limiter - headers already sent (output started at /home/number/domains/1number.com/public_html/s4/app/mywidgets.php:1) in /home/number/domains/1number.com/public_html/s4/app/mywidgets.php on line 23
Warning: Cannot modify header information - headers already sent by (output started at /home/number/domains/1number.com/public_html/s4/app/mywidgets.php:1) in /home/number/domains/1number.com/public_html/s4/lib/webservice.php on line 0
What is the exact problem?(Please give me a simple way to implement what I want)
Thanks in advance
For the SID, one of the ways to send that is as part of the query string, and you're already doing that. At least I assume that's what the id=20 part of your URL is.
For cookies, everything you want is in cookielib.
Just creating a CookieJar to use for a session with the server is trivial. If you want to import cookies from your browser, there are three possibilities:
If your browser uses the old Netscape cookie file format, you can use FileCookieJar.
If your browser uses a sqlite database (as at least Firefox and Safari/Chrome do), use the sqlite3 module to read it, and populate a CookieJar manually.
If worst comes to worst, copy and paste the cookies from your browser into your script as hardcoded strings and popular a CookieJar manually.
If you don't want to read the docs on how to use cookielib, just see the examples at the end, which show how to use a CookieJar with urllib2, which is exactly what you want to do.
If you have a problem, read the docs.
Meanwhile, what you're showing us are (a) warnings, not errors, and (b) obviously a problem on the server side, not your script. The server should never be spewing out a bunch of warnings and an otherwise-blank page. If you, or one of your coworkers, is responsible for the server code, that needs to be fixed first (and your current simple Python script can serve as a great regression test case).
How can my python cgi return a specific http status code, such as 403 or 418?
I tried the obvious (print "Status:403 Forbidden") but it doesn't work.
print 'Status: 403 Forbidden'
print
Works for me. You do need the second print though, as you need a double-newline to end the HTTP response headers. Otherwise your web server may complain you aren't sending it a complete set of headers.
sys.stdout('Status: 403 Forbidden\r\n\r\n')
may be technically more correct, according to RFC (assuming that your CGI script isn't running in text mode on Windows). However both line endings seem to work everywhere.
I guess, you're looking for send_error. It would be located in http.server in py3k.