How to send images in django api? - python

I'm trying to send my image from django server to app.There are two appoaches for above problem:
1)send url of the image
2)send encoded image
I'm not sure which one to use due to following:-
If i choose the first way. Json resoponse will be quicker but if a user returns a page twice, it will make two request on server. Moreever it would be impossible to cache the image.
In case of second approach, i can cache the image at client app, but does encoding increase overall response time from the server?
Which is the recommended approach for sending image from api.
Currently i'm sending image by following:-
return JsonResponse({'image':model.image.url})

The answer is approach 1. Encoding images will destroy your server response time unless they’re very tiny like thumbnails or avatars, even then I wouldn’t make a habit of it. I’ve seen apps rendered unusable by this practice. Most browsers cache images during sessions automatically. If server performance is a big concern and images are dragging it down, Generally I’ll store images in some kind of static file host though like s3 and use an edge cache anyway in real world environments.

You probably want to go with #1 and send over the URL. Usually you would not deliver the image files directly from your server but from a cloud storage like S3 or GCS. More advanced setups even include CDNs (Content Delivery Networks) like Fastly or Cloudfront to enable easy caching and serve traffic on a global scale.
If you should choose to send over the encoded image (in base64 for example), be aware that the increased response size of the body will increase the response times incredibly high and might even lead to complete timeouts at around 30s. Users will ultimately have longer response times, are more likely to churn and you are paying way more for your servers.

Related

How does a server detect whether sort of automated http request is being made

I was making some test http requests using Python's request library. When searching for Walmart's Canadian site (www.walmart.ca), I got this:
How do servers like Walmart's detect that my request is being made programatically? I understand browsers send all sorts of metadata to the server. I was hoping to get a few specific examples of how this is commonly done. I've found a similar question, albeit related to Selenium Web Driver, here where it claims that there are some vendors that provide this service but I was hoping to get something a bit more specific.
Appreciate any insights, thanks.
As mentioned in the comments, a real browsers send many different values - headers, cookies, data. It reads from server not only HTML but also images, CSS, JS, fonts. Browser can also run JavaScript which can get other information about browser - version, extensions, data in local storage, etc (i.e how you move mouse). And real human loads/visits pages with random delays and in rather in random order. And all these elements can be used to detect a script. Servers may use very complex systems even Machine Learning (Artificial Intelligence) and use data from few mintues or hours to compare your behavior.

Slow access to Django's request.body

Sometimes this line of Django app (hosted using Apache/mod_wsgi) takes a lot of time to execute (eg. 99% of eg. 6 seconds of request handling, as measured by New Relic), when submitted by some mobile clients:
raw_body = request.body
(where request is an incoming request)
The questions I have:
What could have slowed down access to request.body so much?
What would be the correct configuration for Apache to wait before invoking Django until client sends whole payload? Maybe the problem is in Apache configuration.
Django's body attribute in HttpRequest is a property, so that really resolves on what is really being done there and how to make it happen outside of the Django app, if possible. I want Apache to wait for full request before sending it to Django app.
Regarding (1), Apache passes control to the mod_wsgi handler as soon as the request's headers are available, and mod_wsgi then passes control on to Python. The internal implementation of request.body then calls the read() method which eventually calls the implementation within mod_wsgi, which requests the request's body from Apache and, if it hasn't been completely received by Apache yet, blocks until it is available.
Regarding (2), this is not possible with mod_wsgi alone. At least, the hook processing incoming requests doesn't provide a mechanism to block until the full request is available. Another poster suggested to use nginx as a proxy in a response to this duplicate question.
There are two ways you can fix this in Apache.
You can use mod_buffer, available in >=2.3, and change BufferSize to the maximum expected payload size. This should make Apache hold the request in memory until it's either finished sending, or the buffer is reached.
For older Apache versions < 2.3, you can use mod_proxy combined with ProxyIOBufferSize, ProxyReceiveBufferSize and a loopback vhost. This involves putting your real vhost on a loopback interface, and exposing a proxy vhost which connects back to the real vhost. The downside to this is that it uses twice as many sockets, and can make resource calculation difficult.
However, the most ideal choice would be to enable request/response buffering at your L4/L7 load balancer. For example, haproxy lets you add rules based on req_len and same goes for nginx. Most good commercial load balancers also have an option to buffer requests before sending.
All three approaches rely on buffering the full request/response payload, and there are performance considerations depending on your use case and available resources. You could cache the entire payload in memory but this may dramatically decrease your maximum concurrent connections. You could choose to write the payload to local storage (preferably SSD), but you are then limited by IO capacity.
You also need to consider file uploads, because these are not a good fit for memory based payload buffering. In most cases, you would handle upload requests in your webserver, for example HttpUploadModule, then query nginx for the upload progress, rather than handling it directly in WSGI. If you are buffering at your load balancer, then you may wish to exclude file uploads from the buffering rules.
You need to understand why this is happening, and that this problem exists both when sending a response and receiving a request. It's also a good idea to have these protections in place, not just for scalability, but for security reasons.
I'm afraid the problem could be in the amount of data you are transferring and possibly a slow connection. Also note that upload bandwidth is typically much less than download bandwidth.
As already pointed out, when you use request.body Django will wait for the whole body to be fully transferred from the client and available in-memory (or on disk, according to configurations and size) on the server.
I would suggest you to try what happens with the same request if the client is connected to a WiFi access point which is wired to the server itself, and see if it improves grately. If this is not possible, perhaps just run a tool like speedtest.net on the client, get the request size and do the math to see how much time it would require theoretically (I'd expect the mesured time to be more or less 20% more). Be careful that network speed is often mesured in bits per second, while file size is mesured in Bytes.
In some cases, if a lot of processing is needed on the data, it may be convinient to read() the request and do computations on-the-go, or perhaps directly pass the request object to any function that can read from a so-called "file-like object" instead of a string.
In your specific case, however, I'm afraid this would only affect that 1% of time that is not spent in receiving the body from the network.
Edit:
Sorry, ony now I've noticed the extra description in the bounty. I'm afraid I can't help you but, may I ask, what is the point? I'd guess this would only save a tiny bit of server resources for keeping a python thread idle for a while, without any noticable performance gain on the request...
Looking at the Django source, it looks like what actually happens when you call request.body is the the request body is loaded into memory by being read from a stream.
https://github.com/django/django/blob/stable/1.4.x/django/http/init.py#L390-L392
It's likely that if the request is large the time being taken is actually just loading it into memory. Django has methods on the request to handle acting on the body as a stream, which depending on what exactly the content being consumed is could allow you to process the request more efficiently.
https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.read
You could for example read one line at a time.

Google App Engine: upload file and other fields in the same request

I want to upload an image to the blobstore, because I want to support files larger than 1MB. Now, the only way I can find is for the client to issue a POST where it sends the metadata, like geo-location, tags, and what-not, which the server puts in an entity. In this entity the server also puts the key of a blob where the actual image data is going to be stored, and the server concludes the request by returning to the client the url returned by create_upload_url(). This works fine, however I can get inconsistency, such as if the second request is never issued, and hence the blob is never filled. The entity is now pointing to an empty blob.
The only solution to this problem I can see is to trigger a deferred task which is going to check whether the blob was ever filled with an upload. I'm not a big fan of this solution, so I'm guessing if anybody has a better solution in mind.
I went through exactly the same thought process, but in Java, and ended up using Apache Commons FileUpload. I'm not familiar with Python, but you'll just need a way of handling a multipart/form-data upload.
I upload the image and my additional fields together, using JQuery to assemble the multipart form data, which I then POST to my server.
On the server side I then take the file and write it to Google Cloud Storage using the Google Cloud Storage client library (Python link). This can be done in one chunk, or 'streamed' if it's a large file. Once it's in GCS, your App Engine app can read it using the same library, or you can serve it directly with a public URL, depending on the ACL you set.

How do you restrict large file uploads in wsgi?

I'm trying to get an understanding of the best way of handling file uploads safely in a wsgi app. It seems a lot of solutions involve using FieldStorage from the cgi module to parse form data. From what I understand about FieldStorage it performs a bit of 'magic' behind the scenes by streaming data into a tempfile.
What I'm not 100% clear on is how to restrict a request containing a file greater than a specified amount (say 10MB). If someone uploads a file which is several GB in size you obviously want to block the request before it chews through your server's disk space right?
What is the best way to restrict file uploads in a wsgi application?
It would depend on your front-end server. If it has any configuration to block big request even before it goes into your app, use it.
If you want to block this with your code I see two approaches:
Look ate the Content-Length HTTP Header. If it's bigger than you can handle, deny the request right away.
Don't trust the headers and start reading the request body, until you reach your limit. Note that this is not a very clever way, but could work. =)
Trusting the HTTP header could lead you to some problems. Supose some one send a request with a Content-Length: 1024 but sends a 1GB request body. If your front-end server trusts the header, it will start do read this request and would find out later that the request body is actually much bigger that it should be. This situation could still fill your server disk, even being a request that "passes" the "too big check".
Although this could happen, I think trusting the Header would be a good start point.
You could use the features of the HTTP server you probably have in front of your WSGI application. For example lighttpd has many options for traffic shaping.

Writing a Python Music Streamer

I would like to implement a server in Python that streams music in MP3 format over HTTP. I would like it to broadcast the music such that a client can connect to the stream and start listening to whatever is currently playing, much like a radio station.
Previously, I've implemented my own HTTP server in Python using SocketServer.TCPServer (yes I know BaseHTTPServer exists, just wanted to write a mini HTTP stack myself), so how would a music streamer be different architecturally? What libraries would I need to look at on the network side and on the MP3 side?
The mp3 format was designed for streaming, which makes some things simpler than you might have expected. The data is essentially a stream of audio frames with built-in boundary markers, rather than a file header followed by raw data. This means that once a client is expecting to receive audio data, you can just start sending it bytes from any point in an existing mp3 source, whether it be live or a file, and the client will sync up to the next frame it finds and start playing audio. Yay!
Of course, you'll have to give clients a way to set up the connection. The de-facto standard is the SHOUTcast (ICY) protocol. This is very much like HTTP, but with status and header fields just different enough that it isn't directly compatible with Python's built-in http server libraries. You might be able to get those libraries to do some of the work for you, but their documented interfaces won't be enough to get it done; you'll have to read their code to understand how to make them speak SHOUTcast.
Here are a few links to get you started:
https://web.archive.org/web/20220912105447/http://forums.winamp.com/showthread.php?threadid=70403
https://web.archive.org/web/20170714033851/https://www.radiotoolbox.com/community/forums/viewtopic.php?t=74
https://web.archive.org/web/20190214132820/http://www.smackfu.com/stuff/programming/shoutcast.html
http://en.wikipedia.org/wiki/Shoutcast
I suggest starting with a single mp3 file as your data source, getting the client-server connection setup and playback working, and then moving on to issues like live sources, multiple encoding bit rates, inband meta-data, and playlists.
Playlists are generally either .pls or .m3u files, and essentially just static text files pointing at the URL for your live stream. They're not difficult and not even strictly necessary, since many (most?) mp3 streaming clients will accept a live stream URL with no playlist at all.
As for architecture, the field is pretty much wide open. You have as many options as there are for HTTP servers. Threaded? Worker processes? Event driven? It's up to you. To me, the more interesting question is how to share the data from a single input stream (the broadcaster) with the network handlers serving multiple output streams (the players). In order to avoid IPC and synchronization complications, I would probably start with a single-threaded event-driven design. In python 2, a library like gevent will give you very good I/O performance while allowing you to structure your code in a very understandable way. In python 3, I would prefer asyncio coroutines.
Since you already have good python experience (given you've already written an HTTP server) I can only provide a few pointers on how to extend the ground-work you've already done:
Prepare your server for dealing with Request Headers like: Accept-Encoding, Range, TE (Transfer Encoding), etc. An MP3-over-HTTP player (i.e. VLC) is nothing but an mp3 player that knows how to "speak" HTTP and "seek" to different positions in the file.
Use wireshark or tcpdump to sniff actual HTTP requests done by VLC when playing an mp3 over HTTP, so you know how what request headers you'll be receiving and implement them.
Good luck with your project!
You'll want to look into serving m3u or pls files. That should give you a file format that players understand well enough to hit your http server looking for mp3 files.
A minimal m3u file would just be a simple text file with one song url per line. Assuming you've got the following URLs available on your server:
/playlists/<playlist_name/playlist_id>
/songs/<song_name/song_id>
You'd serve a playlist from the url:
/playlists/myfirstplaylist
And the contents of the resource would be just:
/songs/1
/songs/mysong.mp3
A player (like Winamp) will be able to open the URL to the m3u file on your HTTP server and will then start streaming the first song on the playlist. All you'll have to do to support this is serve the mp3 file just like you'd serve any other static content.
Depending on how many clients you want to support you may want to look into asynchronous IO using a library like Twisted to support tons of simultaneous streams.
Study these before getting too far:
http://wiki.python.org/moin/PythonInMusic
Specifically
http://edna.sourceforge.net/
You'll want to have a .m3u or .pls file that points at a static URI (e.g. http://example.com/now_playing.mp3) then give them mp3 data starting wherever you are in the song when they ask for that file. Probably there are a bunch of minor issues I'm glossing over here...However, at least as forest points out, you can just start streaming the mp3 data from any byte.

Categories