Django-imagekit, which I'm using to process user uploaded images on a social media website, uses an unacceptably high level of memory. I'm looking for ideas on how to get around this problem.
We are using django-imagekit to copy user uploaded images it into three predefined sizes, and saves the four copies (3 processed plus 1 original) into our AmazonS3 bucket.
This operation is quickly causing us to go over our memory limit on our Heroku dynos. On the django-imagekit github page, I've seen a few suggestions for hacking the library to use less memory.
I see three options:
Try to hack django-imagekit, and deal with the ensuing update problems from using a modified third party library
Use a different imaging processing library
Do something different entirely -- resize the images on in the browser perhaps? Or use a third party service? Or...?
I'm looking for advice on which of these routes to take. In particular, if you are familiar with django-imagekit, or if you know of / are using a different image processing library in a Django app, I'd love to hear your thoughts.
Thanks a lot!
Clay
Try to change image size with PIL from console and see if memory usage is ok. Image resize is a simple task, I don't believe you should use side applications. Besides, split your task into 3 tasks(3 images?).
Related
TLDR: Is there a Python library that allows me to get a application window frame as an image and rewrite it to the said application?
So the whole story is that I want to write an application using Python that does something similar to Lossless Scaling and Magpie. I want to grab an application window (a videogame window, for example), get the current frame as an image, then use some Machine Learning/Deep Learning algorithm (like FSR or DLSS) to upscale said image, then rewrite the current frame from the application with said upscaled image.
So far, I have been playing around with some upscaling algorithms like the one from Real-ESRGAN, but now my main problem is how to upscale the video game images in real-time. The only thing I found that does something related to what I need to do is PyAutoGUI. But this package only allows you to take screenshots of an application but not rewrite the graphics of said application.
I hope I have clarified my problem; feel free to comment if you still have any questions.
Thank you for reading this post, and have a good day.
Doing this with Python is going to be very difficult. A lot of the performance involved in this sort of thing is in avoiding as many memory copies as possible, and Python's idiom for string and bytes processing unfortunately makes quite a few additional copies in the course of any idiomatic program. I say this as a die-hard Python fan who is constantly trying to cram Python in everywhere it doesn't belong: you'd be better off doing this in Rust.
Update: After receiving some feedback from some folks with more direct experience in this sort of thing, I may have overstated the difficulty here. Many ML tools in Python provide zero-copy access, you can easily access and manipulate memory-mapped data from numpy and there is even a CUDA protocol for doing this to data in GPU memory, so while it's not exactly easy, as long as your operations are implemented as numpy operations and not as pure-python pixel-by-pixel logic, it shouldn't be much harder than other python machine learning applications which require access to native APIs for accessing their source data.
However, there's no way to access framebuffer data directly from python, so step 1 is going to be writing your own bindings over the relevant DirectX APIs. Since Magpie is open source, you can see which APIs it's using, for example, in its various C++ "Frame Source" backends. For example, this looks relevant: https://github.com/Blinue/Magpie/blob/42cfcba1222b07e4cec282eaff639aead229f123/Runtime/GraphicsCaptureFrameSource.cpp#L87
You can then look those APIs up on MSDN; that one, for example, is here: https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture.direct3d11captureframepool.createfreethreaded?view=winrt-22621
CFFI is a good choice for writing native wrappers: https://cffi.readthedocs.io/en/latest/
Gluing these together appropriately is left as an exercise for the reader :).
I am planning to create a file upload website where users register as members and then upload files through both a file upload form and ftp account (each file can be up to 10gb).
For each file uploaded the member gets provided with a link which he can share with other users. Unfortunately I am just an average Django coder/linux user and have not worked on any similar project before.
Problem 1
The storage space used will potentially quickly grow to 1000s of TB's, how do I optimise the server and its storage for this? Should I use a Cloud-service or which type of Hosting would be most suitable? How would you setup the Infrastructure to make this run smoothly?
I was planning to run Freebsd as OS and Django/Python for the Development ...
Appreciate your input and all ideas!
From what you describe, I would start with a cloud service and see how the actual usage turns out. That might be the cheapest and most scalable version.
For setting things up, you have several options (surprise! :-) ). AFAIK, Amazon has some preconfigured images that might take you a long way. Since you're doing python, you could also look at Google and see how their services play together.
As you described yourself primarily as a coder, I would stay away from puppet, chef, ansible and such. While those are great tools, they add a layer of abstraction to managing actual servers. I might be wrong, of course, and such tools are just the help you need in order to set things up.
For many admin-tools, there are ready-to-use modules or templates that might help you achieve your goal.
As a simple battle-plan suggestion:
look at cloud-providers to determine which one suites you well.
look at tools to interact with the cloud-provider you are thinking about using.
try to find user-groups for the cloud-provider/admin tool you chose to learn more about them or get help from other people.
I am developing a rather large python application (wxpython) that allows a workflow of data analysis. Performing all steps of the workflow can be quite long and the user is not likely to everything at once. More likely he would prefer to do different parts of the processing at different points in time. It would therefore be very handy to be able to store the application's current status with some sort of "save project" functionality. Opening the application and loading a project file would set up the application as it was previously and allow one to continue where he/she left off last time.
However I have a large amount of objects to save, most of which are imbued with attributes coming from wxpython. This causes pickle to fail with the following error:
TypeError: can't pickle PySwigObject objects
Does anyone has experience with this? What would be a best practice to obtain the required functionality? Are there libraries devoted to this?
Thanks you.
wxPython is a wrapper around a C++ library known as wxWidgets. So you cannot use normal Python serialization to save its state. However, you can use the persist library to save most widget's state: http://wxpython.org/Phoenix/docs/html/lib.agw.persist.html
I'm not sure when this library was added to wxPython, but I'm guessing it was with 2.9 or perhaps the latest version of 2.8. Otherwise you can probably find it in the latest version of 2.8's source.
As others have said, it's usually better to just save the process's state and then load that information back to the GUI when it's started.
I am using python inside another application (CINEMA 4D) create a nice connection to out issue tracker (Jira) inside the application. Rationale behind this is to make it really easy for our plugin users to report and track bugs and have things like machine specs, screenshots or attaching scene files (including textures) automatically.
So far it as been a really smooth ride and the integration is coming along great. I started grabbing the icons for issue priorities, projects, issue types, etc. from Jira as well so they can be displayed for better overview. To read the image files I am using CINEMA 4D functionality that is available inside its python binding.
The problem now is, that most icons from Jira come in GIF format and the CINEMA 4D SDK doesn't read GIF files directly (actually it does read them, but only through a back door so users can load them, but I can't use that functionality through Python or the SDK). So I need another way to read the GIF files.
There are a few questions on stackoverflow that go into this direction, but they all seem to recommend PIL. This doesn't feel like the right solution for a few reasons:
While that looks nice, it's not part of the standard distribution and seems to be really only maintained for Windows (even though there are builds for Mac OS X).
It also seems to install itself into the current system installation of Python, but CINEMA 4D comes with its own, so I'd have to rip it apart and distribute it with my plugin.
And then it is quite large, while I really only want a compact script to have a compact solution (preferably out of the box, but that doesn't seem to be an option)
I was wondering if there is a simpler or at least more compact way. Since GIF seems to be a relatively simple file format, I am wondering if there may even be a simple parser as a python function/class.
I found a link where somebody disassembles a gif files embedded frames, but doesn't actually grab the image contents: Python, how i can get gif frames
I'm fine with putting in some time on my own, and I would've already been coding away if the file format was something uncompressed, but I am a little reluctant since the compression seems to raise the bar slightly.
Is there any standard method of watermarking videos of some format in Python?
And how about still images?
I'd suggest checking out pyffmpeg or pymedia, but that's about as good as it gets. Try to find a way to leverage ffmpeg proper if you can.
For still images, simply use PIL, the Python Imaging Library.
If you're looking for a robust (for-pay) service, I've had a very nice experience with Zencoder. The python api module is easy to use and fairly well documented.
Transloadit provides image & video conversion via web services, works well, and very cheap. If you need to do this on a large scale and don't want to buy a bunch of HW, they are great. Someone mentioned Zencoder. I don't have experience to understand all the tradeoffs between Transloadit and Zencoder. However in their current pricing models, Transloadit charges per GB of video and Zencoder charges per minute of video. If you are doing enough volume to worry about scalable pricing, for the scenarios I've looked at, Transloadit is cheaper for smaller / lower-resolutions videos. Perhaps obviously :)