Integration testing and images

Integration testing and images - python

I'm writing an app that converts different images to JPG. It operates over a complex directory structure. There, a directory may include other directories, image files (JPG, GIF, PNG, TIFF), PDF files, RAR/ZIP archives, which in turn may include anything of the above. The app finds everything that can be converted to an image and places the resulting JPGs into a separate folder.
How do i write integration tests to test the conversion of images? Specifically, how should i fake the complex directory structure with all the files?
Currently i just store a sample directory structure, which i manually assembled out of various image, PDF and archive files, in a tests/ directory. In a setUp method i put this sample directory in place of the actual data and run the code. I had an idea to generate all these sample files myself (generate JPGs via Imagemagick, for example), but it proved hard.
How integration testing on images is usually done?

Do you write your own library to convert images of you just use existing library? In the latter case you simply do not test it. Author has already tested it somehow. You just need to create an abstraction layer between your code and the image library you use. Then you can simply check if your code calls the library with desired parameters.
If you really insist on testing pictures then you need to make the transformation deterministic (and compare actual result with expected result) or you need to make comparison a bit less strict (from ignoring date fields to OCR recognizing the image).
Testing files is way easier (you do not need probability based OCR).Check if your program placed all files in expected location.

Related

How to insert vector graphics (SVG) to a raster image such as JPG/TIF programmatically (Python/JS/C++)?

I need a programmatic way to embed a clipping-path (saved as SVG file) to another image such as JPG/TIF/PSD. Using a tool such as Photoshop, this can be done easily and the path will be inserted in the image 8BIM profile, but it seems there is no way to do it programmatically. ImageMagick allows you to get a vector image for example by using the following command:
identify -format "%[8BIM:1999,2998:#1]" test.jpg > test.svg
But it seems not possible to do the reverse operation and add a vector image. Can anyone suggest any libraries which allow this operation?

It's a bit more code than I feel like writing for the moment, but it should be possible to put an 8BIM into a JPEG using the following information.
The anatomy of a JPEG is described here and here.
You can use PIL or OpenCV to encode a JPEG into a memory buffer and then locate and modify/add segments (such as an 8BIM) using code like this. Or you could just read() in an existing JPEG that you want to modify. To insert a segment, just write the first few segments to disk, then write your new one followed by the remaining segments from the existing file that you read at the start.
You can construct an 8BIM segment to insert using this answer.
You can use exiftool -v -v -v to see where an 8BIM appears in a JPEG created by Photoshop and then put yours in a similar place. You can also, obviously, equally use exiftool to see where/how your own attempt has landed.

Displaying lots of png files to a webpage

Hopefully this question wont be asking for too much and can be understandable, but any help would be amazing. Currently I am doing [astronomy] research, and I am required to construct a webpage of quasar spectra to look like this...Sample of final product
This is to be done so by downloading each individual spectra from this source here...https://data.sdss.org/sas/dr13/eboss/spectro/redux/images/v5_9_0/v5_9_0/3590-55201/.
The problem is, I am struggling to find a way to download large quantities of png files all at once. For some reason, all the spectra on this link do not have their coordinates (Right ascension and declination) on the file name. Whereas the code provided to me as an example does.
In the situation that I have the png "00:14:53.206-09:12:17.70-4536-55857-0770.png" downloaded, it should be displayed. However as mentioned before, all the files I have viewed when trying to do this myself, do not list those. My page looks like direct code, no actual images. But it remains in code because it cannot pull forward those spectra since they are not downloaded, and I would prefer to have them assorted by their coordinates.
Downloading a FITS file which contains the quasar catalog was suggested to me. Presumably, the coords would in some way have to be appended to the png files downloaded. Apparently this is all supposed to be easy.
In summary: How do I download large quantities of png files, where they do not display their coordinates. I also need a method of renaming the image files to so that their file names correspond with the coordinates, and then print to a webpage.

When displaying images on a website (regardless of where you sourced the images from, or the format - jpg/png etc), it is advisable that you COMPRESS your images. This is especially valid in cases where the images are big, and where there are a number of images on the page (pages like yours!). There are a few online image compressors like tinypng (where you can upload ~30 images at at time to compress, and it compresses both jpg and pngs) or pngcrush.
Compressing images this way will reduce the file size (greatly in some cases) but the image appears the same. This will very much improve the load time on your site.
When you download a file (any file, not just an image file, you can save it as anything you want (name-wise) so you can rename the files on download. You will need to upload all the [preferably compressed] images to a web server in order to display them on a webpage. If you don't know ANY webscripting, start with learning basic html (you won't need a lot for this project), but the best way to display the images would probably be to use a loop to loop through the image folder using either javascript or php

Custom file structure to save multiple images in python

I am experimenting with packaging of data, and since most of my data is stored as image/graphs and other similar data; I was planning to find a more efficient way to store these images.
I did read about saving them in a DB as blob; and some others are more inclined to save them in the file system; but what I would like is to have the images to not be visible outside the application. This is essential because when I run analysis on instruments; I am not interested in showing users all the images, but only the ones related to their particular instrument.
Plus it is convenient to pack data in one single file, compared to a folder with 20-30 images in it.
I was thinking to store the images in a custom structure, a sort of a bin file, using python; unless there is something that already cover that functionality. In my search I didn't notice any specific struct to save images, while the most common solutions were either a folder in the file system or the DB approach.

If you can convert your images to raster arrays, you can store them in an HDF5 file: Add raster image to HDF5 file using h5py

"logging" images

I'm writing a scientific program that has some intermediate results (plots and images) that I'd like to log (additional to the usual text messages).
I like python's logging interface a lot, so I'm wondering if there is a possibility to use it to create log files that include images.
The first idea that came to my mind was creating a log file as a SVG, so the log text is machine readable and the images can be included easily.
Is there a better approach to make this possible?

You could use SVG, but I'm not sure how compact the SVG would be since it would probably (in general) store the bitmap rather than vector information. An alternative would be to base64-encode the image and store it using a structured format, as documented here - the linked example uses JSON, which might be handy to e.g. store metadata about the image, but you could use a simpler scheme if all you're storing is the image and the format is always the same.

Checking if an image format is Lossless in Python?

I am working on an application that requires images submitted to it to be lossless. Currently I am opening the image with PIL and checking if the "format" attribute is a lossless format. This requires me to manually keep a list of formats, and I have no idea if, for instance, a jpeg that was submitted just happens to have the lossless variant applied.
import PIL
import PIL.Image
def validate_image(path):
img = PIL.Image.open(path)
if not img.format.lower() in ['bmp', 'gif', 'png', ...]:
raise Exception("File %s has invalid image format %s" % (path, img.format))
Is there a better way to check if the image file is lossless?

I think I now understand things: You want to open the images via PIL. You want to reject lossy images because you're doing scientific processing of some kind that needs all that lost data because information that's unimportant for human visual processing is important for your algorithms.
PIL does not have any kind of interface at the top level to distinguish different types of compression. You could reach inside the image decoders and assume that anything that uses the "raw" decoder is lossless, but even if you wanted to do that, that's too limited—it'll rule out GIF, LZW-compressed TIFF, etc. along with JPEG, JPEG-compressed TIFF, etc.
Keep in mind that the real problem is here is messaging and documentation—managing user expectations. The check for lossy images is really just a heuristic, a way to catch the more obvious mistakes and remind the user what the requirements are. So, you don't need something perfect, but having something pretty good may be helpful anyway.
So, there are only a few options, none of them very good:
Hack up PIL's decoder source to retain the encoding information and pass it up to the top level. This is, obviously, going to take some non-trivial work, in 30 different importers, possibly involving C as well as Python, and it will result in a patch that you have to maintain against a (slowly-)evolving codebase—although of course you can always submit it upstream and hope that it makes it into future versions of PIL.
Dig into the decoders themselves to get the information at runtime. The only semi-standard thing you can really find is whether they use the raw decoder or the bit decoder, which isn't useful at all (many lossless formats will need the bit decoder), so you'll probably end up reading all 30 importers and writing a dozen or so pieces of code to extract information from them.
Use another library along with (or in place of) PIL. For example, while ImageMagick is definitely not significantly easier than PIL, it does have an API to tell you what type of compression an image file uses. Basically, if it's UndefinedCompression or JPEGCompression it's lossy, anything else, it's lossless. The major downside (besides needing to install two image libraries) is that there will be files that PIL can open but IM can't, and vice-versa, and multi-image files that PIL and IM handle differently, and so on.
Do what you're already doing. Read through the 30 importers to make a list of which are lossy and which are lossless. To handle cases like JPEG and TIFF that are sometimes lossless, you may want to write code that doesn't flat-out reject them, but instead gives a warning saying "These files may be lossy. Are you sure you want to import them?" (Or, alternatively, just offer an "I know what I'm doing" override for all lossy formats, and then just consider JPEG and TIFF lossy.)
For many use cases, I'd be very wary of going with #4, but for yours, it actually seems pretty reasonable. You're not trying to block lossy images because your code will crash, or for security reasons, or anything like that; you're just trying to warn people that they're going to waste a lot of time getting useless information if they submit a JPEG, right?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.