Is it possible to read path in JPEG image with python?

Is it possible to read path in JPEG image with python? - python

If you Save as > jpg in Adobe Photoshop a path (selection) is stored in the file.
Is it possible to read that path in python, for example to create a composition with PIL?
EDIT
Imagemagick seems to help, example

This code (by /F AKA the effbot, author of PIL and generally wondrous Python contributor) shows how to walk through the 8BIM resource blocks (but it's looking for 0x0404, the IPTC/NAA data, so of course you'll need to edit it).
Per Tom Ruark's post to this thread, paths will have IDs of 2000 to 2999 (the latter gives the name of the clipping path, so it's different from the others) and the data's a series of 26-bytes "point records" (so the resource length is always a multiple of 26).
Read the rest in Tom's post in all the gory details -- it's a pesky and very detailed binary format that will take substantial experimentation (and skill with struct, bitwise manipulation, etc) to read and interpret just right (not helped by the fact that the fields can be big-endian or little-endian -- little-endian in Windows, if I read the post correctly).

Are you sure the path is stored in the jpg? That seems unlikely. Paths would be stored in native photoshop format, but not the jpg.
Do you know of any other tools that can read the path? Can you try saving the item as a jpg, close photoshop, reopen only the jpg and see if you still have the path? I doubt it'd be there.

Related

the difference between .bin file and .mat files

can the tensorflow read a file contain a normal images for example in JPG, .... or the tensorflow just read the .bin file contains images
what is the difference between .mat file and .bin file
Also when I rename the .bin file name to .mat, does the data of the file changed??
sorry maybe my language not clear because I cannot speak English very well

A file-name suffix is just a suffix (which sometimes help to get info about that file; e.g. Windows decides which tool is called when double-clicked). A suffix does not need to be correct. And of course, changing the suffix will not change the content.
Every format will need their own decoder. JPG, PNG, MAT and co.
To some extent, these are automatically used by reading out metadata (giving some assumptions!). Many image-tools have some imread-function which works for jpg and png, even if there is no suffix (because there is checking for common and supported image-formats).
I'm not sure what tensorflow does automatically, but:
jpg, png, bmp should be no problem
worst-case: use scipy to read and convert
mat is usually a matrix (with infinite different encodings) and often matlab-based
scipy can read many matlab-based formats
bin can be anything (usually stands for binary; no clear mapping like the above)
Don't get me wrong, but i expect someone trying to use tensorflow (not a small, not a simple tool) to know that changing a suffix should never magically transform the content to the new format (especially in the lossless/lossy case like png, jpg). I hope you evaluated this decision and you are not running blindly into using a popular tool.

A '.mat' file contains Matlab formatted Data (not matlab code like you would expect from a '.m' file). I'm not sure if you're even using Matlab since you didn't include the the tag in your question. '.mat' files are associated with matlab workspace; if you wanted to save your current workspace in Matlab, you would save it as a '.mat' file.
A '.bin' file is a binary file read by the computer. In general, executable (ready-to-run) programs are often identified as binary files. I think this is what you would want to use. I am unsure what you really want though because the wording of the question is difficult to understand and it seems like you have two questions here.
Changing the suffix of a file just changes what will run the file. For example, if I were to change test.txt to test.py, the data inside the text file remains the same, but the way the file is opened has changed. In this case, the file was a text file usually opened using Notepad (or some variation) then it was opened by python once changed. If you were to change a .jpg file to a txt file, you wouldn't be able to view it as a picture again, but instead, you would open a text file with a bunch of seemingly random characters which describe the picture. The picture data never changed, but the way you see it and are able to use it does.
Take a look at this website which describes the .bin extension pretty well. Also, a quick Google search goes a long way especially with questions like this.

Modify EXIF/IPTC info in .dng (rawfiles) via Python?

Anyone aware of some Python module or library capable of modifying EXIF and IPTC data in Adobe RAW files (.dng)? Until some eight years ago, I used JPEG and could rather easily do such modifications helped by Python. After having switched to RAW, I have to use image tools to modify EXIF info.
Primarily the EXIF Taken date is of interest to be modified, but some IPTC-fields are also candidates of modification.
(I'm geotagging photos from my cameras each of which have RTC's that creeps in various directions and amounts. My 'worst' camera 'hurries' ~2.4 sec per day. Before matching photodates with .gpx-data from a GPS-logger, I need to modify the Taken date with various amounts depending on number of days since cameraclocksetting.)

In one of my projects I use GExiv2 (https://wiki.gnome.org/Projects/gexiv2) with the PyGObject bindings (https://wiki.gnome.org/Projects/PyGObject). GExiv2 is a wrapper around exiv2, which can read & write Exif, IPTC and XMP metadata in DNG files: http://www.exiv2.org/manpage.html

create pdf from python

I'm looking to generate PDF's from a Python application.
They start relatively simple but some may become more complex (Essentially letter like documents but will include watermarks for example later)
I've worked in raw postscript before and providing I can generate the correct headers etc and file at the end of it I want to avoid use of complex libs that may not do entirely what I want. Some seem to have got bitrot and no longer supported (pypdf and pypdf2) Especially when I know PDF/Postscript can do exactly what I need. PDF content really isn't that complex.
I can generate EPS (Encapsulated postscript) fine by just writing the appropriate text headers to file and my postscript code. But Inspecting PDF's there is a lil binary header I'm not sure how to generate.
I could generate an EPS and convert it. I'm not overly happy with this as the production environment is a Windows 2008 server (Dev is Ubuntu 12.04) and making something and converting it seems very silly.
Has anyone done this before?
Am I being pedantic by not wanting to use a library?

borrowed from ask.yahoo
A PDF file starts with "%PDF-1.1" if it is a version 1.1 type of PDF file. You can read PDF files ok when they don't have binary data objects stored in them, and you could even make one using Notepad if you didn't need to store a binary object like a Paint bitmap in it.
But after seeing the "%PDF-1.1" you ignore what's after that (Adobe Reader does, too) and go straight to the end of the file to where there is a line that says "%%EOF". That's always the last thing in the file; and if that's there you know that just a few characters before that place in the file there's the word "startxref" followed by a number. This number tells a reader program where to look in the file to find the start of the list of items describing the structure of the file. These items in the list can be page objects, dictionary objects, or stream objects (like the binary data of a bitmap), and each one has "obj" and "endobj" marking out where its description starts and ends.
For fairly simple PDF files, you might be able to type the text in just like you did with Notepad to make a working PDF file that Adobe Reader and other PDF viewer programs could read and display correctly.
Doing something like this is a challenge, even for a simple file, and you'd really have to know what you're doing to get any binary data into the file where it's supposed to go; but for character data, you'd just be able to type it in. And all of the commands used in the PDF are in the form of strings that you could type in. The hardest part is calculating those numbers that give the file offsets for items in the file (such as the number following "startxref").
If the way the file format is laid out intrigues you, go ahead and read the PDF manual, which tells the whole story.
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
but really you should probably just use a library
Thanks to #LukasGraf for providing this link http://www.gnupdf.org/Introduction_to_PDF that shows how to create a simple hello world pdf from scratch

As long as you're working in Python 2.7, Reportlab seems to be the best solution out there at the moment. It's quite full-featured, and can be a little complex to work with, depending on exactly what you're doing with it, but since you seem to be familiar with PDF internals in general hopefully the learning curve won't be too steep.

I recommend you to use a library. I spent a lot of time creating pdfme and learned a lot of things along the way, but it's not something you would do for a single project. If you want to use my library check the docs here.

unrar archive while downloading it

I've got a program that downloads part01, then part02 etc of a rar file split across the internet.
My program downloads part01 first, then part02 and so on.
After some tests, I found out that using, on example, UnRAR2 for python I can extract the first part of the file (an .avi file) contained in the archive and I'm able to play it for the first minutes. When I add another file it extracts a bit more and so on. What I wonder is: is it possible to make it extract single files WHILE downloading them?
I'd need it to start extracting part01 without having to wait for it to finish downloading... is that possible?
Thank you very much!
Matteo

You are talking about an .avi file inside the rar archives. Are you sure the archives are actually compressed? Video files released by the warez scene do not use compression:
Ripped movies are still packaged due to the large filesize, but compression is disallowed and the RAR format is used only as a container. Because of this, modern playback software can easily play a release directly from the packaged files, and even stream it as the release is downloaded (if the network is fast enough).
(I'm thinking VLC, BSPlayer, KMPlayer, Dziobas Rar Player, rarfilesource, rarfs,...)
You can check for the compression as follows:
Open the first .rar archive in WinRAR. (name.part01.rar or name.rar for old style volumes names)
Click the info button.
If Version to extract indicates 2.0, then the archive uses no compression. (unless you have decade old rars) You can see Total size and Packed size will be equal.
is it possible to make it extract
single files WHILE downloading them?
Yes. When no compression is used, you can write your own program to extract the files. (I know of someone who wrote a script to directly download the movie from external rar files; but it's not public and I don't have it.) Because you mentioned Python I suggest you take a look at rarfile 2.2 by Marko Kreen like the author of pyarrfs did. The archive is just the file chopped up with headers (rar blocks) added. It will be a copy operation that you need to pause until the next archive is downloaded.
I strongly believe it is also possible for compressed files. Your approach here will be different because you must use unrar to extract the compressed files. I have to add that there is also a free RARv3 implementation to extract rars implemented in The Unarchiver.
I think this parameter for (un)rar will make it possible:
-vp Pause before each volume
By default RAR asks for confirmation before creating
or unpacking next volume only for removable disks.
This switch forces RAR to ask such confirmation always.
It can be useful if disk space is limited and you wish
to copy each volume to another media immediately after
creation.
It will give you the possibility to pause the extraction until the next archive is downloaded.
I believe that this won't work if the rar was created with the 'solid' option enabled.
When the solid option is used for rars, all packed files are treated as one big file stream. This should not cause any problems if you always start from the first file even if it doesn't contain the file you want to extract.
I also think it will work with passworded archives.

I highly doubt it. By nature of compression (from my understanding), every bit is needed to uncompress it. It seems that the source of where you are downloading from has intentionally broken the avi into pieces before compression, but by the time you apply compression, whatever you compressed is now one atomic unit. So they kindly broke the whole avi into Parts, but each Part is still an atomic nit.
But I'm not an expert in compression.
The only test I can currently think of is something like: curl http://example.com/Part01 | unrar.

I don't know if this was asked with a specific language in mind, but it is possible to stream a compressed RAR directly from the internet and have it decompressed on the fly. I can do this with my C# library http://sharpcompress.codeplex.com/
The RAR format is actually kind of nice. It has headers preceding each entry and the compressed data itself does not require random access on the stream of bytes.
Do it multi-part files, you'd have to fully extract part 1 first, then continue writing when part 2 is available.
All of this is possible with my RarReader API. Solid archive are also streamable (in fact, they're only streamable. You can't randomly access files in a solid archive. You pretty much have to extract them all at once.)

Extending a PIL decoder

I have a file which contains a single image of a specific format at a
specific offset. I can already get a file-like for the embedded image
which supports read(), seek(), and tell(). I want to take advantage
of an existing PIL decoder to handle the embedded image, but be able to
treat the entire file as an "image file" in its own right.
I have not been able to figure out how to do this given the
documentation
available and was wondering if anyone had any insights as to how I could
do this.

The relevant chapter of the docs is this one and I think it's fairly clear: if for example you want to decode image files in the new .zap-format, you write a ZapImagePlugin.py module which must perform a couple things:
have a class ZapImageFile(ImageFile.ImageFile): with string attributes format and format_description, and a hook-method def _open(self) (of which more later);
at module level, Image.register_open('zap', ZapImageFile) and Image.register_extension('ZAP', '.zap')
The specs for the _open method are very clearly laid out in the chapter -- it must read image data and metadata from open binary file-like object self.fp, raise SyntaxError (or another exception) ASAP if it detects that the file's not actually in the right format, set at least self.size and self.mode attributes, and in order to allow reading the image, also self.tile, a list of tile descriptors again in the format specified in that chapter (including the file-offset, which you say you know, and a decoder -- if the raw or bit decoders, documented in the chapter, don't meet your needs, the chapter recommends studying the sources of some of the many supplied decoders, such as JPEG, PNG, etc).

What I did to solve this was to derive from the ImageFile.ImageFile child belonging to the embedded format instead of ImageFile.ImageFile directly. Then in _open() I replaced self.fp with the file-like to the embedded image, and called the parent's _open(). I can't say that I'm particularly happy doing it this way, but it seems to have worked.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.