How can I use Python to convert a qcow2 image file into a raw image file?
I know of qemu-img, but I'm curious about any Python libraries that might allow me to avoid asking my users to install that tool. It's not packaged with a default Fedora install, and that's what I'm developing for. If there are no other options however, I'll use qemu-img.
It seems that qemu-img is a necessity for converting qcow2 image files to raw images. I did not find a solution that avoided calling on this tool. This isn't a big issue though, because qemu-img is widely available in distros' repositories, and is sometimes packaged with distros. In order to make use of this tool in Python, simply ensure that it's installed to the system and then call it programmatically via the subprocess module, like so:
import subprocess
# Assuming file_path is the path to a local qcow2 file
if file_path.endswith('.qcow2'):
raw_file_path = file_path[:5] + '.raw'
subprocess.call(['qemu-img', 'convert', file_path, raw_file_path])
Related
Some important background upfront, I am using a computer that does not give me access to pip. In fact I do not have access to the command prompt. This make is it impossible for me to install additional libraries unfortunately (at least the standard way).
My question is whether I can run a python library without formally installing it. Could I download the library, and then store it the same directory as my main script, and then import it like I would with a multi .py script project with functions being defined in other files, almost as if I had written the script natively on my computer?
Specifically, I would like to use pdfminer.six. Apparently it is written completely in python, however, I realize that may not mean what I think it does. It may be similar to numpy which I understand has C++ code associated with it.
You can import any script or lib from your current folder (example). You can find any lib you want by googling 'lib_name github'. Download the zip and unpack it in your folder, it should work.
You can also go to your python Lib folder on another computer and copy libs from there (By default: C:\Users\User\AppData\Local\Programs\Python\Python310\Lib)
Maybe you can use a web-based-interpreter solution like Google Colab and work in your browser.
https://colab.research.google.com
I am developing a full text search engine for indexing popular binary formats. I know that there are hundereds of such questions (and solutions) already, but I found it tough to find one:
cross platform
supports DOC, DOCX and PDF formats at once
easy to use with python
can be set up in a major shared host
For PDFs, I recommend PDFminer.
Try the docx module (I have not used it myself)
I am not aware of any pure python module that can read .doc files.
There are command-line tools to extract text from .doc files: antiword and catdoc (and probably others). If the packages are installed on your shared host, you could use subprocess to shell out to these tools. Available on Windows via Cygwin.
Apache POI is a Java library that can extract text from Office documents. If your shared host has Java installed, you could write a bit of Java (or Jython) code and execute using subprocess.
If at server side you can use OpenOffice then you can use unoconv: Convert between any document format supported by OpenOffice
One possible solution is to use google documents to extract the text contents from binary .doc-files. You upload the document to google docs and then download the text contents. It is a fairly slow process, but it is the only "pure Python" solution I know of since it doesn't require any external tools except for network access. An external tool such as catdoc or antiword is a much better solution if you are allowed to install it on your host.
Textract uses the default tools for every kind of file.
https://github.com/deanmalmgren/textract
Does anyone know of a way to read and write the National Instruments binary file type (TDMS) in python under linux? I know that NI has a C DLL available, but I don't know how to access that through python, or if I even can do so under linux.
It looks like TDMS isn't directly supported under Linux (see here).
Your options currently are to use the G-based functions directly in LabVIEW (It's possible that you can wrap them in a .so file), calling LabVIEW from Python, or building your own file parser from the TDMS spec.
Sorry, no really easy options.
Edit: It looks like there may be an open source project to try to do this at http://sourceforge.net/projects/pytdms/. Worth a try, at least.
You have to install the python version 2.7 (thats the only one that is working with the tdms package for labview atleast)
Sudo pip install npTDMS
Link to the tdms package page
and just follow the example on the page.
I've found, via Google, numerous people asking the same question, but no solutions. The Python Image Library (PIL) has tools for stepping through an already existing multi-page TIFF, but nothing about creating them.
Libraries would hopefully be available on Windows, for Python 2.6.
If there's some freeware out there which will do the trick, I wouldn't mind seeing it, but I was hoping I could accomplish this in Python.
You can use ImageMagick for this (available on Unix and Windows).
A linux shell command would be
$ convert *.tif multipage.tif
where *.tif are all your individual tif files.
A freeware option: Irfanview can do it, even via the command line; this allows you to call it from Python.
From changes version 3.90:
New command line option:
/multitif=(tifname,file1,...,fileN)
Example to create multipage TIF test.tif from 2 other files:
i_view32 /multitif=(c:\test.tif,c:\test1.bmp,c:\dummy.jpg)
New command line option:
/append=tiffile
Example to open c:\test.jpg and append it as (TIF) page to c:\test.tif
i_view32 c:\test.jpg /append=c:\test.tif
I have used it once and know it works, though limitation on command line length apply.
you can use the command utility "tiffutil"
I'm attempting to use Python to convert a multi-page PDF into a series of JPEGs. I can split the PDF up into individual pages easily enough with available tools, but I haven't been able to find anything that can covert PDFs to images.
PIL does not work, as it can't read PDFs. The two options I've found are using either GhostScript or ImageMagick through the shell. This is not a viable option for me, since this program needs to be cross-platform, and I can't be sure either of those programs will be available on the machines it will be installed and used on.
Are there any Python libraries out there that can do this?
ImageMagick has Python bindings.
Here's whats worked for me using the python ghostscript module (installed by '$ pip install ghostscript'):
import ghostscript
def pdf2jpeg(pdf_input_path, jpeg_output_path):
args = ["pdf2jpeg", # actual value doesn't matter
"-dNOPAUSE",
"-sDEVICE=jpeg",
"-r144",
"-sOutputFile=" + jpeg_output_path,
pdf_input_path]
ghostscript.Ghostscript(*args)
I also installed Ghostscript 9.18 on my computer and it probably wouldn't have worked otherwise.
You can't avoid the Ghostscript dependency. Even Imagemagick relies on Ghostscript for its PDF reading functions. The reason for this is the complexity of the PDF format: a PDF doesn't just contain bitmap information, but mostly vector shapes, transparencies etc.
Furthermore it is quite complex to figure out which of these objects appear on which page.
So the correct rendering of a PDF Page is clearly out of scope for a pure Python library.
The good news is that Ghostscript is pre-installed on many windows and Linux systems, because it is also needed by all those PDF Printers (except Adobe Acrobat).
If you're using linux some versions come with a command line utility called 'pdftopbm' out of the box. Check out netpbm
Perhaps relevant: http://www.swftools.org/gfx_tutorial.html