feed uncompressed file to command line argument - python

Let's say I have a gzipped file, but my script only takes in an uncompressed file.
Without modifying the script to take in a compressed file, could I uncompress the file on the fly with bash?
For example:
python ../scripts/myscript.py --in (gunzip compressed_file.txt.gz)

You can use a process substitution, as long as the Python script doesn't try to seek backwards in the file:
python ../scripts/myscript.py --in <(gunzip compressed_file.txt.gz)
Python receives a file name as an argument; the name just doesn't refer to a simple file on disk. It can only be opened in read-only mode, and attempts to use the seek method will fail.
If you were using zsh instead of bash, you could use
python ../scripts/myscript.py --in =(gunzip compressed_file.txt.gz)
and Python would receive the name of an actual (temporary) file that could be used like any other file. Said file would be deleted by the shell after python exits, though.

Related

In Python 3 on Windows, how can I set NTFS compression on a file? Nothing I've googled has gotten me even close to an answer

(Background: On an NTFS partition, files and/or folders can be set to "compressed", like it's a file attribute. They'll show up in blue in Windows Explorer, and will take up less disk space than they normally would. They can be accessed by any program normally, compression/decompression is handled transparently by the OS - this is not a .zip file. In Windows, setting a file to compressed can be done from a command line with the "Compact" command.)
Let's say I've created a file called "testfile.txt", put some data in it, and closed it. Now, I want to set it to be NTFS compressed. Yes, I could shell out and run Compact, but is there a way to do it directly in Python code instead?
In the end, I ended up cheating a bit and simply shelling out to the command line Compact utility. Here is the function I ended up writing. Errors are ignored, and it returns the output text from the Compact command, if any.
def ntfscompress(filename):
import subprocess
_compactcommand = 'Compact.exe /C /I /A "{}"'.format(filename)
try:
_result = subprocess.run(_compactcommand, timeout=86400,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,text=True)
return(_result.stdout)
except:
return('')

Avoid temp file in call to suprocess.run()

In a Python project, I need the output of an external (non-Python) command.
Let's call it identify_stuff.*
Command-line scenario
When called from command-line, this command requires a file name as an argument.
If its input is generated dynamically, we cannot pipe it into the command – this doesn't work:
cat input/* | ./identify_stuff > output.txt
cat input/* | ./identify_stuff - > output.txt
... it strictly requires a file name it can open, so one needs to create a temporary file on disk for the output of the first command, from where the second command can read the data.
However, the identify_stuff program really iterates over the input lines only once, no seeking or re-reading is involved.
So in Bash we can avoid the temporary file with the <(...) construct.
This works:
./identify_stuff <(cat input/*) > output.txt
This pipes the output of the first command to some device at a path /dev/fdX, which can be used for opening a stream like the path to a regular file on disk.
Actual scenario: call from within Python
Now, instead of just cat input/*, the input text is created inside a Python program, which continues to run after the output of identify_stuff has been captured.
The natural choice for calling an external command is the standard-library's subprocess.run().
For performance reasons, I would like to avoid creating a file on-disk.
Is there any way to do this with the subprocess tools?
The stdin and input parameters of subprocess.run won't work, because the external command ignores STDIN and specifically requires a file-name argument.
*Actually, it's this tool: https://github.com/jakelever/Ab3P/blob/master/identify_abbr.C

Opening python file with an input

Here is the code I have so far:
How can I make this program open another python file, using this method or similar (you have to open it from a variable)?
You can use exec function, for to execute an external script,
file = "test.py"
exec(open(file).read())
you get,
File Opened!

Why doesn't my bash script read lines from a file when called from a python script?

I am trying to write a small program in bash and part of it needs to be able to get some values from a txt file where the different files are separated by a line, and then either add each line to a variable or add each line to one array.
So far I have tried this:
FILE=$"transfer_config.csv"
while read line
do
MYARRAY[$index]="$line"
index=$(($index+1))
done < $FILE
echo ${MYARRAY[0]}
This just produces a blank line though, and not what was on the first line of the config file.
I am not returned with any errors which is why I am not too sure why this is happening.
The bash script is called though a python script using os.system("$HOME/bin/mcserver_config/server_transfer/down/createRemoteFolder"), but if I simply call it after the python program has made the file which the bash script reads, it works.
I am almost 100% sure it is not an issue with the directories, because pwd at the top of the bash script shows it in the correct directory, and the python program is also creating the data file in the correct place.
Any help is much appreciated.
EDIT:
I also tried the subprocess.call("path_to_script", shell=True) to see if it would make a difference, I know it is unlikely but it didn't.
I suspect that when calling the bash script from python, having just created the file, you are not really finished with that file: you should either explicitly close the file or use a with construct.
Otherwise, the written data is still in any buffer (from the file object, or in the OS, or wherever). Only closing (or at least flushing) the file makes sure the data is indeed in the file.
BTW, instead of os.system, you should use the subprocess module...

AppDailySales: Works, but the downloaded gzip file is corrupted

I am trying to use the appdailysales.py module to download daily our iPhone apps. I am a .NET developer, so I tried running this using IronPython in a C# solution using the following code:
using IronPython.Hosting;
var ipy = Python.CreateRuntime();
dynamic appSales = ipy.UseFile("appdailysales.py");
appSales.main();
Because I didn't have gzip, I took out the references to that module. I was going to use the GZipStream C# class to decompress the file (Apple, provides their downloads as .gz files). So, I commented out lines 75 and 429-435.
I have tried executing appdailysales.py in my C# solution, directly from IronPython and using Python 2.7 (installed ActivePython last night); all with the same results: When I try to open the .gz file using 7zip, I get the following error:
CRC Failed ... file is broken
When I try using the GZipStream class I get:
The CRC in GZip footer does not match the CRC calculated from the decompressed data
If I download the .gz file manually, I can decompress the file just fine using 7Zip or GZipStream.
I am fluent in C#, but new to Python. Any help you can provide would be much appreciated.
Thanks for your time.
Looks like line 444 is the problem. Here are lines 444-446:
downloadFile = open(filename, 'w')
downloadFile.write(filebuffer)
downloadFile.close()
At this stage, IF you have deleted lines 429-435 OR selected not to unzip, then filebuffer refers to the raw gzipped stream that you got from the web. The output file is opened in TEXT mode, and you are on Windows, so every \n in the BINARY gzipped stream will be converted to \r\n -- CORRUPTION, like the error message said.
So: for the module to be used portably on both Windows and other platforms, the open mode must be "wb" (b for binary). If the gunzipped result file is also a binary file, "wb" can be hardcoded in the open call. However if the gunzipped file is a text file (meant to be capable of being opened in a text editor), then you need just "w" for that purpose, and you should set a variable mode to either "wb" or "w" as appropriate, and use mode in the open call.
Big question: I understand why you removed the gzip references for IronPython usage. Did you remove those lines for Python 2.7? Or did you run it under Python 2.7 with those lines still in, but set options.unzipFile to False?

Categories