I'm working on a python program that will automatically combine sets of files based on their names.
Being a newbie, I wasn't quite sure how to go about it, so I decided to just brute force it with the win32api.
So I'm attempting to do everything with virtual keys. So I run the script, it selects the top file (after arranging the by name), then sends a right click command,selects 'combine as adobe PDF', and then have it push enter. This launched the Acrobat combine window, where I send another 'enter' command. The here's where I hit the problem.
The folder where I'm converting these things loses focus and I'm unsure how to get it back. Sending alt+tab commands seems somewhat unreliable. It sometimes switches to the wrong thing.
A much bigger issue for me.. Different combination of files take different times to combine. though I haven't gotten this far in my code, my plan was to set some arbitrarily long time.sleep() command before it finally sent the last "enter" command to finish and confirm the file name completing the combination process. Is there a way to monitor another programs progress? Is there a way to have python not execute anymore code until something else has finished?
I would suggest using a command-line tool like pdftk http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ - it does exactly what you want, it's cross-platform, it's free, and it's a small download.
You can easily call it from python with (for example) subprocess.Popen
Edit: sample code as below:
import subprocess
import os
def combine_pdfs(infiles, outfile, basedir=''):
"""
Accept a list of pdf filenames,
merge the files,
save the result as outfile
#param infiles: list of string, names of PDF files to combine
#param outfile: string, name of merged PDF file to create
#param basedir: string, base directory for PDFs (if filenames are not absolute)
"""
# From the pdftk documentation:
# Merge Two or More PDFs into a New Document:
# pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf
if basedir:
infiles = [os.path.join(basedir,i) for i in infiles]
outfile = [os.path.join(basedir,outfile)]
pdftk = [r'C:\Program Files (x86)\Pdftk\pdftk.exe'] # or wherever you installed it
op = ['cat']
outcmd = ['output']
args = pdftk + infiles + op + outcmd + outfile
res = subprocess.call(args)
combine_pdfs(
['p1.pdf', 'p2.pdf'],
'p_total.pdf',
'C:\\Users\\Me\\Downloads'
)
Related
I send many files over TCP from PC(windows) to Server(Linux).
When I process files on server sometimes I get error, since file is corrupted or has zero size, because it is still undergoes 'saving' to hard disc.
I process files in python, grabbing like this:
file_list = sorted(glob('*.bin'))
for file in file_list:
file_size = os.path.getsize(file)
if file_size > min_file_size:
do_process(file)
How to make it in proper way, i.e make sure, that file is ok.
I cant choose right min_file_size, since files have different sizes..
May be I should copy it to another folder ant then process them?
** I'm using SCP to copy files. So on the server side how can I be sure(some linux hints), that file is ok, to move it to directory, which will be processing? Sometimes by typing ls I see files, whch is not fully sent yet.. so how can I rename them?
You can use the fuser command to check whether the file is currently being accessed by any process, as follows:
import subprocess
...
file_list = sorted(glob('*.bin'))
for file in file_list:
result = subprocess.run(['fuser','--silent',file])
if result.returncode != 0:
do_process(file)
The fuser command will terminate with a non-0 return code if the file is not being accessed.
This has nothing to do with TCP. You are basically asking how to synchronize two processes in a way that if one writes the file the other will only use it once it is completely written and closed by the other.
One common way is to let the first process (writer) use a temporary file name which is not expected by the second process (reader) and to rename the file to the expected one after the file was closed. Other ways involve using file locking. One can also have a communication between the two processes (like a pipe or socketpair) which is used to explicitly inform the reader once the writer has finished and which file was written.
I have been using OneDrive to store a large amount of images and now I need to process those, so I have sync'd my OneDrive folder to my computer, which takes relatively no space on disk. However, since I have to open() them in my code, they all get downloaded, which would take much more than the available memory on my computer. I can manually use the Free up space action in the right-click contextual menu, which keeps them sync'd without taking space.
I'm looking for a way to do the same thing but in my code instead, after every image I process.
Trying to find how to get the commands of contextual menu items led me to these two places in the registry:
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Directory\shell
HKEY_LOCAL_MACHINE\SOFTWARE\Classes*\shellex\ContextMenuHandlers
However I couldn't find anything related to it and those trees have way too many keys to check blindly. Also this forum post (outside link) shows a few ways to free up space automatically, but it seems to affect all files and is limited to full days intervals.
So is there any way to either access that command or to free up the space in python ?
According to this microsoft post it is possible to call Attrib.exe to do that sort of manipulation on files.
This little snippet does the job for a per-file usage. As shown in the linked post, it's also possible to do it on the full contents of a folder using the /s argument, and much more.
import subprocess
def process_image(path):
# Open the file, which downloads it automatically
with open(path, 'r') as img:
print(img)
# Free up space (OneDrive) after usage
subprocess.run('attrib +U -P "' + path + '"')
The download and freeing up space are fairly quick, but in the case of running this heavily in parallel, it is possible that some disk space will be consumed for a short amount of time. In general though, this is pretty instantaneous.
In addition to Mat's answer. If you are working on a Mac then you can replace Attrib.exe with "/Applications/OneDrive.App/Contents/MacOS/OneDrive /unpin" to make the file online only.
import subprocess
path = "/Users/OneDrive/file.png"
subprocess.run(["/Applications/OneDrive.App/Contents/MacOS/OneDrive", "/unpin", path])
Free up space for multiples files.
import os
import subprocess
path = r"C:\Users\yourUser\Folder"
diret = os.listdir(path)
for di in diret:
dir_atual = path + "\\" + di
for root, dirs, files in os.walk(dir_atual):
for file in files:
arquivos = (os.path.join(root, file))
print (arquivos)
subprocess.run('attrib +U -P "' + arquivos + '"')
I am using HTcondor to generate some data (txt, png). By running my program, it creates a directory next to the .sub file, named datasets, where the datasets are stored into. Unfortunately, condor does not give me back this created data when finished. In other words, my goal is to get the created data in a "Datasets" subfolder next to the .sub file.
I tried:
1) to not put the data under the datasets subfolder, and I obtained them as thought. Howerver, this is not a smooth solution, since I generate like 100 files which are now mixed up with the .sub file and all the other.
2) Also I tried to set this up in the sub file, leading to this:
notification = Always
should_transfer_files = YES
RunAsOwner = True
When_To_Transfer_Output = ON_EXIT_OR_EVICT
getenv = True
transfer_input_files = main.py
transfer_output_files = Datasets
universe = vanilla
log = log/test-$(Cluster).log
error = log/test-$(Cluster)-$(Process).err
output = log/test-$(Cluster)-$(Process).log
executable = Simulation.bat
queue
This time I get the error, that Datasets was not found. Spelling was checked already.
3) Another option would be, to pack everything in a zip, but since I have to run hundreds of jobs, I do not want to unpack all this files afterwards.
I hope somebody comes up with a good idea on how to solve this.
Just for the record here: HTCondor does not transfer created directories at the end of the run or its contents. The best way to get the content back is to write a wrapper script that will run your executable and then compress the created directory at the root of the working directory. This file will be transferred with all other files. For example, create run.exe:
./Simulation.bat
tar zcf Datasets.tar.gz Datasets
and in your condor submission script put:
executable = run.exe
However, if you do not want to do this and if HTCondor is using a common shared space like an AFS you can simply copy the whole directory out:
./Simulation.bat
cp -r Datasets <AFS location>
The other alternative is to define an initialdir as described at the end of: https://research.cs.wisc.edu/htcondor/manual/quickstart.html
But one must create the directory structure by hand.
also, look around pg. 65 of: https://indico.cern.ch/event/611296/contributions/2604376/attachments/1471164/2276521/TannenbaumT_UserTutorial.pdf
This document is, in general, a very useful one for beginners.
The purpose of this program is to zip a directory or a folder as simply as possible, and write
the generated .tar.gz to one of my USB flash drives (or any other location), plans are to add a
function that will also use 'GnuPG' to encrypt the folder and another
that will allow user to input a time in order to perform this task
daily, weekly, monthly, etc. I also want the user to be able to choose
the destination of the zipped folder. Just wanted to post this up now
to see if it worked on first attempt and to get a bit of feedback.
My main question is why I lose the main folder upon extraction of the compressed files. For example, if I compress "Documents" which contains the two folders "Videos" and "Pictures" and the file "manual.txt". When I extract the file it does not dump "Documents" into the extraction point it dumps "Videos" and "Pictures" and "manual.txt". Which is fine and all, no data loss and everything is still intact, just creates a bit of clutter and I would like to keep the original directory.
Also wondering why in the world is this program taking so long to convert the file and when it does the conversion in some cases the .tar.gz file is just as large as the original folder, this happens with video files, it does seem to compress text files well, and much quicker.
Are video files just hard to compress? Or what, It takes like 5 minutes to process 2gb of video files and then they are the same as the original size? Kinda pointless.
Also would it make sense to use regex to validate user input in this case, I could just use a couple if statements instead no? like the preferred input in this program is 'root' not '/root'. Couldn't I just have it cut the '/' off if the input starts with a '/'.
I mainly want to see if this is the right, most efficient way of doing things, I'd rather not be given the answer in the usual stack overflow copy/paste way, lets get a discussion going.
So why is this program so slow when processing larger amounts of data? I expect a reduction in speed but not by that much
#!/usr/bin/env python3
'''
author: ryan st***
date: 12/5/2015
time: 18:55 Eastern time (GMT -5)
language: python 3.4
'''
# Import, import, import.
import os, subprocess, sys, zipfile, re
import shutil
import time
# Backup (zip) files
def zipDir():
try:
# Get file to be zipped and zip file destination from user
Dir = "~"
str1 = input ('Input directory to be zipped(eg. Douments, Dowloads, Desktop/programs): ')
# an input example that works "bin/mans"
str2 = input ('Zipped output directory (eg. root, myBackups): ')
# an output example that works "bin2/test"
zipName = input ("What would you like to name your zipped folder? ")
path1 = Dir, str1, "/"
path2 = Dir, str2, "/"
# Zip it up
# print (zipFile, ".tar.gz will be created from the folder ", path1[0]+path1[1]+path1[2])
#"and placed into the folder ", path2[0]+path2[1]+path2[2])
zipDirTo = os.path.expanduser(os.path.join(path2[0], path2[1]+path2[2], zipName))
zipDir = os.path.expanduser(os.path.join(path1[0], path1[1]))
print ('Directory "',zipDir,'" will be zipped and saved to the location: "' ,zipDirTo,'.tar.gz"')
shutil.make_archive(zipDirTo, 'gztar', zipDir)
print ("file zipped")
# In Case of mistake
except:
print ("Something went wrong in compression.\n",
"Ending Task, Please try again")
quit()
# Execute the program
def main():
print ("It will be a fucking miracle if this succeeds.")
zipDir()
print ("Success!!!!!!")
time.sleep(2)
quit()
# Wrap it all up
if __name__ == '__main__':
main()
Video files are normally compressed themselves and recompressing them doesn't help.for image and video file use tar only.
My main question is why I lose the main folder upon extraction of the compressed files
Because you're not storing that folder's name in the zip file. The paths you're using don't include Documents, they start with the name of the items inside Documents.
Are video files just hard to compress?
Any file that is already compressed, such as most video and audio formats, will be hard to compress further, and it will take quite a bit of time to find that out if the size is large. You might consider detecting compressed files and storing them in the zip file without further compression using the ZIP_STORED constant.
let[']s get a discussion going.
Stack Overflow's format is not really suited to discussions.
I'm writing two Python scripts that both parse files. One is a standard unix logfile and the other is a binary file. I'm trying to figure out the best way to monitor these so I can read data as soon as they're updated. Most of the solutions I've found thus far are linux specific, but I need this to work in FreeBSD.
Obviously one approach would be to just run my script every X amount of time, but this seems gross and inefficient. If I want my Python app running continuously in the background monitoring a file and acting on it once it's changed/updated, what's my best bet?
Have you tried KQueue events?
http://docs.python.org/library/select.html#kqueue-objects
kqueue is the FreeBSD / OS version of inotify (file change notification service). I haven't used this, but I think it's what you want.
I once did to make a sort of daemon process for a parser built in Python. I needed to watch a series of files and process them in Python, and it had to be a truly multi-OS solution (Windows & Linux in this case). I wrote a program that watches over a list of files by checking their modification time. The program sleeps for a while and then checks the modification time of the files being watched. If the modification time is newer than the one previously registered, then the file has changed and so stuff has to be done with this file.
Something like this:
import os
import time
path = os.path.dirname(__file__)
print "Looking for files in", path, "..."
# get interesting files
files = [{"file" : f} for f in os.listdir(path) if os.path.isfile(f) and os.path.splitext(f)[1].lower() == ".src"]
for f in files:
f["output"] = os.path.splitext(f["file"])[0] + ".out"
f["modtime"] = os.path.getmtime(f["file"]) - 10
print " watching:", f["file"]
while True:
# sleep for a while
time.sleep(0.5)
# check if anything changed
for f in files:
# is mod time of file is newer than the one registered?
if os.path.getmtime(f["file"]) > f["modtime"]:
# store new time and...
f["modtime"] = os.path.getmtime(f["file"])
print f["file"], "has changed..."
# do your stuff here
It does not look like top notch code, but it works quite well.
There are other SO questions about this, but I don't know if they'll provide a direct answer to your question:
How to implement a pythonic equivalent of tail -F?
How do I watch a file for changes?
How can I "watch" a file for modification / change?
Hope this helps!