OneDrive free up space with Python - python

I have been using OneDrive to store a large amount of images and now I need to process those, so I have sync'd my OneDrive folder to my computer, which takes relatively no space on disk. However, since I have to open() them in my code, they all get downloaded, which would take much more than the available memory on my computer. I can manually use the Free up space action in the right-click contextual menu, which keeps them sync'd without taking space.
I'm looking for a way to do the same thing but in my code instead, after every image I process.
Trying to find how to get the commands of contextual menu items led me to these two places in the registry:
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Directory\shell
HKEY_LOCAL_MACHINE\SOFTWARE\Classes*\shellex\ContextMenuHandlers
However I couldn't find anything related to it and those trees have way too many keys to check blindly. Also this forum post (outside link) shows a few ways to free up space automatically, but it seems to affect all files and is limited to full days intervals.
So is there any way to either access that command or to free up the space in python ?

According to this microsoft post it is possible to call Attrib.exe to do that sort of manipulation on files.
This little snippet does the job for a per-file usage. As shown in the linked post, it's also possible to do it on the full contents of a folder using the /s argument, and much more.
import subprocess
def process_image(path):
# Open the file, which downloads it automatically
with open(path, 'r') as img:
print(img)
# Free up space (OneDrive) after usage
subprocess.run('attrib +U -P "' + path + '"')
The download and freeing up space are fairly quick, but in the case of running this heavily in parallel, it is possible that some disk space will be consumed for a short amount of time. In general though, this is pretty instantaneous.

In addition to Mat's answer. If you are working on a Mac then you can replace Attrib.exe with "/Applications/OneDrive.App/Contents/MacOS/OneDrive /unpin" to make the file online only.
import subprocess
path = "/Users/OneDrive/file.png"
subprocess.run(["/Applications/OneDrive.App/Contents/MacOS/OneDrive", "/unpin", path])

Free up space for multiples files.
import os
import subprocess
path = r"C:\Users\yourUser\Folder"
diret = os.listdir(path)
for di in diret:
dir_atual = path + "\\" + di
for root, dirs, files in os.walk(dir_atual):
for file in files:
arquivos = (os.path.join(root, file))
print (arquivos)
subprocess.run('attrib +U -P "' + arquivos + '"')

Related

Why is my original folder not kept after compression? Why is my compression so slow? - python 3.4

The purpose of this program is to zip a directory or a folder as simply as possible, and write
the generated .tar.gz to one of my USB flash drives (or any other location), plans are to add a
function that will also use 'GnuPG' to encrypt the folder and another
that will allow user to input a time in order to perform this task
daily, weekly, monthly, etc. I also want the user to be able to choose
the destination of the zipped folder. Just wanted to post this up now
to see if it worked on first attempt and to get a bit of feedback.
My main question is why I lose the main folder upon extraction of the compressed files. For example, if I compress "Documents" which contains the two folders "Videos" and "Pictures" and the file "manual.txt". When I extract the file it does not dump "Documents" into the extraction point it dumps "Videos" and "Pictures" and "manual.txt". Which is fine and all, no data loss and everything is still intact, just creates a bit of clutter and I would like to keep the original directory.
Also wondering why in the world is this program taking so long to convert the file and when it does the conversion in some cases the .tar.gz file is just as large as the original folder, this happens with video files, it does seem to compress text files well, and much quicker.
Are video files just hard to compress? Or what, It takes like 5 minutes to process 2gb of video files and then they are the same as the original size? Kinda pointless.
Also would it make sense to use regex to validate user input in this case, I could just use a couple if statements instead no? like the preferred input in this program is 'root' not '/root'. Couldn't I just have it cut the '/' off if the input starts with a '/'.
I mainly want to see if this is the right, most efficient way of doing things, I'd rather not be given the answer in the usual stack overflow copy/paste way, lets get a discussion going.
So why is this program so slow when processing larger amounts of data? I expect a reduction in speed but not by that much
#!/usr/bin/env python3
'''
author: ryan st***
date: 12/5/2015
time: 18:55 Eastern time (GMT -5)
language: python 3.4
'''
# Import, import, import.
import os, subprocess, sys, zipfile, re
import shutil
import time
# Backup (zip) files
def zipDir():
try:
# Get file to be zipped and zip file destination from user
Dir = "~"
str1 = input ('Input directory to be zipped(eg. Douments, Dowloads, Desktop/programs): ')
# an input example that works "bin/mans"
str2 = input ('Zipped output directory (eg. root, myBackups): ')
# an output example that works "bin2/test"
zipName = input ("What would you like to name your zipped folder? ")
path1 = Dir, str1, "/"
path2 = Dir, str2, "/"
# Zip it up
# print (zipFile, ".tar.gz will be created from the folder ", path1[0]+path1[1]+path1[2])
#"and placed into the folder ", path2[0]+path2[1]+path2[2])
zipDirTo = os.path.expanduser(os.path.join(path2[0], path2[1]+path2[2], zipName))
zipDir = os.path.expanduser(os.path.join(path1[0], path1[1]))
print ('Directory "',zipDir,'" will be zipped and saved to the location: "' ,zipDirTo,'.tar.gz"')
shutil.make_archive(zipDirTo, 'gztar', zipDir)
print ("file zipped")
# In Case of mistake
except:
print ("Something went wrong in compression.\n",
"Ending Task, Please try again")
quit()
# Execute the program
def main():
print ("It will be a fucking miracle if this succeeds.")
zipDir()
print ("Success!!!!!!")
time.sleep(2)
quit()
# Wrap it all up
if __name__ == '__main__':
main()
Video files are normally compressed themselves and recompressing them doesn't help.for image and video file use tar only.
My main question is why I lose the main folder upon extraction of the compressed files
Because you're not storing that folder's name in the zip file. The paths you're using don't include Documents, they start with the name of the items inside Documents.
Are video files just hard to compress?
Any file that is already compressed, such as most video and audio formats, will be hard to compress further, and it will take quite a bit of time to find that out if the size is large. You might consider detecting compressed files and storing them in the zip file without further compression using the ZIP_STORED constant.
let[']s get a discussion going.
Stack Overflow's format is not really suited to discussions.

In Paraview, how to change the datafile filename of the state to create a snapshot from a given datafile and state?

I am currently able to visualize correctly in ParaView a .vtp file for each time step of a simulation, and to print a screenshot for each. I want to do that in batch, but I want to keep the same state for each one (view point, filters applied, etc). I have already saved the state into a .psvm file , and I tried to write a python script which, after being run by pvbatch, will (hopefully) print the screenshots. But, unfortunately, it is not working. I tried to change the filename in the state by processing the state file and doing a search and replace, but still it is not working. For instance, it keeps plotting the first data input only, even if the current file is different (altough GetSources() shows an always increasing list of sources). I use ParaView 3.14.0 in Snow Leopard. I am sure this is easy, but I am overwhelmed with the large amount of info about python and paraview with no reference to this particularissue. Please, please, any advice is greatly welcome, and I am sorry if this has been answered previously (I looked at google, the paraview mailing list, and here). Below is my script, which can also be found at http://pastebin.com/4xiLNrS0 . Furthermore, you can find some example files and state in http://goo.gl/XjPpE .
#!/bin/python
import glob, string, os, commands
from paraview.simple import *
#help(servermanager)
# vtp files are inside the local subdir DISPLAY
files = (commands.getoutput("ls DISPLAY/data-*.vtp | grep -v contacts")).split()
# process each file
for filename in files:
fullfn = commands.getoutput("ls $PWD/" + filename)
fn = filename.replace('DISPLAY/', '')
#os.system("cp ../dem_git/addons/paraview_state.pvsm tmp.pvsm")
os.system("cp ~/Desktop/state.pvsm tmp.pvsm")
os.system("sed -i.bck 's/DATA.vtp/" + fullfn.replace('/','\/') + "/1' tmp.pvsm") # replace first intance with full path
os.system("sed -i.bck 's/DATA.vtp/" + fullfn.replace('/','\/') + "/1' tmp.pvsm") # replace second intance with full path
os.system("sed -i.bck 's/DATA.vtp/" + fn + "/1' tmp.pvsm") # replace third with just the filename path
servermanager.LoadState("tmp.pvsm")
pm = servermanager.ProxyManager()
reader = pm.GetProxy('sources', fullfn)
reader.FileName = fullfn
reader.FileNameChanged()
reader.UpdatePipeline()
view = servermanager.GetRenderView()
SetActiveView(view)
view.StillRender()
WriteImage(filename.replace(".vtp", ".png"))
os.system("rm -f tmp.pvsm")
os.system("rm -f tmp.pvsm.bck")
Delete(reader)
I realise this is an old question, but I had exactly the same problem recently and couldn't find any answers either. All you need to do is add Delete(view) after Delete(reader) for your script to work.

Python File System Reader Performance

I need to scan a file system for a list of files, and log those who don't exist. Currently I have an input file with a list of the 13 million files which need to be investigated. This script needs to be run from a remote location, as I do not have access/cannot run scripts directly on the storage server.
My current approach works, but is relatively slow. I'm still fairly new to Python, so I'm looking for tips on speeding things up.
import sys,os
from pz import padZero #prepends 0's to string until desired length
output = open('./out.txt', 'w')
input = open('./in.txt', 'r')
rootPath = '\\\\server\share\' #UNC path to storage
for ifid in input:
ifid = padZero(str(ifid)[:-1], 8) #extracts/formats fileName
dir = padZero(str(ifid)[:-3], 5) #exracts/formats the directory containing the file
fPath = rootPath + '\\' + dir + '\\' + ifid + '.tif'
try:
size = os.path.getsize(fPath) #don't actually need size, better approach?
except:
output.write(ifid+'\n')
Thanks.
dirs = collections.defaultdict(set)
for file_path in input:
file_path = file_path.rjust(8, "0")
dir, name = file_path[:-3], file_path
dirs[dir].add(name)
for dir, files in dirs.iteritems():
for missing_file in files - set(glob.glob("*.tif")):
print missing_file
Explanation
First read the input file into a dictionary of directory: filename. Then for each directory, list all the TIFF files in that directory on the server, and (set) subtract this from the collection of filenames you should have. Print anything that's left.
EDIT: Fixed silly things. Too late at night when I wrote this!
That padZero and string concatenation stuff looks to me like it would take a good percent of time.
What you want it to do is spend all its time reading the directory, very little else.
Do you have to do it in python? I've done similar stuff in C and C++. Java should be pretty good too.
You're going to be I/O bound, especially on a network, so any changes you can make to your script will result in very minimal speedups, but off the top of my head:
import os
input, output = open("in.txt"), open("out.txt", "w")
root = r'\\server\share'
for fid in input:
fid = fid.strip().rjust(8, "0")
dir = fid[:-3] # no need to re-pad
path = os.path.join(root, dir, fid + ".tif")
if not os.path.isfile(path):
output.write(fid + "\n")
I don't really expect that to be any faster, but it is arguably easier to read.
Other approaches may be faster. For example, if you expect to touch most of the files, you could just pull a complete recursive directory listing from the server, convert it to a Python set(), and check for membership in that rather than hitting the server for many small requests. I will leave the code as an exercise...
I would probably use a shell command to get the full listing of files in all directories and subdirectories in one hit. Hopefully this will minimise the amount of requests you need to make to the server.
You can get a listing of the remote server's files by doing something like:
Linux: mount the shared drive as /shared/directory/ and then do ls -R /shared/directory > ~/remote_file_list.txt
Windows: Use Map Network Drive to mount the shared drive as drive letter X:, then do dir /S X:/shared_directory > C:/remote_file_list.txt
Use the same methods to create a listing of your local folder's contents as local_file_list.txt. You python script will then reduce to an exercise in text processing.
Note: I did actually have to do this at work.

Reading updated files on the fly in Python

I'm writing two Python scripts that both parse files. One is a standard unix logfile and the other is a binary file. I'm trying to figure out the best way to monitor these so I can read data as soon as they're updated. Most of the solutions I've found thus far are linux specific, but I need this to work in FreeBSD.
Obviously one approach would be to just run my script every X amount of time, but this seems gross and inefficient. If I want my Python app running continuously in the background monitoring a file and acting on it once it's changed/updated, what's my best bet?
Have you tried KQueue events?
http://docs.python.org/library/select.html#kqueue-objects
kqueue is the FreeBSD / OS version of inotify (file change notification service). I haven't used this, but I think it's what you want.
I once did to make a sort of daemon process for a parser built in Python. I needed to watch a series of files and process them in Python, and it had to be a truly multi-OS solution (Windows & Linux in this case). I wrote a program that watches over a list of files by checking their modification time. The program sleeps for a while and then checks the modification time of the files being watched. If the modification time is newer than the one previously registered, then the file has changed and so stuff has to be done with this file.
Something like this:
import os
import time
path = os.path.dirname(__file__)
print "Looking for files in", path, "..."
# get interesting files
files = [{"file" : f} for f in os.listdir(path) if os.path.isfile(f) and os.path.splitext(f)[1].lower() == ".src"]
for f in files:
f["output"] = os.path.splitext(f["file"])[0] + ".out"
f["modtime"] = os.path.getmtime(f["file"]) - 10
print " watching:", f["file"]
while True:
# sleep for a while
time.sleep(0.5)
# check if anything changed
for f in files:
# is mod time of file is newer than the one registered?
if os.path.getmtime(f["file"]) > f["modtime"]:
# store new time and...
f["modtime"] = os.path.getmtime(f["file"])
print f["file"], "has changed..."
# do your stuff here
It does not look like top notch code, but it works quite well.
There are other SO questions about this, but I don't know if they'll provide a direct answer to your question:
How to implement a pythonic equivalent of tail -F?
How do I watch a file for changes?
How can I "watch" a file for modification / change?
Hope this helps!

Python programming - Windows focus and program process

I'm working on a python program that will automatically combine sets of files based on their names.
Being a newbie, I wasn't quite sure how to go about it, so I decided to just brute force it with the win32api.
So I'm attempting to do everything with virtual keys. So I run the script, it selects the top file (after arranging the by name), then sends a right click command,selects 'combine as adobe PDF', and then have it push enter. This launched the Acrobat combine window, where I send another 'enter' command. The here's where I hit the problem.
The folder where I'm converting these things loses focus and I'm unsure how to get it back. Sending alt+tab commands seems somewhat unreliable. It sometimes switches to the wrong thing.
A much bigger issue for me.. Different combination of files take different times to combine. though I haven't gotten this far in my code, my plan was to set some arbitrarily long time.sleep() command before it finally sent the last "enter" command to finish and confirm the file name completing the combination process. Is there a way to monitor another programs progress? Is there a way to have python not execute anymore code until something else has finished?
I would suggest using a command-line tool like pdftk http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ - it does exactly what you want, it's cross-platform, it's free, and it's a small download.
You can easily call it from python with (for example) subprocess.Popen
Edit: sample code as below:
import subprocess
import os
def combine_pdfs(infiles, outfile, basedir=''):
"""
Accept a list of pdf filenames,
merge the files,
save the result as outfile
#param infiles: list of string, names of PDF files to combine
#param outfile: string, name of merged PDF file to create
#param basedir: string, base directory for PDFs (if filenames are not absolute)
"""
# From the pdftk documentation:
# Merge Two or More PDFs into a New Document:
# pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf
if basedir:
infiles = [os.path.join(basedir,i) for i in infiles]
outfile = [os.path.join(basedir,outfile)]
pdftk = [r'C:\Program Files (x86)\Pdftk\pdftk.exe'] # or wherever you installed it
op = ['cat']
outcmd = ['output']
args = pdftk + infiles + op + outcmd + outfile
res = subprocess.call(args)
combine_pdfs(
['p1.pdf', 'p2.pdf'],
'p_total.pdf',
'C:\\Users\\Me\\Downloads'
)

Categories