Time module and file changes - python

I need to write a script that does the following
Write a python script to list all of the files and directories in the current directory and all subdirectories that have been modified in the last X minutes.
X should be taken in as a command-line argument.
Check that this argument exists, and exit with a suitable error message if it doesn’t.
X should be an int which is less than or equal to 120. If not, exit with a suitable error message.
For each of these files and directories, list the time of modification, whether it is a file or directory,
and its size.
I have come up with this
#!/usr/bin/python
import os,sys,time
total = len(sys.argv)
if total < 2:
print "You need to enter a value in minutes"
sys.exit()
var = int(sys.argv[1])
if var < 1 or var > 120 :
print "The value has to be between 1 and 120"
sys.exit()
past = time.time() - var * 60
result = []
dir = os.getcwd()
for p, ds, fs in os.walk(dir):
for fn in fs:
filepath = os.path.join(p, fn)
status = os.stat(filepath).st_mtime
if os.path.getmtime(filepath) >= past:
size = os.path.getsize(filepath)
result.append(filepath)
created = os.stat(fn).st_mtime
asciiTime = time.asctime( time.gmtime( created ) )
print "Files that have changed are %s"%(result)
print "Size of file is %s"%(size)
So it reports back with something like this
Files that have changed are ['/home/admin/Python/osglob2.py']
Size of file is 729
Files that have changed are ['/home/admin/Python/osglob2.py', '/home/admin/Python/endswith.py']
Size of file is 285
Files that have changed are ['/home/admin/Python/osglob2.py', '/home/admin/Python/endswith.py', '/home/admin/Python/glob3.py']
Size of file is 633
How can i get this to stop reepeating the files ?

The reason your code builds a list of all the files it's encountered is
result.append(filepath)
and the reason it prints out that whole list every time is
print "Files that have changed are %s"%(result)
So you will need to change one of those lines: either replace the list, rather than appending to it, or (much more sensible IMO) just print out the one latest filename found, rather than the whole list.

You aren't clearing your result list at the end of each iteration. Try something like result.clear() after your second print statement. Make sure it is on the same indent as the for though, not the print.

Related

How to take multiple VLC snapshot into specific folder

This code is working well, but it keep replacing the previous snapshot in the location I dont want.
How can I keep taking the snapshot every second without replacing the previous shot, and how can I specify the folder for these png going to be saved?
player=vlc.MediaPlayer('rtsp://admin:888999#thesport.fujiko.biz:554/unicast/c3/s0/live')
player.play()
while 1:
time.sleep(1)
player.video_take_snapshot(0, '.snapshot.tmp.png', 0, 0)
It's easy, every time you get a frame, store it in different variable, like this
As the one comment says, you need to change the filename for each subsequent save. I would create a count in your loop and then format the value to the string that is the filename. For example:
player=vlc.MediaPlayer('rtsp://admin:888999#thesport.fujiko.biz:554/unicast/c3/s0/live')
player.play()
i = 0
while 1:
time.sleep(1)
player.video_take_snapshot(0, '.snapshot_{}.tmp.png'.format(i), 0, 0)
i += 1
If you simply specify a directory name, rather than a filename, vlc will create a unique file name for you, based on the date and time.
i.e.
file:///home/rolf/vlcsnap-2020-08-14-10h43m06s020.png
file:///home/rolf/vlcsnap-2020-08-14-10h43m08s936.png
#Video Snapshot
def OnSnapShot(self,evt):
media_state = self.player.get_state()
if media_state.value < 3 or media_state.value > 4:
return
if os.path.isfile(self.currentlyplaying):
dir_name = os.path.dirname(self.currentlyplaying)
else:
dir_name = self.home_dir
snapshot_size = self.player.video_get_size(0)
x=self.player.video_take_snapshot(0, dir_name,snapshot_size[0],snapshot_size[1])
if x == 0:
Notify(self,"Snapshot","Image saved in "+dir_name)

Is there a memory-efficient way to use a tuple to iterate over very large os.scandir() objects?

**EDITED TO FOCUS ON FILENAME (PARTIAL) MATCHING:
I am working with approximately 1.8 million files in one directory on a remote server. I am using os.scandir() in python3 to generate the full list of files in that directory, then checking each file name against an existing tuple, and, if there is a match, copying that file to a separate directory (also on the remote server).
The tuple I am using to check for the proper filenames is ~100,000 items long. Further, each item in the tuple is only a partial match for the actual filename -- for example, a tuple item might be '2019007432' and I want it to match a filename such as '2019007432_longunpredictablefilename.doc'. So I've used .startswith when searching filenames, rather than looking for exact matches.
I have successfully been able to run this code one time, but the script slows down progressively as it goes on, maxing out my computer's RAM -- and it took about 24 hours to run. As I will be adding to the 1.8 million files in the future, and I may have additional (longer) tuples with which to find and copy files, I'm looking for ways to streamline the code so it will run more efficiently. Does anyone have any suggestions about how to make this work any better?
import os
import shutil
from variables import file_tuple
srcpath = 'path/to/source/directory'
destpath = 'path/to/destination/directory'
counter = 1
copy_counter = 1
error_list = []
all_files = os.scandir(srcpath)
for file in all_files:
try:
if file.name.startswith(file_tuple):
shutil.copy(srcpath + '/' + file.name, destpath)
print('copied ' + str(counter) + ' -- ' + str(copy_counter))
copy_counter +=1
else:
if counter % 5000 == 0:
percent = "{0:.0%}".format(counter/1860000)
print(str(counter) + ' -- ' + str(percent))
except Exception as e:
print(e)
error_list.append(file.name)
counter +=1
print(error_list)
So now we talk about algorithm. In my opinion one of the best idea is to shrink the list of all files in the computer. So try to find a similar patter for this names in tuple like all start with a digit or all ends with a digit or contains only digits or have some precise length range. After you subset this files you could look across much smaller list. Still it will be a O(N^2) although it might be significantly more efficient. * It is like one additional loop across all files looking for similar pattern
In case this ends up being useful to others: Following the advice from #juanpa.arrivillaga, I converted my tuple into a set, and altered my code to split my the filenames generated by scandir such that they would be an exact match with the items in the set. This ran in about 6 hours (compared to 24+ hours the original way. See code below.
I haven't yet tried #polkas suggestion, which is to break down the tuple into smaller chunks and run them separately, but I suspect it will be very useful for a) unstable internet connections, to allow me to only run sections at a time without losing my place when the internet drops and/or b) when the filenames cannot be easily split at a known character.
files = os.scandir(srcpath)
for file in files:
try:
UID = file.name.split("_")[0]
if UID in file_set:
shutil.copy(srcpath + '/' + file.name, destpath)
print('copied ' + str(counter) + ' -- ' + str(copy_counter))
copy_counter +=1
else:
if counter % 5000 == 0:
percent = "{0:.0%}".format(counter/1860000)
print(str(counter) + ' -- ' + str(percent))
except Exception as e:
print('Error on line {}'.format(sys.exc_info()[-1].tb_lineno), type(e).__name__, e)
error_list.append(file.name)
counter +=1
print(error_list)

Using a generator in a while loop and evaluate after every yield

I'm using Python to open some files in a CAD program. Since the program will crash when I open too many files at once, I want my script to stop opening files from a list I generated when the sum of thier filesize exceeds a certain value.
Here is what I have so far:
I'm converting the log file to a list. It contains the filepaths seperated by commas:
fList = []
with open('C:/Users/user/Desktop/log.txt', 'r') as f:
fList = f.read().split(',')
with suppress(ValueError, AttributeError):
fList.remove('')
fcount = len(fList)
This is the Generator that I use to Iterate over the partList:
def partGenerator(partList):
for file in partList:
yield file
Here I try to loop over the files while the sum of thier size is smaller than 2500000 bite:
count = 0
progression = 0
storage = 0
while storage < 2500000:
for file in partGenerator(fList):
name = os.path.basename(file)
storage += os.path.getsize(file)
print(f'Auslastung: {storage} bite / 2500000 bite')
oDoc = oApp.Documents.Open(file)
progression += 1
percent = round(100 * progression / fcount)
print(f'Fortschritt: {progression} / {fcount} ({percent} %) - {name}')
What happens is, that the files open propperly in the CAD Software, but they don't stop after the while condition is exceeded. My guess is, that the while condition is evaluated after the list runs out of entries and not after every entry like I what to.
Help on the correct syntax would be great!
What I'm looking for ultimately:
I would like to use this script in a way that it opens some files and whenever I manualy close one in the CAD program, It opens the next one from my list until the list is exhausted.
Your while condition is never checked, no, because the for loop never lets Python check. That the for loop takes elements from a generator function is neither here nor there.
You need to check inside your for loop if your condition still holds:
for file in partGenerator(fList):
name = os.path.basename(file)
storage += os.path.getsize(file)
if storage >= 2500000:
# wait for input before continuing, then reset the storage amount
input("Please close some files to continue, then press ENTER")
storage = 0
Python doesn't check while conditions until the full suite (series of statements) in the block under the while ...: statement has completed running or executes a continue statement, so a while condition really isn't suitable here.
In the above example I used the low-tech input() function to ask whomever is running the script to press ENTER afterwards. It'll depend on what oDoc.Documents actually offers as an API to see if you could use that to detect that files have been closed.
If you wanted to use a generator function, have it track file sizes. You can even have it read those from the CSV file. I'd use the csv module to handle splitting and progress, by the way:
import csv
def parts(logfile):
with open(logfile, newline='') as f:
reader = csv.reader(f)
files = [column for row in reader for column in row if column]
fcount = len(files)
storage = 0
for i, filename in enumerate(files):
storage += os.path.getsize(file)
if storage >= 2500000:
input("Please close some files to continue, then press ENTER")
storage = 0
print(f'Auslastung: {storage} bite / 2500000 bite')
yield file
print(f'Fortschritt: {i} / {fcount} ({i / fcount:.2%}) - {name}')
then just use
for file in parts('C:/Users/user/Desktop/log.txt'):
oDoc = oApp.Documents.Open(file)
Note that the absolute number of files open is what your OS limits on, not how large those files are.
With the input from Martijn Pieters I came up with something that works perfectly for me. I'm a noob in programming so it took me a while to understand the problem. Here is what woked just fine in the end:
fList = []
with open('C:/Users/jhoefler/Desktop/log.txt', 'r') as f:
fList = f.read().split(',')
with suppress(ValueError, AttributeError):
fList.remove('')
fcount = len(fList)
count = 0
progression = 0
for file in fList:
name = os.path.basename(file)
if oApp.Documents.Count < 10:
oDoc = oApp.Documents.Open(file)
else:
pCount = oApp.Documents.LoadedCount
fCount = oApp.Documents.LoadedCount
while fCount == pCount:
time.sleep(1)
pCount = oApp.Documents.LoadedCount
oDoc = oApp.Documents.Open(file)
progression += 1
percent = round(100 * progression / fcount)
print(f'Fortschritt: {progression} / {fcount} ({percent} %) - {name}')
I'm sure there is a more elegant way to solve the problem, but it workes for my needs just fine.

python worm how to make it more complex?

Please be kind this is my second post and i hope you all like.
Here I have made a program that makes directories inside directories,
but the problem is I would like a way to make it self replicate.
Any ideas and help is greatly appreciated.
Before:
user/scripts
After:
user/scripts/worm1/worm2/worm3
The script is as follows:
import os, sys, string, random
worms_made = 0
stop = 20
patha = ''
pathb = '/'
pathc = ''
def fileworm(worms_made, stop, patha, pathb, pathc):
filename = (''.join(random.choice(string.ascii_lowercase
+string.ascii_uppercase + string.digits) for i in range(8)))
pathc = patha + filename + pathb
worms_made = worms_made + 1
os.system("mkdir %s" % filename)
os.chdir(pathc)
print "Worms made: %r" % worms_made
if worms_made == stop:
print "*Done"
exit(0)
elif worms_made != stop:
pass
fileworm(worms_made, stop, patha, pathb, pathc)
fileworm(worms_made, stop, patha, pathb, pathc)
To create a variable depth, you could do something like this:
import os
depth = 3
worms = ['worm{}'.format(x) for x in range(1, depth+1)]
path = os.path.join(r'/user/scripts', *worms)
os.path.makedirs(path)
As mentioned, os.path.makedirs() will create all the required folders in one call. You just need to build the full path.
Python has a function to help with creating paths called os.path.join(). This makes sure the correct / or \ is automatically added for the current operating system between each part.
worms is a list containing ["worm1", "worm2", "worm3"], it is created using a Python feature called a list comprehension. This is passed to the os.path.join() function using * meaning the each element of the list is passed as a separate parameter.
I suggest you try adding print worms or print path to see how it works.
The result is that a string looking something like as follows is passed to the function to create your folder structure:
/user/scripts/worm1/worm2/worm3

how to skip the rest of a sequence

I have a couple of functions that are being called recursively inside nested loops. The ultimate objective of my program is to:
a) loop through each year,
b) within each each year, loop through each month (12 total),
c) within each month, loop through each day (using a self generated day counter),
d) and read 2 files and merge them together into a another file.
In each instance, I am going down into the directory only if exists. Otherwise, I'm to just skip it and go to the next one. My code does a pretty good job when all the files are present, but when one of the files is missing, I would like to just simply skip the whole process of creating a merged file and continue the loops. The problem I am getting is a syntax error that states that continue is not properly in the loop. I am only getting this error in the function definitions, and not outside of them.
Can someone explain why I'm getting this error?
import os, calendar
file01 = 'myfile1.txt'
file02 = 'myfile2.txt'
output = 'mybigfile.txt'
def main():
#ROOT DIRECTORY
top_path = r'C:\directory'
processTop(top_path)
def processTop(path):
year_list = ['2013', '2014', '2015']
for year in year_list:
year_path = os.path.join(path, year)
if not os.path.isdir(year_path):
continue
else:
for month in range(1, 13):
month_path = os.path.join(year_path, month)
if not os.path.isdir(month_path):
continue
else:
numDaysInMth = calendar.monthrange(int(year), month)[1]
for day in range(1, numDaysInMth+1):
processDay(day, month_path)
print('Done!')
def processDay(day, path):
day_path = os.path.join(path, day)
if not os.path.isdir(day_path):
continue
else:
createDailyFile(day_path, output)
def createDailyFile(path, dailyFile):
data01 = openFile(file01, path)
data02 = openFile(file02, path)
if len(data01) == 0 or len(data02) == 0:
# either file is missing
continue
else:
# merge the two datalists into a single list
# create a file with the merged list
pass
def openFile(filename, path):
# return a list of contents of filename
# returns an empty list if file is missing
pass
if __name__ == "__main__": main()
You can use continue only plainly inside a loop (otherwise, what guarantee you have that the function was called in a loop in the first place?) If you need stack unwinding, consider using exceptions (Python exception handling).
I think you can get away with having your functions return a value that would say if operation was completed successfully:
def processDay(day, path):
do_some_job()
if should_continue:
return False
return True
And then in your main code simply say
if not processDay(day, path):
continue
You are probably getting that error in processDay and createDailyFile, right? That's because there is no loop in these functions, and yet you use continue. I'd recommend using return or pass in them.
The continue statement only applies in loops as the error message implies if your functions are structured as you show you can just use pass.
continue can only appear in a loop since it tells python not to execute the lines below and go to the next iteration. Hence, this syntax here is not valid :
def processDay(day, path):
day_path = os.path.join(path, day)
if not os.path.isdir(day_path):
continue # <============ this continue is not inside a loop !
else:
createDailyFile(day_path, output)enter code here
Same for your createDailyFile function.
You may want to replace it with a return ?

Categories