Get number of files in directory with pathlib python - python

I have a two directories with csv files. Both should be of the same length, as I am looping over both of them with zip. Therefor I have a check to see if the length of them are the same. The code looks like this:
from pathlib import Path
def check():
base = Path('home/user/src/log').rglob('*.csv')
test = Path('home/user/src/log').rglob('*.csv')
print(list(base))
if len(list(base)) != len(list(test):
print(f"Wrong number of files in {str(base)} and {str(test)}")
return -1
for base, test in zip(base, test):
x = pd.read_csv(base)
y = pd.read_csv(test)
print(x)
print(y)
if __name__ == '__main__':
check()
The list(base) gives the list of files, but it also silent kills the program. So if I have print(list(base)) it will print the files in base and then the program terminates.
The str(base) does also not work, but this is because I havent found a way to print out the directory path without the program terminating afterwards. Any tips to get the length of the list and print the directory without killing the program.
Note: I now I can use 'os' but would like to use pathlib if possible

rglob returns a generator. Calling list on the generator consumes all items.
You could however convert it to a list initially and then keep working with the list afterwards:
from pathlib import Path
def check():
base = list(Path('home/user/src/log').rglob('*.csv'))
test = list(Path('home/user/src/log').rglob('*.csv'))
print(base)
if len(base) != len(test):
print(f"Wrong number of files in {str(base)} and {str(test)}")
return -1
for base, test in zip(base, test):
x = pd.read_csv(base)
y = pd.read_csv(test)
print(x)
print(y)
if __name__ == '__main__':
check()

Related

How to pass a list from one function to another function in Python3 across three files?

I've been banging my head for some time regarding this and I know it might end up being something stupid but my import game is non-existent.
Basically, I have four files on the same directory: fs.py, z.py, match.py, y.py.
The function from fs.py and function from z.py feeds into match.py. From there, the results of match.py feeds into y.py (This is where I get an issue).
When I feed the list from match.py into y.py, the list is empty and doesn't have anything.
Each file has a bit of code, but the general flow of logic is below. I'm thinking it has to do with the way I'm importing in my last file: y.py.
I'm able to get what I want when I run match.py so I know the results from fs.py and z.py are being fed in properly. I've printed out the list which will goes into y.py from match.py to see if I get anything and it's not an empty list so I'm not sure why it's being passed in as an empty list to the function in y.py. Am I suppose to also import fs and z in y.py?
fs.py:
def get_fs():
y_params = [1, 2, 4]
return y_params
if __name__ == "__main__":
get_fs()
z.py:
def get_z():
y_params = [3, 5, 4]
return y_params
if __name__ == "__main__":
get_z()
match.py
from fs import get_fs
from z import get_z
def create_list(fs=[], z=[]):
match_list = []
match_list = fs + z
return match_list
if __name__ == "__main__":
fs = get_fs()
z = get_z()
create_list(fs, z)
y.py
from match import create_list
def create_new_list(match_list=[]):
print(match_list)
if __name__ == "__main__":
match_list = create_list()
create_match_list(match_list)
If someone has any idea, I would really appreciate it. I've been trying to figure this out for the last 2 hours and I'm fairly new to python. T_T I can also add the original code as well.
I guess you have misinterpreted the concept of if __name__ == '__main__', which you have used in your code. This if condition evaluates to True only if you execute python file.py, if you are importing file.py to some other module, it won't get executed.
Now in your case match.py has the list initializer placed inside the if __name__ == "__main__", which would be evaluated only when you run python match.py If you import match to some other file, the if __name__ == "__main__" condition would evaluate to false and hence your lists would be empty.

python worm how to make it more complex?

Please be kind this is my second post and i hope you all like.
Here I have made a program that makes directories inside directories,
but the problem is I would like a way to make it self replicate.
Any ideas and help is greatly appreciated.
Before:
user/scripts
After:
user/scripts/worm1/worm2/worm3
The script is as follows:
import os, sys, string, random
worms_made = 0
stop = 20
patha = ''
pathb = '/'
pathc = ''
def fileworm(worms_made, stop, patha, pathb, pathc):
filename = (''.join(random.choice(string.ascii_lowercase
+string.ascii_uppercase + string.digits) for i in range(8)))
pathc = patha + filename + pathb
worms_made = worms_made + 1
os.system("mkdir %s" % filename)
os.chdir(pathc)
print "Worms made: %r" % worms_made
if worms_made == stop:
print "*Done"
exit(0)
elif worms_made != stop:
pass
fileworm(worms_made, stop, patha, pathb, pathc)
fileworm(worms_made, stop, patha, pathb, pathc)
To create a variable depth, you could do something like this:
import os
depth = 3
worms = ['worm{}'.format(x) for x in range(1, depth+1)]
path = os.path.join(r'/user/scripts', *worms)
os.path.makedirs(path)
As mentioned, os.path.makedirs() will create all the required folders in one call. You just need to build the full path.
Python has a function to help with creating paths called os.path.join(). This makes sure the correct / or \ is automatically added for the current operating system between each part.
worms is a list containing ["worm1", "worm2", "worm3"], it is created using a Python feature called a list comprehension. This is passed to the os.path.join() function using * meaning the each element of the list is passed as a separate parameter.
I suggest you try adding print worms or print path to see how it works.
The result is that a string looking something like as follows is passed to the function to create your folder structure:
/user/scripts/worm1/worm2/worm3

how to skip the rest of a sequence

I have a couple of functions that are being called recursively inside nested loops. The ultimate objective of my program is to:
a) loop through each year,
b) within each each year, loop through each month (12 total),
c) within each month, loop through each day (using a self generated day counter),
d) and read 2 files and merge them together into a another file.
In each instance, I am going down into the directory only if exists. Otherwise, I'm to just skip it and go to the next one. My code does a pretty good job when all the files are present, but when one of the files is missing, I would like to just simply skip the whole process of creating a merged file and continue the loops. The problem I am getting is a syntax error that states that continue is not properly in the loop. I am only getting this error in the function definitions, and not outside of them.
Can someone explain why I'm getting this error?
import os, calendar
file01 = 'myfile1.txt'
file02 = 'myfile2.txt'
output = 'mybigfile.txt'
def main():
#ROOT DIRECTORY
top_path = r'C:\directory'
processTop(top_path)
def processTop(path):
year_list = ['2013', '2014', '2015']
for year in year_list:
year_path = os.path.join(path, year)
if not os.path.isdir(year_path):
continue
else:
for month in range(1, 13):
month_path = os.path.join(year_path, month)
if not os.path.isdir(month_path):
continue
else:
numDaysInMth = calendar.monthrange(int(year), month)[1]
for day in range(1, numDaysInMth+1):
processDay(day, month_path)
print('Done!')
def processDay(day, path):
day_path = os.path.join(path, day)
if not os.path.isdir(day_path):
continue
else:
createDailyFile(day_path, output)
def createDailyFile(path, dailyFile):
data01 = openFile(file01, path)
data02 = openFile(file02, path)
if len(data01) == 0 or len(data02) == 0:
# either file is missing
continue
else:
# merge the two datalists into a single list
# create a file with the merged list
pass
def openFile(filename, path):
# return a list of contents of filename
# returns an empty list if file is missing
pass
if __name__ == "__main__": main()
You can use continue only plainly inside a loop (otherwise, what guarantee you have that the function was called in a loop in the first place?) If you need stack unwinding, consider using exceptions (Python exception handling).
I think you can get away with having your functions return a value that would say if operation was completed successfully:
def processDay(day, path):
do_some_job()
if should_continue:
return False
return True
And then in your main code simply say
if not processDay(day, path):
continue
You are probably getting that error in processDay and createDailyFile, right? That's because there is no loop in these functions, and yet you use continue. I'd recommend using return or pass in them.
The continue statement only applies in loops as the error message implies if your functions are structured as you show you can just use pass.
continue can only appear in a loop since it tells python not to execute the lines below and go to the next iteration. Hence, this syntax here is not valid :
def processDay(day, path):
day_path = os.path.join(path, day)
if not os.path.isdir(day_path):
continue # <============ this continue is not inside a loop !
else:
createDailyFile(day_path, output)enter code here
Same for your createDailyFile function.
You may want to replace it with a return ?

Why does adding 1 print Kill my code (Python)?

I was playing with this sudoku solver, that I found.
Like quoted here it works perfect, but if I uncomment that single print a, that I commented out (line 13), then it stops before finding a full solution...?
import sys
from datetime import datetime # for datetime.now()
def same_row(i,j): return (i/9 == j/9)
def same_col(i,j): return (i-j) % 9 == 0
def same_block(i,j): return (i/27 == j/27 and i%9/3 == j%9/3)
def r(a):
i = a.find('.')
if i == -1: # All solved !
print a
else:
#print a
excluded_numbers = set()
for j in range(81):
if same_row(i,j) or same_col(i,j) or same_block(i,j):
excluded_numbers.add(a[j])
for m in '123456789':
if m not in excluded_numbers:
# At this point, m is not excluded by any row, column, or block, so let's place it and recurse
r(a[:i]+m+a[i+1:])
if __name__ == '__main__':
if len(sys.argv) == 2:
filI = open(sys.argv[1])
for pusI in filI:
pusI.strip()
print "pussle:\n",pusI
timStart = datetime.now()
r(pusI) # <- Calling the recursive solver ...
timEnd = datetime.now()
print "Duration (h:mm:ss.dddddd): "+str(timEnd-timStart)
else:
print str(len(sys.argv))
print 'Usage: python sudoku.py puzzle'
The program needs to be called with a file. That file should hold 1 sudoku per line.
For testing I used this:
25...1........8.6...3...4.1..48.6.9...9.4.8...1..29.4.9.53.7....6..5...7.........
QUESTION:
I can't understand how that single 'print a' manage to break the recursive loop, before it's done. Can anyone give an explanation?
Credit: I originally found the above sudoku solver code here:
http://www.scottkirkwood.com/2006/07/shortest-sudoku-solver-in-python.html
it's also shown here on StackOverflow:
Shortest Sudoku Solver in Python - How does it work?
It actually does find the solution.
I ran the program and get the solution
256491738471238569893765421534876192629143875718529643945387216162954387387612954
If you run with the uncommenting as you suggested and output that to a file:
python solver.py file.txt > output.txt
And search for the solution string, it is there. It's not the last line, for me it shows up 67% into the file.
The reason it does this is that the solver basically goes through a ton of combinations and it finds the solution but continues as long as there are any possible paths to go down to find a possible solution.

execute python script multiple times

Im not sure about the best way to do this but I have a python script saved as a .py. The final output of this script is two files x1.txt and y1.txt.
Basically I want to run this script say 1000 times and each run write my two text files with new names i.e x1.txt + y1.txt then second run x2.txt and y2.txt.
Thinking about this it seems it might be better to start the whole script with something like
runs=xrange(:999)
for i in runs:
##run the script
and then finish with something that does
for i in runs:
filnameA=prefix += "a"+i
open("filnamea.txt", "w").write('\n'.join('\t'.join(x for x in g if x) for g in grouper(7, values)))
for i in runs:
filnameB=prefix += "a"+i
open("filnameB.txt", "w").write('\n'.join('\t'.join(x for x in g if x) for g in grouper(7, values)))
Is this really the best way to do it? I bet its not..better ideas?
I know you can import time and write a filename that mathes time but this would be annoying for processing later.
If your computer has the resources to run these in parallel, you can use multiprocessing to do it. Otherwise use a loop to execute them sequentially.
Your question isn't quite explicit about which part you're stuck with. Do you just need advice about whether you should use a loop? If yes, my answer is above. Or do you also need help with forming the filenames? You can do that part like this:
import sys
def myscript(iteration_number):
xfile_name = "x%d.txt" % iteration_number
yfile_name = "y%d.txt" % iteration_number
with open(xfile_name, "w") as xf:
with open(yfile_name, "w") as yf:
... whatever your script does goes here
def main(unused_command_line_args):
for i in xrange(1000):
myscript(i)
return 0
if __name__ == '__main__':
sys.exit(main(sys.argv))
import subprocess
import sys
script_name = 'dummy_file.py'
output_prefix = 'out'
n_iter = 5
for i in range(n_iter):
output_file = output_prefix + '_' + str(i) + '.txt'
sys.stdout = open(output_file, 'w')
subprocess.call(['python', script_name], stdout=sys.stdout, stderr=subprocess.STDOUT)
On running this, you'll get 5 output text files (out_0.txt, ..., out_4.txt)
I'm not sure, but maybe, it can help:
Suppose, I want to print 'hello' 10 times, without manually writing it 10 times. For doing this, I can define a function :
#Function for printing hello 10 times:
def func(x):
x="hello"
i=1
while i<10 :
print(x)
i += 1
else :
print(x)
print(func(1))

Categories