I am working on this project that involves downloading a daily .csv. I have successfully written the code to download the .csv file through selenium. however, I am having trouble changing the directory when running the entire code.
The code in question is as follows:
download_purchases = driver.find_element_by_xpath('/html/body/div[1]/div/div/div/div[3]/div/div[2]')
download_purchases.click()
fp = os.path.expanduser('~')+'/Desktop/Export_Purchasing/CSV/'
os.chdir(fp)
files = [f for f in os.listdir(fp)]
When I run the whole syntax up unto this point, the files list comprehension produces an empty list. However, when i re run it (after having tried to run the whole code from the start), the list comprehension is able to detect the download .csv.
How can i make it so that the files are detected on the first pass? I tried quitting the driver with:
driver.quit()
but this didn't fix the problem.
It looks like the files are probably not downloaded by the time are hitting your last line: files = [f for f in os.listdir(fp)]. To test this you can add in a sleep like:
import time
download_purchases = driver.find_element_by_xpath('/html/body/div[1]/div/div/div/div[3]/div/div[2]')
download_purchases.click()
fp = os.path.expanduser('~')+'/Desktop/Export_Purchasing/CSV/'
os.chdir(fp)
print("Printed immediately.")
time.sleep(10)
files = [f for f in os.listdir(fp)]
If that works then you know that it is merely a timing issue and you can employ a more sophisticated solution to continue after the download is complete.
Related
I'm trying to use a while loop to loop through .xml files in a folder. However, files are being added to the folder whilst the while loop is running. This is a shortened version of the code I currently use:
import os
my_folder = "d:\\xml\\"
while True:
files = [f for f in os.listdir(my_folder) if f.endswith(".xml")]
while files:
for file in files:
# do whatever with the xml file
os.remove(my_folder + file)
files = [f for f in os.listdir(my_folder) if f.endswith(".xml")]
What I would like to do is tidy up the code by only having one line filling the files list. I'd like to have something like:
while files = [f for f in os.listdir(my_folder) if f.endswith(".xml")]:
But, I know this won't work. I would imagine that Python is capable of this, but I don't know the correct syntax.
Added note:
I'm using Windows 10 with Python 3.7.6
You could simplify your code by removing the inner while loop and the second assignment to files. This will loop indefinitely, see if there are xml files in the directory, and if so process and delete them, before continuing to loop. (You might also add a short sleep in case of no new files.)
while True:
files = [f for f in os.listdir(my_folder) if f.endswith(".xml")]
for file in files:
# do whatever with the xml file
os.remove(my_folder + file)
As shown in the other answer, you could also use the := operator and something like the following...
while True:
while (files := [...]):
...
... but this would behave exactly the same as without the inner while. Only if you e.g. want to do something when there are temporarily no files left, i.e. have code in the outer loop that's not in the inner loop, this may make a difference.
I'm setting up a script that takes .jpg images from a folder named {number}.jpg and compares that number multiplied by the framerate to a range given by a csv file. The jpg is then copied into the same folder as the csv that contained the range it fit in.
So the csv data looks like:
477.01645354635303,1087.1628371628808
1191.5980219780615,1777.622457542435
1915.5956043956126,2525.6515684316387
2687.7457042956867,3299.803336663285
3429.317892107908,4053.6603896103848
4209.835924075932,4809.700129870082
(there are many files but this is one full example)
Each number would be compared to each of these ranges and placed in the corresponding folder. If I just print the target file and destination, everything works fine and the results are as expected. But if I try to use any of the shutil copy function (copy, copyfile, copy2) the loop is broken.
The file structure looks:
Data
|-Training
|--COMPRESSION (CPR)
|---COMPRESSION (CPR).csv
|---Where the image data would go
|--More folders..
|-Validation
|--Same as Training
|-Test
|--Same as Training
This is Python 3. I'm running VS Code on a Ubuntu (Pop!OS) machine. I've tried each of the different shutil copy functions that fit this case (copy, copy2, copyfile). I've tried copying to different folders and that works. If I copy the files to the parent folder (i.e. Training in the above hierarchy), instead of the sub-directories, it works fine. However I need them in the subdirectory for labeling purposes.
for cur in file_list:
with open(cur, 'r') as img:
filename = ntpath.basename(cur)
frame_num = int(filename[:-4]) # get number from filename
frame_num = (frame_num - 1) * (30000./1001.) # it's one second from each frame in a video
training = get_folders(train_path)
for folder in training:
train_csvfile = get_files(train_path + folder)
if len(train_csvfile) > 0:
with open(train_csvfile[0], 'r', encoding='latin-1', newline='') as source:
train_reader = csv.reader(source, delimiter = ',')
for trdata in train_reader:
if frame_num > float(trdata[0]) and frame_num < float(trdata[1]):
tr_path = os.path.join(train_path + folder, ntpath.basename(cur))
copy2(cur,tr_path)
print('Copied {} to training folder {}.'.format(filename, tr_path))
Code for getting the files and folders:
def get_folders(a_dir):
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
def get_files(a_dir):
a_dir = Path(a_dir)
return [f for f in a_dir.glob('**/*') if f.is_file()]
file_list = get_files('/media/username/Seagate Expansion Drive/EXP 3/S1 C2/frames')
The full output is:
Copied 000017.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000017.jpg.
Copied 000018.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000018.jpg.
Copied 000019.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000019.jpg.
Copied 000021.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000021.jpg.
Traceback (most recent call last):
File "tfinput.py", line 39, in <module>
for trdata in train_reader:
_csv.Error: line contains NULL byte
The files are correctly copied as said (but ONLY those four out of hundreds)
The csv files are not altered at all in this script. The script gets through four images and crashes with the above error. It correctly places these four images. If I try to run the script again without regenerating the data, it crashes immediately. However, if I don't use the copy function, everything works fine and all of the correct input and output directories are given in my print statements. The script can also be rerun without regeneration when there is no copy statement. This makes me think there must be some kind of overwrite issue but since I don't actually edit the csv files I can't put my finger on it.
I expect that it should simply copy the files from the source to destination.
EDIT: I went ahead and printed the whole file it gets stuck on. And what it seems to do is read the first line and then get crash. I tested this on another file and confirmed it just copies the files within the first range and then crashes
EDIT 2: I was able to get this working by using a try-except block on the block starting with for trdata in train_reader: however it skipped a lot of entries
EDIT 3: For those curious, I never figured out the issue although I suspect it was an overwrite issue, as checking for NULL values without the copy statement came up with nothing. I refactored the code where I first created a text file of the folder and file name and then read that file and copied the files. That worked perfect.
Thank you for any help!!
I don't think it's a problem with the copy. From the error message it looks like there's NULL byte in the CSV file that is being read. Write some print statements and observe that file.
You may find this helpful. "Line contains NULL byte" in CSV reader (Python)
I have a python script that reads a file and extracts certain data from it, then stores the data in a dictionary, and finally inserts that into a mysql table. The first line of my code is where I enter in my filename:
filename = "examplelogfile"
which is a text file, and is later referred to as 'filename'. But I have to run this code on almost a thousand different files of this type, and I've moved all of these files to a certain server (I'm not running the script in terminal since I need to run it on the same server where the mysql database is located). What's the easiest way to run my code on all of the files?
Edit: I have tried this but it isn't giving me any output (it is supposed to print all the dictionaries):
import glob
files = glob.glob('~/Desktop/pythoncode/*logfile')
for filename in files:
rest of code
print dict
Try the glob module to get all files like this:
import glob
path = 'C:/Users/telli/Desktop/Test Shapes/Shapes/Squares'
filenames = glob.glob(path + '/*.gif')
for filename in filenames:
# Do something
I am trying to make a minor modification to a python script made by my predecessor and I have bumped into a problem. I have studied programming, but coding is not my profession.
The python script processes SQL queries and writes them to an excel file, there is a folder where all the queries are kept in .txt format. The script creates a list of the queries found in the folder and goes through them one by one in a for cycle.
My problem is if I want to rename or add a query in the folder, I get a "[Errno 2] No such file or directory" error. The script uses relative path so I am puzzled why does it keep making errors for non-existing files.
queries_pathNIC = "./queriesNIC/"
def queriesDirer():
global filelist
l = 0
filelist = []
for file in os.listdir(queries_pathNIC):
if file.endswith(".txt"):
l+=1
filelist.append(file)
return(l)
Where the problem arises in the main function:
for round in range(0,queriesDirer()):
print ("\nQuery :",filelist[round])
file_query = open(queries_pathNIC+filelist[round],'r'); # problem on this line
file_query = str(file_query.read())
Contents of queriesNIC folder
00_1_Hardware_WelcomeNew.txt
00_2_Software_WelcomeNew.txt
00_3_Software_WelcomeNew_AUTORENEW.txt
The scripts runs without a problem, but if I change the first query name to
"00_1_Hardware_WelcomeNew_sth.txt" or anything different, I get the following error message:
FileNotFoundError: [Errno 2] No such file or directory: './queriesNIC/00_1_Hardware_WelcomeNew.txt'
I have also tried adding new text files to the folder (example: "00_1_Hardware_Other.txt") and the script simply skips processing the ones I added altogether and only goes with the original files.
I am using Python 3.4.
Does anyone have any suggestions what might be the problem?
Thank you
The following approach would be an improvement. The glob module can produce a list of files ending with .txt quite easily without needing to create a list.
import glob, os
queries_pathNIC = "./queriesNIC/"
def queriesDirer(directory):
return glob.glob(os.path.join(directory, "*.txt"))
for file_name in queriesDirer(queries_pathNIC):
print ("Query :", file_name)
with open(file_name, 'r') as f_query:
file_query = f_query.read()
From the sample you have given, it is not clear if you need further access to the round variable or the file list.
I'm downloading Excel files from a website using beautifulsoup4.
I only need to download the files. I don't need to rename them, just download them to a folder, relative to where the code is.
the function takes in a beautifulsoup call, searches for <a> then makes a call to the links.
def save_excel_files(sfile):
print("starting")
for link in sfile.find_all("a"):
candidate_link = link.get("href")
if (candidate_link is not None
and "Flat.File" in candidate_link):
xfile = requests.get(candidate_link)
if xfile:
### I just don't know what to do...
I've tried using os.path ; with open("xtest", "wb") as f: and many other variations. Been at this for two evenings and totally stuck.
The first issue is that I can't even get the files to downlaod and save anywhere. xfile resolves to [response 200], so that part is working, I'm just having a hard time coding the actual download and save.
Something like this should've worked :
xfile = requests.get(candidate_link)
file_name = candidate_link.split('/')[-1]
if xfile:
with open(file_name, "wb") as f:
f.write(xfile.content)
Tested with the following link I found randomly in google :
candidate_link = "http://berkeleycollege.edu/browser_check/samples/excel.xls"