Using shutil copy breaks loop over csv data

Using shutil copy breaks loop over csv data - python

I'm setting up a script that takes .jpg images from a folder named {number}.jpg and compares that number multiplied by the framerate to a range given by a csv file. The jpg is then copied into the same folder as the csv that contained the range it fit in.
So the csv data looks like:
477.01645354635303,1087.1628371628808
1191.5980219780615,1777.622457542435
1915.5956043956126,2525.6515684316387
2687.7457042956867,3299.803336663285
3429.317892107908,4053.6603896103848
4209.835924075932,4809.700129870082
(there are many files but this is one full example)
Each number would be compared to each of these ranges and placed in the corresponding folder. If I just print the target file and destination, everything works fine and the results are as expected. But if I try to use any of the shutil copy function (copy, copyfile, copy2) the loop is broken.
The file structure looks:
Data
|-Training
|--COMPRESSION (CPR)
|---COMPRESSION (CPR).csv
|---Where the image data would go
|--More folders..
|-Validation
|--Same as Training
|-Test
|--Same as Training
This is Python 3. I'm running VS Code on a Ubuntu (Pop!OS) machine. I've tried each of the different shutil copy functions that fit this case (copy, copy2, copyfile). I've tried copying to different folders and that works. If I copy the files to the parent folder (i.e. Training in the above hierarchy), instead of the sub-directories, it works fine. However I need them in the subdirectory for labeling purposes.
for cur in file_list:
with open(cur, 'r') as img:
filename = ntpath.basename(cur)
frame_num = int(filename[:-4]) # get number from filename
frame_num = (frame_num - 1) * (30000./1001.) # it's one second from each frame in a video
training = get_folders(train_path)
for folder in training:
train_csvfile = get_files(train_path + folder)
if len(train_csvfile) > 0:
with open(train_csvfile[0], 'r', encoding='latin-1', newline='') as source:
train_reader = csv.reader(source, delimiter = ',')
for trdata in train_reader:
if frame_num > float(trdata[0]) and frame_num < float(trdata[1]):
tr_path = os.path.join(train_path + folder, ntpath.basename(cur))
copy2(cur,tr_path)
print('Copied {} to training folder {}.'.format(filename, tr_path))
Code for getting the files and folders:
def get_folders(a_dir):
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
def get_files(a_dir):
a_dir = Path(a_dir)
return [f for f in a_dir.glob('**/*') if f.is_file()]
file_list = get_files('/media/username/Seagate Expansion Drive/EXP 3/S1 C2/frames')
The full output is:
Copied 000017.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000017.jpg.
Copied 000018.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000018.jpg.
Copied 000019.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000019.jpg.
Copied 000021.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000021.jpg.
Traceback (most recent call last):
File "tfinput.py", line 39, in <module>
for trdata in train_reader:
_csv.Error: line contains NULL byte
The files are correctly copied as said (but ONLY those four out of hundreds)
The csv files are not altered at all in this script. The script gets through four images and crashes with the above error. It correctly places these four images. If I try to run the script again without regenerating the data, it crashes immediately. However, if I don't use the copy function, everything works fine and all of the correct input and output directories are given in my print statements. The script can also be rerun without regeneration when there is no copy statement. This makes me think there must be some kind of overwrite issue but since I don't actually edit the csv files I can't put my finger on it.
I expect that it should simply copy the files from the source to destination.
EDIT: I went ahead and printed the whole file it gets stuck on. And what it seems to do is read the first line and then get crash. I tested this on another file and confirmed it just copies the files within the first range and then crashes
EDIT 2: I was able to get this working by using a try-except block on the block starting with for trdata in train_reader: however it skipped a lot of entries
EDIT 3: For those curious, I never figured out the issue although I suspect it was an overwrite issue, as checking for NULL values without the copy statement came up with nothing. I refactored the code where I first created a text file of the folder and file name and then read that file and copied the files. That worked perfect.
Thank you for any help!!

I don't think it's a problem with the copy. From the error message it looks like there's NULL byte in the CSV file that is being read. Write some print statements and observe that file.
You may find this helpful. "Line contains NULL byte" in CSV reader (Python)

Related

Run only if "if " statement is true.!

So I've a question, Like I'm reading the fits file and then i'm using the information from the header of the fits to define the other files which are related to the original fits file. But for some of the fits file, the other files (blaze_file, bis_file, ccf_table) are not available. And because of that my code gives the pretty obvious error that No Such file or directory.
import pandas as pd
import sys, os
import numpy as np
from glob import glob
from astropy.io import fits
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
e2ds_hdu = fits.open(filename)
e2ds_header = e2ds_hdu[0].header
date = e2ds_header['DATE-OBS']
date2 = date = date[0:19]
blaze_file = e2ds_header['HIERARCH ESO DRS BLAZE FILE']
bis_file = glob('HARPS.' + date2 + '*_bis_G2_A.fits')
ccf_table = glob('HARPS.' + date2 + '*_ccf_G2_A.tbl')
if not all(file in os.listdir(PATH) for file in [blaze_file,bis_file,ccf_table]):
continue
So what i want to do is like, i want to make my code run only if all the files are available otherwise don't. But the problem is that, i'm defining the other files as variable inside the for loop as i'm using the header information. So how can i define them before the for loop???? and then use something like
So can anyone help me out of this?

The filenames returned by os.listdir() are always relative to the path given there.
In order to be used, they have to be joined with this path.
Example:
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
filepath = os.path.join(PATH, filename)
e2ds_hdu = fits.open(filepath)
…
Let the filenames be ['a', 'b', 'a_ed2ds_A.fits', 'b_ed2ds_A.fits']. The code now excludes the two first names and then prepends the file path to the remaining two.
a_ed2ds_A.fits becomes /home/Desktop/2d_spectra/a_ed2ds_A.fits and
b_ed2ds_A.fits becomes /home/Desktop/2d_spectra/b_ed2ds_A.fits.
Now they can be accessed from everywhere, not just from the given file path.
I should become accustomed to reading a question in full before trying to answer it.
The problem I mentionned is a problem if you don't start the script from any path outside the said directory. Nevertheless, applying it will make your code much more consistent.
Your real problem, however, lies somewhere else: you examine a file and then, after checking its contents, want to read files whose names depend on informations from that first file.
There are several ways to accomplish your goal:
Just extend your loop with the proper tests.
Pseudo code:
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if all files exist:
proceed
or
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if not all files exist:
continue # actual keyword, no pseudo code!
proceed
Put some functionality into functions (variation of 1.)
Create a loop in a generator function which yields the "interesting information" of one fits file (or alternatively nothing) and have another loop run over them to actually work with the data.
If I am still missing some points or am not detailled enough, please let me know.

Since you have to read the fits file to know the other dependant files names, there's no way you can avoid reading the fit file first. The only thing you can do is test for the dependant files existance before trying to read them and skip the rest of the loop (using continue) if not.

Edit this line
e2ds_hdu = fits.open(filename)
And replace with
e2ds_hdu = fits.open(os.path.join(PATH, filename))

Iterating through subdirectories to add unique strings to each file

My goal: To build a program that:
Opens a folder (provided by the user) from the user's computer
Iterates through that folder, opening each document in each subdirectory (named according to language codes; "AR," "EN," "ES," etc.)
Substitutes a string in for another string in each document. Crucially, the new string will change with each document (though the old string will not), according to the language code in the folder name.
My level of experience: Minimal; been learning python for a few months but this is the first program I'm building that's not paint-by-numbers. I'm building it to make a process at work faster. I'm sure I'm not building this as efficiently as possible; I've been throwing it together from my own knowledge and from reading stackexchange religiously while building it.
Research I've done on my own: I've been living in stackexchange the past few days, but I haven't found anyone doing quite what I'm doing (which was very surprising to me). I'm not sure if this is just because I lack the vocabulary to search (tried out a lot of search terms, but none of them totally match what I'm doing) or if this is just the wrong way of going about things.
The issue I'm running into:
I'm getting this error:
Traceback (most recent call last):
File "test5.py", line 52, in <module>
for f in os.listdir(src_dir):
OSError: [Errno 20] Not a directory: 'ExploringEduTubingEN(1).txt'
I'm not sure how to iterate through every file in the subdirectories and update a string within each file (not the file names) with a new and unique string. I thought I had it, but this error has totally thrown me off. Prior to this, I was getting an error for the same line that said "Not a file or directory: 'ExploringEduTubingEN(1).txt'" and it's surprising to me that the first error could request a file or a directory, and once I fixed that, it asked for just a directory; seems like it should've just asked for a directory at the beginning.
With no further ado, the code (placing at bottom because it's long to include context):
import os
ex=raw_input("Please provide an example PDF that we'll append a language code to. ")
#Asking for a PDF to which we'll iteratively append the language codes from below.
lst = ['_ar.pdf', '_cs.pdf', '_de.pdf', '_el.pdf', '_en_gb.pdf', '_es.pdf', '_es_419.pdf',
'_fr.pdf', '_id.pdf', '_it.pdf', '_ja.pdf', '_ko.pdf', '_nl.pdf', '_pl.pdf', '_pt_br.pdf', '_pt_pt.pdf', '_ro.pdf', '_ru.pdf',
'_sv.pdf', '_th.pdf', '_tr.pdf', '_vi.pdf', '_zh_tw.pdf', '_vn.pdf', '_zh_cn.pdf']
#list of language code PDF appending strings.
pdf_list=open('pdflist.txt','w+')
#creating a document to put this group of PDF filepaths in.
pdf2='pdflist.txt'
#making this an actual variable.
for word in lst:
pdf_list.write(ex + word + "\n")
#creating a version of the PDF example for every item in the language list, and then appending the language codes.
pdf_list.seek(0)
langlist=pdf_list.readlines()
#creating a list of the PDF paths so that I can use it below.
for i in langlist:
i=i.rstrip("\n")
#removing the line breaks.
pdf_list.close()
#closing the file after removing the line breaks.
file1=raw_input("Please provide the full filepath of the folder you'd like to convert. ")
#the folder provided by the user to iterate through.
folder1=os.listdir(file1)
#creating a list of the files within the folder
pdfpath1="example.pdf"
langfile="example2.pdf"
#setting variables for below
#my thought here is that i'd need to make the variable the initial folder, then make it a list, then iterate through the list.
for ogfile in folder1:
#want to iterate through all the files in the directory, including in subdirectories
src_dir=ogfile.split("/",6)
src_dir="/".join(src_dir[:6])
#goal here is to cut off the language code folder name and then join it again, w/o language code.
for f in os.listdir(src_dir):
f = os.path.join(src_dir, f)
#i admit this got a little convoluted–i'm trying to make sure the files put the right code in, I.E. that the document from the folder ending in "AR" gets the PDF that will now end in "AR"
#the perils of pulling from lots of different questions in stackexchange
with open(ogfile, 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
for langfile in langlist:
f.write(content.replace(pdfpath1, langfile))
#replacing the placeholder PDF link with the created PDF links from the beginning of the code
If you read this far, thanks. I've tried to provide as much information as possible, especially about my thought process. I'll keep trying things and reading, but I'd love to have more eyes on it.

You have to specify the full path to your directories/files. Use os.path.join to create a valid path to your file or directory (and platform-independent).
For replacing your string, simply modify your example string using the subfolder name. Assuming that ex as the format filename.pdf, you could use: newstring = ex[:-4] + '_' + str.lower(subfolder) + '.pdf'. That way, you do not have to specify the list of replacement strings nor loop through this list.
Solution
To iterate over your directory and replace the content of your files as you'd like, you can do the following:
# Get the name of the file: "example.pdf" (note the .pdf is assumed here)
ex=raw_input("Please provide an example PDF that we'll append a language code to. ")
# Get the folder to go through
folderpath=raw_input("Please provide the full filepath of the folder you'd like to convert. ")
# Get all subfolders and go through them (named: 'AR', 'DE', etc.)
subfolders=os.listdir(folderpath)
for subfolder in subfolders:
# Get the full path to the subfolder
fullsubfolder = os.path.join(folderpath,subfolder)
# If it is a directory, go through it
if os.path.isdir(fullsubfolder):
# Find all files in subdirectory and go through each of them
files = os.listdir(fullsubfolder)
for filename in files:
# Get full path to the file
fullfile = os.path.join(fullsubfolder, filename)
# If it is a file, process it (note: we do not check if it is a text file here)
if os.path.isfile(fullfile):
with open(fullfile, 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
# Create the replacing string based on the subdirectory name. Ex: 'example_ar.pdf'
newstring = ex[:-4] + '_' + str.lower(subfolder) + '.pdf'
f.write(content.replace(ex, newstring))
Note
Instead of asking the user to find write the folder, you could ask him to open the directory with a dialog box. See this question for more info: Use GUI to open directory in Python 3

How to change the output file name in python in batch process?

I am running a model in batchess. I have included my code below.
I have 1000+ data files. Each time when I run the script I get the output for each input file, however the name of each output for every input comes as OUTPUT.0001.nc, where nc is an extension and 0001 is the number of iteration. If I will keep number of iteration 3, I will get output files as OUTPUT.0001.nc, OUTPUT.0002.nc, OUTPUT.0003.nc
Intially I wrote the script for quite lesser number of grids and did some manual stuff to analyse the result, but now I have to do the same things for 1000+ grids.
For lesser number of grids (say 12), I was running the code, saving the results of each file in a new folder having the same name as input file, and renaming them in last after deleting all the iteration results but last one.
However, since now the number is large, I need to change the output file name (same as input) through script after deleting all the iteration except the last one.
How to do this?
input file = 1.dat output file= 1.nc from OUTPUT.0005.nc
OUTPUT.0000.nc such name are created because of the model's code, I can't alter this.
#!/usr/bin/python
import numpy as np
import os
import shutil
import glob
# Name of the current working directory
current_directory=os.getcwd()
# Loop for creating the file name
for point in np.arange(1,101):
x=point
fname=str(x)+".dat"
if os.path.exists(fname):
command="./driver.exe"+" "+str(fname)
# Invoke command in the terminal
os.system(command)
# create directories
directory=str(x)
os.makedirs(directory)
# path of created directory
destination = os.path.abspath(directory)
# path of existing file
source = os.path.abspath(fname)
# Move all files with ".nc" extension to relevant directory
files = glob.iglob(os.path.join(current_directory, "*.nc"))
for file in files:
if os.path.isfile(file):
shutil.move(file, destination)
# Move file which is used for execution in relevant directroy
shutil.move(source, destination)
What I want to do is, run the model in batch
Loop-1 will take input file 1.dat
1. delete all the iteration output except the last one
2. Change the name of the output as per the input file name 1.nc
Loop-2 will take input file 2.dat

Python - [Errno 2] No such file or directory,

I am trying to make a minor modification to a python script made by my predecessor and I have bumped into a problem. I have studied programming, but coding is not my profession.
The python script processes SQL queries and writes them to an excel file, there is a folder where all the queries are kept in .txt format. The script creates a list of the queries found in the folder and goes through them one by one in a for cycle.
My problem is if I want to rename or add a query in the folder, I get a "[Errno 2] No such file or directory" error. The script uses relative path so I am puzzled why does it keep making errors for non-existing files.
queries_pathNIC = "./queriesNIC/"
def queriesDirer():
global filelist
l = 0
filelist = []
for file in os.listdir(queries_pathNIC):
if file.endswith(".txt"):
l+=1
filelist.append(file)
return(l)
Where the problem arises in the main function:
for round in range(0,queriesDirer()):
print ("\nQuery :",filelist[round])
file_query = open(queries_pathNIC+filelist[round],'r'); # problem on this line
file_query = str(file_query.read())
Contents of queriesNIC folder
00_1_Hardware_WelcomeNew.txt
00_2_Software_WelcomeNew.txt
00_3_Software_WelcomeNew_AUTORENEW.txt
The scripts runs without a problem, but if I change the first query name to
"00_1_Hardware_WelcomeNew_sth.txt" or anything different, I get the following error message:
FileNotFoundError: [Errno 2] No such file or directory: './queriesNIC/00_1_Hardware_WelcomeNew.txt'
I have also tried adding new text files to the folder (example: "00_1_Hardware_Other.txt") and the script simply skips processing the ones I added altogether and only goes with the original files.
I am using Python 3.4.
Does anyone have any suggestions what might be the problem?
Thank you

The following approach would be an improvement. The glob module can produce a list of files ending with .txt quite easily without needing to create a list.
import glob, os
queries_pathNIC = "./queriesNIC/"
def queriesDirer(directory):
return glob.glob(os.path.join(directory, "*.txt"))
for file_name in queriesDirer(queries_pathNIC):
print ("Query :", file_name)
with open(file_name, 'r') as f_query:
file_query = f_query.read()
From the sample you have given, it is not clear if you need further access to the round variable or the file list.

taking data from files which are in folder

How do I get the data from multiple txt files that placed in a specific folder. I started with this could not fix. It gives an error like 'No such file or directory: '.idea' (??)
(Let's say I have an A folder and in that, there are x.txt, y.txt, z.txt and so on. I am trying to get and print the information from all the files x,y,z)
def find_get(folder):
for file in os.listdir(folder):
f = open(file, 'r')
for data in open(file, 'r'):
print data
find_get('filex')
Thanks.

If you just want to print each line:
import glob
import os
def find_get(path):
for f in glob.glob(os.path.join(path,"*.txt")):
with open(os.path.join(path, f)) as data:
for line in data:
print(line)
glob will find only your .txt files in the specified path.
Your error comes from not joining the path to the filename, unless the file was in the same directory you were running the code from python would not be able to find the file without the full path. Another issue is you seem to have a directory .idea which would also give you an error when trying to open it as a file. This also presumes you actually have permissions to read the files in the directory.
If your files were larger I would avoid reading all into memory and/or storing the full content.

First of all make sure you add the folder name to the file name, so you can find the file relative to where the script is executed.
To do so you want to use os.path.join, which as it's name suggests - joins paths. So, using a generator:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield f.read()
# this consumes the generator to a list
files_data = list(find_get('filex'))
See what we got in the list that consumed the generator:
print files_data
It may be more convenient to produce tuples which can be used to construct a dict:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield (relative_file_path, f.read(), )
# this consumes the generator to a list
files_data = dict(find_get('filex'))
You will now have a mapping from the file's name to it's content.
Also, take a look at the answer by #Padraic Cunningham . He brought up the glob module which is suitable in this case.

The error you're facing is simple: listdir returns filenames, not full pathnames. To turn them into pathnames you can access from your current working directory, you have to join them to the directory path:
for filename in os.listdir(directory):
pathname = os.path.join(directory, filename)
with open(pathname) as f:
# do stuff
So, in your case, there's a file named .idea in the folder directory, but you're trying to open a file named .idea in the current working directory, and there is no such file.
There are at least four other potential problems with your code that you also need to think about and possibly fix after this one:
You don't handle errors. There are many very common reasons you may not be able to open and read a file--it may be a directory, you may not have read access, it may be exclusively locked, it may have been moved since your listdir, etc. And those aren't logic errors in your code or user errors in specifying the wrong directory, they're part of the normal flow of events, so your code should handle them, not just die. Which means you need a try statement.
You don't do anything with the files but print out every line. Basically, this is like running cat folder/* from the shell. Is that what you want? If not, you have to figure out what you want and write the corresponding code.
You open the same file twice in a row, without closing in between. At best this is wasteful, at worst it will mean your code doesn't run on any system where opens are exclusive by default. (Are there such systems? Unless you know the answer to that is "no", you should assume there are.)
You don't close your files. Sure, the garbage collector will get to them eventually--and if you're using CPython and know how it works, you can even prove the maximum number of open file handles that your code can accumulate is fixed and pretty small. But why rely on that? Just use a with statement, or call close.
However, none of those problems are related to your current error. So, while you have to fix them too, don't expect fixing one of them to make the first problem go away.

Full variant:
import os
def find_get(path):
files = {}
for file in os.listdir(path):
if os.path.isfile(os.path.join(path,file)):
with open(os.path.join(path,file), "r") as data:
files[file] = data.read()
return files
print(find_get("filex"))
Output:
{'1.txt': 'dsad', '2.txt': 'fsdfs'}
After the you could generate one file from that content, etc.
Key-thing:
os.listdir return a list of files without full path, so you need to concatenate initial path with fount item to operate.
there could be ideally used dicts :)
os.listdir return files and folders, so you need to check if list item is really file

You should check if the file is actually file and not a folder, since you can't open folders for reading. Also, you can't just open a relative path file, since it is under a folder, so you should get the correct path with os.path.join. Check below:
import os
def find_get(folder):
for file in os.listdir(folder):
if not os.path.isfile(file):
continue # skip other directories
f = open(os.path.join(folder, file), 'r')
for line in f:
print line

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.