How to change the output file name in python in batch process? - python

I am running a model in batchess. I have included my code below.
I have 1000+ data files. Each time when I run the script I get the output for each input file, however the name of each output for every input comes as OUTPUT.0001.nc, where nc is an extension and 0001 is the number of iteration. If I will keep number of iteration 3, I will get output files as OUTPUT.0001.nc, OUTPUT.0002.nc, OUTPUT.0003.nc
Intially I wrote the script for quite lesser number of grids and did some manual stuff to analyse the result, but now I have to do the same things for 1000+ grids.
For lesser number of grids (say 12), I was running the code, saving the results of each file in a new folder having the same name as input file, and renaming them in last after deleting all the iteration results but last one.
However, since now the number is large, I need to change the output file name (same as input) through script after deleting all the iteration except the last one.
How to do this?
input file = 1.dat output file= 1.nc from OUTPUT.0005.nc
OUTPUT.0000.nc such name are created because of the model's code, I can't alter this.
#!/usr/bin/python
import numpy as np
import os
import shutil
import glob
# Name of the current working directory
current_directory=os.getcwd()
# Loop for creating the file name
for point in np.arange(1,101):
x=point
fname=str(x)+".dat"
if os.path.exists(fname):
command="./driver.exe"+" "+str(fname)
# Invoke command in the terminal
os.system(command)
# create directories
directory=str(x)
os.makedirs(directory)
# path of created directory
destination = os.path.abspath(directory)
# path of existing file
source = os.path.abspath(fname)
# Move all files with ".nc" extension to relevant directory
files = glob.iglob(os.path.join(current_directory, "*.nc"))
for file in files:
if os.path.isfile(file):
shutil.move(file, destination)
# Move file which is used for execution in relevant directroy
shutil.move(source, destination)
What I want to do is, run the model in batch
Loop-1 will take input file 1.dat
1. delete all the iteration output except the last one
2. Change the name of the output as per the input file name 1.nc
Loop-2 will take input file 2.dat

Related

Program to check all last modified files in a folder using python?

import glob
import os
import time
Path = 'Aabmatica/*'#Folder path
list_of_files = glob.glob(Path) # * Name of the folder in which all files exist
latest_file = max(list_of_files, key=os.path.getmtime)
print()
print("last modified/added file is:",end=" ")
print(latest_file)
print()
modification_time = os.path.getmtime(latest_file)
local_time = time.ctime(modification_time)
print("modified time: ",local_time)
I made a python program which gives me name of last modified file in a folder:-
This program is running well but the problem is that if I place new file or if I am editing any file in a folder than it is giving me correct output but if I am copying any file into the folder than I am not getting any output.
And how can I show all the last modified files from folder using this program.
5.So there is basically two problem with this program if I am copying any file into folder than I am not getting the file name and I am unable to show all the last modified files from folder.
In Windows a copy of a file probably has a new creation time. You can look at os.path.getctime() to check the creation time for the copy of the file.
If that works as expected then you could include os.path.getctime() as an additional check in the key to max().
def latest_change(filename):
return max(os.path.getmtime(filename), os.path.getctime(filename))
latest_file = max(list_of_files, key=latest_change)
The key function just grabs whichever of the modification or creation time is greatest, and then uses that greatest time as the key.

Operating within the file list from sub-directories in Python

I have been trying to operate within individual filename contained in a list of file. Each file contains a two-column data which I want to integrate and get a value. Main folder (Main directory) contains multiple subfolders (sub-directories) named after each day. All files are in '.csv' format so I don't have to worry about selecting the specific file formats. I am trying to read the contents of each file within the sub-directories with the following code.
file_path = r"C:\Users\......\Totalfiles"
read_files = glob.glob(os.path.join(file_path,"*.csv"))
np_array_values = []
for (root, dirs, files) in os.walk(file_path):
for filenames in files:
values = pd.read_csv(filenames, encoding='utf-8', header=0)
np_array_values.append(values)
print(filenames)
There are multiple errors here.
If I skip the second for loop, only a list containing 8 file gets printed as follows:
2019-10-12-18-28.csv
2019-10-12-18-28.csv
2019-10-12-18-28.csv
2019-10-12-18-28.csv
2019-10-12-18-28.csv
2019-10-12-18-28.csv
2019-10-12-18-28.csv
2019-10-12-18-28.csv
But if the for loop is used then the total list of 100+ files is generated. But the error is generated from the 6th line. when I try to assign a variable called values, then this error is displayed:
File b'2019-08-10_05-58.csv' does not exist: b'2019-08-10_05-58.csv'
This program runs totally fine for a directory if I don't use the first for loop. I couldn't find examples with this kind of problem (probably due to incorrect keywords used). I assume this would be useful for everyone working with loads of measurement files. Help on this would be much appreciated.

Using shutil copy breaks loop over csv data

I'm setting up a script that takes .jpg images from a folder named {number}.jpg and compares that number multiplied by the framerate to a range given by a csv file. The jpg is then copied into the same folder as the csv that contained the range it fit in.
So the csv data looks like:
477.01645354635303,1087.1628371628808
1191.5980219780615,1777.622457542435
1915.5956043956126,2525.6515684316387
2687.7457042956867,3299.803336663285
3429.317892107908,4053.6603896103848
4209.835924075932,4809.700129870082
(there are many files but this is one full example)
Each number would be compared to each of these ranges and placed in the corresponding folder. If I just print the target file and destination, everything works fine and the results are as expected. But if I try to use any of the shutil copy function (copy, copyfile, copy2) the loop is broken.
The file structure looks:
Data
|-Training
|--COMPRESSION (CPR)
|---COMPRESSION (CPR).csv
|---Where the image data would go
|--More folders..
|-Validation
|--Same as Training
|-Test
|--Same as Training
This is Python 3. I'm running VS Code on a Ubuntu (Pop!OS) machine. I've tried each of the different shutil copy functions that fit this case (copy, copy2, copyfile). I've tried copying to different folders and that works. If I copy the files to the parent folder (i.e. Training in the above hierarchy), instead of the sub-directories, it works fine. However I need them in the subdirectory for labeling purposes.
for cur in file_list:
with open(cur, 'r') as img:
filename = ntpath.basename(cur)
frame_num = int(filename[:-4]) # get number from filename
frame_num = (frame_num - 1) * (30000./1001.) # it's one second from each frame in a video
training = get_folders(train_path)
for folder in training:
train_csvfile = get_files(train_path + folder)
if len(train_csvfile) > 0:
with open(train_csvfile[0], 'r', encoding='latin-1', newline='') as source:
train_reader = csv.reader(source, delimiter = ',')
for trdata in train_reader:
if frame_num > float(trdata[0]) and frame_num < float(trdata[1]):
tr_path = os.path.join(train_path + folder, ntpath.basename(cur))
copy2(cur,tr_path)
print('Copied {} to training folder {}.'.format(filename, tr_path))
Code for getting the files and folders:
def get_folders(a_dir):
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
def get_files(a_dir):
a_dir = Path(a_dir)
return [f for f in a_dir.glob('**/*') if f.is_file()]
file_list = get_files('/media/username/Seagate Expansion Drive/EXP 3/S1 C2/frames')
The full output is:
Copied 000017.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000017.jpg.
Copied 000018.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000018.jpg.
Copied 000019.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000019.jpg.
Copied 000021.jpg to training folder /home/username/Downloads/Event Data CSV/Data/Training/CPR (COMPRESSION)/000021.jpg.
Traceback (most recent call last):
File "tfinput.py", line 39, in <module>
for trdata in train_reader:
_csv.Error: line contains NULL byte
The files are correctly copied as said (but ONLY those four out of hundreds)
The csv files are not altered at all in this script. The script gets through four images and crashes with the above error. It correctly places these four images. If I try to run the script again without regenerating the data, it crashes immediately. However, if I don't use the copy function, everything works fine and all of the correct input and output directories are given in my print statements. The script can also be rerun without regeneration when there is no copy statement. This makes me think there must be some kind of overwrite issue but since I don't actually edit the csv files I can't put my finger on it.
I expect that it should simply copy the files from the source to destination.
EDIT: I went ahead and printed the whole file it gets stuck on. And what it seems to do is read the first line and then get crash. I tested this on another file and confirmed it just copies the files within the first range and then crashes
EDIT 2: I was able to get this working by using a try-except block on the block starting with for trdata in train_reader: however it skipped a lot of entries
EDIT 3: For those curious, I never figured out the issue although I suspect it was an overwrite issue, as checking for NULL values without the copy statement came up with nothing. I refactored the code where I first created a text file of the folder and file name and then read that file and copied the files. That worked perfect.
Thank you for any help!!
I don't think it's a problem with the copy. From the error message it looks like there's NULL byte in the CSV file that is being read. Write some print statements and observe that file.
You may find this helpful. "Line contains NULL byte" in CSV reader (Python)

How do find the average of the numbers in multiple text files?

I have multiple (around 50) text files in a folder and I wish to find the mean average of all these files. Is there a way for python to add up all the numbers in each of these files automatically and find the average for them?
I assume you don't want to put the name of all the files manually, so the first step is to get the name of the files in python so that you can use them in the next step.
import os
import numpy as np
Initial_directory = "<the full address to the 50 files you have ending with />"
Files = []
for file in os.listdir(Initial_directory):
Path.append( begin + file0 )
Now the list called "Files" has all the 50 files. Let's make another list to save the average of each file.
reading the data from each file depends on how the data is stored but I assume that in each line there is a single value.
Averages = []
for i in range(len(Files)):
Data = np.loadtxt(Files[i])
Averages.append (np.average(Data))
looping over all the files, Data stores the values in each file and then their average is added to the list Averages.
This can be done if we can unpack the steps needed to accomplish.
steps:
Python has a module called os that allows you to interact with the file system. You'll need this for accessing the files and reading from them.
declare a few variables as counters to be used for the duration of your script, including the directory name where the files reside.
loop over files in the directory, increment the total file_count variable by 1 (to get the total number of files, used for averaging at the end of the script).
join the file's specific name with the directory to create a path for the open function to find the accurate file.
read each file and add each line (assuming it's a number) within the file to the total number count (used for averaging at the end of the script), removing the newline character.
finally, print the average or continue using it in the script for whatever you need.
You could try something like the following:
#!/usr/bin/env python
import os
file_count=0
total=0
dir_name='your_directory_path_here'
for files in os.listdir(dir_name):
file_count+=1
for file_name in files:
file_path=os.path.join(dir_name,file_name)
file=open(file_path, 'r')
for line in file.readlines():
total+=int(line.strip('\n'))
avg=(total/file_count)
print(avg)

How to input all files within a directory with a certain ending? Python

I have a folder called Test '/Desktop/Test/'
I have several files in the folder (eg. 1.fa,2.fa,3.fa,X.fa)
or '/Desktop/Test/1.fa','/Desktop/Test/2.fa','/Desktop/Test/3.fa','/Desktop/Test/X.fa'
I'm trying to create an element in my function to open up every file in the directory '/Desktop/Test/' that has an ending .fa without actually making a function with 20 or so variables (the only way I know how to do it)
Example:
def simple(input):
#input would be the directory '/Desktop/Test/'
#for each .fa in the directory (eg. 1.fa, 2.fa, 3.fa, X.fa) i need it to create a list of all the strings within each file
#query=open(input,'r').read().split('\n') is what I would use in my simple codes that only have one input
How can one input all files within a directory with a certain ending (.fa) ?
You can do:
import glob
import os
def simple(input)
os.chdir(input)
for file in glob.glob("*.fa"):
with open(file, 'r+') as f:
print f.readlines() #or do whatever

Categories