I have multiple (around 50) text files in a folder and I wish to find the mean average of all these files. Is there a way for python to add up all the numbers in each of these files automatically and find the average for them?
I assume you don't want to put the name of all the files manually, so the first step is to get the name of the files in python so that you can use them in the next step.
import os
import numpy as np
Initial_directory = "<the full address to the 50 files you have ending with />"
Files = []
for file in os.listdir(Initial_directory):
Path.append( begin + file0 )
Now the list called "Files" has all the 50 files. Let's make another list to save the average of each file.
reading the data from each file depends on how the data is stored but I assume that in each line there is a single value.
Averages = []
for i in range(len(Files)):
Data = np.loadtxt(Files[i])
Averages.append (np.average(Data))
looping over all the files, Data stores the values in each file and then their average is added to the list Averages.
This can be done if we can unpack the steps needed to accomplish.
steps:
Python has a module called os that allows you to interact with the file system. You'll need this for accessing the files and reading from them.
declare a few variables as counters to be used for the duration of your script, including the directory name where the files reside.
loop over files in the directory, increment the total file_count variable by 1 (to get the total number of files, used for averaging at the end of the script).
join the file's specific name with the directory to create a path for the open function to find the accurate file.
read each file and add each line (assuming it's a number) within the file to the total number count (used for averaging at the end of the script), removing the newline character.
finally, print the average or continue using it in the script for whatever you need.
You could try something like the following:
#!/usr/bin/env python
import os
file_count=0
total=0
dir_name='your_directory_path_here'
for files in os.listdir(dir_name):
file_count+=1
for file_name in files:
file_path=os.path.join(dir_name,file_name)
file=open(file_path, 'r')
for line in file.readlines():
total+=int(line.strip('\n'))
avg=(total/file_count)
print(avg)
Related
I have a output files folder, where all the files get dumped, i need to check into the folder every five mins and pick up all the list of recently added files by using python.
One way of doing this is using sets, and get the non intersected files, is there any other better approach?
much appreciate the code snippet of it.
Thanks
To solve this, you can make use of the particular method listdir() from the os module and sleep() from the time module.
import os
from time import sleep
path = "/path/to/folder/with/csv/files"
with open("log.txt", "a+") as log_file:
while True:
log_file.seek(0)
existing = [f.strip() for f in log_file]
csvs = [f for f in os.listdir(path) if f.endswith(".csv") and f not in existing]
if len(csvs) > 0:
print(f"Found {len(csvs)} new file(s):")
for f in csvs:
print(f)
print("\n")
else:
print("Found 0 new files.")
log_file.writelines([f"{f}\n" for f in csvs])
sleep(300)
We will be storing the existing file names in a .txt file. You could use a .json file or any other file type you like. Firstly, we open the file using with/open (in append/read mode) and get a list of the file names that have previously been stored in the text file. We then get a list of all of the .csv files in that directory that are not in the file:
csvs = [f for f in os.listdir(path) if f.endswith(".csv") and f not in existing]
os.listdir() is lists all of the files and folders in the current working directory.
The following if/else statement is simply for output purposes and is not required. It is only saying: if new csv files were found, print how many and the names of each. If none were found, print that zero were found.
All that's left to do is write the newly discovered file names into the .txt file so that on the next iteration, they will be marked as existing and not new:
log_file.writelines([f"{f}\n" for f in csvs])
The final line, sleep(300), makes the program wait 300 seconds, or 5 minutes, to iterate again.
Let's say the start.py is located in C:\.
import os
path = "C:\\Users\\Downloads\\00005.tex"
file = open(path,"a+")
file. truncate(0)
file.write("""Hello
""")
file.close()
os.startfile("C:\\Users\\Downloads\\00005.tex")
In the subdirectory could be some files. For example: 00001.tex, 00002.tex, 00003.tex, 00004.tex.
I want first to search in the subdir for the file with the highest number (00004.tex) and create a new one with the next number (00005.tex), write "Hello" and save it in the subdir 00005.tex.
Are the zeros necessary or can i also just name them 1.tex, 2.tex, 3.tex......?
Textually, "2" is greater than "100" but of course numerically, its the opposite. The reason for writing files as say, "00002.txt" and "00100.text" is that for files numbered up to 99999, the lexical sorting is the same as the numerical sorting. If you write them as "2.txt" and "100.txt" then you need to change the non-extension part of the name to an integer before sorting.
In your case, since you want the next highest number, you need to convert the filenames to integers so that you can get a maximum and add 1. Since you are converting to an integer anyway, your progam doesn't care whether you prepend zeroes or not.
So the choice is based on external reasons. Is there some reason to make it so that a textual sort works? If not, then the choice is purely random and do whatever you think looks better.
You can use glob:
import glob, os
os.chdir(r"Path")
files = glob.glob("*.tex")
entries = sorted([int(entry.split(".", 1)[0]) for entry in files])
next_entry = entries[-1]+1
next_entry can be used as a new filename. You can then create a new file with this name and write your new content to that file
I have the following:
I have directory with subdirectories which are filled with files. The structure is the following: /periodic_table/{Element}_lj_dat/lj_dat_sim.dat;
Each file consists of two rows (first one is the comment) and 12 columns of data.
What I would like to get is to go through all folders of elements (eg. Al, Cu etc.), open created file (for example named "mergedlj.dat" in periodic_table directory) and store all the data from each file in one adding Element name from parent directory as a first (or last) column of merged file.
The best way is to ignore the first row in each file and save only data from second row.
I am very unexperienced in bash/shell scripting, but I think this is the best way to go (Python is acceptable too!). Unfortunately I had only experience with files which are in the same folder as the script, so this is some new experience for me.
Here is the code just to find this files, but actually it doesn't do anything what I need:
find ../periodic_table/*_lj_dat/ -name lj_dat_sim.dat -print0 | while read -d $'\0' file; do
echo "Processing $file"
done
Any help will be highly appreciated!!
Here's a Python solution.
You can use glob() to get a list of the matching files and then iterate over them with fileinput.input(). fileinput.filename() lets you get the name of the file that is currently being processed, and this can be used to determine the current element whenever processing begins on a new file, as determined by fileinput.isfirstline().
The current element is added as the first column of the merge file. I've assumed that the field separator in the input files is a single space, but you can change that by altering ' '.join() below.
import re
import fileinput
from glob import glob
dir_prefix = '.'
glob_pattern = '{}/periodic_table/*_lj_dat/lj_dat_sim.dat'.format(dir_prefix)
element_pattern = re.compile(r'.*periodic_table/(.+)_lj_dat/lj_dat_sim.dat')
with open('mergedlj.dat', 'w') as outfile:
element = ''
for line in fileinput.input(glob(glob_pattern)):
if fileinput.isfirstline():
# extract the element name from the file name
element = element_pattern.match(fileinput.filename()).groups()[0]
else:
print(' '.join([element, line]), end='', file=outfile)
You can use os.path.join() to construct the glob and element regex patterns, but I've omitted that above to avoid cluttering up the answer.
I am running a model in batchess. I have included my code below.
I have 1000+ data files. Each time when I run the script I get the output for each input file, however the name of each output for every input comes as OUTPUT.0001.nc, where nc is an extension and 0001 is the number of iteration. If I will keep number of iteration 3, I will get output files as OUTPUT.0001.nc, OUTPUT.0002.nc, OUTPUT.0003.nc
Intially I wrote the script for quite lesser number of grids and did some manual stuff to analyse the result, but now I have to do the same things for 1000+ grids.
For lesser number of grids (say 12), I was running the code, saving the results of each file in a new folder having the same name as input file, and renaming them in last after deleting all the iteration results but last one.
However, since now the number is large, I need to change the output file name (same as input) through script after deleting all the iteration except the last one.
How to do this?
input file = 1.dat output file= 1.nc from OUTPUT.0005.nc
OUTPUT.0000.nc such name are created because of the model's code, I can't alter this.
#!/usr/bin/python
import numpy as np
import os
import shutil
import glob
# Name of the current working directory
current_directory=os.getcwd()
# Loop for creating the file name
for point in np.arange(1,101):
x=point
fname=str(x)+".dat"
if os.path.exists(fname):
command="./driver.exe"+" "+str(fname)
# Invoke command in the terminal
os.system(command)
# create directories
directory=str(x)
os.makedirs(directory)
# path of created directory
destination = os.path.abspath(directory)
# path of existing file
source = os.path.abspath(fname)
# Move all files with ".nc" extension to relevant directory
files = glob.iglob(os.path.join(current_directory, "*.nc"))
for file in files:
if os.path.isfile(file):
shutil.move(file, destination)
# Move file which is used for execution in relevant directroy
shutil.move(source, destination)
What I want to do is, run the model in batch
Loop-1 will take input file 1.dat
1. delete all the iteration output except the last one
2. Change the name of the output as per the input file name 1.nc
Loop-2 will take input file 2.dat
I have a folder called Test '/Desktop/Test/'
I have several files in the folder (eg. 1.fa,2.fa,3.fa,X.fa)
or '/Desktop/Test/1.fa','/Desktop/Test/2.fa','/Desktop/Test/3.fa','/Desktop/Test/X.fa'
I'm trying to create an element in my function to open up every file in the directory '/Desktop/Test/' that has an ending .fa without actually making a function with 20 or so variables (the only way I know how to do it)
Example:
def simple(input):
#input would be the directory '/Desktop/Test/'
#for each .fa in the directory (eg. 1.fa, 2.fa, 3.fa, X.fa) i need it to create a list of all the strings within each file
#query=open(input,'r').read().split('\n') is what I would use in my simple codes that only have one input
How can one input all files within a directory with a certain ending (.fa) ?
You can do:
import glob
import os
def simple(input)
os.chdir(input)
for file in glob.glob("*.fa"):
with open(file, 'r+') as f:
print f.readlines() #or do whatever