INPUT: I want to add increasing numbers to file names in a directory sorted by date. For example, add "01_", "02_", "03_"...to these files below.
test1.txt (oldest text file)
test2.txt
test3.txt
test4.txt (newest text file)
Here's the code so far. I can get the file names, but each character in the file name seems to be it's own item in a list.
import os
for file in os.listdir("/Users/Admin/Documents/Test"):
if file.endswith(".txt"):
print(file)
The EXPECTED results are:
01_test1.txt
02_test2.txt
03_test3.txt
04_test4.txt
with test1 being the oldest and test 4 being the newest.
How do I add a 01_, 02_, 03_, 04_ to each file name?
I've tried something like this. But it adds a '01_' to every single character in the file name.
new_test_names = ['01_'.format(i) for i in file]
print (new_test_names)
If you want to number your files by age, you'll need to sort them first. You call sorted and pass a key parameter. The function os.path.getmtime will sort in ascending order of age (oldest to latest).
Use glob.glob to get all text files in a given directory. It is not recursive as of now, but a recursive extension is a minimal addition if you are using python3.
Use str.zfill to strings of the form 0x_
Use os.rename to rename your files
import glob
import os
sorted_files = sorted(
glob.glob('path/to/your/directory/*.txt'), key=os.path.getmtime)
for i, f in enumerate(sorted_files, 1):
try:
head, tail = os.path.split(f)
os.rename(f, os.path.join(head, str(i).zfill(2) + '_' + tail))
except OSError:
print('Invalid operation')
It always helps to make a check using try-except, to catch any errors that shouldn't be occurring.
This should work:
import glob
new_test_names = ["{:02d}_{}".format(i, filename) for i, filename in enumerate(glob.glob("/Users/Admin/Documents/Test/*.txt"), start=1)]
Or without list comprehension:
for i, filename in enumerate(glob.glob("/Users/Admin/Documents/Test/*.txt"), start=1):
print("{:02d}_{}".format(i, filename))
Three things to learn about here:
glob, which makes this sort of file matching easier.
enumerate, which lets you write a loop with an index variable.
format, specifically the 02d modifier, which prints two-digit numbers (zero-padded).
two methods to format integer with leading zero.
1.use .format
import os
i = 1
for file in os.listdir("/Users/Admin/Documents/Test"):
if file.endswith(".txt"):
print('{0:02d}'.format(i) + '_' + file)
i+=1
2.use .zfill
import os
i = 1
for file in os.listdir("/Users/Admin/Documents/Test"):
if file.endswith(".txt"):
print(str(i).zfill(2) + '_' + file)
i+=1
The easiest way is to simply have a variable, such as i, which will hold the number and prepend it to the string using some kind of formatting that guarantees it will have at least 2 digits:
import os
i = 1
for file in os.listdir("/Users/Admin/Documents/Test"):
if file.endswith(".txt"):
print('%02d_%s' % (i, file)) # %02d means your number will have at least 2 digits
i += 1
You can also take a look at enumerate and glob to make your code even shorter (but make sure you understand the fundamentals before using it).
test_dir = '/Users/Admin/Documents/Test'
txt_files = [file
for file in os.listdir(test_dir)
if file.endswith('.txt')]
numbered_files = ['%02d_%s' % (i + 1, file)
for i, file in enumerate(txt_files)]
Related
I have multiple text files containing different text.
They all contain a single appearance of the same 2 lines I am interested in:
================================================================
Result: XX/100
I am trying to write a script to collect all those XX values (numerical values between 0 and 100), and paste them in a CSV file with the text file name in column A and the numerical value in column B.
I have considered using Python or PowerShell for this purpose.
How can I identify the line where "Result" appears under the string of "===..", collect its content until '\n', and then strip it from "Result: " and "/100"?
"Result" and other numerical values could appear in the files, but never in the quoted format, and below "=====", like the line im interested in.
Thank you!
Edit: I have written this poor naive attempt to collect the numerical values.
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
for filename in os.listdir(dir_path):
if filename.endswith(".txt"):
with open(filename,"r") as f:
lineFound=False
for index, line in enumerate(f):
if lineFound:
line=line.replace("Result: ", "")
line=line.replace("/100","")
line.strip()
grade=line
lineFound=False
print(grade, end='')
continue
if index>3:
if "================================================================" in line:
lineFound=True
I'd still be happy to learn if there's a simple way to do this with PowerShell tbh
For the output, I used csv writer to append the results to a file one by one.
So there's two steps involved here, first is to get a list of files. There's a ton of answers for that one on stackoverflow, but this one is stupidly complete.
Once you have the list of files, you can simply just load the files themselves one by one, and then do some simple string.split() to get the value you want.
Finally, write the results into a CSV file. Since the CSV file is a simple one, you don't need to use the CSV library for this.
See the code example below. Note that I copied/pasted the function for generating the list of files from my personal github repo. I reuse that one a lot.
import os
def get_files_from_path(path: str = ".", ext:str or list=None) -> list:
"""Find files in path and return them as a list.
Gets all files in folders and subfolders
See the answer on the link below for a ridiculously
complete answer for this.
https://stackoverflow.com/a/41447012/9267296
Args:
path (str, optional): Which path to start on.
Defaults to '.'.
ext (str/list, optional): Optional file extention.
Defaults to None.
Returns:
list: list of file paths
"""
result = []
for subdir, dirs, files in os.walk(path):
for fname in files:
filepath = f"{subdir}{os.sep}{fname}"
if ext == None:
result.append(filepath)
elif type(ext) == str and fname.lower().endswith(ext.lower()):
result.append(filepath)
elif type(ext) == list:
for item in ext:
if fname.lower().endswith(item.lower()):
result.append(filepath)
return result
filelist = get_files_from_path("path/to/files/", ext=".txt")
split1 = "================================================================\nResult: "
split2 = "/100"
with open("output.csv", "w") as outfile:
outfile.write('filename, value\n')
for filename in filelist:
with open(filename) as infile:
value = infile.read().split(split1)[1].split(split2)[0]
print(value)
outfile.write(f'"{filename}", {value}\n')
You could try this.
In this example the filename written to the CSV will be its full (absolute) path. You may just want the base filename.
Uses the same, albeit seemingly unnecessary, mechanism for deriving the source directory. It would be unusual to have your Python script in the same directory as your data.
import os
import glob
equals = '=' * 64
dir_path = os.path.dirname(os.path.realpath(__file__))
outfile = os.path.join(dir_path, 'foo.csv')
with open(outfile, 'w') as csv:
print('A,B', file=csv)
for file in glob.glob(os.path.join(dir_path, '*.txt')):
prev = None
with open(file) as indata:
for line in indata:
t = line.split()
if len(t) == 2 and t[0] == 'Result:' and prev.startswith(equals):
v = t[1].split('/')
if len(v) == 2 and v[1] == '100':
print(f'{file},{v[0]}', file=csv)
break
prev = line
I have a bunch of images that have filenames that represent a range of values that I need to split into individual images. For example, for an image with the filename 1000-1200.jpg, I need 200 individual copies of the image named 1000.jpg, 1001.jpg, 1002.jpg, etc.
I know a bit of python but any suggestions on the quickest way to go about this would be much appreciated.
EDIT: Here's what I have so far. The only issue is that it strips leading zeros from the filename and I'm not quite sure how to fix that.
import os
from shutil import copyfile
fileList = []
filePath = 'C:\\AD\\Scripts\\to_split'
for file in os.listdir(filePath):
if file.endswith(".jpg"):
fileList.append(file)
for file in fileList:
fileName = os.path.splitext(file)[0].split("-")
rangeStart = fileName[0]
rangeEnd = fileName[1]
for part in range(int(rangeStart), int(rangeEnd)+1):
copyfile(os.path.join(filePath, file), os.path.join(filePath, str(part) + ".jpg"))
Lets break the problem down:
Step 1. Get all files in folder
Step 2. for each file, Get string from filename
Step 3. split the string into two ints a and b with str.split("-")
Step 4. for x in range(a, b), copy file and set the name of the file as str(x)
I am trying to get the number of the files in a certain directory, but I want it to count only text file because I have another directory in the accounts and DS. Store file. What should I modify to only get the number of text files?
list = os.listdir("data/accounts/")
number_files = len(list)
print(number_files)
From Count number of files with certain extension in Python,
Solution 1.
fileCounter = 0
for root, dirs, files in os.walk("data/accounts/"):
for file in files:
if file.endswith('.txt'):
fileCounter += 1
Solution 2.
fileCounter = len(glob.glob1("data/accounts/","*.txt"))
Read about glob here here
An alternative module that may work for you is glob.
It will allow you to use wildcards so you can capture just the files you are interested in.
from glob import glob
filenames = glob("data/accounts/*.txt")
number_of_files = len(filenames)
print(number_of_files)
Assuming your text files end in ".txt", you could use something like:
files = [x for x in os.listdir("data/accounts/") if (os.isfile(x) and x.endswith('.txt'))]
number_files = len(list)
print(number_files)
Using os.isfile() to ignore directories and string.endswith() to determine text file-ness.
number_of_files = sum(f.endswith('.txt') for f in os.listdir("data/accounts/"))
str.endswith() returns True or False
True and False have numeric values of one and zero, respectively.
sum() will consume the generator expression.
I have a folder with many text files (EPA10.txt, EPA55.txt, EPA120.txt..., EPA150.txt). I have 2 strings that are to be searched in each file and the result of the search is written in a text file result.txt. So far I have it working for a single file. Here is the working code:
if 'LZY_201_335_R10A01' and 'LZY_201_186_R5U01' in open('C:\\Temp\\lamip\\EPA150.txt').read():
with open("C:\\Temp\\lamip\\result.txt", "w") as f:
f.write('Current MW in node is EPA150')
else:
with open("C:\\Temp\\lamip\\result.txt", "w") as f:
f.write('NOT EPA150')
Now I want this to be repeated for all the text files in the folder. Please help.
Given that you have some amount of files named from EPA1.txt to EPA150.txt, but you don't know all the names, you can put them all together inside a folder, then read all the files in that folder using the os.listdir() method to get a list of filenames. You can read the file names using listdir("C:/Temp/lamip").
Also, your if statement is wrong, you should do this instead:
text = file.read()
if "string1" in text and "string2" in text
Here's the code:
from os import listdir
with open("C:/Temp/lamip/result.txt", "w") as f:
for filename in listdir("C:/Temp/lamip"):
with open('C:/Temp/lamip/' + filename) as currentFile:
text = currentFile.read()
if ('LZY_201_335_R10A01' in text) and ('LZY_201_186_R5U01' in text):
f.write('Current MW in node is ' + filename[:-4] + '\n')
else:
f.write('NOT ' + filename[:-4] + '\n')
PS: You can use / instead of \\ in your paths, Python automatically converts them for you.
Modularise! Modularise!
Well, not in the terms of having to write distinct Python modules, but isolate the different tasks at hand.
Find the files you wish to search.
Read the file and locate the text.
Write the result into a separate file.
Each of these tasks can be solved independently. I.e. to list the files, you have os.listdir which you might want to filter.
For step 2, it does not matter whether you have 1 or 1,000 files to search. The routine is the same. You merely have to iterate over each file found in step 1. This indicates that step 2 could be implemented as a function that takes the filename (and possible search-string) as argument, and returns True or False.
Step 3 is the combination of each element from step 1 and the result of step 2.
The result:
files = [fn for fn in os.listdir('C:/Temp/lamip') if fn.endswith('.txt')]
# perhaps filter `files`
def does_fn_contain_string(filename):
with open('C:/Temp/lamip/' + filename) as blargh:
content = blargh.read()
return 'string1' in content and/or 'string2' in content
with open('results.txt', 'w') as output:
for fn in files:
if does_fn_contain_string(fn):
output.write('Current MW in node is {1}\n'.format(fn[:-4]))
else:
output.write('NOT {1}\n'.format(fn[:-4]))
You can do this by creating a for loop that runs through all your .txt files in the current working directory.
import os
with open("result.txt", "w") as resultfile:
for result in [txt for txt in os.listdir(os.getcwd()) if txt.endswith(".txt")]:
if 'LZY_201_335_R10A01' and 'LZY_201_186_R5U01' in open(result).read():
resultfile.write('Current MW in node is {1}'.format(result[:-4]))
else:
resultfile.write('NOT {0}'.format(result[:-4]))
I am writing a Python code and would like some more insight on how to approach this issue.
I am trying to read in multiple files in order that end with .log. With this, I hope to write specific values to a .csv file.
Within the text file, there are X/Y values that are extracted below:
Textfile.log:
X/Y = 5
X/Y = 6
Textfile.log.2:
X/Y = 7
X/Y = 8
DesiredOutput in the CSV file:
5
6
7
8
Here is the code I've come up with so far:
def readfile():
import os
i = 0
for file in os.listdir("\mydir"):
if file.endswith(".log"):
return file
def main ():
import re
list = []
list = readfile()
for line in readfile():
x = re.search(r'(?<=X/Y = )\d+', line)
if x:
list.append(x.group())
else:
break
f = csv.write(open(output, "wb"))
while 1:
if (i>len(list-1)):
break
else:
f.writerow(list(i))
i += 1
if __name__ == '__main__':
main()
I'm confused on how to make it read the .log file, then the .log.2 file.
Is it possible to just have it automatically read all the files in 1 directory without typing them in individually?
Update: I'm using Windows 7 and Python V2.7
The simplest way to read files sequentially is to build a list and then loop over it. Something like:
for fname in list_of_files:
with open(fname, 'r') as f:
#Do all the stuff you do to each file
This way whatever you do to read each file will be repeated and applied to every file in list_of_files. Since lists are ordered, it will occur in the same order as the list is sorted to.
Borrowing from #The2ndSon's answer, you can pick up the files with os.listdir(dir). This will simply list all files and directories within dir in an arbitrary order. From this you can pull out and order all of your files like this:
allFiles = os.listdir(some_dir)
logFiles = [fname for fname in allFiles if "log" in fname.split('.')]
logFiles.sort(key = lambda x: x.split('.')[-1])
logFiles[0], logFiles[-1] = logFiles[-1], logFiles[0]
The above code will work with files name like "somename.log", "somename.log.2" and so on. You can then take logFiles and plug it in as list_of_files. Note that the last line is only necessary if the first file is "somename.log" instead of "somename.log.1". If the first file has a number on the end, just exclude the last step
Line By Line Explanation:
allFiles = os.listdir(some_dir)
This line takes all files and directories within some_dir and returns them as a list
logFiles = [fname for fname in allFiles if "log" in fname.split('.')]
Perform a list comprehension to gather all of the files with log in the name as part of the extension. "something.log.somethingelse" will be included, "log_something.somethingelse" will not.
logFiles.sort(key = lambda x: x.split('.')[-1])
Sort the list of log files in place by the last extension. x.split('.')[-1] splits the file name into a list of period delimited values and takes the last entry. If the name is "name.log.5", it will be sorted as "5". If the name is "name.log", it will be sorted as "log".
logFiles[0], logFiles[-1] = logFiles[-1], logFiles[0]
Swap the first and last entries of the list of log files. This is necessary because the sorting operation will put "name.log" as the last entry and "nane.log.1" as the first.
If you change the naming scheme for your log files you can easily return of list of files that have the ".log" extension. For example if you change the file names to Textfile1.log and Textfile2.log you can update readfile() to be:
import os
def readfile():
my_list = []
for file in os.listdir("."):
if file.endswith(".log"):
my_list.append(file)
print my_list will return ['Textfile1.log', 'Textfile2.log']. Using the word 'list' as a variable is generally avoided, as it is also used to for an object in python.