I'm trying to retrieve the number from a file, and determine the padding of it, so I can apply it to the new file name, but with an added number. I'm basically trying to do a file saver sequencer.
Ex.:
fileName_0026
0026 = 4 digits
add 1 to the current number and keep the same amount of digit
The result should be 0027 and on.
What I'm trying to do is retrieve the padding number from the file and use the '%04d'%27 string formatting. I've tried everything I know (my knowledge is very limited), but nothing works. I've looked everywhere to no avail.
What I'm trying to do is something like this:
O=fileName_0026
P=Retrieved padding from original file (4)
CN=retrieve current file number (26)
NN=add 1 to current file number (27)
'%0 P d' % NN
Result=fileName_0027
I hope this is clear enough, I'm having a hard time trying to articulate this.
Thanks in advance for any help.
Cheers!
There's a few things going on here, so here's my approach and a few comments.
def get_next_filename(existing_filename):
prefix = existing_filename.split("_")[0] # get string prior to _
int_string = existing_filename.split("_")[-1].split(".")[0] # pull out the number as a string so we can extract an integer value as well as the number of characters
try:
extension = existing_filename.split("_")[-1].split(".")[-1] # check for extension
except:
extension = None
int_current = int(int_string) # integer value of current filename
int_new = int(int_string) + 1 # integer value of new filename
digits = len(int_string) # number of characters/padding in name
formatter = "%0"+str(digits)+"d" # creates a statement that int_string_new can use to create a number as a string with leading zeros
int_string_new = formatter % (int_new,) # applies that format
new_filename = prefix+"_"+int_string_new # put it all together
if extension: # add the extension if present in original name
new_filename += "."+extension
return new_filename
# since we only want to do this when the file already exists, check if it exists and execute function if so
our_filename = 'file_0026.txt'
while os.path.isfile(our_filename):
our_filename = get_next_filename(our_filename) # loop until a unique filename found
I am writing some hints to acheive that. It's unclear what exactly you wanna achieve?
fh = open("fileName_0026.txt","r") #Read a file
t= fh.read() #Read the content
name= t.split("_|.") #Output:: [fileName,0026,txt]
n=str(int(name[1])+1) #27
s= n.zfill(2) #0027
newName= "_".join([fileName,s])+".txt" #"fileName_0027.txt"
fh = open(newName,"w") #Write a new file*emphasized text*
Use the rjust function from string
O=fileName_0026
P=Retrieved padding from original file (4)
CN=retrieve current file number (26)
NN=add 1 to current file number (27)
new_padding = str(NN).rjust(P, '0')
Result=fileName_ + new_padding
import re
m = re.search(r".*_(0*)(\d*)", "filenName_00023")
print m.groups()
print("fileName_{0:04d}".format(int(m.groups()[1])+1))
{0:04d} means pad out to four digits wide with leading zeros.
As you can see there are a few ways to do this that are quite similar. But one thing the other answers haven't mention is that it's important to strip off any existing leading zeroes from your file's number string before converting it to int, otherwise it will be interpreted as octal.
edit
I just realised that my previous code crashes if the file number is zero! :embarrassed:
Here's a better version that also copes with a missing file number and names with multiple or no underscores.
#! /usr/bin/env python
def increment_filename(s):
parts = s.split('_')
#Handle names without a number after the final underscore
if not parts[-1].isdigit():
parts.append('0')
tail = parts[-1]
try:
n = int(tail.lstrip('0'))
except ValueError:
#Tail was all zeroes
n = 0
parts[-1] = str(n + 1).zfill(len(tail))
return '_'.join(parts)
def main():
for s in (
'fileName_0026',
'data_042',
'myfile_7',
'tricky_99',
'myfile_0',
'bad_file',
'worse_file_',
'_lead_ing_under_score',
'nounderscore',
):
print "'%s' -> '%s'" % (s, increment_filename(s))
if __name__ == "__main__":
main()
output
'fileName_0026' -> 'fileName_0027'
'data_042' -> 'data_043'
'myfile_7' -> 'myfile_8'
'tricky_99' -> 'tricky_100'
'myfile_0' -> 'myfile_1'
'bad_file' -> 'bad_file_1'
'worse_file_' -> 'worse_file__1'
'_lead_ing_under_score' -> '_lead_ing_under_score_1'
'nounderscore' -> 'nounderscore_1'
Some additional refinements possible:
An optional arg to specify the number to add to the current file
number,
An optional arg to specify the minimum width of the file
number string,
Improved handling of names with weird number / position of
underscores.
Related
I have seen the basic Python code for a filename replacement in a directory but they are always for known strings, but how would you remove random characters of a certain length?
Would this work?
newFileName = file.replace([-5:], "")
As I am trying to remove the last five characters from the filename without removing the extension.
Here is an update:
I am trying to do this:
DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml
to
CIWS15-AAAA-A00-00-0000-00A-018A-D.xml
which removes DMC- and _014-00_EN-US from the end.
I need to add this to a code that will fix a directory of files.
This problem (if I understand it correctly) has a clear separation. Remove extension, remove X characters from beginning and end, and then add the extension again to get the final answer.
import os
oldFileName = 'xxxx-filename-xxxxx.XML'
# remove n chars in beginning, m chars at end
n = 5
m = 6
name, ext = os.path.splitext(oldFileName)
# splice away the chars, and add the extension
newFileName = '{}{}'.format(name[0:-m][n:], ext)
# newFileName == 'filename.XML'
So in your case, you would use n=4 and m=13.
If you didn't know the length, but you knew you wanted everything up to and including the first dash out, and likewise everything after the first underscore (which would mean there couldn't be underscores in the normal filename or the first part of it), this would work also:
import os
oldFileName = 'DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml'
name, ext = os.path.splitext(oldFileName)
newFileName = '{}{}'.format(name[name.index('-')+1:name.index('_')], ext)
# newFileName == 'CIWS15-AAAA-A00-00-0000-00A-018A-D.xml'
And even if the pattern is something else, but there is a pattern, you can code to match it, like I have here.
Its not nice but I hope this works for you tho
If you know the files that you want to rename all have the same length, you can try:
>>>file = 'DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml'
>>>ext = file[51:]
>>>newFile = file[4:38]+ext
when you print the newFile you now have:
>>>print(newFile)
CIWS15-AAAA-A00-00-0000-00A-018A-D.xml
I was writing a little program that finds all files with given prefix, let's say 'spam' for this example, in a folder and locates gaps in numbering and renames subsequent folders to fill the gap. Below illustrates a portion of the program that locates the files using a regex and renames it:
prefix = 'spam'
newNumber = 005
# Regex for finding files with specified prefix + any numbering + any file extension
prefixRegex = re.compile(r'(%s)((\d)+)(\.[a-zA-Z0-9]+)' % prefix)
# Rename file by keeping group 1 (prefix) and group 4 (file extension),
# but substituting numbering with newNumber
newFileName = prefixRegex.sub(r'\1%s\4' % newNumber, 'spam006.txt')
What I was expecting from above was spam005.txt, but instead I got #5.txt
I figured out I could use r'%s%s\4' % (prefix, newNumber) instead and then it does work as intended, but I'd like to understand why this error is happening. Does it have something to do with the %s used during re.compile()?
There are two problems here:
Your newNumber needs to be a string if you want it to be 005 as the first two 0 are dropped when it is being interpreted as an integer.
Your next problem is indeed in your substitution. By using the string formating you effectively create the new regexp \15\4 (see the 5 in there, that was your newNumber). When python sees this it tries to get capturing group 15 and not group 1 followed by a literal 5. You can enclose the reference in a g like this to get your desired behavior: \g<1>5\4
So your code needs to be changed to this:
prefix = 'spam'
newNumber = '005'
# Regex for finding files with specified prefix + any numbering + any file extension
prefixRegex = re.compile(r'(%s)((\d)+)(\.[a-zA-Z0-9]+)' % prefix)
# Rename file by keeping group 1 (prefix) and group 4 (file extension),
# but substituting numbering with newNumber
newFileName = prefixRegex.sub(r'\g<1>%s\4' % newNumber, 'spam006.txt')
More information about the \g<n> behavior can be found at the end of the re.sub doucmentation
I am using this code to find difference between two csv list and hove some formatting questions. This is probably an easy fix, but I am new and trying to learn and having alot of problems.
import difflib
diff=difflib.ndiff(open('test1.csv',"rb").readlines(), open('test2.csv',"rb").readlines())
try:
while 1:
print diff.next(),
except:
pass
the code works fine and I get the output I am looking for as:
Group,Symbol,Total
- Adam,apple,3850
? ^
+ Adam,apple,2850
? ^
bob,orange,-45
bob,lemon,66
bob,appl,-56
bob,,88
My question is how do I clean the formatting up, can I make the Group,Symbol,Total into sperate columns, and the line up the text below?
Also can i change the ? to represent a text I determine? such as test 1 and test 2 representing which sheet it comes from?
thanks for any help
Using difflib.unified_diff gives much cleaner output, see below.
Also, both difflib.ndiff and difflib.unified_diff return a Differ object that is a generator object, which you can directly use in a for loop, and that knows when to quit, so you don't have to handle exceptions yourself. N.B; The comma after line is to prevent print from adding another newline.
import difflib
s1 = ['Adam,apple,3850\n', 'bob,orange,-45\n', 'bob,lemon,66\n',
'bob,appl,-56\n', 'bob,,88\n']
s2 = ['Adam,apple,2850\n', 'bob,orange,-45\n', 'bob,lemon,66\n',
'bob,appl,-56\n', 'bob,,88\n']
for line in difflib.unified_diff(s1, s2, fromfile='test1.csv',
tofile='test2.csv'):
print line,
This gives:
--- test1.csv
+++ test2.csv
## -1,4 +1,4 ##
-Adam,apple,3850
+Adam,apple,2850
bob,orange,-45
bob,lemon,66
bob,appl,-56
So you can clearly see which lines were changed between test1.csv and test1.csv.
To line up the columns, you must use string formatting.
E.g. print "%-20s %-20s %-20s" % (row[0],row[1],row[2]).
To change the ? into any text test you like, you'd use s.replace('any text i like').
Your problem has more to do with the CSV format, since difflib has no idea it's looking at columnar fields. What you need is to figure out into which field the guide is pointing, so that you can adjust it when printing the columns.
If your CSV files are simple, i.e. they don't contain any quoted fields with embedded commas or (shudder) newlines, you can just use split(',') to separate them into fields, and figure out where the guide points as follows:
def align(line, guideline):
"""
Figure out which field the guide (^) points to, and the offset within it.
E.g., if the guide points 3 chars into field 2, return (2, 3)
"""
fields = line.split(',')
guide = guideline.index('^')
f = p = 0
while p + len(fields[f]) < guide:
p += len(fields[f]) + 1 # +1 for the comma
f += 1
offset = guide - p
return f, offset
Now it's easy to show the guide properly. Let's say you want to align your columns by printing everything 12 spaces wide:
diff=difflib.ndiff(...)
for line in diff:
code = line[0] # The diff prefix
print code,
if code == '?':
fld, offset = align(lastline, line[2:])
for f in range(fld):
print "%-12s" % '',
print ' '*offset + '^'
else:
fields = line[2:].rstrip('\r\n').split(',')
for f in fields:
print "%-12s" % f,
print
lastline = line[2:]
Be warned that the only reliable way to parse CSV files is to use the csv module (or a robust alternative); but getting it to play well with the diff format (in full generality) would be a bit of a headache. If you're mainly interested in readability and your CSV isn't too gnarly, you can probably live with an occasional mix-up.
This seems fairly trivial but I can't seem to work it out
I have a text file with the contents:
B>F
I am reading this with the code below, stripping the '>' and trying to convert the strings into their corresponding ASCII value, minus 65 to give me a value that will correspond to another list index
def readRoute():
routeFile = open('route.txt', 'r')
for line in routeFile.readlines():
route = line.strip('\n' '\r')
route = line.split('>')
#startNode, endNode = route
startNode = ord(route[0])-65
endNode = ord(route[1])-65
# Debug (this comment was for my use to explain below the print values)
print 'Route Entered:'
print line
print startNode, ',', endNode, '\n'
return[startNode, endNode]
However I am having slight trouble doing the conversion nicely, because the text file only contains one line at the moment but ideally I need it to be able to support more than one line and run an amount of code for each line.
For example it could contain:
B>F
A>D
C>F
E>D
So I would want to run the same code outside this function 4 times with the different inputs
Anyone able to give me a hand
Edit:
Not sure I made my issue that clear, sorry
What I need it do it parse the text file (possibly containing one line or multiple lines like above. I am able to do it for one line with the lines
startNode = ord(route[0])-65
endNode = ord(route[1])-65
But I get errors when trying to do more than one line because the ord() is expecting different inputs
If I have (below) in the route.txt
B>F
A>D
This is the error it gives me:
line 43, in readRoute endNode = ord(route[1])-65
TypeError: ord() expected a character, but string of length 2 found
My code above should read the route.txt file and see that B>F is the first route, strip the '>' - convert the B & F to ASCII, so 66 & 70 respectively then minus 65 from both to give 1 & 5 (in this example)
The 1 & 5 are corresponding indexes for another "array" (list of lists) to do computations and other things on
Once the other code has completed it can then go to the next line in route.txt which could be A>D and perform the above again
Perhaps this will work for you. I turned the fileread into a generator so you can do as you please with the parsed results in the for-i loop.
def readRoute(file_name):
with open(file_name, 'r') as r:
for line in r:
yield (ord(line[0])-65, ord(line[2])-65)
filename = 'route.txt'
for startnode, endnode in readRoute(filename):
print startnode, endnode
If you can't change readRoute, change the contents of the file before each call. Better yet, make readRoute take the filename as a parameter (default it to 'route.txt' to preserve the current behavior) so you can have it process other files.
What about something like this? It takes the routes defined in your file and turns them into path objects with start and end member variables. As an added bonus PathManager.readFile() allows you to load multiple route files without overwriting the existing paths.
import re
class Path:
def __init__(self, start, end):
self.start = ord(start) - 65 # Scale the values as desired
self.end = ord(end) - 65 # Scale the values as desired
class PathManager:
def __init__(self):
self.expr = re.compile("^([A-Za-z])[>]([A-Za-z])$") # looks for string "C>C"
# where C is a char
self.paths = []
def do_logic_routine(self, start, end):
# Do custom logic here that will execute before the next line is read
# Return True for 'continue reading' or False to stop parsing file
return True
def readFile(self, path):
file = open(path,"r")
for line in file:
item = self.expr.match(line.strip()) # strip whitespaces before parsing
if item:
'''
item.group(0) is *not* used here; it matches the whole expression
item.group(1) matches the first parenthesis in the regular expression
item.group(2) matches the second
'''
self.paths.append(Path(item.group(1), item.group(2)))
if not do_logic_routine(self.paths[-1].start, self.paths[-1].end):
break
# Running the example
MyManager = PathManager()
MyManager.readFile('route.txt')
for path in MyManager.paths:
print "Start: %s End: %s" % (path.start, path.end)
Output is:
Start: 1 End: 5
Start: 0 End: 3
Start: 2 End: 5
Start: 4 End: 3
I'm pretty new to Python programming and would appreciate some help to a problem I have...
Basically I have multiple text files which contain velocity values as such:
0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00
etc for many lines...
What I need to do is convert all the values in the text file that are less than 1 (e.g. 0.137865E+00 above) to an arbitrary value of 0.100000E+01. While it seems pretty simple to replace specific values with the 'replace()' method and a while loop, how do you do this if you want to replace a range?
thanks
I think when you are beginning programming, it's useful to see some examples; and I assume you've tried this problem on your own first!
Here is a break-down of how you could approach this:
contents='0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00'
The split method works on strings. It returns a list of strings. By default, it splits on whitespace:
string_numbers=contents.split()
print(string_numbers)
# ['0.259515E+03', '0.235095E+03', '0.208262E+03', '0.230223E+03', '0.267333E+03', '0.217889E+03', '0.156233E+03', '0.144876E+03', '0.136187E+03', '0.137865E+00']
The map command applies its first argument (the function float) to each of the elements of its second argument (the list string_numbers). The float function converts each string into a floating-point object.
float_numbers=map(float,string_numbers)
print(float_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 0.13786499999999999]
You can use a list comprehension to process the list, converting numbers less than 1 into the number 1. The conditional expression (1 if num<1 else num) equals 1 when num is less than 1, otherwise, it equals num.
processed_numbers=[(1 if num<1 else num) for num in float_numbers]
print(processed_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 1]
This is the same thing, all in one line:
processed_numbers=[(1 if num<1 else num) for num in map(float,contents.split())]
To generate a string out of the elements of processed_numbers, you could use the str.join method:
comma_separated_string=', '.join(map(str,processed_numbers))
# '259.515, 235.095, 208.262, 230.223, 267.333, 217.889, 156.233, 144.876, 136.187, 1'
typical technique would be:
read file line by line
split each line into a list of strings
convert each string to the float
compare converted value with 1
replace when needed
write back to the new file
As I don't see you having any code yet, I hope that this would be a good start
def float_filter(input):
for number in input.split():
if float(number) < 1.0:
yield "0.100000E+01"
else:
yield number
input = "0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00"
print " ".join(float_filter(input))
import numpy as np
a = np.genfromtxt('file.txt') # read file
a[a<1] = 0.1 # replace
np.savetxt('converted.txt', a) # save to file
You could use regular expressions for parsing the string. I'm assuming here that the mantissa is never larger than 1 (ie, begins with 0). This means that for the number to be less than 1, the exponent must be either 0 or negative. The following regular expression matches '0', '.', unlimited number of decimal digits (at least 1), 'E' and either '+00' or '-' and two decimal digits.
0\.\d+E(-\d\d|\+00)
Assuming that you have the file read into variable 'text', you can use the regexp with the following python code:
result = re.sub(r"0\.\d*E(-\d\d|\+00)", "0.100000E+01", text)
Edit: Just realized that the description doesn't limit the valid range of input numbers to positive numbers. Negative numbers can be matched with the following regexp:
-0\.\d+E[-+]\d\d
This can be alternated with the first one using the (pattern1|pattern2) syntax which results in the following Python code:
result = re.sub(r"(0\.\d+E(-\d\d|\+00)|-0\.\d+E[-+]\d\d)", "0.100000E+00", subject)
Also if there's a chance that the exponent goes past 99, the regexp can be further modified by adding a '+' sign after the '\d\d' patterns. This allows matching digits ending in two OR MORE digits.
I've got the script working as I want now...thanks people.
When writing the list to a new file I used the replace method to get rid of the brackets and commas - is there a simpler way?
ftext = open("C:\\Users\\hhp06\\Desktop\\out.grd", "r")
otext = open("C:\\Users\\hhp06\\Desktop\\out2.grd", "w+")
for line in ftext:
stringnum = line.split()
floatnum = map(float, stringnum)
procnum = [(1.0 if num<1 else num) for num in floatnum]
stringproc = str(procnum)
s = (stringproc).replace(",", " ").replace("[", " ").replace("]", "")
otext.writelines(s + "\n")
otext.close()