strip() function doesn't remove trailing numbers - python

I try the following code but fails to remove the trailing digits using python 3.4.3
file_name = "48athens22.jpg"
result = file_name.strip("0123456789")
print (result)
Output:
athens22.jpg
What has gone wrong?

strip() only strips from the end of a string; the 22 is not at the end of the string.
Here's how to do what you want:
import os
def strip_filename(filename):
root, ext = os.path.splitext(filename)
root = root.strip('0123456789')
return root + ext
print(strip_filename('48athens22.jpg')) # athens.jpg

strip only removes from the beginning and end of the string. Try re.sub instead, if you want to remove any occurrences of a substring or a pattern.
E.g.
re.sub('[0-9]', '', file_name)

Those numbers aren't trailing. They come before the '.jpg'.
file_name = "48athens22.jpg"
name, *extension = file_name.rpartition('.')
result = name.strip("0123456789") + ''.join(extension)
print (result)

Works for me:
file_name = "48athens22.jpg1234"
result = file_name.strip("0123456789")
print(result)
Gives:
athens22.jpg
If you want to remove all digits, try:
import re
file_name = "48athens22.jpg1234"
result = re.sub(r'\d+', "", file_name)
print(result)
Gives:
athens.jpg
If you only want to remove digits before the ".", try:
result = re.sub(r'\d+\.', ".", file_name)

There are no trailing numbers , the last character in your string in 'g' , 22 is actually in the middle , if you do not want to consider the extension when striping , you will have to first split the file_name based on '.' And then strip the first part and then rejoin them.
Code -
filenames = file_name.split('.')
result = filenames[0].strip('0123456789') + '.' + '.'.join(filenames[1:])
print(result)

Related

Python: Remove character only from end of string if character is ="/"

I add different Values to the Houdini Variables with Python.
Some of these Variables are file pathes and end with an "/" - others are just names and do not end with an "/".
In my current code I use [:-1] to remove the last character of the filepath, so I dont have the "/".
The problem is, that if I add a Value like "Var_ABC", the result will be "Var_AB" since it also removes the last character.
How can i remove the last character only if the last character is a "/"?
Thats what I have and it works so far:
def set_vars():
count = hou.evalParm('vars_names')
user_name = hou.evalParm('user_name')
for idx in range( 1,count+1):
output = hou.evalParm('vars_' + str(idx))
vars_path_out = hou.evalParm('vars_path_' + str(idx))
vars_path = vars_path_out[:-1]
hou.hscript("setenv -g " + output + "=" + vars_path)
final_vars = hou.hscript("setenv -g " + output + "=" + vars_path)
hou.ui.displayMessage(user_name +", " + "all variables are set.")
Thank you
As #jasonharper mentioned in the comments, you should probably use rstrip here. It is built-in and IMO more readable than the contitional one-liner:
vars_path_out.rstrip('/')
This will strip out those strings which end with / and return without that ending. Otherwise it will return your string as-is.
Try this in your code:
vars_path_out = hou.evalParm('vars_path_' + str(idx))
if vars_path_out[-1] == '/':
vars_path = vars_path_out[:-1]
or
based on the comment of jasonharper
vars_path = vars_path_out.rstrip('/')
This is much better than the first
Use endswith method to check if it ends with /
if vars_path_out.endswith('/')
Or simply check the last character:
if vars_path_out[-1] == '/'
Like this:
vars_path = vars_path_out[:-1] if vars_path_out.endswith('/') else vars_path_out
Or like this:
if vars_path_out.endswith('\'):
vars_path = vars_path_out[:-1]
else:
vars_path = vars_path_out
another way is rstrip method:
vars_path = vars_path_out.rstrip('/')

Camel Casing and Underscore addition in filename with text, number and date

Relatively new with python and pandas, hence need some inputs here. Appreciate some response here.
I'm having multiple files with a filename having a combination of text, number and date. I want to have camel casing with an underscore and trimming of white space to a standard format, for eg,
FileName- ARA Inoc Start Times V34 20200418.xlsx to be named as Ara_Inoc_Start_Time_V34_20200418.xlsx
FileName- Batch Start Time V3 20200418.xlsx to be named as Batch_Start_Time_V3_20200418.xlsx
The challenge I'm facing is
1) how to add an underscore before date?
2) with a word in a filename like ARA Inoc Start - my code converts it to A_R_A _Inoc _Start. How to adapt it to Ara_Inoc? this would involve trimming the white space as well. How to add it in current code.
def change_case(str):
res = [str[0].upper()]
for c in str[1:]:
if c in ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
res.append('_')
res.append(c.upper())
else:
res.append(c)
return ''.join(res)
# Driver code
for filename in os.listdir("C:\\Users\\t\\Documents\\DummyData\\"):
str = filename
print(change_case(str))
Split the strings using str.split(), convert the first letter using str.upper(), then join them using str.join()
import os
for filename in [
' ARA Inoc Start Times V34 20200418.xlsx ',
' Batch_Start_Time_V3_20200418.xlsx '
]: # os.listdir('C:\\Users\\t\\Documents\\DummyData\\')
new_filename = '_'.join([i[:1].upper()+i[1:].lower() for i in filename.strip().split()])
print(new_filename)
Output:
Ara_Inoc_Start_Times_V34_20200418.xlsx
Batch_start_time_v3_20200418.xlsx
Note the use of i[:1].upper()+i[1:] instead of str.title(). You can use the latter, but that will convert the file extension to title case as well, hence why I used the above instead. Alternatively, you can split the filename and the extension before doing the conversion:
import os
for filename in[
' ARA Inoc Start Times V34 20200418.xlsx ',
' Batch_Start_Time_V3_20200418.xlsx '
]:
filename, ext = filename.rsplit('.', 1)
filename = '_'.join([i.title() for i in filename.strip().lower().split()])
new_filename = '.'.join([filename, ext])
print(new_filename)
Output:
Ara_Inoc_Start_Times_V34_20200418.xlsx
Batch_Start_Time_V3_20200418.xlsx

Can't figure out where function is going wrong

I have a text file IDlistfix, which contains a list of youtube video IDs. I'm trying to make a new text file, newlist.txt, which is the IDs in the first video with apostrophes around them and a comma in between the IDs. This is what I've written to accomplish this:
n = open('IDlistfix','r+')
j = open('newlist.txt','w')
line = n.readline()
def listify(rd):
return '\'' + rd + '\','
for line in n:
j.write(listify(line))
This gives me an output of ','rUfg2SLliTQ where I'd expect the output to be 'rUfg2SLliTQ',. Where is my function going wrong?
You just have to strip it of newlines:
j.write(listify(line.strip())) # Notice the call of the .strip() method on the String
Try to remove trailing whitespace and return a formatted string:
n = open('IDlistfix','r+')
j = open('newlist.txt','w')
line = n.readline()
def listify(rd):
# remove trailing whitespace
rd = rd.rstrip()
# return a formatted string
# this is generally preferable to '+'
return "'{0}',".format(rd)
for line in n:
j.write(listify(line))
The problem must be in,
`return '\'' + rd + '\`','
because rd is ending with '/n'.
Remove the '/n' from rd and it should be fine
Is a problem with change of line.
Change:
for line in n:
j.write(listify(line.replace('\n','')))

Replace single quotes with double quotes in python, for use with insert into database

Was wondering whether anyone has a clever solution for fixing bad
insert statements in Python, exported by a not so clever program. It didn't add
two single quotes for strings with a single quote in the string. To
make it a bit easier all the values being inserted are strings.
So it has:
INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');
instead of:
INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');
Obviously there are multiple lines of this and I don't want to replace
the enclosing single quotes as well.
Was thinking of using split and join, but I'm not sure how to easily update the split values while looping in a loop. Sorry I'm a noob. Something like the below, where I'm not sure how to do #update bit
import sys
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
bits = line.split("','")
for bit in bits:
if bit.find("'") > -1:
#update bit
line_out = "','".join(bits)
sys.stdout.write(line_out)
line = fileIN.readline()
Thanks
Based on katrielalex's suggestion, how about this:
>>> import re
>>> s = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"
>>> def repl(m):
if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):
return m.group(0)
return m.group(1) + "''" + m.group(2)
>>> re.sub("(.)'(.)", repl, s)
"INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');"
and if you're into negative lookbehind assertions, this is the headache inducing pure regex version:
re.sub("((?<![(,])'(?![,)]))", "''", s)
while line:
# Restrain line2 to inside parentheses
line1, rest = line.split('(')
line2, line3 = rest.split(')')
# A bit more cleaner
new_bits = []
for bit in line2.split(','):
# Remove border ' characters
bit = bit[1:-1]
# Duplicate the ones inside
if "'" in bit:
bit = bit.replace("'", "''")
# Re-add border '
new_bits.append("'" + bit + "'")
sys.stdout.write(line1 + '(' + ','.join(new_bits + ')' + line3)
line = fileIN.readline()
Warning: This depends way too much on the formatting of the SQL statement. However, if your input is only ever going to have the format "statements (params) end" then this will work every time.
import sys
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
#split out the parameters (between the ()'s)
start, temp = line.split("(")
params, end = temp.split(")")
#replace the "'"s in the parameters (without the start and end quote)
newParams = "','".join([x.replace("'", "''") for x in params[1:-1].split("','")])
#join the statement back together
line_out = start + "('" + newParams + "')" + end
#next line
sys.stdout.write(line_out)
line = fileIN.readline()
Explanation:
Split the string into 3 parts: The query start, the parameters, and the end.
The generator takes the parameters (without the starting/ending 's), splits it on ',', and, for every element in the list the split generates (the individual data entries), replaces the 's with ''s.
The last line then joins the query start, the new params (with the parenthesis and quotes that were removed previously), and the end of the statement.
Another answer:
a = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"
open_par = a.find("(")
close_par = a.find(")")
b = a[open_par+1:close_par]
c = b.split(",")
d = map(lambda x: '"' + x.strip().strip("'") + '"',c)
result = a[:open_par+1] + ",".join(d) + a[close_par:]
Went with:
import sys
import re
def repl(m):
if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):
return m.group(0)
return m.group(1) + "''" + m.group(2)
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
line_out = re.sub("(.)'(.)", repl, line)
sys.stdout.write(line_out)
# Next line.
line = fileIN.readline()

Python RegEx Woes

I'm not sure why this isn't working:
import re
import csv
def check(q, s):
match = re.search(r'%s' % q, s, re.IGNORECASE)
if match:
return True
else:
return False
tstr = []
# test strings
tstr.append('testthisisnotworking')
tstr.append('This is a TEsT')
tstr.append('This is a TEST mon!')
f = open('testwords.txt', 'rU')
reader = csv.reader(f)
for type, term, exp in reader:
for i in range(2):
if check(exp, tstr[i]):
print exp + " hit on " + tstr[i]
else:
print exp + " did NOT hit on " + tstr[i]
f.close()
testwords.txt contains this line:
blah, blah, test
So essentially 'test' is the RegEx pattern. Nothing complex, just a simple word. Here's the output:
test did NOT hit on testthisisnotworking
test hit on This is a TEsT
test hit on This is a TEST mon!
Why does it NOT hit on the first string? I also tried \s*test\s* with no luck. Help?
The csv module by default returns blank spaces around words in the input (this can be changed by using a different "dialect"). So exp contains " test" with a leading space.
A quick way to fix this would be to add:
exp = exp.strip()
after you read from the CSV file.
Adding a print repr(exp) to the top of the first for loop shows that exp is ' test', note the leading space.
This isn't that surprising since csv.reader() splits on commas, try changing your code to the following:
for type, term, exp in reader:
exp = exp.strip()
for s in tstr:
if check(exp, s):
print exp + " hit on " + s
else:
print exp + " did NOT hit on " + s
Note that in addition to the strip() call which will remove the leading a trailing whitespace, I change your second for loop to just loop directly over the strings in tstr instead of over a range. There was actually a bug in your current code because tstr contained three values but you only checked the first two because for i in range(2) will only give you i=0 and i=1.

Categories