python strip function is not giving expected output

python strip function is not giving expected output - python

i have below code in which filenames are FR1.1.csv, FR2.0.csv etc. I am using these names to print in header row but i want to modify these name to FR1.1 , Fr2.0 and so on. Hence i am using strip function to remove .csv. when i have tried it at command prompt its working fine. But when i have added it to main script its not giving output.
for fname in filenames:
print "fname : ", fname
fname.strip('.csv');
print "after strip fname: ", fname
headerline.append(fname+' Compile');
headerline.append(fname+' Run');
output i am getting
fname :FR1.1.csv
after strip fname: FR1.1.csv
required output-->
fname :FR1.1.csv
after strip fname: FR1.1
i guess some indentation problem is there in my code after for loop.
plesae tell me what is the correct way to achive this.

Strings are immutable, so string methods can't change the original string, they return a new one which you need to assign again:
fname = fname.strip('.csv') # no semicolons in Python!
But this call doesn't do what you probably expect it to. It will remove all the leading and trailing characters c, s, v and . from your string:
>>> "cross.csv".strip(".csv")
'ro'
So you probably want to do
import re
fname = re.sub(r"\.csv$", "", fname)

Strings are immutable. strip() returns a new string.
>>> "FR1.1.csv".strip('.csv')
'FR1.1'
>>> m = "FR1.1.csv".strip('.csv')
>>> print(m)
FR1.1
You need to do fname = fname.strip('.csv').
And get rid of the semicolons in the end!
P.S - Please see Jon Clement's comment and Tim Pietzcker's answer to know why this code should not be used.

You probably should use os.path for path manipulations:
import os
#...
for fname in filenames:
print "fname : ", fname
fname = os.path.splitext(fname)[0]
#...
The particular reason why your code fails is provided in other answers.

change
fname.strip('.csv')
with
fname = fname.strip('.csv')

Related

Does Python have a string function to find a section of a string

I'm trying to filter some log files that are in the format of a table/dataset but .endswith() and .startswith() are not meeting my requirments. I'm using an anonymous function but need to adapt my Python code to check if a string contains .jpg
logfilejpg = sc.textFile("/loudacre/logs/*.log").filter(lambda line: line.endswith('.jpg'))

Use in:
'.jpg' in 'something.jpg foo'
Out: True
You can also put it in your lambda expression:
lambda line: '.jpg' in line
Example:
list(filter(lambda line: '.jpg' in line, ["foo", "foo.jpg.bar", "bar.jpg"]))
Out: ['foo.jpg.bar', 'bar.jpg']

To get the index of where the ".jpg" starts at:
hello = "world.jpg"
print(hello.find(".jpg"))

You can split the inintial string by " " (space) then by "." and take the second value in the resulting array. Of course it depends on how your initial string is. The basic idea is you can isolate the ".jpg" and use equal to check.
To verify that the file is actually a jog, you can try to open it. If it fails, the file is ether other format or corrupt, see also the excepption you get.

Using str.find() and len(), you could find the substring like so:
a_string = 'there is a .jpg here.'
start = a_string.find('.jpg') # The lowest index in a_string where '.jpg' is found
end = start + len('.jpg')
print(a_string[start:end])
# .jpg

Python match '.\' at start of string

I need to identify where some powershell path strings cross over into Python.
How do I detect if a path in Python starts with .\ ??
Here's an example:
import re
file_path = ".\reports\dsReports"
if re.match(r'.\\', file_path):
print "Pass"
else:
print "Fail"
This Fails, in the debugger it lists
expression = .\\\\\\
string = .\\reports\\\\dsReports
If I try using replace like so:
import re
file_path = ".\reports\dsReports"
testThis = file_path.replace(r'\', '&jkl$ff88')
if re.match(r'.&jkl$ff88', file_path):
print "Pass"
else:
print "Fail"
the testThis variable ends up like this:
testThis = '.\\reports&jkl$ff88dsReports'
Quite agravating.

The reason this is happening is because \r is an escape sequence. You will need to either escape the backslashes by doubling them, or use a raw string literal like this:
file_path = r".\reports\dsReports"
And then check if it starts with ".\\":
if file_path.startswith('.\\'):
do_whatever()

removing a string of four characters from the front and thirteen characters from the end of a filename

I have seen the basic Python code for a filename replacement in a directory but they are always for known strings, but how would you remove random characters of a certain length?
Would this work?
newFileName = file.replace([-5:], "")
As I am trying to remove the last five characters from the filename without removing the extension.
Here is an update:
I am trying to do this:
DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml
to
CIWS15-AAAA-A00-00-0000-00A-018A-D.xml
which removes DMC- and _014-00_EN-US from the end.
I need to add this to a code that will fix a directory of files.

This problem (if I understand it correctly) has a clear separation. Remove extension, remove X characters from beginning and end, and then add the extension again to get the final answer.
import os
oldFileName = 'xxxx-filename-xxxxx.XML'
# remove n chars in beginning, m chars at end
n = 5
m = 6
name, ext = os.path.splitext(oldFileName)
# splice away the chars, and add the extension
newFileName = '{}{}'.format(name[0:-m][n:], ext)
# newFileName == 'filename.XML'
So in your case, you would use n=4 and m=13.
If you didn't know the length, but you knew you wanted everything up to and including the first dash out, and likewise everything after the first underscore (which would mean there couldn't be underscores in the normal filename or the first part of it), this would work also:
import os
oldFileName = 'DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml'
name, ext = os.path.splitext(oldFileName)
newFileName = '{}{}'.format(name[name.index('-')+1:name.index('_')], ext)
# newFileName == 'CIWS15-AAAA-A00-00-0000-00A-018A-D.xml'
And even if the pattern is something else, but there is a pattern, you can code to match it, like I have here.

Its not nice but I hope this works for you tho
If you know the files that you want to rename all have the same length, you can try:
>>>file = 'DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml'
>>>ext = file[51:]
>>>newFile = file[4:38]+ext
when you print the newFile you now have:
>>>print(newFile)
CIWS15-AAAA-A00-00-0000-00A-018A-D.xml

adding extra information to filename - python

I used the following line to rename my file by adding timing and remove extra space and replace it with (-)
if i would like to add extra information like lable before the timing ,
filename = ("%s_%s.mp4" %(pfile, time.strftime("%Y-%m-%d_%H:%M:%S",time.localtime()))).replace(" ", "-")
the current output looks like
testfile_2016-07-25_12:17:14.mp4
im looking to have the file output as
testfile_2016-07-25_12:17:14-MediaFile.mp4
try the following ,
filename = ("%s_%s_%s.mp4" %(pfile, time.strftime("%Y-%m-%d_%H:%M:%S","Mediafile",time.localtime()))).replace(" ", "-")
what did i missed here ?

You're using the function strftime incorrectly. Strftime only takes 2 arguments and you're passing it 3.
You would need to generate the string from the time and apply some string operations to append the extra info.
If you want to add MediaFile to the end of the filename simply do something like this.
filename = ("%s_%s-MediaFile.mp4" %(pfile, time.strftime("%Y-%m-%d_%H:%M:%S",time.localtime()))).replace(" ", "-")

filename = ("%s_%s-%s.mp4" %(pfile, time.strftime("%Y-%m-%d_%H:%M:%S",time.localtime()), 'MediaFile')).replace(' ', '-')
# 'testfile_2016-07-25_10:29:28-MediaFile.mp4'
To understand better how this works and slightly improve readability, you can define your time stamp in a separate variable:
timestr = time.strftime("%Y-%m-%d_%H:%M:%S", time.localtime()) # 2016-07-25_10:31:03
filename = ("%s_%s-%s" %(pfile, timestr, 'MediaFile')).replace(' ', '-')
# 'testfile_2016-07-25_10:31:03-MediaFile.mp4'
or
filename = ("%s_%s-MediaFile.mp4" %(pfile, timestr)).replace(' ', '-')
For completeness, you can also use the format() method:
filename = '{0}_{1}-MediaFile.mp4'.format(pfile, timestr).replace(' ', '-')

What you are looking for should be :
filename = ("%s_%s_%s.mp4" %(pfile, time.strftime("%Y-%m-%d_%H:%M:%S",time.localtime()),"Mediafile")).replace(" ", "-")
In your original code, the 'Mediafile' string was not in the right place : you put it as an argument of strftime(), when you should put it as one of the string to replace, in the 2nd level of parentheses.

Two regex functions together do not work

I am trying to get the index for the start of a tag and the end of another tag. However, when I use one regex it works absolutely fine but for two regex functions, it gives an error for the second one.
Kindly help in explaining the reason
The below code works fine:
import re
f = open('C:/Users/Jyoti/Desktop/PythonPrograms/try.xml','r')
opentag = re.search('<TEXT>',f.read())
begin = opentag.start()+6
print begin
But when I add another similar regex it give me the error
AttributeError: 'NoneType' object has no attribute 'start'
which I understand is due to the start() function returning None
Below is the code:
import re
f = open('C:/Users/Jyoti/Desktop/PythonPrograms/try.xml','r')
opentag = re.search('<TEXT>',f.read())
begin = opentag.start()+6
print begin
closetag = re.search('</TEXT>',f.read())
end = closetag.start() - 1
print end
Please provide a solution to how can I get this working. Also I am a newbie here so please don't mind if I ask more questions on the solution.

You are reading the file in f.read() which reads the whole file, and so the file descriptor moves forward, which means the text can't be read again when you do f.read() the next time.
If you need to search on the same text again, save the output of f.read(), and then do a regular expression search on it as below:
import re
f = open('C:/Users/Jyoti/Desktop/PythonPrograms/try.xml','r')
text = f.read()
opentag = re.search('<TEXT>',text)
begin = opentag.start()+6
print begin
closetag = re.search('</TEXT>',text)
end = closetag.start() - 1
print end

f.read() reads the whole file. So there's nothing left to read on the second f.read() call.
See https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

First of all you have to know that f.read() after read file sets the pointer to the EOF so if you again use f.read() it gives you empty string ''. Secondly you should use r before string passed as a pattern of re.search function, which means raw, and automatically escapes special characters. So you have to do something like this:
import re
f = open('C:/Users/Jyoti/Desktop/PythonPrograms/try.xml','r')
data = f.read()
opentag = re.search(r'<TEXT>',data)
begin = opentag.start()+6
print begin
closetag = re.search(r'</TEXT>',data)
end = closetag.start() - 1
print end
gl & hf with Python :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python strip function is not giving expected output - python

You probably should use os.path for path manipulations: import os #... for fname in filenames: print "fname : ", fname fname = os.path.splitext(fname)[0] #... The particular reason why your code fails is provided in other answers.

change fname.strip('.csv') with fname = fname.strip('.csv')

Related

Does Python have a string function to find a section of a string

Python match '.\' at start of string

removing a string of four characters from the front and thirteen characters from the end of a filename

adding extra information to filename - python

Two regex functions together do not work

Categories

Resources