Python rstrip() (for tabs) not working as expected - python

I was trying out the rstrip() function, but it doesn't work as expected.
For example, if I run this:
lines = ['tra\tla\tla\t\t\t\n', 'tri\tli\tli\t\t\t\n', 'tro\tlo\tlo\t\t\t\n']
for line in lines:
line.rstrip('\t')
print lines
It returns
['tra\tla\tla\t\t\t\n', 'tri\tli\tli\t\t\t\n', 'tro\tlo\tlo\t\t\t\n']
whereas I want it to return:
['tra\tla\tla\n', 'tri\tli\tli\n', 'tro\tlo\tlo\n']
What is the problem here?

The function returns the new, stripped string, but you discard that return value.
Use a list comprehension instead to replace the whole lines list; you'll need to ignore the newlines at the end as well; the .rstrip() method won't ignore those for you.
lines = [line[:-1].rstrip('\t') + '\n' for line in lines]
Demo:
>>> lines = ['tra\tla\tla\t\t\t\n', 'tri\tli\tli\t\t\t\n', 'tro\tlo\tlo\t\t\t\n']
>>> [line[:-1].rstrip('\t') + '\n' for line in lines]
['tra\tla\tla\n', 'tri\tli\tli\n', 'tro\tlo\tlo\n']

Related

Removes white spaces while reading in a file

with open(filename, "r") as f:
for line in f:
line = (' '.join(line.strip().split())).split()
Can anyone break down the line where whitespaces get removed?
I understand line.strip().split() first removes leading and trailing spaces from line then the resulting string gets split on whitespaces and stores all words in a list.
But what does the remaining code do?
The line ' '.join(line.strip().split()) creates a string consisting of all the list elements separated by exactly one whitespace character. Applying split() method on this string again returns a list containing all the words in the string which were separated by a whitespace character.
Here's a breakdown:
# Opens the file
with open(filename, "r") as f:
# Iterates through each line
for line in f:
# Rewriting this line, below:
# line = (' '.join(line.strip().split())).split()
# Assuming line was " foo bar quux "
stripped_line = line.strip() # "foo bar quux"
parts = stripped_line.split() # ["foo", "bar", "quux"]
joined = ' '.join(parts) # "foo bar quux"
parts_again = joined.split() # ["foo", "bar", "quux"]
Is this what you were looking for?
That code is pointlessly complicated is what it is.
There is no need to strip if you're no-arg spliting next (no-arg split drops leading and trailing whitespace by side-effect), so line.strip().split() can simplify to line.split().
The join and re-split doesn't change a thing, join sticks the first split back together with spaces, then split resplits on those very same spaces. So you could save the time spent joining only to split and just keep the original results from the first split, changing it to:
line = line.split()
and it would be functionally identical to the original:
line = (' '.join(line.strip().split())).split()
and faster to boot. I'm guessing the code you were handed was written by someone who didn't understand spliting and joining either, and just threw stuff at their problem without understanding what it did.
Here is explanation to code:-
with open(filename, "r") as f:
for line in f:
line = (' '.join(line.strip().split())).split()
First line.strip() removes leading and trailing white spaces from line and .split() break to list on basis of white spaces.
Again .join convert previous list to a line of white space separated. Finally .split again convert it to list.
This code is superfluous line = (' '.join(line.strip().split())).split(). And it should be:-
line = line.split()
If you again want to strip use:-
line = map(str.strip, line.split())
I think they are doing this to maintain a constant amount of whitespace. The strip is removing all whitespace (could be 5 spaces and a tab), and then they are adding back in the single space in its place.

Removing newline characters in a txt files

I'm doing Euler Problems and am at problem #8 and wanted to just copy this huge 1000-digit number to a numberToProblem8.txt file and then just read it into my script but I can't find a good way to remove newlines from it. With that code:
hugeNumberAsStr = ''
with open('numberToProblem8.txt') as f:
for line in f:
aSingleLine = line.strip()
hugeNumberAsStr.join(aSingleLine)
print(hugeNumberAsStr)
Im using print() to only check if it works and well, it doesnt. It doesnt print out anything. What's wrong with my code? I remove all the trash with strip() and then use join() to add that cleaned line into hugeNumberAsStr (need a string to join those lines, gonna use int() later on) and its repeated for all the lines.
Here is the .txt file with a number in it.
What about something like:
hugeNumberAsStr = open('numberToProblem8.txt').read()
hugeNumberAsStr = hugeNumberAsStr.strip().replace('\n', '')
Or even:
hugeNumberAsStr = ''.join([d for d in hugeNumberAsStr if d.isdigit()])
I was able to simplify it to the following to get the number from that file:
>>> int(open('numberToProblem8.txt').read().replace('\n',''))
731671765313306249192251196744265747423553491949349698352031277450632623957831801698480186947885184385861560789112949495459501737958331952853208805511125406987471585238630507156932909632952274430435576689664895044524452316173185640309871112172238311362229893423380308135336276614282806444486645238749303589072962904915604407723907138105158593079608667017242712188399879790879227492190169972088809377665727333001053367881220235421809751254540594752243525849077116705560136048395864467063244157221553975369781797784617406495514929086256932197846862248283972241375657056057490261407972968652414535100474821663704844031998900088952434506585412275886668811642717147992444292823086346567481391912316282458617866458359124566529476545682848912883142607690042242190226710556263211111093705442175069416589604080719840385096245544
You need to do hugeNumberAsStr += aSingleLine instead of hugeNumberAsStr.join(..)
str.join() joins the passed iterator and return the string value joined by str. It doesn't update the value of hugeNumberAsStr as you think. You want to create a new string with removed \n. You need to store these values in new string. For that you need append the content to the string
The join method for strings simply takes an iterable object and concatenates each part together. It then returns the resulting concatenated string. As stated in help(str.join):
join(...)
S.join(iterable) -> str
Return a string which is the concatenation of the strings in the
iterable. The separator between elements is S.
Thus the join method really does not do what you want.
The concatenation line should be more like:
hugeNumberAsString += aSingleLine
Or even:
hugeNumberAsString += line.strip()
Which gets rid of the extra line of code doing the strip.

Replace part of a matched string in python

I have the following matched strings:
punctacros="Tasla"_TONTA
punctacros="Tasla"_SONTA
punctacros="Tasla"_JONTA
punctacros="Tasla"_BONTA
I want to replace only a part (before the underscore) of the matched strings, and the rest of it should remain the same in each original string.
The result should look like this:
TROGA_TONTA
TROGA_SONTA
TROGA_JONTA
TROGA_BONTA
Edit:
This should work:
from re import sub
with open("/path/to/file") as myfile:
lines = []
for line in myfile:
line = sub('punctacros="Tasla"(_.*)', r'TROGA\1', line)
lines.append(line)
with open("/path/to/file", "w") as myfile:
myfile.writelines(lines)
Result:
TROGA_TONTA
TROGA_SONTA
TROGA_JONTA
TROGA_BONTA
Note however, if your file is exactly like the sample given, you can replace the re.sub line with this:
line = "TROGA_"+line.split("_", 1)[1]
eliminating the need of Regex altogether. I didn't do this though because you seem to want a Regex solution.
mystring.replace('punctacross="Tasla"', 'TROGA_')
where mystring is string with those four lines. It will return string with replaced values.
If you want to replace everything before the first underscore, try this:
#! /usr/bin/python3
data = ['punctacros="Tasla"_TONTA',
'punctacros="Tasla"_SONTA',
'punctacros="Tasla"_JONTA',
'punctacros="Tasla"_BONTA',
'somethingelse!="Tucku"_CONTA']
for s in data:
print('TROGA' + s[s.find('_'):])

Stripping line edings before appending to a list?

Ok I am writing a program that reads text files and goes through the different lines, the problem that I have encountered however is line endings (\n). My aim is to read the text file line by line and write it to a list and remove the line endings before it is appended to the list.
I have tried this:
thelist = []
inputfile = open('text.txt','rU')
for line in inputfile:
line.rstrip()
thelist.append(line)
Strings are immutable in Python. All string methods return new strings, and don't modify the original one, so the line
line.rstrip()
effectively does nothing. You can use a list comprehension to accomplish this:
with open("text.txt", "rU") as f:
lines = [line.rstrip("\n") for line in f]
Also note that it is stringly recommended to use the with statement to open (and implicitly close) files.
with open('text.txt', 'rU') as f: # Use with block to close file on block exit
thelist = [line.rstrip() for line in f]
rstrip doesn't change its argument, it returns modified string, that's why you must write it so:
thelist.append(line.rstrip())
But you can write your code simpler:
with open('text.txt', 'rU') as inputfile:
thelist = [x.rstrip() for x in inputfile]
Use rstrip('\n') on each line before appending to your list.
I think you need something like this.
s = s.strip(' \t\n\r')
This will strip white spaces from both the beginning and the end of you string
In Python - strings are immutable - which means that operations return a new string, and don't modify the existing string. ie, you've got it right, but need to re-assign (or name a new variable) using line = line.rstrip().
rstrip returns a new string. It should be line = line.rstrip(). However, the whole code could be shorter:
thelist = list(map(str.rstrip, open('text.txt','rU')))
UPD: Note that just calling rstrip() trims all trailing whitespace, not just newline. But there is a concise way to do that too:
thelist = open('text.txt','rU').read().splitlines()

Need to reverse File but last line has no Carriage Return, so first item of Reversed List has 2 lines in it

Here is the file
303620.43,6187793.62
303663.61,6187757.08
303652.22,6187702.51
303580.10,6187685.43
303551.63,6187737.15
303574.88,6187775.11
303610.94,6187773.69
When it is reversed I get
303610.94,6187773.69303574.88,6187775.11
303551.63,6187737.15
303580.10,6187685.43
303652.22,6187702.51
303663.61,6187757.08
303620.43,6187793.62
How do I ensure that the Last line when reversed has a '\n' ?
Use rstrip to remove the newline (and other trailing whitespace) off all lines, then rely on print to put it back in.
a = [ln.rstrip() for ln in open('datafile.txt')]
a.reverse()
for ln in a:
print(ln)
In your code, after you've read a line, check that it ends in '\n'. If it doesn't, append a '\n' and carry on as you're doing already.
The endswith method will probably come in handy.
+1 for larsmans response.An other solution is to use the splitlines method of string objects:
myFile = open(filePath, 'r')
lines = myFile.read().splitlines() #by default splitlines removes trailing '\n'
myFile.close()
lines.reverse()
for line in lines:
print line

Categories