Problem with reading file in Python - python

I have a file: Alus.txt
File content: (each name in new line)
Margus
Mihkel
Daniel
Mark
Juri
Victor
Marek
Nikolai
Pavel
Kalle
Problem: While programm reads this file, there are \n after each name (['Margus\n', 'Mihkel\n', 'Daniel\n', 'Mark\n', 'Juri\n', 'Victor\n', 'Marek\n', 'Nikolai\n', 'Pavel\n', 'Kalle']). How can I remove \n and have a list with names? What I am doing wrong? Thank you.
alus = []
file = open('alus.txt', 'r')
while True:
rida = file.readline()
if (rida == ''):
break
else:
alus.append(rida)

You can remove the linebreaks with rstrip:
alus = []
with open('alus.txt', 'r') as f:
for rida in f:
rida=rida.rstrip()
if rida: alus.append(rida)
else: break
By the way, the usual way to test if a string is empty is
if not rida:
rather than
if (rida == ''):
And if you have an if...else block, you should consider the non-negated form:
if rida:
since it is usually easier to read and understand.
Edit: My previous comment about removing break was wrong. (I was mistaking break with continue.) Since break stops the loop, it needs to be kept to preserve the behavior of your original code.
Edit 2: A.L. Flanagan rightly points out that rstrip removes all trailing whitespace, not just the ending newline character(s). If you'd like to remove the newline characters only, you could use A.L. Flanagan's method, or list the characters you wish to remove as an argument to rstrip:
rida = rida.rstrip(r'\r\n')

alnus = [l.rstrip() for l in open('alus.txt', 'r')]

open('alus.txt').read().splitlines()

One possible problem with rstrip() is that it will remove any whitespace. If you want to preserve whitespace, you can use slices:
if line.endswith('\n'):
line = line[:-1]
If you could be sure all the lines end with '\n', you could speed it up by removing the if. However, in general, you can't be sure the last line in a text file has a newline.

Related

How to make everything in a string lowercase

I am trying to write a function that will print a poem reading the words backwards and make all the characters lower case. I have looked around and found that .lower() should make everything in the string lowercase; however I cannot seem to make it work with my function. I don't know if I'm putting it in the wrong spot or if .lower() will not work in my code. Any feedback is appreciated!
Below is my code before entering .lower() anywhere into it:
def readingWordsBackwards( poemFileName ):
inputFile = open(poemFileName, 'r')
poemTitle = inputFile.readline().strip()
poemAuthor = inputFile.readline().strip()
inputFile.readline()
print ("\t You have to write the readingWordsBackwards function \n")
lines = []
for line in inputFile:
lines.append(line)
lines.reverse()
for i, line in enumerate(lines):
reversed_line = remove_punctuation(line).strip().split(" ")
reversed_line.reverse()
print(len(lines) - i, " ".join(reversed_line))
inputFile.close()
As per official documentation,
str.lower()
Return a copy of the string with all the cased characters [4] converted to lowercase.
So you could use it at several different places, e.g.
lines.append(line.lower())
reversed_line = remove_punctuation(line).strip().split(" ").lower()
or
print(len(lines) - i, " ".join(reversed_line).lower())
(this would not store the result, but only print it, so it is likely not what you want).
Note that, depending on the language of the source, you may need a little caution, e.g., this.
See also other relevant answers for How to convert string to lowercase in Python
I think changing the second to last line to this may work
print(len(lines) - i, " ".join(reversed_line).lower())
You could probably insert it here, for instance:
lines.append(line.lower())
Note that line.lower() does not do anything to line itself (strings are immutable!), but returns a new string object. To make line hold that lowercase string, you'd do:
line = line.lower()
Store the contents of a file in a variable, the assign it to itself .lower() like so:
fileContents = inputFile.readline()
fileContents = fileContents.lower()

How to extract last line of text in Python (excluding new lines)?

Textfile:
1
2
3
4
5
6
\n
\n
I know lines[-1] gets you the last line, but I want to disregard any new lines and get the last line of text (6 in this case).
The best approach regarding memory is to exhaust the file. Something like this:
with open('file.txt') as f:
last = None
for line in (line for line in f if line.rstrip('\n')):
last = line
print last
It can be done more elegantly though. A slightly different approach:
with open('file.txt') as f:
last = None
for last in (line for line in f if line.rstrip('\n')):
pass
print last
For a small file you can just read all of the lines, discarding any empty ones. Notice that I've used an inner generator to strip the lines before excluding them in the outer one.
with open(textfile) as fp:
last_line = [l2 for l2 in (l1.strip() for l1 in fp) if l2][-1]
with open('file') as f:
print([i for i in f.read().split('\n') if i != ''][-1])
This is just an edit to Avinash Raj's answer (but since I'm a new account, I can't comment on it). This will preserve any None values in your data (i.e. if the data in your last line is "None" it will work, though depending on your input this may not be an issue).
with open('path/to/file') as infile:
for line in infile:
if not line.strip('\n'):
continue
answer = line
print(answer)
This will print 6 with a newline at the end. You can decide how to strip that. Following are some options:
answer.rstrip('\n') removes trailing newlines
answer.rstrip() removes trailing whitespaces
answer.strip() removes any surrounding whitespaces
with open ('file.txt') as myfile:
for num,line in enumerate(myfile):
pass
print num

\n appending at the end of each line

I am writing lines one by one to an external files. Each line has 9 columns separated by Tab delimiter. If i split each line in that file and output last column, i can see \n being appended to the end of the 9 column. My code is:
#!/usr/bin/python
with open("temp", "r") as f:
for lines in f:
hashes = lines.split("\t")
print hashes[8]
The last column values are integers, either 1 or 2. When i run this program, the output i get is,
['1\n']
['2\n']
I should only get 1 or 2. Why is '\n' being appended here?
I tried the following check to remove the problem.
with open("temp", "r") as f:
for lines in f:
if lines != '\n':
hashes = lines.split("\t")
print hashes[8]
This too is not working. I tried if lines != ' '. How can i make this go away? Thanks in advance.
Try using strip on the lines to remove the \n (the new line character). strip removes the leading and trailing whitespace characters.
with open("temp", "r") as f:
for lines in f.readlines():
if lines.strip():
hashes = lines.split("\t")
print hashes[8]
\n is the newline character, it is how the computer knows to display the data on the next line. If you modify the last item in the array hashes[-1] to remove the last character, then that should be fine.
Depending on the platform, your line ending may be more than just one character. Dos/Windows uses "\r\n" for example.
def clean(file_handle):
for line in file_handle:
yield line.rstrip()
with open('temp', 'r') as f:
for line in clean(f):
hashes = line.split('\t')
print hashes[-1]
I prefer rstrip() for times when I want to preserve leading whitespace. That and using generator functions to clean up my input.
Because each line has 9 columns, the 8th index (which is the 9th object) has a line break, since the next line starts. Just take that away:
print hashes[8][:-1]

Splitting lines in python based on some character

Input:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Output:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
'!' is the starting character and +0013 should be the ending of each line (if present).
Problem which I am getting:
Output is like :
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
Any help would be highly appreciated...!!!
My code:
file_open= open('sample.txt','r')
file_read= file_open.read()
file_open2= open('output.txt','w+')
counter =0
for i in file_read:
if '!' in i:
if counter == 1:
file_open2.write('\n')
counter= counter -1
counter= counter +1
file_open2.write(i)
You can try something like this:
with open("abc.txt") as f:
data=f.read().replace("\r\n","") #replace the newlines with ""
#the newline can be "\n" in your system instead of "\r\n"
ans=filter(None,data.split("!")) #split the data at '!', then filter out empty lines
for x in ans:
print "!"+x #or write to some other file
.....:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Could you just use str.split?
lines = file_read.split('!')
Now lines is a list which holds the split data. This is almost the lines you want to write -- The only difference is that they don't have trailing newlines and they don't have '!' at the start. We can put those in easily with string formatting -- e.g. '!{0}\n'.format(line). Then we can put that whole thing in a generator expression which we'll pass to file.writelines to put the data in a new file:
file_open2.writelines('!{0}\n'.format(line) for line in lines)
You might need:
file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
if you find that you're getting more newlines than you wanted in the output.
A few other points, when opening files, it's nice to use a context manager -- This makes sure that the file is closed properly:
with open('inputfile') as fin:
lines = fin.read()
with open('outputfile','w') as fout:
fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
Another option, using replace instead of split, since you know the starting and ending characters of each line:
In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '')
In [15]: print data.replace('+0013!', "+0013\n!")
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Just for some variance, here is a regular expression answer:
import re
outputFile = open('output.txt', 'w+')
with open('sample.txt', 'r') as f:
for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL):
outputFile.write(line.replace("\n", "") + '\n')
outputFile.close()
It will open the output file, get the contents of the input file, and loop through all the matches using the regular expression !.+?(?=!|$) with the re.DOTALL flag. The regular expression explanation & what it matches can be found here: http://regex101.com/r/aK6aV4
After we have a match, we strip out the new lines from the match, and write it to the file.
Let's try to add a \n before every "!"; then let python splitlines :-) :
file_read.replace("!", "!\n").splitlines()
I will actually implement as a generator so that you can work on the data stream rather than the entire content of the file. This will be quite memory friendly if working with huge files
>>> def split_on_stream(it,sep="!"):
prev = ""
for line in it:
line = (prev + line.strip()).split(sep)
for parts in line[:-1]:
yield parts
prev = line[-1]
yield prev
>>> with open("test.txt") as fin:
for parts in split_on_stream(fin):
print parts
,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:19,000.0,0,37N22.

Stripping line edings before appending to a list?

Ok I am writing a program that reads text files and goes through the different lines, the problem that I have encountered however is line endings (\n). My aim is to read the text file line by line and write it to a list and remove the line endings before it is appended to the list.
I have tried this:
thelist = []
inputfile = open('text.txt','rU')
for line in inputfile:
line.rstrip()
thelist.append(line)
Strings are immutable in Python. All string methods return new strings, and don't modify the original one, so the line
line.rstrip()
effectively does nothing. You can use a list comprehension to accomplish this:
with open("text.txt", "rU") as f:
lines = [line.rstrip("\n") for line in f]
Also note that it is stringly recommended to use the with statement to open (and implicitly close) files.
with open('text.txt', 'rU') as f: # Use with block to close file on block exit
thelist = [line.rstrip() for line in f]
rstrip doesn't change its argument, it returns modified string, that's why you must write it so:
thelist.append(line.rstrip())
But you can write your code simpler:
with open('text.txt', 'rU') as inputfile:
thelist = [x.rstrip() for x in inputfile]
Use rstrip('\n') on each line before appending to your list.
I think you need something like this.
s = s.strip(' \t\n\r')
This will strip white spaces from both the beginning and the end of you string
In Python - strings are immutable - which means that operations return a new string, and don't modify the existing string. ie, you've got it right, but need to re-assign (or name a new variable) using line = line.rstrip().
rstrip returns a new string. It should be line = line.rstrip(). However, the whole code could be shorter:
thelist = list(map(str.rstrip, open('text.txt','rU')))
UPD: Note that just calling rstrip() trims all trailing whitespace, not just newline. But there is a concise way to do that too:
thelist = open('text.txt','rU').read().splitlines()

Categories