python split string but keep delimiter - python

In python I can easily read a file line by line into a set, just be using:
file = open("filename.txt", 'r')
content = set(file)
each of the elements in the set consists of the actual line and also the trailing line-break.
Now I have a string with multiple lines, which I want to compare to the content by using the normal set operations.
Is there any way of transforming a string into a set just the same way, such, that it also contains the line-breaks?
Edit:
The question "In Python, how do I split a string and keep the separators?" deals with a similar problem, but the answer doesn't make it easy to adopt to other use-cases.
import re
content = re.split("(\n)", string)
doesn't have the expected effect.

The str.splitlines() method does exactly what you want if you pass True as the optional keepends parameter. It keeps the newlines on the end of each line, and doesn't add one to the last line if there was no newline at the end of the string.
text = "foo\nbar\nbaz"
lines = text.splitlines(True)
print(lines) # prints ['foo\n', 'bar\n', 'baz']

Here's a simple generator that does the job:
content = set(e + "\n" for e in s.split("\n"))
This solution adds an additional newline at the end though.

you can also do it the other way round, remove line endings when reading file lines, assuming you open the file with U for universal line endings:
file = open("filename.txt", 'rU')
content = set(line.rstrip('\n') for line in file)

Could this be what you mean?
>>> from io import StringIO
>>> someLines=StringIO('''\
... line1
... line2
... line3
... ''')
>>> content=set(someLines)
>>> content
{'line1\n', 'line2\n', 'line3\n'}

Related

Send keylogger log files to e-mail [duplicate]

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

reading in file python says its a string [duplicate]

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

Python I/O not creating new line

I have the following code:
lines[usernum] = str(user) + "\n"
f = open('users.txt','w')
userstr = str(lines)
f.write(userstr)
Effectively I am modifying the lines list at usernum, then writing it back to the original file 'users.txt'. However, even adding the "\n", everything gets written to one line rather than an individual line for each line in lines. Why is this?
That is because str(lines) will not do what you expect it to do:
>>> lst = ['This', 'is a\n', 'fancy\n', 'list']
>>> print(str(lst))
['This', 'is a\n', 'fancy\n', 'list']
As you can see, it will write a representation of the list, and not print out the individual lines. You can use str.join to combine the list elements:
lines[usernum] = str(user)
with open('users.txt', 'w') as f:
f.write('\n'.join(lines))
Using '\n' as the join character will make sure that a newline is inserted between each list element, so you do not need to take care of adding it to every item manually.
Furthermore, when working with files, it is recommended to use the with statement when opening them, to make sure that they are closed correctly when you are done working with them.
You are writing the string representation of the list to the file.
String representations of Python containers use the representation of the contents, not directly write strings.
Use the writelines() function to write a sequence of strings:
f.writelines(lines)
or write each line separately:
for line in lines:
f.write(line)
or join the lines into one long string:
f.write(''.join(lines))
The latter also allows you to add the newlines at the time of writing, by using \n as the joining string:
f.write('\n'.join(lines))
where lines contains strings without newlines.
Use a for-loop and '\n' to write each item on a new line:
with open('users.txt','w') as f:
for line in lines:
f.write('{}\n'.format(line))
Calling str() on a python object like list or dict simply returns the str version of that object, to write it's content to a file you should loop over it.
Demo:
lines = [1, 'foo', 3, 'bar']
with open('users.txt','w') as f:
for line in lines:
f.write('{}\n'.format(line))
...
>>> !cat users.txt
1
foo
3
bar

Splitting lines in python based on some character

Input:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Output:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
'!' is the starting character and +0013 should be the ending of each line (if present).
Problem which I am getting:
Output is like :
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
Any help would be highly appreciated...!!!
My code:
file_open= open('sample.txt','r')
file_read= file_open.read()
file_open2= open('output.txt','w+')
counter =0
for i in file_read:
if '!' in i:
if counter == 1:
file_open2.write('\n')
counter= counter -1
counter= counter +1
file_open2.write(i)
You can try something like this:
with open("abc.txt") as f:
data=f.read().replace("\r\n","") #replace the newlines with ""
#the newline can be "\n" in your system instead of "\r\n"
ans=filter(None,data.split("!")) #split the data at '!', then filter out empty lines
for x in ans:
print "!"+x #or write to some other file
.....:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Could you just use str.split?
lines = file_read.split('!')
Now lines is a list which holds the split data. This is almost the lines you want to write -- The only difference is that they don't have trailing newlines and they don't have '!' at the start. We can put those in easily with string formatting -- e.g. '!{0}\n'.format(line). Then we can put that whole thing in a generator expression which we'll pass to file.writelines to put the data in a new file:
file_open2.writelines('!{0}\n'.format(line) for line in lines)
You might need:
file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
if you find that you're getting more newlines than you wanted in the output.
A few other points, when opening files, it's nice to use a context manager -- This makes sure that the file is closed properly:
with open('inputfile') as fin:
lines = fin.read()
with open('outputfile','w') as fout:
fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
Another option, using replace instead of split, since you know the starting and ending characters of each line:
In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '')
In [15]: print data.replace('+0013!', "+0013\n!")
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Just for some variance, here is a regular expression answer:
import re
outputFile = open('output.txt', 'w+')
with open('sample.txt', 'r') as f:
for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL):
outputFile.write(line.replace("\n", "") + '\n')
outputFile.close()
It will open the output file, get the contents of the input file, and loop through all the matches using the regular expression !.+?(?=!|$) with the re.DOTALL flag. The regular expression explanation & what it matches can be found here: http://regex101.com/r/aK6aV4
After we have a match, we strip out the new lines from the match, and write it to the file.
Let's try to add a \n before every "!"; then let python splitlines :-) :
file_read.replace("!", "!\n").splitlines()
I will actually implement as a generator so that you can work on the data stream rather than the entire content of the file. This will be quite memory friendly if working with huge files
>>> def split_on_stream(it,sep="!"):
prev = ""
for line in it:
line = (prev + line.strip()).split(sep)
for parts in line[:-1]:
yield parts
prev = line[-1]
yield prev
>>> with open("test.txt") as fin:
for parts in split_on_stream(fin):
print parts
,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:19,000.0,0,37N22.

Stripping line edings before appending to a list?

Ok I am writing a program that reads text files and goes through the different lines, the problem that I have encountered however is line endings (\n). My aim is to read the text file line by line and write it to a list and remove the line endings before it is appended to the list.
I have tried this:
thelist = []
inputfile = open('text.txt','rU')
for line in inputfile:
line.rstrip()
thelist.append(line)
Strings are immutable in Python. All string methods return new strings, and don't modify the original one, so the line
line.rstrip()
effectively does nothing. You can use a list comprehension to accomplish this:
with open("text.txt", "rU") as f:
lines = [line.rstrip("\n") for line in f]
Also note that it is stringly recommended to use the with statement to open (and implicitly close) files.
with open('text.txt', 'rU') as f: # Use with block to close file on block exit
thelist = [line.rstrip() for line in f]
rstrip doesn't change its argument, it returns modified string, that's why you must write it so:
thelist.append(line.rstrip())
But you can write your code simpler:
with open('text.txt', 'rU') as inputfile:
thelist = [x.rstrip() for x in inputfile]
Use rstrip('\n') on each line before appending to your list.
I think you need something like this.
s = s.strip(' \t\n\r')
This will strip white spaces from both the beginning and the end of you string
In Python - strings are immutable - which means that operations return a new string, and don't modify the existing string. ie, you've got it right, but need to re-assign (or name a new variable) using line = line.rstrip().
rstrip returns a new string. It should be line = line.rstrip(). However, the whole code could be shorter:
thelist = list(map(str.rstrip, open('text.txt','rU')))
UPD: Note that just calling rstrip() trims all trailing whitespace, not just newline. But there is a concise way to do that too:
thelist = open('text.txt','rU').read().splitlines()

Categories