Spliting a file into lines in Python using re.split - python

I'm trying to split a file with a list comprehension using code similar to:
lines = [x for x in re.split(r"\n+", file.read()) if not re.match(r"com", x)]
However, the lines list always has an empty string as the last element. Does anyone know a way to avoid this (excluding the cludge of putting a pop() afterwards)?

Put the regular expression hammer away :-)
You can iterate over a file directly; readlines() is almost obsolete these days.
Read about str.strip() (and its friends, lstrip() and rstrip()).
Don't use file as a variable name. It's bad form, because file is a built-in function.
You can write your code as:
lines = []
f = open(filename)
for line in f:
if not line.startswith('com'):
lines.append(line.strip())
If you are still getting blank lines in there, you can add in a test:
lines = []
f = open(filename)
for line in f:
if line.strip() and not line.startswith('com'):
lines.append(line.strip())
If you really want it in one line:
lines = [line.strip() for line in open(filename) if line.strip() and not line.startswith('com')]
Finally, if you're on python 2.6, look at the with statement to improve things a little more.

lines = file.readlines()
edit:
or if you didnt want blank lines in there, you can do
lines = filter(lambda a:(a!='\n'), file.readlines())
edit^2:
to remove trailing newines, you can do
lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]

another handy trick, especially when you need the line number, is to use enumerate:
fp = open("myfile.txt", "r")
for n, line in enumerate(fp.readlines()):
dosomethingwith(n, line)
i only found out about enumerate quite recently but it has come in handy quite a few times since then.

This should work, and eliminate the regular expressions as well:
all_lines = (line.rstrip()
for line in open(filename)
if "com" not in line)
# filter out the empty lines
lines = filter(lambda x : x, all_lines)
Since you're using a list comprehension and not a generator expression (so the whole file gets loaded into memory anyway), here's a shortcut that avoids code to filter out empty lines:
lines = [line
for line in open(filename).read().splitlines()
if "com" not in line]

Related

Enumerate using python

I'm a new coder and am currently trying to write a piece of code that, from an opened txt document, will print out the line number that each piece of information is on.
I've opened the file and striped it of all it's commas. I found online that you can use a function called enumerate() to get the line number. However when I run the code instead of getting numbers like 1, 2, 3 I get information like: 0x113a2cff0. Any idea of how to fix this problem/what the actual problem is? The code for how I used enumerate is below.
my_document = open("data.txt")
readDocument = my_document.readlines()
invalidData = []
for data in readDocument:
stripDocument = data.strip()
if stripDocument.isnumeric() == False:
data = (enumerate(stripDocument))
invalidData.append(data)
First of all, start by opening the document and already reading its content, and it's a good practice to use with, as it closes the document after the use. The readlines function gathers all the lines (this assumes the data.txt file is in the same folder as your .py one:
with open("data.txt") as f:
lines = f.readlines()
After, use enumerate to add index to the lines, so you can read them, use them, or even save the indexes:
for index, line in enumerate(lines):
print(index, line)
As last point, if you have breaklines on your data.txt, the lines will contain a \n, and you can remove them with the line.strip(), if you need.
The full code would be:
with open("data.txt") as f:
lines = f.readlines()
for index, line in enumerate(lines):
print(index, line.strip())
Taking your problem statement:
trying to write a piece of code that, from an opened txt document, will print out the line number that each piece of information is on
You're using enumerate incorrectly as #roganjosh was trying to explain:
with open("data.txt") as my_document:
for i, data in enumerate(my_document):
print(i, data)
The way you're doing it now, you're not removing the commas. The strip() method without arguments only deletes whitespaces leading and trailing the line. If you only want the data, this would work:
invalidData = []
for row_number, data in enumerate(readDocument):
stripped_line = ''.join(data.split(','))
if not stripped_line.isnumeric():
invalidData.append((row_number, data))
You can use the enumerate() function to enumerate a list. This will return a list of tuples containing the index first, then the line string. Like this:
(0, 'first line')
Your readDocument is a list of the lines, so it might be a good idea to name it accordingly.
lines = my_document.readlines()
for i, line in enumerate(lines):
print i, line

generating list by reading from file

i want to generate a list of server addresses and credentials reading from a file, as a single list splitting from newline in file.
file is in this format
login:username
pass:password
destPath:/directory/subdir/
ip:10.95.64.211
ip:10.95.64.215
ip:10.95.64.212
ip:10.95.64.219
ip:10.95.64.213
output i want is in this manner
[['login:username', 'pass:password', 'destPath:/directory/subdirectory', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
i tried this
with open('file') as f:
credentials = [x.strip().split('\n') for x in f.readlines()]
and this returns lists within list
[['login:username'], ['pass:password'], ['destPath:/directory/subdir/'], ['ip:10.95.64.211'], ['ip:10.95.64.215'], ['ip:10.95.64.212'], ['ip:10.95.64.219'], ['ip:10.95.64.213']]
am new to python, how can i split by newline character and create single list. thank you in advance
You could do it like this
with open('servers.dat') as f:
L = [[line.strip() for line in f]]
print(L)
Output
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211', 'ip:10.95.64.215', 'ip:10.95.64.212', 'ip:10.95.64.219', 'ip:10.95.64.213']]
Just use a list comprehension to read the lines. You don't need to split on \n as the regular file iterator reads line by line. The double list is a bit unconventional, just remove the outer [] if you decide you don't want it.
I just noticed you wanted the list of ip addresses joined in one string. It's not clear as its off the screen in the question and you make no attempt to do it in your own code.
To do that read the first three lines individually using next then just join up the remaining lines using ; as your delimiter.
def reader(f):
yield next(f)
yield next(f)
yield next(f)
yield ';'.join(ip.strip() for ip in f)
with open('servers.dat') as f:
L2 = [[line.strip() for line in reader(f)]]
For which the output is
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
It does not match your expected output exactly as there is a typo 'destPath:/directory/subdirectory' instead of 'destPath:/directory/subdir' from the data.
This should work
arr = []
with open('file') as f:
for line in f:
arr.append(line)
return [arr]
You could just treat the file as a list and iterate through it with a for loop:
arr = []
with open('file', 'r') as f:
for line in f:
arr.append(line.strip('\n'))

How to extract last line of text in Python (excluding new lines)?

Textfile:
1
2
3
4
5
6
\n
\n
I know lines[-1] gets you the last line, but I want to disregard any new lines and get the last line of text (6 in this case).
The best approach regarding memory is to exhaust the file. Something like this:
with open('file.txt') as f:
last = None
for line in (line for line in f if line.rstrip('\n')):
last = line
print last
It can be done more elegantly though. A slightly different approach:
with open('file.txt') as f:
last = None
for last in (line for line in f if line.rstrip('\n')):
pass
print last
For a small file you can just read all of the lines, discarding any empty ones. Notice that I've used an inner generator to strip the lines before excluding them in the outer one.
with open(textfile) as fp:
last_line = [l2 for l2 in (l1.strip() for l1 in fp) if l2][-1]
with open('file') as f:
print([i for i in f.read().split('\n') if i != ''][-1])
This is just an edit to Avinash Raj's answer (but since I'm a new account, I can't comment on it). This will preserve any None values in your data (i.e. if the data in your last line is "None" it will work, though depending on your input this may not be an issue).
with open('path/to/file') as infile:
for line in infile:
if not line.strip('\n'):
continue
answer = line
print(answer)
This will print 6 with a newline at the end. You can decide how to strip that. Following are some options:
answer.rstrip('\n') removes trailing newlines
answer.rstrip() removes trailing whitespaces
answer.strip() removes any surrounding whitespaces
with open ('file.txt') as myfile:
for num,line in enumerate(myfile):
pass
print num

Stripping line edings before appending to a list?

Ok I am writing a program that reads text files and goes through the different lines, the problem that I have encountered however is line endings (\n). My aim is to read the text file line by line and write it to a list and remove the line endings before it is appended to the list.
I have tried this:
thelist = []
inputfile = open('text.txt','rU')
for line in inputfile:
line.rstrip()
thelist.append(line)
Strings are immutable in Python. All string methods return new strings, and don't modify the original one, so the line
line.rstrip()
effectively does nothing. You can use a list comprehension to accomplish this:
with open("text.txt", "rU") as f:
lines = [line.rstrip("\n") for line in f]
Also note that it is stringly recommended to use the with statement to open (and implicitly close) files.
with open('text.txt', 'rU') as f: # Use with block to close file on block exit
thelist = [line.rstrip() for line in f]
rstrip doesn't change its argument, it returns modified string, that's why you must write it so:
thelist.append(line.rstrip())
But you can write your code simpler:
with open('text.txt', 'rU') as inputfile:
thelist = [x.rstrip() for x in inputfile]
Use rstrip('\n') on each line before appending to your list.
I think you need something like this.
s = s.strip(' \t\n\r')
This will strip white spaces from both the beginning and the end of you string
In Python - strings are immutable - which means that operations return a new string, and don't modify the existing string. ie, you've got it right, but need to re-assign (or name a new variable) using line = line.rstrip().
rstrip returns a new string. It should be line = line.rstrip(). However, the whole code could be shorter:
thelist = list(map(str.rstrip, open('text.txt','rU')))
UPD: Note that just calling rstrip() trims all trailing whitespace, not just newline. But there is a concise way to do that too:
thelist = open('text.txt','rU').read().splitlines()

Best method for reading newline delimited files and discarding the newlines?

I am trying to determine the best way to handle getting rid of newlines when reading in newline delimited files in Python.
What I've come up with is the following code, include throwaway code to test.
import os
def getfile(filename,results):
f = open(filename)
filecontents = f.readlines()
for line in filecontents:
foo = line.strip('\n')
results.append(foo)
return results
blahblah = []
getfile('/tmp/foo',blahblah)
for x in blahblah:
print x
lines = open(filename).read().splitlines()
Here's a generator that does what you requested. In this case, using rstrip is sufficient and slightly faster than strip.
lines = (line.rstrip('\n') for line in open(filename))
However, you'll most likely want to use this to get rid of trailing whitespaces too.
lines = (line.rstrip() for line in open(filename))
What do you think about this approach?
with open(filename) as data:
datalines = (line.rstrip('\r\n') for line in data)
for line in datalines:
...do something awesome...
Generator expression avoids loading whole file into memory and with ensures closing the file
for line in file('/tmp/foo'):
print line.strip('\n')
Just use generator expressions:
blahblah = (l.rstrip() for l in open(filename))
for x in blahblah:
print x
Also I want to advise you against reading whole file in memory -- looping over generators is much more efficient on big datasets.
I use this
def cleaned( aFile ):
for line in aFile:
yield line.strip()
Then I can do things like this.
lines = list( cleaned( open("file","r") ) )
Or, I can extend cleaned with extra functions to, for example, drop blank lines or skip comment lines or whatever.
I'd do it like this:
f = open('test.txt')
l = [l for l in f.readlines() if l.strip()]
f.close()
print l

Categories