error when reading a file with blank lines to the end - python

I'm trying to read a file line by line and do some stuff. The problem is that if I add a bunch of blank lines to the end of the file, I'm getting an exception (list index out of range).
def check_ip(address):
try:
socket.inet_aton(address)
return True
except:
return False
def myfunction:
with open(filename, 'r') as f:
for line in f.readlines():
if not line: continue
tokens = line.strip().split()
if not check_ip( tokens[0] ): continue
// do some stuff

Your empty line test doesn't take into account whitespace.
Use:
if not line.strip(): continue
otherwise you end up with an empty tokens list for those lines.
You don't have to call str.strip() when also using str.split() with no arguments; that call already strips leading and trailing whitespace:
tokens = line.split()
Note that you don't need (nor want) to use f.readlines(); you can iterate over the file object directly:
for line in f:

Related

How to open a file in python, read the comments ("#"), find a word after the comments and select the word after it?

I have a function that loops through a file that Looks like this:
"#" XDI/1.0 XDAC/1.4 Athena/0.9.25
"#" Column.4: pre_edge
Content
That is to say that after the "#" there is a comment. My function aims to read each line and if it starts with a specific word, select what is after the ":"
For example if I had These two lines. I would like to read through them and if the line starts with "#" and contains the word "Column.4" the word "pre_edge" should be stored.
An example of my current approach follows:
with open(file, "r") as f:
for line in f:
if line.startswith ('#'):
word = line.split(" Column.4:")[1]
else:
print("n")
I think my Trouble is specifically after finding a line that starts with "#" how can I parse/search through it? and save its Content if it contains the desidered word.
In case that # comment contain str Column.4: as stated above, you could parse it this way.
with open(filepath) as f:
for line in f:
if line.startswith('#'):
# Here you proceed comment lines
if 'Column.4' in line:
first, remainder = line.split('Column.4: ')
# Remainder contains everything after '# Column.4: '
# So if you want to get first word ->
word = remainder.split()[0]
else:
# Here you can proceed lines that are not comments
pass
Note
Also it is a good practice to use for line in f: statement instead of f.readlines() (as mentioned in other answers), because this way you don't load all lines into memory, but proceed them one by one.
You should start by reading the file into a list and then work through that instead:
file = 'test.txt' #<- call file whatever you want
with open(file, "r") as f:
txt = f.readlines()
for line in txt:
if line.startswith ('"#"'):
word = line.split(" Column.4: ")
try:
print(word[1])
except IndexError:
print(word)
else:
print("n")
Output:
>>> ['"#" XDI/1.0 XDAC/1.4 Athena/0.9.25\n']
>>> pre_edge
Used a try and except catch because the first line also starts with "#" and we can't split that with your current logic.
Also, as a side note, in the question you have the file with lines starting as "#" with the quotation marks so the startswith() function was altered as such.
with open('stuff.txt', 'r+') as f:
data = f.readlines()
for line in data:
words = line.split()
if words and ('#' in words[0]) and ("Column.4:" in words):
print(words[-1])
# pre_edge

Line.strip() is not working in python?

I try to remove a whitespace line in my file; But it is not removing my white space line.
def removeWhiteSpaceLine():
fp = open("singleDataFile.csv")
for i, line in enumerate(fp, 1):
if i == 2:
line.strip()
fp.close()
My sample file is like:(i want to remove the 2nd line which is white space)
Name,Address,Age
John,Melbourne,28
Kati,Brisbane,35
.....
line.strip() do not modify line but returns a new string.
Calling line.strip() alone on its line has no effects. You must reassign the result to your variable:
line = line.strip()
However, it looks like you shouldn't use strip anyway:
strip()
Return a copy of the string with leading and trailing characters removed.
To me, it's unclear what you're asking:
1) fp = open("singleDataFile.csv") opens the file in read only mode. If you expect to update the file, that won't work. If you want to modify the file, open it in write mode ("w" or "r+").
2) Maybe you don't want to modify the file but only ignore the second line? In that case, you should add all the lines in a list and ignore that second line.
with open("singleDataFile.csv", "r+") as f:
content = f.readlines() # read content
f.seek(0) # go back to the begin of the file
for i, line in enumerate(content):
if i != 1: # the condition could also be if line.strip()
f.write(line) # write all lines except second line
f.truncate() # end of file
Try like this, Here modifying the same file like rewriting same contents to same file except blank lines:
with open("singleDataFile.csv","r") as f:
lines=f.readlines()
with open("singleDataFile.csv","w") as f:
for line in lines:
if line.strip():
f.write(line)
try using pandas library.
pd.colname.str.strip()

how can i convert surname:name to name:surname? [duplicate]

In Python, calling e.g. temp = open(filename,'r').readlines() results in a list in which each element is a line from the file. However, these strings have a newline character at the end, which I don't want.
How can I get the data without the newlines?
You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
Or you can strip the newline by hand:
temp = [line[:-1] for line in file]
Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.
This assumption is true in most cases (especially for files created by text editors, which often do add an ending newline anyway).
If you want to avoid this you can add a newline at the end of file:
with open(the_file, 'r+') as f:
f.seek(-1, 2) # go at the end of the file
if f.read(1) != '\n':
# add missing newline if not already present
f.write('\n')
f.flush()
f.seek(0)
lines = [line[:-1] for line in f]
Or a simpler alternative is to strip the newline instead:
[line.rstrip('\n') for line in file]
Or even, although pretty unreadable:
[line[:-(line[-1] == '\n') or len(line)+1] for line in file]
Which exploits the fact that the return value of or isn't a boolean, but the object that was evaluated true or false.
The readlines method is actually equivalent to:
def readlines(self):
lines = []
for line in iter(self.readline, ''):
lines.append(line)
return lines
# or equivalently
def readlines(self):
lines = []
while True:
line = self.readline()
if not line:
break
lines.append(line)
return lines
Since readline() keeps the newline also readlines() keeps it.
Note: for symmetry to readlines() the writelines() method does not add ending newlines, so f2.writelines(f.readlines()) produces an exact copy of f in f2.
temp = open(filename,'r').read().split('\n')
Reading file one row at the time. Removing unwanted chars from end of the string with str.rstrip(chars).
with open(filename, 'r') as fileobj:
for row in fileobj:
print(row.rstrip('\n'))
See also str.strip([chars]) and str.lstrip([chars]).
I think this is the best option.
temp = [line.strip() for line in file.readlines()]
temp = open(filename,'r').read().splitlines()
My preferred one-liner -- if you don't count from pathlib import Path :)
lines = Path(filename).read_text().splitlines()
This it auto-closes the file, no need for with open()...
Added in Python 3.5.
https://docs.python.org/3/library/pathlib.html#pathlib.Path.read_text
Try this:
u=open("url.txt","r")
url=u.read().replace('\n','')
print(url)
To get rid of trailing end-of-line (/n) characters and of empty list values (''), try:
f = open(path_sample, "r")
lines = [line.rstrip('\n') for line in f.readlines() if line.strip() != '']
You can read the file as a list easily using a list comprehension
with open("foo.txt", 'r') as f:
lst = [row.rstrip('\n') for row in f]
my_file = open("first_file.txt", "r")
for line in my_file.readlines():
if line[-1:] == "\n":
print(line[:-1])
else:
print(line)
my_file.close()
This script here will take lines from file and save every line without newline with ,0 at the end in file2.
file = open("temp.txt", "+r")
file2 = open("res.txt", "+w")
for line in file:
file2.writelines(f"{line.splitlines()[0]},0\n")
file2.close()
if you looked at line, this value is data\n, so we put splitlines()
to make it as an array and [0] to choose the only word data
import csv
with open(filename) as f:
csvreader = csv.reader(f)
for line in csvreader:
print(line[0])

Reading a file and storing contents into a dictionary - Python

I'm trying to store contents of a file into a dictionary and I want to return a value when I call its key. Each line of the file has two items (acronyms and corresponding phrases) that are separated by commas, and there are 585 lines. I want to store the acronyms on the left of the comma to the key, and the phrases on the right of the comma to the value. Here's what I have:
def read_file(filename):
infile = open(filename, 'r')
for line in infile:
line = line.strip() #remove newline character at end of each line
phrase = line.split(',')
newDict = {'phrase[0]':'phrase[1]'}
infile.close()
And here's what I get when I try to look up the values:
>>> read_file('acronyms.csv')
>>> acronyms=read_file('acronyms.csv')
>>> acronyms['ABT']
Traceback (most recent call last):
File "<pyshell#65>", line 1, in <module>
acronyms['ABT']
TypeError: 'NoneType' object is not subscriptable
>>>
If I add return newDict to the end of the body of the function, it obviously just returns {'phrase[0]':'phrase[1]'} when I call read_file('acronyms.csv'). I've also tried {phrase[0]:phrase[1]} (no single quotation marks) but that returns the same error. Thanks for any help.
def read_acronym_meanings(path:str):
with open(path) as f:
acronyms = dict(l.strip().split(',') for l in f)
return acronyms
First off, you are creating a new dictionary at every iteration of the loop. Instead, create one dictionary and add elements every time you go over a line. Second, the 'phrase[0]' includes the apostrophes which turn make it a string instead of a reference to the phrase variable that you just created.
Also, try using the with keyword so that you don't have to explicitly close the file later.
def read(filename):
newDict = {}
with open(filename, 'r') as infile:
for line in infile:
line = line.strip() #remove newline character at end of each line
phrase = line.split(',')
newDict[phrase[0]] = phrase[1]}
return newDict
def read_file(filename):
infile = open(filename, 'r')
newDict = {}
for line in infile:
line = line.strip() #remove newline character at end of each line
phrase = line.split(',', 1) # split max of one time
newDict.update( {phrase[0]:phrase[1]})
infile.close()
return newDict
Your original creates a new dictionary every iteration of the loop.

How can I use readline() to begin from the second line?

I'm writing a short program in Python that will read a FASTA file which is usually in this format:
>gi|253795547|ref|NC_012960.1| Candidatus Hodgkinia cicadicola Dsem chromosome, 52 lines
GACGGCTTGTTTGCGTGCGACGAGTTTAGGATTGCTCTTTTGCTAAGCTTGGGGGTTGCGCCCAAAGTGA
TTAGATTTTCCGACAGCGTACGGCGCGCGCTGCTGAACGTGGCCACTGAGCTTACACCTCATTTCAGCGC
TCGCTTGCTGGCGAAGCTGGCAGCAGCTTGTTAATGCTAGTGTTGGGCTCGCCGAAAGCTGGCAGGTCGA
I've created another program that reads the first line(aka header) of this FASTA file and now I want this second program to start reading and printing beginning from the sequence.
How would I do that?
so far i have this:
FASTA = open("test.txt", "r")
def readSeq(FASTA):
"""returns the DNA sequence of a FASTA file"""
for line in FASTA:
line = line.strip()
print line
readSeq(FASTA)
Thanks guys
-Noob
def readSeq(FASTA):
"""returns the DNA sequence of a FASTA file"""
_unused = FASTA.next() # skip heading record
for line in FASTA:
line = line.strip()
print line
Read the docs on file.next() to see why you should be wary of mixing file.readline() with for line in file:
you should show your script. To read from second line, something like this
f=open("file")
f.readline()
for line in f:
print line
f.close()
You might be interested in checking BioPythons handling of Fasta files (source).
def FastaIterator(handle, alphabet = single_letter_alphabet, title2ids = None):
"""Generator function to iterate over Fasta records (as SeqRecord objects).
handle - input file
alphabet - optional alphabet
title2ids - A function that, when given the title of the FASTA
file (without the beginning >), will return the id, name and
description (in that order) for the record as a tuple of strings.
If this is not given, then the entire title line will be used
as the description, and the first word as the id and name.
Note that use of title2ids matches that of Bio.Fasta.SequenceParser
but the defaults are slightly different.
"""
#Skip any text before the first record (e.g. blank lines, comments)
while True:
line = handle.readline()
if line == "" : return #Premature end of file, or just empty?
if line[0] == ">":
break
while True:
if line[0]!=">":
raise ValueError("Records in Fasta files should start with '>' character")
if title2ids:
id, name, descr = title2ids(line[1:].rstrip())
else:
descr = line[1:].rstrip()
id = descr.split()[0]
name = id
lines = []
line = handle.readline()
while True:
if not line : break
if line[0] == ">": break
#Remove trailing whitespace, and any internal spaces
#(and any embedded \r which are possible in mangled files
#when not opened in universal read lines mode)
lines.append(line.rstrip().replace(" ","").replace("\r",""))
line = handle.readline()
#Return the record and then continue...
yield SeqRecord(Seq("".join(lines), alphabet),
id = id, name = name, description = descr)
if not line : return #StopIteration
assert False, "Should not reach this line"
good to see another bioinformatician :)
just include an if clause within your for loop above the line.strip() call
def readSeq(FASTA):
for line in FASTA:
if line.startswith('>'):
continue
line = line.strip()
print(line)
A pythonic and simple way to do this would be slice notation.
>>> f = open('filename')
>>> lines = f.readlines()
>>> lines[1:]
['TTAGATTTTCCGACAGCGTACGGCGCGCGCTGCTGAACGTGGCCACTGAGCTTACACCTCATTTCAGCGC\n', 'TCGCTTGCTGGCGAAGCTGGCAGCAGCTTGTTAATGCTAGTG
TTGGGCTCGCCGAAAGCTGGCAGGTCGA']
That says "give me all elements of lines, from the second (index 1) to the end.
Other general uses of slice notation:
s[i:j] slice of s from i to j
s[i:j:k] slice of s from i to j with step k (k can be negative to go backward)
Either i or j can be omitted (to imply the beginning or the end), and j can be negative to indicate a number of elements from the end.
s[:-1] All but the last element.
Edit in response to gnibbler's comment:
If the file is truly massive you can use iterator slicing to get the same effect while making sure you don't get the whole thing in memory.
import itertools
f = open("filename")
#start at the second line, don't stop, stride by one
for line in itertools.islice(f, 1, None, 1):
print line
"islicing" doesn't have the nice syntax or extra features of regular slicing, but it's a nice approach to remember.

Categories