I am a beginner at python and trying to solve the below:
I have a text file that each line starts like this:
<18:12:53.972>
<18:12:53.975>
<18:12:53.975>
<18:12:53.975>
<18:12:54.008>
etc
Instead of above I would like to add the elapsed time in seconds in the beginning of each line, but only if the line starts with '<'.
<0.0><18:12:53.972>
<0.003><18:12:53.975>
<0.003><18:12:53.975>
<0.003><18:12:53.975>
<0.036><18:12:54.008>
etc
Here comes a try :-)
#import datetime
from datetime import timedelta
from sys import argv
#get filename as argument
run, input, output = argv
#get number of lines for textfile
nr_of_lines = sum(1 for line in open(input))
#read in file
f = open(input)
lines = f.readlines()
f.close
#declarations
do_once = True
time = []
delta_to_list = []
i = 0
#read in and translate all timevalues from logfile to delta time.
while i < nr_of_lines:
i += 1
if lines[i-1].startswith('<'):
get_lines = lines[i-1] #get one line
get_time = (get_lines[1:13]) #get the time from that line
h = int(get_time[0:2])
m = int(get_time[3:5])
s = int(get_time[6:8])
ms = int(get_time[9:13])
time = timedelta(hours = h, minutes = m, seconds = s, microseconds = 0, milliseconds = ms)
sec_time = time.seconds + (ms/1000)
if do_once:
start_value = sec_time
do_once = False
delta = float("{0:.3f}".format(sec_time - start_value))
delta_to_list.append(delta)
#write back values to logfile.
k=0
s = str(delta_to_list[k])
with open(output, 'w') as out_file:
with open(input, 'r') as in_file:
for line in in_file:
if line.startswith('<'):
s = str(delta_to_list[k])
out_file.write("<" + s + ">" + line)
else:
out_file.write(line)
k += 1
As it is now, it works fine, but the last two lines is not written to the new file. It says: "s = str(delta_to_list[k]) IndexError: list index out of range.
At first I would like to get my code working, and second a suggestions for improvements. Thank you!
First point: never read a full file in memory when you don't have too (and specially when you don't know whether you have enough free memory).
Second point: learn to use python's for loop and iteration protocol. The way to iterate over a list and any other iterable is:
for item in some_iterable:
do_something_with(item)
This avoids messing with indexes and getting it wrong ;)
One of the nice things with Python file objects is that they actually are iterables, so to iterate over a file lines, the simplest way is:
for line in my_opened_file:
do_something_with(line)
Here's a simple yet working and mostly pythonic (nb: python 2.7.x) way to write your program:
# -*- coding: utf-8 -*-
import os
import sys
import datetime
import re
import tempfile
def totime(timestr):
""" returns a datetime object for a "HH:MM:SS" string """
# we actually need datetime objects for substraction
# so let's use the first available bogus date
# notes:
# `timestr.split(":")` will returns a list `["MM", "HH", "SS]`
# `map(int, ...)` will apply `int()` on each item
# of the sequence (second argument) and return
# the resulting list, ie
# `map(int, "01", "02", "03")` => `[1, 2, 3]`
return datetime.datetime(1900, 1, 1, *map(int, timestr.split(":")))
def process(instream, outstream):
# some may consider that regexps are not that pythonic
# but as far as I'm concerned it seems like a sensible
# use case.
time_re = re.compile("^<(?P<time>\d{2}:\d{2}:\d{2})\.")
first = None
# iterate over our input stream lines
for line in instream:
# should we handle this line at all ?
# (nb a bit redundant but faster than re.match)
if not line.startswith("<"):
continue
# looks like a candidate, let's try and
# extract the 'time' value from it
match = time_re.search(line)
if not match:
# starts with '<' BUT not followed by 'HH:MM:SS.' ?
# unexpected from the sample source but well, we
# can't do much about it either
continue
# retrieve the captured "time" (HH:MM:SS) part
current = totime(match.group("time"))
# store the first occurrence so we can
# compute the elapsed time
if first is None:
first = current
# `(current - first)` yields a `timedelta` object
# we now just have to retrieve it's `seconds` attribute
seconds = (current - first).seconds
# inject the seconds before the line
# and write the whole thing tou our output stream
newline = "{}{}".format(seconds, line)
outstream.write(newline)
def usage(err=None):
if err:
print >> sys.stderr, err
print >> sys.stderr, "usage: python retime.py <filename>"
# unix standards process exit codes
return 2 if err else 0
def main(*args):
# our entry point...
# gets the source filename, process it
# (storing the results in a temporary file),
# and if everything's ok replace the source file
# by the temporary file.
try:
sourcename = args[0]
except IndexError as e:
return usage("missing <filename> argument")
# `delete=False` prevents the tmp file to be
# deleted on closing.
dest = tempfile.NamedTemporaryFile(delete=False)
with open(sourcename) as source:
try:
process(source, dest)
except Exception as e:
dest.close()
os.remove(dest)
raise
# ok done
dest.close()
os.rename(dest.name, sourcename)
return 0
if __name__ == "__main__":
# only execute main() if we are called as a script
# (so we can also import this file as a module)
sys.exit(main(*sys.argv[1:]))
It gives the expected results on your sample data (running on linux - but it should be ok on any other supported OS afaict).
Note that I wrote it to work like your original code (replace the source file with the processed one), but if it were my code I would instead either explicitely provide a destination filename or as a default write to sys.stdout instead (and redirect stdout to another file). The process function can deal with any of those solution FWIW - it's only a matter of a couple edits in main().
Related
The following code looks through 2500 markdown files with a total of 76475 lines, to check each one for the presence of two strings.
#!/usr/bin/env python3
# encoding: utf-8
import re
import os
zettelkasten = '/Users/will/Dropbox/zettelkasten'
def zsearch(s, *args):
for x in args:
r = (r"(?=.* " + x + ")")
p = re.search(r, s, re.IGNORECASE)
if p is None:
return None
return s
for filename in os.listdir(zettelkasten):
if filename.endswith('.md'):
with open(os.path.join(zettelkasten, filename),"r") as fp:
for line in fp:
result_line = zsearch(line, "COVID", "vaccine")
if result_line != None:
UUID = filename[-15:-3]
print(f'›[[{UUID}]] OR', end=" ")
This correctly gives output like:
›[[202202121717]] OR ›[[202003311814]] OR
, but it takes almost two seconds to run on my machine, which I think is much too slow. What, if anything, can be done to make it faster?
The main bottleneck is the regular expressions you're building.
If we print(f"{r=}") inside the zsearch function:
>>> zsearch("line line covid line", "COVID", "vaccine")
r='(?=.* COVID)'
r='(?=.* vaccine)'
The (?=.*) lookahead is what is causing the slowdown - and it's also not needed.
You can achieve the same result by searching for:
r=' COVID'
r=' vaccine'
As part of a program that decodes a communication protocol (EDIFACT MSCONS) I have a class that gives me the next 'segment' of the message. The segments are delimited by an apostrophe "'". There may be newlines after the "'" or not.
Here's the code for that class:
class SegmentGenerator:
def __init__(self, filename):
try:
fh = open(filename)
except IOError:
print ("Error: file " + filename + " not found!")
sys.exit(2)
lines=[]
for line in fh:
line = line.rstrip()
lines.append(line)
if len(lines) == 1:
msg = lines[0]
else:
msg = ''
for line in lines:
msg = msg + line.rstrip()
self.segments=msg.split("'")
self.iterator=iter(self.segments)
def next(self):
try:
return next(self.iterator)
except StopIteration:
return None
if __name__ == '__main__': #testing only
sg = SegmentGenerator('MSCONS_21X000000001333E_20X-SUD-STROUM-M_20180807_000026404801.txt')
for i in range(210436):
if i > 8940:
break
print(sg.next())
To give an idea what the file looks like here's an excerpt of it:
UNB+UNOC:3+21X000000001333E:020+20X-SUD-STROUM-M:020+180807:1400+000026404801++TL'UNH+000026404802+MSCONS:D:04B:UN:1.0'BGM+7+000026404802+9'DTM+137:201808071400:203'RFF+AGI:6HYR67925RZUD_000000257860_00_E27'NAD+MS+21X000000001333E::020'NAD+MR+20X-SUD-STROUM-M::020'UNS+D'NAD+DP'LOC+172+LU0000010496200000000000050287886::89'DTM+163:201701010000?+01:303'DTM+164:201702010000?+01:303'LIN+1'PIA+5+1-1?:1.29.0:SRW'QTY+220:9.600'DTM+163:201701010000?+01:303'DTM+164:201701010015?+01:303'QTY+220:10.400'DTM+163:201701010015?+01:303'DTM+164:201701010030?+01:303'QTY+220:10.400'DTM+163:201701010030?+01:303'DTM+164:201701010045?+01:303'QTY+220:10.400'DTM+163:201701010045?+01:303'DTM+164:201701010100?+01:303'QTY+220:10.400'DTM+163:201701010100?+01:303'DTM+164:201701010115?+01:303'QTY+220:10.400'DTM+163:201701010115?+01:303'DTM+164:201701010130?+01:303'QTY+220:10.400'DTM+163:201701010130?+01:303'DTM+164:201701010145?+01:303'QTY+220:10.400'DTM+163:201701010145?+01:303'DTM+164:201701010200?+01:303'QTY+220:11.200'DTM+163:201701010200?+01:303' ...
The file I have a problem with has 210000 of those segments. I tested the code and everything works fine. The list of segments is complete and I get one segment after the other correctly until the end of the list.
I use the segments as input to a statemachine that gets new segments from an instance of SegmentGenerator.
Here's an excerpt:
def DTMstarttransition(self,segment):
match=re.search('DTM\+(.*?):(.*?):(.*?)($|\+.*|:.*)',segment)
if match:
if match.group(1) == '164':
self.currentendtime=self.dateConvert(match.group(2),match.group(3))
return('DTMend',self.sg.next())
return('Error',segment + "\nExpected DTM segment didn't match")
The method returns the name of the next state and the next segment sg.next(), sg being an instance of SegmentGenerator.
However at the 8942st segment the call to sg.next() doesn't give me the next segment but the second last of the list of segments!
I traced the function calls (with the autologging module):
TRACE:segmentgenerator.SegmentGenerator:next:CALL *() **{}
TRACE:segmentgenerator.SegmentGenerator:next:RETURN 'DTM+164:201702010000?+01:303'
TRACE:__main__.MSCONSparser:QTYtransition:RETURN ('DTMstart', 'DTM+164:201702010000?+01:303')
TRACE:__main__.MSCONSparser:DTMstarttransition:CALL *('DTM+164:201702010000?+01:303',) **{}
TRACE:__main__.MSCONSparser:dateConvert:CALL *('201702010000?+01', '303') **{}
TRACE:__main__.MSCONSparser:dateConvert:RETURN datetime.datetime(2017, 2, 1, 0, 0)
TRACE:segmentgenerator.SegmentGenerator:next:CALL *() **{}
TRACE:segmentgenerator.SegmentGenerator:next:RETURN 'UNT+17872+000026404802'
TRACE:__main__.MSCONSparser:DTMstarttransition:RETURN ('DTMend', 'UNT+17872+000026404802')
TRACE:__main__.MSCONSparser:DTMendtransition:CALL *('UNT+17872+000026404802',) **{}
UNT+... isn't the next segment it should be a LIN segment.
But how is this possible? Why does SegmentGenerator work when I test it with the main function in its module and doesn't work correctly after thousands of calls from the other module?
All the segments are there from beginning to end. I can verify this from the interpreter, since the list sg.segments stays available after program stop. len(sg.segments) is 210435 but my program stops after 8942. So it is clearly a problem with the iterator.
The files (3 python files and data example) can be found on Github in branch 'next' if you like to test the whole thing.
I think it's possible there is a double apostrophe '' in your data file, near the 8942th apostrophe.
In this case your code will continue to read the whole file reading all 210435 segments.
But if you have the condition that tests the result of sg.next(), then that would be falsey on the 8942th iteration, and I'm guessing this is causing your program to abort.
eg:
while sg.next():
# some processing here
If I'm completely wrong then I'd be interested in seeing the behaviour of this: - where len and iterations should equal.
if __name__ == '__main__':
fn = sys.argv[1]
sg = SegmentGenerator(fn)
print("Num segments:", len(sg.segments))
i = 0
value = 'x'
while value:
value = sg.next()
i += 1
print(i, value)
print("Num iterations:", i)
It turned out that a segment 'DTM+164:201702010000?+01:303' existed a second time further down in the file and that indeed that one is followed by a UTM segment. So the problem is with the protocol states themselves and the iterator was working correctly.
So sorry that I bothered you with my wrong assumption. Thanks for wanting to help!
I have two files A and B in FASTQ format, which are basically several hundred million lines of text organized in groups of 4 lines starting with an # as follows:
#120412_SN549_0058_BD0UMKACXX:5:1101:1156:2031#0/1
GCCAATGGCATGGTTTCATGGATGTTAGCAGAAGACATGAGACTTCTGGGACAGGAGCAAAACACTTCATGATGGCAAAAGATCGGAAGAGCACACGTCTGAACTCN
+120412_SN549_0058_BD0UMKACXX:5:1101:1156:2031#0/1
bbbeee_[_ccdccegeeghhiiehghifhfhhhiiihhfhghigbeffeefddd]aegggdffhfhhihbghhdfffgdb^beeabcccabbcb`ccacacbbccB
I need to compare the
5:1101:1156:2031#0/
part between files A and B and write the groups of 4 lines in file B that matched to a new file. I got a piece of code in python that does that, but only works for small files as it parses through the entire #-lines of file B for every #-line in file A, and both files contain hundreds of millions of lines.
Someone suggested that I should create an index for file B; I have googled around without success and would be very grateful if someone could point out how to do this or let me know of a tutorial so I can learn. Thanks.
==EDIT==
In theory each group of 4 lines should only exist once in each file. Would it increase the speed enough if breaking the parsing after each match or do I need a different algorithm altogether?
An index is just a shortened version of the information you are working with. In this case, you will want the "key" - the text between the first colon(':') on the #-line and the final slash('/') near the end - as well as some kind of value.
Since the "value" in this case is the entire contents of the 4-line block, and since our index is going to store a separate entry for each block, we would be storing the entire file in memory if we used the actual value in the index.
Instead, let's use the file position of the beginning of the 4-line block. That way, you can move to that file position, print 4 lines, and stop. Total cost is the 4 or 8 or however many bytes it takes to store an integer file position, instead of however-many bytes of actual genome data.
Here is some code that does the job, but also does a lot of validation and checking. You might want to throw stuff away that you don't use.
import sys
def build_index(path):
index = {}
for key, pos, data in parse_fastq(path):
if key not in index:
# Don't overwrite duplicates- use first occurrence.
index[key] = pos
return index
def error(s):
sys.stderr.write(s + "\n")
def extract_key(s):
# This much is fairly constant:
assert(s.startswith('#'))
(machine_name, rest) = s.split(':', 1)
# Per wikipedia, this changes in different variants of FASTQ format:
(key, rest) = rest.split('/', 1)
return key
def parse_fastq(path):
"""
Parse the 4-line FASTQ groups in path.
Validate the contents, somewhat.
"""
f = open(path)
i = 0
# Note: iterating a file is incompatible with fh.tell(). Fake it.
pos = offset = 0
for line in f:
offset += len(line)
lx = i % 4
i += 1
if lx == 0: # #machine: key
key = extract_key(line)
len1 = len2 = 0
data = [ line ]
elif lx == 1:
data.append(line)
len1 = len(line)
elif lx == 2: # +machine: key or something
assert(line.startswith('+'))
data.append(line)
else: # lx == 3 : quality data
data.append(line)
len2 = len(line)
if len2 != len1:
error("Data length mismatch at line "
+ str(i-2)
+ " (len: " + str(len1) + ") and line "
+ str(i)
+ " (len: " + str(len2) + ")\n")
#print "Yielding #%i: %s" % (pos, key)
yield key, pos, data
pos = offset
if i % 4 != 0:
error("EOF encountered in mid-record at line " + str(i));
def match_records(path, index):
results = []
for key, pos, d in parse_fastq(path):
if key in index:
# found a match!
results.append(key)
return results
def write_matches(inpath, matches, outpath):
rf = open(inpath)
wf = open(outpath, 'w')
for m in matches:
rf.seek(m)
wf.write(rf.readline())
wf.write(rf.readline())
wf.write(rf.readline())
wf.write(rf.readline())
rf.close()
wf.close()
#import pdb; pdb.set_trace()
index = build_index('afile.fastq')
matches = match_records('bfile.fastq', index)
posns = [ index[k] for k in matches ]
write_matches('afile.fastq', posns, 'outfile.fastq')
Note that this code goes back to the first file to get the blocks of data. If your data is identical between files, you would be able to copy the block from the second file when a match occurs.
Note also that depending on what you are trying to extract, you may want to change the order of the output blocks, and you may want to make sure that the keys are unique, or perhaps make sure the keys are not unique but are repeated in the order they match. That's up to you - I'm not sure what you're doing with the data.
these guys claim to parse a few gigs file while using a dedicated library, see http://www.biostars.org/p/15113/
fastq_parser = SeqIO.parse(fastq_filename, "fastq")
wanted = (rec for rec in fastq_parser if ...)
SeqIO.write(wanted, output_file, "fastq")
a better approach IMO would be to parse it once and load the data to some database instead of that output_file (i.e mysql) and latter run the queries there
I was wondering how the read() function can be used to read between 2 offsets that are in hex?
I tried using this to convert the offset values to int, but I get a syntax error for the read.() line. Any ideas?
OFFSETS = ('3AF7','3ECF')
OFFSETE = ('3B04','3EDE')
for r, d, f in os.walk("."):
for hahahoho, value in enumerate(OFFSETS and OFFSETE):
try:
with open(os.path.join(r,f), 'rb' ) as fileread:
texttoprint = fileread.seek(int(OFFSETS[hahahoho], 16) -1)
yeeha = texttoprint.read[int(OFFSETS[hahahoho], 16) -1 : int(OFFSETE[damn],16)]
print (yeeha)
hahahoho + 1
this is not the entire code thou, just posted the ones i need help with =(
EDIT:
Alright, i think i should listen to the advice of you people this is the entire code
nost = 1
OFFSETS = ('3AF7','3ECF')
OFFSETE = ('3B04','3EDE')
endscript = 'No'
nooffile = 1
import os, glob, sys, tempfile
try:
directory = input('Enter your directory:').replace('\\','\\\\')
os.chdir(directory)
except FileNotFoundError:
print ('Directory not found!')
endscript = 'YES!'
if endscript == 'YES!':
sys.exit('Error. Be careful of what you do to your computer!')
else:
if os.path.isfile('Done.txt') == True:
print ('The folder has already been modified!')
else:
print ('Searching texts...\r\n')
print ('Printing...')
for r, d, f in os.walk("."):
for HODF in f:
if HODF.endswith(".hod") or "." not in HODF:
for damn, value in enumerate(OFFSETS and OFFSETE):
try:
with open(os.path.join(r,HODF), 'rb' ) as fileread:
fileread.seek(int(OFFSETS[damn],16) -1)
yeeha = fileread.read(int(OFFSETE[damn], 16) - (int(OFFSETS[damn],16) -1))
if b'?\x03\x00\x00\x00\x01\x00\x00\x00Leg2.' not in yeeha and b'?\x03\x00\x00\x00\x01\x00\x00\x00Leg2_r.' not in yeeha:
print (yeeha)
damn + 1
except FileNotFoundError:
print('Invalid file path!')
os._exit(1)
except IndexError:
print ('File successfully modified!')
nooffile = nooffile + 1
nost = 1
print ('\r\n'+str(nooffile)+' files read.',)
print ('\tANI_list.txt, End.dat, Group.txt, Head.txt, Tail.dat files ignored.')
print ('\r\nFiles successfully read! Hope you found what you are looking for!')
May I know whats wrong with it? Cause it works just fine for me
There are other problems with your code, but it sounds like you want to solve that yourself. When it comes to reading a particular byte range from a file, you can do that like this:
start = 1000
end = 1020 # Just examples
fileread.seek(start)
stuff = fileread.read(end - start)
That is, you start by seeking to the start position, and then you read as many bytes as you need (that is 20, in this example).
EDIT:
The only real "problem" with your code is that you're using enumerate in a strange and weird fashion that makes it completely unnecessary. The expression OFFSETS and OFFSETE will simply evaluate to OFFSETE, making OFFSETS and completely superfluous in it. Then, you're only actually using the first value from enumerate (the index), which makes enumerate itself superfluous: You could just have used range(len(OFFSETE)) instead.
More proper, however, would be to loop directly over the values instead of going via an index, like this:
for start, end in zip(OFFSETS, OFFSETE):
# snip
fileread.seek(int(start, 16) - 1)
yeeha = fileread.read(int(start, 16) - int(end, 16) - 1)
The other things are more like slight uglinesses that could be eliminated to make your code much nicer, but aren't strictly speaking wrong. Among them are that you don't need to represent your offsets as strings, but could use hexadecimal literals instead; that you open the file multiple times for no reason, that the hohohaha + 1 expression is completely superfluous, and that you could just bake the - 1 extra offsets directly into your actual offsets instead of adding it later.
I would write it closer to this instead:
OFFSETS = [0x3AF7 - 1, 0x3ECF - 2]
OFFSETE = [0x3B04 - 1, 0x3EDE - 2]
for r, d, f in os.walk("."):
for fn in f:
with open(os.path.join(r, fn), "rb") as fp:
for start, end in zip(OFFSETS, OFFSETE):
fp.seek(start)
yeeha = fp.read(start - end)
# Do whatever it is you need with yeeha
This seems fairly trivial but I can't seem to work it out
I have a text file with the contents:
B>F
I am reading this with the code below, stripping the '>' and trying to convert the strings into their corresponding ASCII value, minus 65 to give me a value that will correspond to another list index
def readRoute():
routeFile = open('route.txt', 'r')
for line in routeFile.readlines():
route = line.strip('\n' '\r')
route = line.split('>')
#startNode, endNode = route
startNode = ord(route[0])-65
endNode = ord(route[1])-65
# Debug (this comment was for my use to explain below the print values)
print 'Route Entered:'
print line
print startNode, ',', endNode, '\n'
return[startNode, endNode]
However I am having slight trouble doing the conversion nicely, because the text file only contains one line at the moment but ideally I need it to be able to support more than one line and run an amount of code for each line.
For example it could contain:
B>F
A>D
C>F
E>D
So I would want to run the same code outside this function 4 times with the different inputs
Anyone able to give me a hand
Edit:
Not sure I made my issue that clear, sorry
What I need it do it parse the text file (possibly containing one line or multiple lines like above. I am able to do it for one line with the lines
startNode = ord(route[0])-65
endNode = ord(route[1])-65
But I get errors when trying to do more than one line because the ord() is expecting different inputs
If I have (below) in the route.txt
B>F
A>D
This is the error it gives me:
line 43, in readRoute endNode = ord(route[1])-65
TypeError: ord() expected a character, but string of length 2 found
My code above should read the route.txt file and see that B>F is the first route, strip the '>' - convert the B & F to ASCII, so 66 & 70 respectively then minus 65 from both to give 1 & 5 (in this example)
The 1 & 5 are corresponding indexes for another "array" (list of lists) to do computations and other things on
Once the other code has completed it can then go to the next line in route.txt which could be A>D and perform the above again
Perhaps this will work for you. I turned the fileread into a generator so you can do as you please with the parsed results in the for-i loop.
def readRoute(file_name):
with open(file_name, 'r') as r:
for line in r:
yield (ord(line[0])-65, ord(line[2])-65)
filename = 'route.txt'
for startnode, endnode in readRoute(filename):
print startnode, endnode
If you can't change readRoute, change the contents of the file before each call. Better yet, make readRoute take the filename as a parameter (default it to 'route.txt' to preserve the current behavior) so you can have it process other files.
What about something like this? It takes the routes defined in your file and turns them into path objects with start and end member variables. As an added bonus PathManager.readFile() allows you to load multiple route files without overwriting the existing paths.
import re
class Path:
def __init__(self, start, end):
self.start = ord(start) - 65 # Scale the values as desired
self.end = ord(end) - 65 # Scale the values as desired
class PathManager:
def __init__(self):
self.expr = re.compile("^([A-Za-z])[>]([A-Za-z])$") # looks for string "C>C"
# where C is a char
self.paths = []
def do_logic_routine(self, start, end):
# Do custom logic here that will execute before the next line is read
# Return True for 'continue reading' or False to stop parsing file
return True
def readFile(self, path):
file = open(path,"r")
for line in file:
item = self.expr.match(line.strip()) # strip whitespaces before parsing
if item:
'''
item.group(0) is *not* used here; it matches the whole expression
item.group(1) matches the first parenthesis in the regular expression
item.group(2) matches the second
'''
self.paths.append(Path(item.group(1), item.group(2)))
if not do_logic_routine(self.paths[-1].start, self.paths[-1].end):
break
# Running the example
MyManager = PathManager()
MyManager.readFile('route.txt')
for path in MyManager.paths:
print "Start: %s End: %s" % (path.start, path.end)
Output is:
Start: 1 End: 5
Start: 0 End: 3
Start: 2 End: 5
Start: 4 End: 3