Python Iterator strangely skipping items - python

As part of a program that decodes a communication protocol (EDIFACT MSCONS) I have a class that gives me the next 'segment' of the message. The segments are delimited by an apostrophe "'". There may be newlines after the "'" or not.
Here's the code for that class:
class SegmentGenerator:
def __init__(self, filename):
try:
fh = open(filename)
except IOError:
print ("Error: file " + filename + " not found!")
sys.exit(2)
lines=[]
for line in fh:
line = line.rstrip()
lines.append(line)
if len(lines) == 1:
msg = lines[0]
else:
msg = ''
for line in lines:
msg = msg + line.rstrip()
self.segments=msg.split("'")
self.iterator=iter(self.segments)
def next(self):
try:
return next(self.iterator)
except StopIteration:
return None
if __name__ == '__main__': #testing only
sg = SegmentGenerator('MSCONS_21X000000001333E_20X-SUD-STROUM-M_20180807_000026404801.txt')
for i in range(210436):
if i > 8940:
break
print(sg.next())
To give an idea what the file looks like here's an excerpt of it:
UNB+UNOC:3+21X000000001333E:020+20X-SUD-STROUM-M:020+180807:1400+000026404801++TL'UNH+000026404802+MSCONS:D:04B:UN:1.0'BGM+7+000026404802+9'DTM+137:201808071400:203'RFF+AGI:6HYR67925RZUD_000000257860_00_E27'NAD+MS+21X000000001333E::020'NAD+MR+20X-SUD-STROUM-M::020'UNS+D'NAD+DP'LOC+172+LU0000010496200000000000050287886::89'DTM+163:201701010000?+01:303'DTM+164:201702010000?+01:303'LIN+1'PIA+5+1-1?:1.29.0:SRW'QTY+220:9.600'DTM+163:201701010000?+01:303'DTM+164:201701010015?+01:303'QTY+220:10.400'DTM+163:201701010015?+01:303'DTM+164:201701010030?+01:303'QTY+220:10.400'DTM+163:201701010030?+01:303'DTM+164:201701010045?+01:303'QTY+220:10.400'DTM+163:201701010045?+01:303'DTM+164:201701010100?+01:303'QTY+220:10.400'DTM+163:201701010100?+01:303'DTM+164:201701010115?+01:303'QTY+220:10.400'DTM+163:201701010115?+01:303'DTM+164:201701010130?+01:303'QTY+220:10.400'DTM+163:201701010130?+01:303'DTM+164:201701010145?+01:303'QTY+220:10.400'DTM+163:201701010145?+01:303'DTM+164:201701010200?+01:303'QTY+220:11.200'DTM+163:201701010200?+01:303' ...
The file I have a problem with has 210000 of those segments. I tested the code and everything works fine. The list of segments is complete and I get one segment after the other correctly until the end of the list.
I use the segments as input to a statemachine that gets new segments from an instance of SegmentGenerator.
Here's an excerpt:
def DTMstarttransition(self,segment):
match=re.search('DTM\+(.*?):(.*?):(.*?)($|\+.*|:.*)',segment)
if match:
if match.group(1) == '164':
self.currentendtime=self.dateConvert(match.group(2),match.group(3))
return('DTMend',self.sg.next())
return('Error',segment + "\nExpected DTM segment didn't match")
The method returns the name of the next state and the next segment sg.next(), sg being an instance of SegmentGenerator.
However at the 8942st segment the call to sg.next() doesn't give me the next segment but the second last of the list of segments!
I traced the function calls (with the autologging module):
TRACE:segmentgenerator.SegmentGenerator:next:CALL *() **{}
TRACE:segmentgenerator.SegmentGenerator:next:RETURN 'DTM+164:201702010000?+01:303'
TRACE:__main__.MSCONSparser:QTYtransition:RETURN ('DTMstart', 'DTM+164:201702010000?+01:303')
TRACE:__main__.MSCONSparser:DTMstarttransition:CALL *('DTM+164:201702010000?+01:303',) **{}
TRACE:__main__.MSCONSparser:dateConvert:CALL *('201702010000?+01', '303') **{}
TRACE:__main__.MSCONSparser:dateConvert:RETURN datetime.datetime(2017, 2, 1, 0, 0)
TRACE:segmentgenerator.SegmentGenerator:next:CALL *() **{}
TRACE:segmentgenerator.SegmentGenerator:next:RETURN 'UNT+17872+000026404802'
TRACE:__main__.MSCONSparser:DTMstarttransition:RETURN ('DTMend', 'UNT+17872+000026404802')
TRACE:__main__.MSCONSparser:DTMendtransition:CALL *('UNT+17872+000026404802',) **{}
UNT+... isn't the next segment it should be a LIN segment.
But how is this possible? Why does SegmentGenerator work when I test it with the main function in its module and doesn't work correctly after thousands of calls from the other module?
All the segments are there from beginning to end. I can verify this from the interpreter, since the list sg.segments stays available after program stop. len(sg.segments) is 210435 but my program stops after 8942. So it is clearly a problem with the iterator.
The files (3 python files and data example) can be found on Github in branch 'next' if you like to test the whole thing.

I think it's possible there is a double apostrophe '' in your data file, near the 8942th apostrophe.
In this case your code will continue to read the whole file reading all 210435 segments.
But if you have the condition that tests the result of sg.next(), then that would be falsey on the 8942th iteration, and I'm guessing this is causing your program to abort.
eg:
while sg.next():
# some processing here
If I'm completely wrong then I'd be interested in seeing the behaviour of this: - where len and iterations should equal.
if __name__ == '__main__':
fn = sys.argv[1]
sg = SegmentGenerator(fn)
print("Num segments:", len(sg.segments))
i = 0
value = 'x'
while value:
value = sg.next()
i += 1
print(i, value)
print("Num iterations:", i)

It turned out that a segment 'DTM+164:201702010000?+01:303' existed a second time further down in the file and that indeed that one is followed by a UTM segment. So the problem is with the protocol states themselves and the iterator was working correctly.
So sorry that I bothered you with my wrong assumption. Thanks for wanting to help!

Related

What causes this return() to create a SyntaxError?

I need this program to create a sheet as a list of strings of ' ' chars and distribute text strings (from a list) into it. I have already coded return statements in python 3 but this one keeps giving
return(riplns)
^
SyntaxError: invalid syntax
It's the return(riplns) on line 39. I want the function to create a number of random numbers (randint) inside a range built around another randint, coming from the function ripimg() that calls this one.
I see clearly where the program declares the list I want this return() to give me. I know its type. I see where I feed variables (of the int type) to it, through .append(). I know from internet research that SyntaxErrors on python's return() functions usually come from mistype but it doesn't seem the case.
#loads the asciified image ("/home/userX/Documents/Programmazione/Python projects/imgascii/myascify/ascimg4")
#creates a sheet "foglio1", same number of lines as the asciified image, and distributes text on it on a randomised line
#create the sheet foglio1
def create():
ref = open("/home/userX/Documents/Programmazione/Python projects/imgascii/myascify/ascimg4")
charcount = ""
field = []
for line in ref:
for c in line:
if c != '\n':
charcount += ' '
if c == '\n':
charcount += '*' #<--- YOU GONNA NEED TO MAKE THIS A SPACE IN A FOLLOWING FUNCTION IN THE WRITER.PY PROGRAM
for i in range(50):#<------- VALUE ADJUSTMENT FROM WRITER.PY GOES HERE(default : 50):
charcount += ' '
charcount += '\n'
break
for line in ref:
field.append(charcount)
return(field)
#turn text in a list of lines and trasforms the lines in a list of strings
def poemln():
txt = open("/home/gcg/Documents/Programmazione/Python projects/imgascii/writer/poem")
arrays = []
for line in txt:
arrays.append(line)
txt.close()
return(arrays)
#rander is to be called in ripimg()
def rander(rando, fldepth):
riplns = []
for i in range(fldepth):
riplns.append(randint((rando)-1,(rando)+1)
return(riplns) #<---- THIS RETURN GIVES SyntaxError upon execution
#opens a rip on the side of the image.
def ripimg():
upmost = randint(160, 168)
positions = []
fldepth = 52 #<-----value is manually input as in DISTRIB function.
positions = rander(upmost,fldepth)
return(positions)
I omitted the rest of the program, I believe these functions are enough to get the idea, please tell me if I need to add more.
You have incomplete set of previous line's parenthesis .
In this line:-
riplns.append(randint((rando)-1,(rando)+1)
You have to add one more brace at the end. This was causing error because python was reading things continuously and thought return statement to be a part of previous uncompleted line.

Python text file manipulation, add delta time to each line in seconds

I am a beginner at python and trying to solve the below:
I have a text file that each line starts like this:
<18:12:53.972>
<18:12:53.975>
<18:12:53.975>
<18:12:53.975>
<18:12:54.008>
etc
Instead of above I would like to add the elapsed time in seconds in the beginning of each line, but only if the line starts with '<'.
<0.0><18:12:53.972>
<0.003><18:12:53.975>
<0.003><18:12:53.975>
<0.003><18:12:53.975>
<0.036><18:12:54.008>
etc
Here comes a try :-)
#import datetime
from datetime import timedelta
from sys import argv
#get filename as argument
run, input, output = argv
#get number of lines for textfile
nr_of_lines = sum(1 for line in open(input))
#read in file
f = open(input)
lines = f.readlines()
f.close
#declarations
do_once = True
time = []
delta_to_list = []
i = 0
#read in and translate all timevalues from logfile to delta time.
while i < nr_of_lines:
i += 1
if lines[i-1].startswith('<'):
get_lines = lines[i-1] #get one line
get_time = (get_lines[1:13]) #get the time from that line
h = int(get_time[0:2])
m = int(get_time[3:5])
s = int(get_time[6:8])
ms = int(get_time[9:13])
time = timedelta(hours = h, minutes = m, seconds = s, microseconds = 0, milliseconds = ms)
sec_time = time.seconds + (ms/1000)
if do_once:
start_value = sec_time
do_once = False
delta = float("{0:.3f}".format(sec_time - start_value))
delta_to_list.append(delta)
#write back values to logfile.
k=0
s = str(delta_to_list[k])
with open(output, 'w') as out_file:
with open(input, 'r') as in_file:
for line in in_file:
if line.startswith('<'):
s = str(delta_to_list[k])
out_file.write("<" + s + ">" + line)
else:
out_file.write(line)
k += 1
As it is now, it works fine, but the last two lines is not written to the new file. It says: "s = str(delta_to_list[k]) IndexError: list index out of range.
At first I would like to get my code working, and second a suggestions for improvements. Thank you!
First point: never read a full file in memory when you don't have too (and specially when you don't know whether you have enough free memory).
Second point: learn to use python's for loop and iteration protocol. The way to iterate over a list and any other iterable is:
for item in some_iterable:
do_something_with(item)
This avoids messing with indexes and getting it wrong ;)
One of the nice things with Python file objects is that they actually are iterables, so to iterate over a file lines, the simplest way is:
for line in my_opened_file:
do_something_with(line)
Here's a simple yet working and mostly pythonic (nb: python 2.7.x) way to write your program:
# -*- coding: utf-8 -*-
import os
import sys
import datetime
import re
import tempfile
def totime(timestr):
""" returns a datetime object for a "HH:MM:SS" string """
# we actually need datetime objects for substraction
# so let's use the first available bogus date
# notes:
# `timestr.split(":")` will returns a list `["MM", "HH", "SS]`
# `map(int, ...)` will apply `int()` on each item
# of the sequence (second argument) and return
# the resulting list, ie
# `map(int, "01", "02", "03")` => `[1, 2, 3]`
return datetime.datetime(1900, 1, 1, *map(int, timestr.split(":")))
def process(instream, outstream):
# some may consider that regexps are not that pythonic
# but as far as I'm concerned it seems like a sensible
# use case.
time_re = re.compile("^<(?P<time>\d{2}:\d{2}:\d{2})\.")
first = None
# iterate over our input stream lines
for line in instream:
# should we handle this line at all ?
# (nb a bit redundant but faster than re.match)
if not line.startswith("<"):
continue
# looks like a candidate, let's try and
# extract the 'time' value from it
match = time_re.search(line)
if not match:
# starts with '<' BUT not followed by 'HH:MM:SS.' ?
# unexpected from the sample source but well, we
# can't do much about it either
continue
# retrieve the captured "time" (HH:MM:SS) part
current = totime(match.group("time"))
# store the first occurrence so we can
# compute the elapsed time
if first is None:
first = current
# `(current - first)` yields a `timedelta` object
# we now just have to retrieve it's `seconds` attribute
seconds = (current - first).seconds
# inject the seconds before the line
# and write the whole thing tou our output stream
newline = "{}{}".format(seconds, line)
outstream.write(newline)
def usage(err=None):
if err:
print >> sys.stderr, err
print >> sys.stderr, "usage: python retime.py <filename>"
# unix standards process exit codes
return 2 if err else 0
def main(*args):
# our entry point...
# gets the source filename, process it
# (storing the results in a temporary file),
# and if everything's ok replace the source file
# by the temporary file.
try:
sourcename = args[0]
except IndexError as e:
return usage("missing <filename> argument")
# `delete=False` prevents the tmp file to be
# deleted on closing.
dest = tempfile.NamedTemporaryFile(delete=False)
with open(sourcename) as source:
try:
process(source, dest)
except Exception as e:
dest.close()
os.remove(dest)
raise
# ok done
dest.close()
os.rename(dest.name, sourcename)
return 0
if __name__ == "__main__":
# only execute main() if we are called as a script
# (so we can also import this file as a module)
sys.exit(main(*sys.argv[1:]))
It gives the expected results on your sample data (running on linux - but it should be ok on any other supported OS afaict).
Note that I wrote it to work like your original code (replace the source file with the processed one), but if it were my code I would instead either explicitely provide a destination filename or as a default write to sys.stdout instead (and redirect stdout to another file). The process function can deal with any of those solution FWIW - it's only a matter of a couple edits in main().

Python: How to compare string from two text files and retrieve an additional line of one in case of match

I have found so much information from previous search on this website but I seem to be stuck on the following issue.
I have two text files that looks like this
Inter.txt ( n-lines but only showed 4 lines,you get the idea)
7275
30000
6693
855
....
rules.txt (2n-lines)
7275
8500
6693
7555
....
3
1000
8
5
....
I want to compare the first line of Inter.txt with rules.txt and in case of a match, I jump for n-lines in order to get the score of that line. (E.g. with 7275, there is a match, I jump n to get the score 3)
I produced the following code but for some reasons, I only have the ouput of the first line when I should have one for each match from my first file. With the previous example, I should have 8 as an output for 6693.
import linecache
inter = open("Inter.txt", "r")
rules = open("rules.txt", "r")
iScore = 0
jump = 266
i=0
for lineInt in inter:
#i = i+1
#print(i)
for lineRul in rules:
i = i+1
#print(i)
if lineInt == lineRul:
print("Match")
inc = linecache.getline("rules.txt", i + jump)
#print(inc)
iScore = iScore + int(inc)
print(iScore)
#break
else:
continue
All the print(i) are there because I checked that all the lines were read. I am a novice in Python.
To sum up, I don't understand why I only have one output. Thanks in advance !
Ok, I think the main thing that blocks you from getting forward is that the for loops on files gets the pointer to the end of the file, and doesn't resets when you starts the loops again.
So when you only open rules.txt once, and uses its intance in the inner loop it only goes through all the lines at the first iteration of the outer loop, the second time it tries to go over the remains lines, which are non.
The solution is to close and open the file outside the inner loop.
This code worked for me.
import linecache
inter = open("Inter.txt", "r")
iScore = 0
jump = 4
for lineInt in inter:
i=0
#i = i+1
#print(i)
rules = open("rules.txt", "r")
for lineRul in rules:
i = i+1
#print(i)
if lineInt == lineRul:
print("Match")
inc = linecache.getline("rules.txt", i + jump)
#print(inc)
iScore = iScore + int(inc)
print(iScore)
#break
else:
continue
rules.close()
I also moved where you set the i to 0 to the beginning of the outer loop, but I guess you'd find it yourself.
And I changed jump to 4 to fit the example files your gave :p
Can you please try this solution:
def get_rules_values(rules_file):
with open(rules_file, "r") as rules:
return map(int, rules.readlines())
def get_rules_dict(rules_values):
return dict(zip(rules_values[:len(rules_values)/2], rules_values[len(rules_values)/2:]))
def get_inter_values(inter_file):
with open(inter_file, "r") as inter:
return map(int, inter.readlines())
rules_dict = get_rules_dict(get_rules_values("rules.txt"))
inter_values = get_inter_values("inter.txt")
for inter_value in inter_values:
print inter_value, rules_dict[inter_value]
Hope it's working for you!

For loop inside for loop causing fails after second loop

Good evening all,
I'm working on a code that checks disk addresses to see if there is a disk present. I verify which disk number it is by executing a scan, disabling the disk, scanning again, and then compare the results to determine the disk, and then re-enable the disk before the loop repeats to scan the next address. As you can see below in my output, it goes through the process, gets scan1 and scan2 for slot 1, which I have printed to show that disk 0 was removed, so that must be the disk in the slot. The loop repeats for the next slot and gets scan1 and scan2 to show that after the removal, disk1 has disappeared, which implies it is in that slot.
However, when I do my second for loop inside to compare the two strings and actually save that difference to a variable, the output for the variable result is just a blank string ' '. Because it is just a blank string, i'm getting the string index out of range error message, which makes sense. I just don't understand how the second for loop inside the main for loop can be fine for one loop, but then when comparing scan1 and scan2 for the second time (even though i can see they're different) it just stores a blank string to result.
# Addresses of each populated slot
Slot1PackedAddress = '+-01.0-[82]----00.0'
Slot2PackedAddress = '+-03.0-[84]----00.0'
Slot3PackedAddress = '+-03.0-[08]----00.0'
Slot4PackedAddress = '+-01.0-[01]----00.0'
Addresses = [Slot1PackedAddress, Slot2PackedAddress, Slot3PackedAddress, Slot4PackedAddress]
InitialChecks = [None]*5
diskchange = [0]*5
Slot = [None]*5
SlotOn = ['4/1/', '5/1/', '6/1/', '7/1']
SlotOff = ['4/0/', '5/0/', '6/0/', '7/0']
for i in range(1,3):
InitialChecks[i] = ['Slot%i = 1' %i]
InitialChecks[i] = str(InitialChecks[i]).replace('[\'', '').replace('\']', '')
with open('lspci.sh', 'rb') as file:
script = file.read()
subprocess.call(script, shell=True)
if Addresses[i-1] in open('results').read():
result = ' '
print("Device in Slot %d, checking to see what drive number it is..." %i)
scan1 = ''
# Initial disk scan
os.system("sudo parted -l > scan1.txt")
with open('scan1.txt', 'rb') as file:
for line in file:
for part in line.split():
if "nvme" in part:
scan1 = scan1 + part
print scan1
# Disable the slot to get ready to record which drive number disappeared
with open('removeslot%d.sh' %i, 'rb') as file:
script = file.read()
subprocess.call(script, shell=True)
scan2 = ''
# Initial disk scan
os.system("sudo parted -l > scan2.txt")
with open('scan2.txt', 'rb') as file:
for line in file:
for part in line.split():
if "nvme" in part:
scan2 = scan2 + part
print scan2
for nvme in scan1:
if nvme not in scan2:
result = result + nvme
print("result is " + result)
disk = filter(str.isdigit, result)
strdiskchange = str(disk)
diskchange[i] = int(strdiskchange[0])
print diskchange[1]
print("The new disk added to slot %i is /dev/nvme%dn1" %(i, diskchange[i]))
# Rescan to re-enable the drive that was disabled.
with open('rescan.sh', 'rb') as file:
script = file.read()
subprocess.call(script, shell=True)
# Represents that Slot 1 is populated and on
Slot[i] = 1
Here is the error output:
Device in Slot 1, checking to see what drive number it is...
/dev/nvme0n1:/dev/nvme1n1:
/dev/nvme1n1:
drive that disappear is 0
The new disk added to slot 1 is /dev/nvme0n1
Device in Slot 2, checking to see what drive number it is...
/dev/nvme0n1:/dev/nvme1n1:
/dev/nvme0n1:
drive that disappear is
Traceback (most recent call last):
File "GUIArduino.py", line 75, in <module>
diskchange[i] = int(strdiskchange[0])
IndexError: string index out of range
pciedev3ubuntu#PCIeDev3ubuntu:~/Documents$
Thanks for the help guys
you can use the tryCatch function to ignore errors in a loop (such a blank or empty cell/vector) and continue where it left off
http://mazamascience.com/WorkingWithData/?p=912
I figured it out. I went back to using the lsblk code that was working but just added the 'and 'p' not in line' portion to skip any lines that had a p since only partitioned lines contain a p.
lsblk = subprocess.Popen(['lsblk'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
scan2 = [line.strip() for line in lsblk.stdout if 'nvme' in line and 'p' not in line]
that just replaced
os.system("sudo parted -l > scan1.txt")
with open('scan1.txt', 'rb') as file:
for line in file:
for part in line.split():
if "nvme" in part:
scan1 = scan1 + part

Conversion of Multiple Strings To ASCII

This seems fairly trivial but I can't seem to work it out
I have a text file with the contents:
B>F
I am reading this with the code below, stripping the '>' and trying to convert the strings into their corresponding ASCII value, minus 65 to give me a value that will correspond to another list index
def readRoute():
routeFile = open('route.txt', 'r')
for line in routeFile.readlines():
route = line.strip('\n' '\r')
route = line.split('>')
#startNode, endNode = route
startNode = ord(route[0])-65
endNode = ord(route[1])-65
# Debug (this comment was for my use to explain below the print values)
print 'Route Entered:'
print line
print startNode, ',', endNode, '\n'
return[startNode, endNode]
However I am having slight trouble doing the conversion nicely, because the text file only contains one line at the moment but ideally I need it to be able to support more than one line and run an amount of code for each line.
For example it could contain:
B>F
A>D
C>F
E>D
So I would want to run the same code outside this function 4 times with the different inputs
Anyone able to give me a hand
Edit:
Not sure I made my issue that clear, sorry
What I need it do it parse the text file (possibly containing one line or multiple lines like above. I am able to do it for one line with the lines
startNode = ord(route[0])-65
endNode = ord(route[1])-65
But I get errors when trying to do more than one line because the ord() is expecting different inputs
If I have (below) in the route.txt
B>F
A>D
This is the error it gives me:
line 43, in readRoute endNode = ord(route[1])-65
TypeError: ord() expected a character, but string of length 2 found
My code above should read the route.txt file and see that B>F is the first route, strip the '>' - convert the B & F to ASCII, so 66 & 70 respectively then minus 65 from both to give 1 & 5 (in this example)
The 1 & 5 are corresponding indexes for another "array" (list of lists) to do computations and other things on
Once the other code has completed it can then go to the next line in route.txt which could be A>D and perform the above again
Perhaps this will work for you. I turned the fileread into a generator so you can do as you please with the parsed results in the for-i loop.
def readRoute(file_name):
with open(file_name, 'r') as r:
for line in r:
yield (ord(line[0])-65, ord(line[2])-65)
filename = 'route.txt'
for startnode, endnode in readRoute(filename):
print startnode, endnode
If you can't change readRoute, change the contents of the file before each call. Better yet, make readRoute take the filename as a parameter (default it to 'route.txt' to preserve the current behavior) so you can have it process other files.
What about something like this? It takes the routes defined in your file and turns them into path objects with start and end member variables. As an added bonus PathManager.readFile() allows you to load multiple route files without overwriting the existing paths.
import re
class Path:
def __init__(self, start, end):
self.start = ord(start) - 65 # Scale the values as desired
self.end = ord(end) - 65 # Scale the values as desired
class PathManager:
def __init__(self):
self.expr = re.compile("^([A-Za-z])[>]([A-Za-z])$") # looks for string "C>C"
# where C is a char
self.paths = []
def do_logic_routine(self, start, end):
# Do custom logic here that will execute before the next line is read
# Return True for 'continue reading' or False to stop parsing file
return True
def readFile(self, path):
file = open(path,"r")
for line in file:
item = self.expr.match(line.strip()) # strip whitespaces before parsing
if item:
'''
item.group(0) is *not* used here; it matches the whole expression
item.group(1) matches the first parenthesis in the regular expression
item.group(2) matches the second
'''
self.paths.append(Path(item.group(1), item.group(2)))
if not do_logic_routine(self.paths[-1].start, self.paths[-1].end):
break
# Running the example
MyManager = PathManager()
MyManager.readFile('route.txt')
for path in MyManager.paths:
print "Start: %s End: %s" % (path.start, path.end)
Output is:
Start: 1 End: 5
Start: 0 End: 3
Start: 2 End: 5
Start: 4 End: 3

Categories