python infinite loop and numpy delete do not work properly - python

I wrote a function and it does not end. Logically len(array) should be decreasing but it stuck in 227. I think numpy delete does not work properly or I made mistake somewhere??
def segmenting (file, threshold):
segments = []
check = True
count = 0
while check == True:
if len(file) <= 2:
check = False
sequence = []
ids = []
for i in range(1, len(file)):
vector = [file[i,1] - file[0,1] , file[i,2]- file[0,2] ]
magnitude = math.sqrt(vector[0]**2 + vector[1]**2)
print(i)
if magnitude <= threshold:
sequence.append(file[i])
ids.append(i)
if i == len(file) and len(sequence) == 0:
file = np.delete(file, 0 , axis = 0)
break
if len(ids) >0 and len(sequence)>0 :
segments.append(sequence)
file = np.delete(file, ids , axis = 0)
print('sequence after :',sequence)
sequence = []
ids = []
print(len(file))
return segments

The following (simplified) logic will never be executed
for i in range(1, len(file)):
if i == len(file):
file = np.delete(file, 0)
Without having a way to remove the first line of the file, you have no way to exhaust your array. This check is superfluous anyway since after each iteration you won't need the first line anymore.
As a first fix you can put the check outside the loop and only check whether you've found any matches
for i in range(1, len(file)):
...
if len(sequence) == 0:
file = np.delete(file, 0)
But that way you would have one iteration where you find (and remove) matches and then one more with no more matches where you then remove it. Therefore, as said above, you should always remove the first line after each iteration.
With more simplifications, your code can be reduced down to:
def segmenting(file, threshold):
segments = []
while len(file) > 2:
idx = np.sqrt(np.sum((file[1:,1:3] - file[0,1:3])**2, axis=1)) <= threshold
file = file[1:]
segments.append(list(file[idx]))
file = file[np.logical_not(idx)]
return segments

It's likely due to the fact you are removing element from file array within a for loop, and also trying to iterate over for loop using file array. Try iterate over a clean version of file array(no modification on it), and do the deletion on a copy of file array
For example, one possible solution is to fix this line
for i in range(1, len(file)):
Fix like below
N=len(file)
for i in range(1, N):
Also you could remove flag variable 'check' and replace with break statement

Related

Not Parsing Through

I tried to parse through a text file, and see the index of the character where the four characters before it are each different. Like this:
wxrgh
The h would be the marker, since it is after the four different digits, and the index would be 4. I would find the index by converting the text into an array, and it works for the test but not for the actually input. Does anyone know what is wrong.
def Repeat(x):
size = len(x)
repeated = []
for i in range(_size):
k = i + 1
for j in range(k, _size):
if x[i] == x[j] and x[i] not in repeated:
repeated.append(x[i])
return repeated
with open("input4.txt") as f:
text = f.read()
test_array = []
split_array = list(text)
woah = ""
for i in split_array:
first = split_array[split_array.index(i)]
second = split_array[split_array.index(i) + 1]
third = split_array[split_array.index(i) + 2]
fourth = split_array[split_array.index(i) + 3]
test_array.append(first)
test_array.append(second)
test_array.append(third)
test_array.append(fourth)
print(test_array)
if Repeat(test_array) != []:
test_array = []
else:
woah = split_array.index(i)
print(woah)
print(woah)
I tried a test document and unit tests but that still does not work
You can utilise a set to help you with this.
Read the entire file into a list (buffer). Iterate over the buffer starting at offset 4. Create a set of the 4 characters that precede the current position. If the length of the set is 4 (i.e., they're all different) and the character at the current position is not in the set then you've found the index you're interested in.
W = 4
with open('input4.txt') as data:
buffer = data.read()
for i in range(W, len(buffer)):
if len(s := set(buffer[i-W:i])) == W and buffer[i] not in s:
print(i)
Note:
If the input data are split over multiple lines you may want to remove newline characters.
You will need to be using Python 3.8+ to take advantage of the assignment expression (walrus operator)

Longest chain of last word of line/first word of next

Okay, so I am trying to find from a text file the longest chain in which the last word of one line is the first word of the next (works well for poetry). The Python script I have to far works well but still takes an immensely long time. I am no coding expert and have really no idea of optimization. Am I running through more options than necessary?
How can I reduce the time it takes to run through a longer text?
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
import sys
# Opening the source text
with open("/text.txt") as g:
all_lines = g.readlines()
def last_word(particular_line):
if particular_line != "\n":
particular_line = re.sub(ur'^\W*|\W*$', "",particular_line)
if len(particular_line) > 1:
return particular_line.rsplit(None, 1)[-1].lower()
def first_word(particular_line):
if particular_line != "\n":
particular_line = re.sub(ur'^\W*|\W*$', "",particular_line)
if len(particular_line) > 1:
return particular_line.split(None, 1)[0].lower()
def chain(start, lines, depth):
remaining = list(lines)
del remaining[remaining.index(start)]
possibles = [x for x in remaining if (len(x.split()) > 2) and (first_word(x) == last_word(start))]
maxchain = []
for c in possibles:
l = chain(c, remaining, depth)
sys.stdout.flush()
sys.stdout.write(str(depth) + " of " + str(len(all_lines)) + " \r")
sys.stdout.flush()
if len(l) > len(maxchain):
maxchain = l
depth = str(depth) + "." + str(len(maxchain))
return [start] + maxchain
#Start
final_output = []
#Finding the longest chain
for i in range (0, len(all_lines)):
x = chain(all_lines[i], all_lines, i)
if len(x) > 2:
final_output.append(x)
final_output.sort(key = len)
#Output on screen
print "\n\n--------------------------------------------"
if len(final_output) > 1:
print final_output[-1]
else:
print "Nothing found"
import itertools
def matching_lines(line_pair):
return line_pair[0].split()[-1].lower() == line_pair[1].split()[0].lower()
line_pairs = ((line,next_line) for line,next_line in itertools.izip(all_lines,all_lines[1:]))
grouped_pairs = itertools.groupby(line_pairs,matching_lines)
print max([len(list(y))+1 for x,y in grouped_pairs if x])
although im not sure it will be faster (but i think it will be since it only iterates one time and uses mostly builtins)
Yes, this code has the complexity of $O(n^2)$. It means that if your file has n lines, then the amount of iterations your code will perform is 1 * (n-1) for the first line, then 1 * (n-2) for the second line etc, with n such elements. For a big n, this is relatively equal to $n^2$. Actually, there's a bug in the code in this line
del remaining[remaining.index(start)]
where you probably meant to run this:
del remaining[:remaining.index(start)]
(notice the ':' in the square brackets) which expands the runtime (now you have (n-1) + (n-1) + .. + (n-1) = n*(n-1), which is slightly bigger then (n-1) + (n-2) + (n-3) ..).
Your can optimize the code as so: begin with maxchainlen = 0, curchainlen = 0. Now, iterate through the lines, every time compare the first word of the current line to the last word of the previous line. If they match, increase curchainlen by 1. If they don't, check if maxchainlen < curchainlen, if so, assign maxchainlen = curchainlen, and init curchainlen to 0. After you finish iterating through the lines, do this checkup for maxchainlen again. Example:
lw = last_word(lines[0])
curchainlen = 0
maxchainlen = 0
for l in lines[2:]:
if lw = first_word(l):
curchainlen = curchainlen + 1
else:
maxchainlen = max(maxchainlen, curchainlen)
curchainlen = 0
maxchainlen = max(maxchainlen, curchainlen)
print(maxchainlen)
I'd try splitting this job into two phases: first finding the chains and then comparing them. That will simplify the code a lot. Since chains will be a small subset of all the lines in the file, finding them first and then sorting them will be quicker than trying to process the whole thing in one big go.
The first part of the problem is a lot easier if you use the python yield keyword, which is similar to return but doesn't end a function. This lets you loop over your content one line at a time and process it in small bites without needing to hold the whole thing in memory at all times.
Here's a basic way to grab a file one line at a time. It uses yield to pull out the chains as it finds them
def get_chains(*lines):
# these hold the last token and the
# members of this chain
previous = None
accum = []
# walk through the lines,
# seeing if they can be added to the existing chain in `accum`
for each_line in lines:
# split the line into words, ignoring case & whitespace at the ends
pieces = each_line.lower().strip().split(" ")
if pieces[0] == previous:
# match? add to accum
accum.append(each_line)
else:
# no match? yield our chain
# if it is not empty
if accum:
yield accum
accum = []
# update our idea of the last, and try the next line
previous = pieces[-1]
# at the end of the file we need to kick out anything
# still in the accumulator
if accum:
yield accum
When you feed this function a string of lines, it will yield out chains if it finds them and then continue. Whoever calls the function can capture the yielded chains and do things with them.
Once you've got the chains, it's easy to sort them by length and pick the longest. Since Python has built-in list sorting, just collect a list of line-length -> line pairs and sort it. The longest line will be the last item:
def longest_chain(filename):
with open (filename, 'rt') as file_handle:
# if you loop over an open file, you'll get
# back the lines in the file one at a time
incoming_chains = get_chains(*file_handle)
# collect the results into a list, keyed by lengths
all_chains = [(len(chain), chain ) for chain in incoming_chains]
if all_chains:
all_chains.sort()
length, lines = all_chains[-1]
# found the longest chain
return "\n".join(lines)
else:
# for some reason there are no chains of connected lines
return []

Making list of adjacent node pairs from Cube-formatted line file (using Python)

My files are formatted like this:
LINE NAME="FirstLine", MODE=15, ONEWAY=T, HEADWAY[1]=20, HEADWAY[2]=30,
HEADWAY[3]=20, HEADWAY[4]=30, HEADWAY[5]=30, VEHICLETYPE=2,
XYSPEED=20, N=-20609, -22042, -20600, 20601, 22839, 22838,
-20602, -20607, -20606, -20605, -20896, -20895, -20897, 20898,
-20899, -20905, -20906, -20910, 21104, -20911, -20912, 25065,
-21375
LINE NAME="SecondLine", MODE=15, ONEWAY=T, HEADWAY[1]=25, HEADWAY[2]=35,
[ETC]
I need to extract the lists of numbers that come after N= (one list for each N=), get rid of the minus-signs, and append each pair of adjacent numbers (e.g. [[20609, 22042], [22042, 20600]]) into a list of pairs. The major sticking part for Python-noob me is just extracting the lists of numbers as the first step (i.e. making what comes after each N= a list of its own).
If Python lists aren't ordered, I may have to make the lists strings and write each one as a line in a new file.
I was able to solve this by using the find method for LINE and N=. Finding LINE would increase an index and make a new item in a dictionary corresponding to that index. Finding N= would give the "definition" to that item in the dictionary -- a list with a single string element. Then for each item in the dictionary, I stripped spaces, replaced the - with '' (i.e. nothing), and used the split method with argument ',' to cut up the lists.
Then I zipped those lists Li[:-1] into themselves Li[1:] to get the adjacent-node pairs I needed.
Probably no one will ever find this useful (and I know it's probably convoluted), but here's my code:
with open(path + filename) as f:
i = 0
L = {}
for line in f:
existL = line.find("LINE")
existN = line.find("N=")
if existL > -1:
i = i + 1
L["Line" + str(i)] = []
if existN > -1:
go = 0
while go == 0:
txtNodes = line[line.rfind('=')+1:].strip()
nodes = txtNodes.split(',')
for node in nodes:
node = node.strip()
node = node.replace('-','')
if len(node) > 3:
L["Line" + str(i)].append(node)
try:
line = f.next()
if line.find("LINE") > -1:
go = go + 1
i = i + 1
L["Line" + str(i)] = []
except:
go = go + 1
Li = []
while i > 1:
L1 = L["Line" + str(i)][:-1]
L2 = L["Line" + str(i)][1:]
Lx = zip(L1,L2)
i = i-1
Li.extend(Lx)
I hate when people come to forums and don't follow up, so here's my follow-up. Sorry for posting in the wrong place initially.

Unbroken chain? Python iteration not being processed

So I've written a bit of code to stack integers in a list from the zeroth position. For some reason I cannot decipher, the while loop below is not being processed. I have followed all good style and syntax requirements that I know, and the while loop works when run by itself.
def row(line):
"""
Function that merges a single row or column.
"""
result_length = len(line)
print result_length
# Create a list of zeros the same length as the 'line' argument
pts_alloc = 0
dummy = 0
result = line
result[0:] = [pts_alloc for dummy in range(len(result))]
print result
#Iterate over the 'line' list looking for non-zero entries and
#stack them from 'result[0]'
line_count = 0
result_place = 0
while (line_count <= (len(line)-1)):
if (line[line_count] > 0):
result[result_place] = line[line_count]
print result
result_place += 1
line_count += 1
return result
print row([4, 0, 0, 5])
Is there a major error in this code that I've missed? Is there some syntax requirement that I am unaware of?
The problems seems to be this part:
result = line
result[0:] = [pts_alloc for dummy in range(len(result))]
By replacing a slice of result, with result = line, you are replacing that same slice in line, too, as result is just another reference to the same list, not a copy.
Since the slice is the entire list, anyway, just do:
result = [pts_alloc for dummy in range(len(result))]
Also, you are declaring a lot of unnecessary variables. You could shorten your code to this:
def row(line):
result = [0] * len(line)
result_place = 0
for x in line:
if x > 0:
result[result_place] = x
result_place += 1
return result
Or even this:
def row(line):
non_zero = [x for x in line if x > 0] # take non-zero values
return non_zero + [0] * (len(line) - len(non_zero)) # pad with zeros

Loop not iterating fully

I'm creating a simple RPG as a learning experience. In my code I have an array of tiles that are displaying on a 25x25 grid just fine, and a separate array that contains the True/False values pertaining to whether the tile is solid. The latter is not working; in my code below I have put a print statement exactly where it is not reaching, and i'm not quite sure what the problem is.
Also, the data for the level is simply a text file with a grid of 25x25 characters representing blocks.
def loadLevel(self, level):
fyle = open("levels/" + level,'r')
count = 0
for lyne in fyle:
if lyne.startswith("|"):
dirs = lyne.split('|')
self.north = dirs[1]
self.south = dirs[2]
self.east = dirs[3]
self.west = dirs[4]
continue
for t in range(25):
tempTile = Tiles.Tile()
tempTile.value = lyne[t]
tempTile.x = t
tempTile.y = count
self.levelData.append(tempTile)
count += 1
rowcount = 0
colcount = 0
for rows in fyle:
print('Doesnt get here!')
for col in rows:
if col == 2:
self.collisionLayer[rowcount][colcount] = False
else:
self.collisionLayer[rowcount][colcount] = True
colcount += 1
print(self.collisionLayer[rowcount[colcount]])
if rows == 2:
self.collisionLayer[rowcount][colcount] = False
else:
self.collisionLayer[rowcount][colcount] = True
rowcount += 1
print(self.collisionLayer)
Where exactly is the problem? I feel as though it is a quick fix but I'm simply not seeing it. Thanks!
You read through the file once with your first for loop, so there isn't anything left to read for the second loop. Seek back to the beginning of the file before starting the second loop:
fyle.seek(0)
Although I'd just cache the lines as a list, if possible:
with open('filename.txt', 'r') as handle:
lines = list(handle)
Also, you can replace this:
if rows == 2:
self.collisionLayer[rowcount][colcount] = False
else:
self.collisionLayer[rowcount][colcount] = True
With:
self.collisionLayer[rowcount][colcount] = rows != 2
The loop:
for lyne in fyle:
... reads all of fyle and leaves nothing to be read by the loop:
for rows in fyle:
I think you just need to reopen the file. If I recall, python will just keep going from where you left off. If there is nothing left, it can't read anything.
You can either re-open it, or use the fyle.seek(0) to go to the first character in the first line.

Categories