Python: looping faster using inbuilt functions

Python: looping faster using inbuilt functions - python

I took a python test where I had to code a function to solve the problem below. It passed some test cases but failed runtime for some others because it took a long time. The function feels bloated. How to make the function faster?
Here is the problem:
A truck fleet dispatcher is trying to determine which routes are still accessible after heavy rains flood certain highways. During their trips, trucks must follow linear, ordered paths between 26 waypoints labeled A through Z; in other words, they must traverse waypoints in either standard or reverse alphabetical order.
The only data the dispatcher can use is the trip logbook, which contains a record of the recent successful trips. The logbook is represented as a list of strings, where each string (corresponding to one entry) has two characters corresponding to the trip origin and destination waypoints respectively. If the logbook contains a record of a successful trip between two points, it can be assumed that all of the waypoints between those points are also accessible. Note that logbook entries imply that both directions of the traversal are valid. For example, an entry of RP means that trucks can move along both R --> Q --> P and P --> Q --> R.
Given an array of logbook entries, your task is to write a function to return the length of the longest consecutive traversal possible; in other words, compute the maximum number of consecutive edges known to be safe.
Example
For logbook = ["BG", "CA", "FI", "OK"], the output should be solution(logbook) = 8.
Because we can get both from A to C and from B to G, we can thus get from A to G. Because we can get from F to I and access I from G, we can therefore traverse A --> I. This corresponds to a traversal length of 8, since 8 edges connect these 9 waypoints. O through K is a length 4 traversal. These two paths are disjoint, so no longer consecutive paths can be found and the answer is 8.
Conditions:
The run time should be less than 4 seconds
def solution(logbook):
tpf = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
tpf_list = list(tpf)
flag=0
longest_route = [0 for x in range(26)]
for i in logbook:
p = sorted(i)
st_idx = p[0]
ed_idx = p[1]
for j in range(26):
if tpf_list[j]==st_idx:
flag = 1
if flag == 1 and tpf_list[j]!=ed_idx:
longest_route[j] = 1
if flag == 1 and tpf_list[j]==ed_idx:
flag =0
summ =1
list2=[]
for i in range(25):
if longest_route[i] == 0:
summ =1
if longest_route[i] == 1 and longest_route[i+1] == 1 :
summ+=1
if longest_route[i] == 1 and longest_route[i+1] == 0:
list2.append(summ)
return max(list2)

Related

Strictly Increasing Path in Grid with Python

At each step we can go the one of the left,right,up or down cells only if the that cell is strictly greater thab our current cell. (We cannot move diagonally). We want to find all the paths that we can go from the top-left cell to the bottom-right cell.
[[1,4,3],
[5,6,7]]
In our example, these paths are 1->4->6->7 and 1->5->6->7.
How can i solve it in reversible use?

First, note that the number of such paths is exponential (in the size of the graph).
For example, look at the graph:
1 2 3 ... n
2 3 4 ... n+1
3 4 5 ... n+2
...
n (n+1) (n+2) ... 2n
In here, you have exactly n rights and n down - which gives you exponential number of such choices - what makes "efficient" solution irrelevant (as you stated, you are looking for all paths).
One approach to find all solutions, is using a variation of DFS, where you can go to a neighbor cell if and only if it's strictly higher.
It should look something like:
def Neighbors(grid, current_location):
"""returns a list of up to 4 tuples neighbors to current location"""
pass # Implement it, should be straight forward
# Note: Using a list as default value is problematic, don't do it in actual code.
def PrintPaths(grid, current_path = [(0,0)], current_location = (0, 0)):
"""Will print all paths with strictly increasing values.
"""
# Stop clause: if got to the end - print current path.
if current_location[0] == len(grid) and current_location[1] == len(grid[current_location[0]):
print(current_path)
return
# Iterate over all valid neighbors:
for next in Neighbors(grid, current_location):
if grid[current_location[0]][current_location[1]] < grid[next[0]][next[1]]:
current_path = current_path + next
PrintPaths(grid, current_path, next)
# When getting back here, all that goes through prefix + next are printed.
# Get next out of the current path
current_path.pop()

indegrees in topological sort to solve CouseSchedule with Kahn's algorithm

I am learning to solve a topological sort problem in leetcode
There are a total of n courses you have to take, labeled from 0 to n-1.
Some courses may have prerequisites, for example to take course 0 you have to first take course 1, which is expressed as a pair: [0,1]
Given the total number of courses and a list of prerequisite pairs, is it possible for you to finish all courses?
Example 1:
Input: 2, [[1,0]]
Output: true
Explanation: There are a total of 2 courses to take.
To take course 1 you should have finished course 0. So it is possible.
Example 2:
Input: 2, [[1,0],[0,1]]
Output: false
Explanation: There are a total of 2 courses to take.
To take course 1 you should have finished course 0, and to take course 0 you should
also have finished course 1. So it is impossible.
Note:
The input prerequisites is a graph represented by a list of edges, not adjacency matrices. Read more about how a graph is represented.
You may assume that there are no duplicate edges in the input prerequisites.
I read the following toposort solution in discussion area
class Solution5:
def canFinish(self,numCourses, prerequirements):
"""
:type numCourse: int
:type prerequirements: List[List[int]]
:rtype:bool
"""
if not prerequirements: return True
count = []
in_degrees = defaultdict(int)
graph = defaultdict(list)
for u, v in prerequirements:
graph[v].append(u)
in_degrees[u] += 1 #Confused here
queue = [u for u in graph if in_degrees[u]==0]
while queue:
s = queue.pop()
count.append(s)
for v in graph(s):
in_degrees[v] -= 1
if in_degrees[v] ==0:
queue.append(v)
#check there exist a circle
for u in in_degrees:
if in_degrees[u]:
return False
return True
I am confused about in_degrees[u] += 1
for u, v in prerequirements:
graph[v].append(u)
in_degrees[u] += 1 #Confused here
for directed edge (u,v) , u -----> v , node u has one outdegree while node v has one indegree.
So I think, in_degrees[u] += 1 should be changed to in_degrees[v] += 1
because if there exist (u,v), then v has at least one incoming incident and one indegree
In Degree: This is applicable only for directed graph. This represents the number of edges incoming to a vertex.
However, the orginal solution works.
What's the problem with my understanding?

Look at the line above; graph[v].append(u). The edges are actually going in reverse direction to your assumption and the input format. This is because for topological sort, we want the things with no dependencies/incoming edges to end up at the front of the resulting order, so we direct the edges according to the interpretation, "is a requirement for" rather than "requires". Eg. input pair (0,1) means 0 requires 1, so in the graph we draw a directed edge (1,0) so that 1 can precede 0 in our sort. Thus 0 gains indegree from considering this pair.

Calculate height of an arbitrary (non-binary) tree

I'm currently taking on online data structures course and this is one of the homework assignments; please guide me towards the answer rather than giving me the answer.
The prompt is as follows:
Task. You are given a description of a rooted tree. Your task is to compute and output its height. Recall that the height of a (rooted) tree is the maximum depth of a node, or the maximum distance from a leaf to the root. You are given an arbitrary tree, not necessarily a binary tree.
Input Format. The first line contains the number of nodes n. The second line contains integer numbers from −1 to n−1 parents of nodes. If the i-th one of them (0 ≤ i ≤ n−1) is −1, node i is the root, otherwise it’s 0-based index of the parent of i-th node. It is guaranteed that there is exactly one root. It is guaranteed that the input represents a tree.
Constraints. 1 ≤ n ≤ 105.
My current solution works, but is very slow when n > 102. Here is my code:
# python3
import sys
import threading
# In Python, the default limit on recursion depth is rather low,
# so raise it here for this problem. Note that to take advantage
# of bigger stack, we have to launch the computation in a new thread.
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**27) # new thread will get stack of such size
threading.Thread(target=main).start()
# returns all indices of item in seq
def listOfDupes(seq, item):
start = -1
locs = []
while True:
try:
loc = seq.index(item, start+1)
except:
break
else:
locs.append(loc)
start = loc
return locs
def compute_height(node, parents):
if node not in parents:
return 1
else:
return 1 + max(compute_height(i, parents) for i in listOfDupes(parents, node))
def main():
n = int(input())
parents = list(map(int, input().split()))
print(compute_height(parents.index(-1), parents))
Example input:
>>> 5
>>> 4 -1 4 1 1
This will yield a solution of 3, because the root is 1, 3 and 4 branch off of 1, then 0 and 2 branch off of 4 which gives this tree a height of 3.
How can I improve this code to get it under the time benchmark of 3 seconds? Also, would this have been easier in another language?

Python will be fine as long as you get the algorithm right. Since you're only looking for guidance, consider:
1) We know the depth of a node iif the depth of its parent is known; and
2) We're not interested in the tree's structure, so we can throw irrelevant information away.
The root node pointer has the value -1. Suppose that we replaced its children's pointers to the root node with the value -2, their children's pointers with -3, and so forth. The greatest absolute value of these is the height of the tree.
If we traverse the tree from an arbitrary node N(0) we can stop as soon as we encounter a negative value at node N(k), at which point we can replace each node with the value of its parent, less one. I.e, N(k-1) = N(k) -1, N(k-2)=N(k-1) - 1... N(0) = N(1) -1. As more and more pointers are replaced by their depth, each traversal is more likely to terminate by encountering a node whose depth is already known. In fact, this algorithm takes basically linear time.
So: load your data into an array, start with the first element and traverse the pointers until you encounter a negative value. Build another array of the nodes traversed as you go. When you encounter a negative value, use the second array to replace the original values in the first array with their depth. Do the same with the second element and so forth. Keep track of the greatest depth you've encountered: that's your answer.

The structure of this question looks like it would be better solved bottom up rather than top down. Your top-down approach spends time seeking, which is unnecessary, e.g.:
def height(tree):
for n in tree:
l = 1
while n != -1:
l += 1
n = tree[n]
yield l
In []:
tree = '4 -1 4 1 1'
max(height(list(map(int, tree.split()))))
Out[]:
3
Or if you don't like a generator:
def height(tree):
d = [1]*len(tree)
for i, n in enumerate(tree):
while n != -1:
d[i] += 1
n = tree[n]
return max(d)
In []:
tree = '4 -1 4 1 1'
height(list(map(int, tree.split())))
Out[]:
3
The above is brute force as it doesn't take advantage of reusing parts of the tree you've already visited, it shouldn't be too hard to add that.

Your algorithm spends a lot of time searching the input for the locations of numbers. If you just iterate over the input once, you can record the locations of each number as you come across them, so you don't have to keep searching over and over later. Consider what data structure would be effective for recording this information.

Creating lists of mutual neighbor elements

Say, I have a set of unique, discrete parameter values, stored in a variable 'para'.
para=[1,2,3,4,5,6,7,8,9,10]
Each element in this list has 'K' number of neighbors (given: each neighbor ϵ para).
EDIT: This 'K' is obviously not the same for each element.
And to clarify the actual size of my problem: I need a neighborhood of close to 50-100 neighbors on average, given that my para list is around 1000 elements large.
NOTE: A neighbor of an element, is another possible 'element value' to which it can jump, by a single mutation.
neighbors_of_1 = [2,4,5,9] #contains all possible neighbors of 1 (i.e para[0])
Question: How can I define each of the other element's
neighbors randomly from 'para', but, keeping in mind the previously
assigned neighbors/relations?
eg:
neighbors_of_5=[1,3,7,10] #contains all possible neighbors of 5 (i.e para[4])
NOTE: '1' has been assigned as a neighbor of '5', keeping the values of 'neighbors_of_1' in mind. They are 'mutual' neighbors.
I know the inefficient way of doing this would be, to keep looping through the previously assigned lists and check if the current state is a neighbor of another state, and if True, store the value of that state as one of the new neighbors.
Is there a cleaner/more pythonic way of doing this? (By maybe using the concept of linked-lists or any other method? Or are lists redundant?)

This solution does what you want, I believe. It is not the most efficient, as it generates quite a bit of extra elements and data, but the run time was still short on my computer and I assume you won't run this repeatedly in a tight, inner loop?
import itertools as itools
import random
# Generating a random para variable:
#para=[1,2,3,4,5,6,7,8,9,10]
para = list(range(10000))
random.shuffle(para)
para = para[:1000]
# Generate all pais in para (in random order)
pairs = [(a,b) for a, b in itools.product(para, para) if a < b]
random.shuffle(pairs)
K = 50 # average number of neighbors
N = len(para)*K//2 # total connections
# Generating a neighbors dict, holding all the neighbors of an element
neighbors = dict()
for elem in para:
neighbors[elem] = []
# append the neighbors to eachother
for pair in pairs[:N]:
neighbors[pair[0]].append(pair[1])
neighbors[pair[1]].append(pair[0])
# sort each neighbor list
for neighbor in neighbors.values():
neighbor.sort()
I hope you understand my solution. Otherwise feel free to ask for a few pointers.

Neighborhood can be represented by a graph. If N is a neighbor of B does not necessarily implies that B is a neighbor of A, it is directed. Else it is undirected. I'm guessing you want a undirected graph since you want to "keep in mind the relationship between the nodes".
Besides the obvious choice of using a third party library for graphs, you can solve your issue by using a set of edges between the graph vertices. Edges can be represented by the pair of their two extremities. Since they are undirected, either you use a tuple (A,B), such that A < B or you use a frozenset((A,B)).
Note there are considerations to take about what neighbor to randomly choose from when in the middle of the algorithm, like discouraging to pick nodes with a lot of neighbor to avoid to go over your limits.
Here is a pseudo-code of what I'd do.
edges = set()
arities = [ 0 for p in para ]
for i in range(len(para)):
p = para[i]
arity = arities[i]
n = random.randrange(50, 100)
k = n
while k > 0:
w = list(map(lambda x : 1/x, arities))
#note: test what random scheme suits you best
j = random.choices(para, weight = w )
#note: I'm storing the vertices index in the edges rather than the nodes.
#But if the nodes are unique, you could store the nodes.
e = frozenset((i,j))
if e not in edges:
edges.add(e)
#instead of arities, you could have a list of list of the neighbours.
#arity[i] would be len(neighbors[i]), then
arities[i] += 1
arities[j] += 1
k-=1

k-greatest double selection

Imagine you have two sacks (A and B) with N and M balls respectively in it. Each ball with a known numeric value (profit). You are asked to extract (with replacement) the pair of balls with the maximum total profit (given by the multiplication of the selected balls).
The best extraction is obvious: Select the greatest valued ball from A as well as from B.
The problem comes when you are asked to give the 2nd or kth best selection. Following the previous approach you should select the greatest valued balls from A and B without repeating selections.
This can be clumsily solved calculating the value of every possible selection, ordering and ordering it (example in python):
def solution(A,B,K):
if K < 1:
return 0
pool = []
for a in A:
for b in B:
pool.append(a*b)
pool.sort(reverse=True)
if K>len(pool):
return 0
return pool[K-1]
This works but its worst time complexity is O(N*M*Log(M*M)) and I bet there are better solutions.
I reached a solution based on a table where A and B elements are sorted from higher value to lower and each of these values has associated an index representing the next value to test from the other column. Initially this table would look like:
The first element from A is 25 and it has to be tested (index 2 select from b = 0) against 20 so 25*20=500 is the first greatest selection and, after increasing the indexes to check, the table changes to:
Using these indexes we have a swift way to get the best selection candidates:
25 * 20 = 500 #first from A and second from B
20 * 20 = 400 #second from A and first from B
I tried to code this solution:
def solution(A,B,K):
if K < 1:
return 0
sa = sorted(A,reverse=true)
sb = sorted(B,reverse=true)
for k in xrange(K):
i = xfrom
j = yfrom
if i >= n and j >= n:
ret = 0
break
best = None
while i < n and j < n:
selected = False
#From left
nexti = i
nextj = sa[i][1]
a = sa[nexti][0]
b = sb[nextj][0]
if best is None or best[2]<a*b:
selected = True
best = [nexti,nextj,a*b,'l']
#From right
nexti = sb[j][1]
nextj = j
a = sa[nexti][0]
b = sb[nextj][0]
if best is None or best[2]<a*b:
selected = True
best = [nexti,nextj,a*b,'r']
#Keep looking?
if not selected or abs(best[0]-best[1])<2:
break
i = min(best[:2])+1
j = i
print("Continue with: ", best, selected,i,j)
#go,go,go
print(best)
if best[3] == 'l':
dx[best[0]][1] = best[1]+1
dy[best[1]][1] += 1
else:
dx[best[0]][1] += 1
dy[best[1]][1] = best[0]+1
if dx[best[0]][1]>= n:
xfrom = best[0]+1
if dy[best[1]][1]>= n:
yfrom = best[1]+1
ret = best[2]
return ret
But it did not work for the on-line Codility judge (Did I mention this is part of the solution to an, already expired, Codility challenge? Sillicium 2014)
My questions are:
Is the second approach an unfinished good solution? If that is the case, any clue on what I may be missing?
Do you know any better approach for the problem?

You need to maintain a priority queue.
You start with (sa[0], sb[0]), then move onto (sa[0], sb[1]) and (sa[1], sb[0]). If (sa[0] * sb[1]) > (sa[1] * sb[0]), can we say anything about the comparative sizes of (sa[0], sb[2]) and (sa[1], sb[0])?
The answer is no. Thus we must maintain a priority queue, and after removing each (sa[i], sb[j]) (such that sa[i] * sb[j] is the biggest in the queue), we must add to the priority queue (sa[i - 1], sb[j]) and (sa[i], sb[j - 1]), and repeat this k times.
Incidentally, I gave this algorithm as an answer to a different question. The algorithm may seem to be different at first, but essentially it's solving the same problem.

I'm not sure I understand the "with replacement" bit...
...but assuming this is in fact the same as "How to find pair with kth largest sum?", then the key to the solution is to consider the matrix S of all the sums (or products, in your case), constructed from A and B (once they are sorted) -- this paper (referenced by #EvgenyKluev) gives this clue.
(You want A*B rather than A+B... but the answer is the same -- though negative numbers complicate but (I think) do not invalidate the approach.)
An example shows what is going on:
for A = (2, 3, 5, 8, 13)
and B = (4, 8, 12, 16)
we have the (notional) array S, where S[r, c] = A[r] + B[c], in this case:
6 ( 2+4), 10 ( 2+8), 14 ( 2+12), 18 ( 2+16)
7 ( 3+4), 11 ( 3+8), 15 ( 3+12), 19 ( 3+16)
9 ( 5+4), 13 ( 5+8), 17 ( 5+12), 21 ( 5+16)
12 ( 8+4), 16 ( 8+8), 20 ( 8+12), 14 ( 8+16)
17 (13+4), 21 (13+8), 25 (13+12), 29 (13+16)
(As the referenced paper points out, we don't need to construct the array S, we can generate the value of an item in S if or when we need it.)
The really interesting thing is that each column of S contains values in ascending order (of course), so we can extract the values from S in descending order by doing a merge of the columns (reading from the bottom).
Of course, merging the columns can be done using a priority queue (heap) -- hence the max-heap solution. The simplest approach being to start the heap with the bottom row of S, marking each heap item with the column it came from. Then pop the top of the heap, and push the next item from the same column as the one just popped, until you pop the kth item. (Since the bottom row is sorted, it is a trivial matter to seed the heap with it.)
The complexity of this is O(k log n) -- where 'n' is the number of columns. The procedure works equally well if you process the rows... so if there are 'm' rows and 'n' columns, you can choose the smaller of the two !
NB: the complexity is not O(k log k)... and since for a given pair of A and B the 'n' is constant, O(k log n) is really O(k) !!
If you want to do many probes for different 'k', then the trick might be to cache the state of the process every now and then, so that future 'k's can be done by restarting from the nearest check-point. In the limit, one would run the merge to completion and store all possible values, for O(1) lookup !

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: looping faster using inbuilt functions - python

Related

Strictly Increasing Path in Grid with Python

indegrees in topological sort to solve CouseSchedule with Kahn's algorithm

Calculate height of an arbitrary (non-binary) tree

Creating lists of mutual neighbor elements

k-greatest double selection

Categories

Resources