I have a list of elements with a certain attribute/variable. First of all I need to check if all attributes have the same value, if not I need to adapt this attribute for every element to the highes value.
The thing is, it is not difficult to programm such a loop. However, I would like to know what the most efficient way is to do so.
My current approach works fine, but it loops through the list 2 times, has two local variables and just does not feel efficient enough.
I simplified the code. This is basically what I've got:
biggest_value = 0
re_calc = 0
for _, element in enumerate(element_list):
if element.value > biggest_value :
biggest_value = element.value
re_calc += 1
if re_calc > 1:
for _, element in enumerate(element_list):
element.value = adjust_value(biggest_value)
element_list(_) = element
The thing annoying me is the necessity of the "re_calc" variable. A simple check for the biggest value is no big deal. But this task consists out of 3 steps:
"Compare Attributes--> Finding Biggest Value --> possibly Adjust Others". However I do not want to loop over this list 3 times. Not even two times as my current suggestion does.
There has to be a more efficient way. Any ideas? Thanks in advance.
The first loop is just determing the largest value of the element_list. So an approach can be:
transform the element_list into a numpy array. Unfortunately you do not tell, how the list looks like. But if the list contains numbers then
L = np.array(element_list)
can probably do it.
After that use np.max(L). Numpy commands without for loops are usually much faster.
import numpy as np
nl = 10
L = np.random.rand(nl)
biggest_value = np.max(L)
L, biggest_value
gives
(array([0.70047074, 0.14160459, 0.75061621, 0.89013494, 0.70587705,
0.50218377, 0.31197993, 0.42670057, 0.67869183, 0.04415816]),
0.8901349369179461)
In the second for-loop it is not obvious what you want to achieve. Unfortunately you do not give an input and a desired output and do not not tell what adjust_value has to do. A minimal running code with data would be helpful to give a support.
I have the following situation:
I am generating n combinations of size 3 from, made from n values. Each kth combination [0...n] is pulled from a pool of values, located in the kth index of a list of n sets. Each value can appear 3 times. So if I have 10 values, then I have a list of size 10. Each index holds a set of values 0-10.
So, it seems to me that a good way to do this is to have something keeping count of all the available values from among all the sets. So, if a value is rare(lets say there is only 1 left), if I had a structure where I could look up the rarest value, and have the structure tell me which index it was located in, then it would make generating the possible combinations much easier.
How could I do this? What about one structure to keep count of elements, and a dictionary to keep track of list indices that contain the value?
edit: I guess I should put in that a specific problem I am looking to solve here, is how to update the set for every index of the list (or whatever other structures i end up using), so that when I use a value 3 times, it is made unavailable for every other combination.
Thank you.
Another edit
It seems that this may be a little too abstract to be asking for solutions when it's hard to understand what I am even asking for. I will come back with some code soon, please check back in 1.5-2 hours if you are interested.
how to update the set for every index of the list (or whatever other structures i end up using), so that when I use a value 3 times, it is made unavailable for every other combination.
I assume you want to sample the values truly randomly, right? What if you put 3 of each value into a list, shuffle it with random.shuffle, and then just keep popping values from the end of the list when you're building your combination? If I'm understanding your problem right, here's example code:
from random import shuffle
valid_values = [i for i in range(10)] # the valid values are 0 through 9 in my example, update accordingly for yours
vals = 3*valid_values # I have 3 of each valid value
shuffle(vals) # randomly shuffle them
while len(vals) != 0:
combination = (vals.pop(), vals.pop(), vals.pop()) # combinations are 3 values?
print(combination)
EDIT: Updated code based on the added information that you have sets of values (but this still assumes you can use more than one value from a given set):
from random import shuffle
my_sets_of_vals = [......] # list of sets
valid_values = list()
for i in range(my_sets_of_vals):
for val in my_sets_of_vals[i]:
valid_values.append((i,val)) # this can probably be done in list comprehension but I forgot the syntax
vals = 3*valid_values # I have 3 of each valid value
shuffle(vals) # randomly shuffle them
while len(vals) != 0:
combination = (vals.pop()[1], vals.pop()[1], vals.pop()[1]) # combinations are 3 values?
print(combination)
Based on the edit you could make an object for each value. It could hold the number of times you have used the element and the element itself. When you find you have used an element three times, remove it from the list
I often find myself writing code like:
mylist = [247]
while mylist:
nextlist = []
for element in mylist:
print element
if element%2==0:
nextlist.append(element/2)
elif element !=1:
nextlist.append(3*element+1)
mylist = nextlist
Okay - it's generally not this simple [and usually it really is with long lists, I just chose this (see xkcd) for fun], but I create a list, iterate over it doing things with those elements. While doing this, I will discover new things that I will need to iterate over, and I put them into a new list which I then iterate over.
It appears to be possible to write:
mylist=[247]
for element in mylist:
print element
if element%2 == 0:
mylist.append(element/2)
elif element !=1:
mylist.append(element*3+1)
I know that it's considered dangerous to modify a list while iterating over it, but in this case I want to iterate over the new elements.
Are there dangers from doing this? The only one I can think of is that the list may grow and take up a lot of memory (in many of my cases I actually want to have the whole list at the end). Are there others I'm ignoring?
Please note: Python: Adding element to list while iterating is related, but explains ways to create a copy of the list so that we can avoid iterating over the original. I'm asking about whether there is anything wrong in my specific case where I actually want my iteration to be extended.
edit: here is something closer to the real problem. Say we want to generate the "k-core" of a network. That is, delete all nodes with degree less than k. From remaining network delete all nodes with degree less than k. Repeat until none left to delete. The algorithm would find all less than k nodes to begin with, put them in a to_delete list. Then as nodes are deleted, if a neighbor's degree becomes k-1, add it to the list. This could be done by:
delete_list = [node for node in G.nodes() if G.degree(node)<k]
for node in delete_list:
nbrs = G.neighbors(node)
for nbr in nbrs:
if G.degree(nbr)==k:
delete_list.append(nbr)
G.remove_node(node)
Yes, it's fairly safe to append to a list you're iterating over, at least in the way that you're doing it. The only issue would be if the list grew so large that it caused memory issues, though that's only going to be an issue for you with very large numbers.
That said, I would probably use a while loop in this case, whether or not you want to have the entire list at the end.
current = 247
result_list = [current]
while current != 1:
if current % 2 == 0:
current /= 2
else:
current = current * 3 + 1
result_list.append(current)
Though really I would probably use a generator.
def collatz(start):
current = start
yield current
while current != 1:
if current % 2 == 0:
current /= 2
else:
current = current * 3 + 1
yield current
Shout-out to the Collatz conjecture! :D
As it's (currently) implemented yes, as it's specified no.
That means that it's a risky idea to modify the list while iterating through it and relying on that the behaviour will remain. One could of course argue that there is no reason why the behaviour would change in this case, but that is relying on an assumption that changes need a reason to happen.
here is a merge sort logic in python : (this is the first part, ignore the function merge()) The point in question is converting the recursive logic to a while loop.
Code courtesy: Rosettacode Merge Sort
def merge_sort(m):
if len(m) <= 1:
return m
middle = len(m) / 2
left = m[:middle]
right = m[middle:]
left = merge_sort(left)
right = merge_sort(right)
return list(merge(left, right))
Is it possible to make it a sort of dynamically in the while loop while each left and right array breaks into two, a sort of pointer keeps increasing based on the number of left and right arrays and breaking them until only single length sized list remains?
because every time the next split comes while going on both left- and right- side the array keeps breaking down till only single length list remains, so the number of left sided (left-left,left-right) and right sided (right-left,right-right) breaks will increase till it reaches a list of size 1 for all.
One possible implementation might be this:
def merge_sort(m):
l = [[x] for x in m] # split each element to its own list
while len(l) > 1: # while there's merging to be done
for x in range(len(l) >> 1): # take the first len/2 lists
l[x] = merge(l[x], l.pop()) # and merge with the last len/2 lists
return l[0] if len(l) else []
Stack frames in the recursive version are used to store progressively smaller lists that need to be merged. You correctly identified that at the bottom of the stack, there's a one-element list for each element in whatever you're sorting. So, by starting from a series of one-element lists, we can iteratively build up larger, merged lists until we have a single, sorted list.
Reposted from alternative to recursion based merge sort logic at the request of a reader:
One way to eliminate recursion is to use a queue to manage the outstanding work. For example, using the built-in collections.deque:
from collections import deque
from heapq import merge
def merge_sorted(iterable):
"""Return a list consisting of the sorted elements of 'iterable'."""
queue = deque([i] for i in iterable)
if not queue:
return []
while len(queue) > 1:
queue.append(list(merge(queue.popleft(), queue.popleft())))
return queue[0]
It's said, that every recursive function can be written in a non-recursive manner, so the short answer is: yes, it's possible. The only solution I can think of is to use the stack-based approach. When recursive function invokes itself, it puts some context (its arguments and return address) on the inner stack, which isn't available for you. Basically, what you need to do in order to eliminate recursion is to write your own stack and every time when you would make a recursive call, put the arguments onto this stack.
For more information you can read this article, or refer to the section named 'Eliminating Recursion' in Robert Lafore's "Data Structures and Algorithms in Java" (although all the examples in this book are given in Java, it's pretty easy to grasp the main idea).
Going with Dan's solution above and taking the advice on pop, still I tried eliminating while and other not so pythonic approach. Here is a solution that I have suggested:
PS: l = len
My doubt on Dans solution is what if L.pop() and L[x] are same and a conflict is created, as in the case of an odd range after iterating over half of the length of L?
def merge_sort(m):
L = [[x] for x in m] # split each element to its own list
for x in xrange(l(L)):
if x > 0:
L[x] = merge(L[x-1], L[x])
return L[-1]
This can go on for all academic discussions but I got my answer to an alternative to recursive method.
I've done a lot of Googling, but haven't found anything, so I'm really sorry if I'm just searching for the wrong things.
I am writing an implementation of the Ghost for MIT Introduction to Programming, assignment 5.
As part of this, I need to determine whether a string of characters is the start of any valid word. I have a list of valid words ("wordlist").
Update: I could use something that iterated through the list each time, such as Peter's simple suggestion:
def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)
I previously had:
wordlist = [w for w in wordlist if w.startswith(word_fragment)]
(from here) to narrow the list down to the list of valid words that start with that fragment and consider it a loss if wordlist is empty. The reason that I took this approach was that I (incorrectly, see below) thought that this would save time, as subsequent lookups would only have to search a smaller list.
It occurred to me that this is going through each item in the original wordlist (38,000-odd words) checking the start of each. This seems silly when wordlist is ordered, and the comprehension could stop once it hits something that is after the word fragment. I tried this:
newlist = []
for w in wordlist:
if w[:len(word_fragment)] > word_fragment:
# Take advantage of the fact that the list is sorted
break
if w.startswith(word_fragment):
newlist.append(w)
return newlist
but that is about the same speed, which I thought may be because list comprehensions run as compiled code?
I then thought that more efficient again would be some form of binary search in the list to find the block of matching words. Is this the way to go, or am I missing something really obvious?
Clearly it isn't really a big deal in this case, but I'm just starting out with programming and want to do things properly.
UPDATE:
I have since tested the below suggestions with a simple test script. While Peter's binary search/bisect would clearly be better for a single run, I was interested in whether the narrowing list would win over a series of fragments. In fact, it did not:
The totals for all strings "p", "py", "pyt", "pyth", "pytho" are as follows:
In total, Peter's simple test took 0.175472736359
In total, Peter's bisect left test took 9.36985015869e-05
In total, the list comprehension took 0.0499348640442
In total, Neil G's bisect took 0.000373601913452
The overhead of creating a second list etc clearly took more time than searching the longer list. In hindsight, this was likely the best approach regardless, as the "reducing list" approach increased the time for the first run, which was the worst case scenario.
Thanks all for some excellent suggestions, and well done Peter for the best answer!!!
Generator expressions are evaluated lazily, so if you only need to determine whether or not your word is valid, I would expect the following to be more efficient since it doesn't necessarily force it to build the full list once it finds a match:
def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)
Note that the lack of square brackets is important for this to work.
However this is obviously still linear in the worst case. You're correct that binary search would be more efficient; you can use the built-in bisect module for that. It might look something like this:
from bisect import bisect_left
def word_exists(wordlist, word_fragment):
try:
return wordlist[bisect_left(wordlist, word_fragment)].startswith(word_fragment)
except IndexError:
return False # word_fragment is greater than all entries in wordlist
bisect_left runs in O(log(n)) so is going to be considerably faster for a large wordlist.
Edit: I would guess that the example you gave loses out if your word_fragment is something really common (like 't'), in which case it probably spends most of its time assembling a large list of valid words, and the gain from only having to do a partial scan of the list is negligible. Hard to say for sure, but it's a little academic since binary search is better anyway.
You're right that you can do this more efficiently given that the list is sorted.
I'm building off of #Peter's answer, which returns a single element. I see that you want all the words that start with a given prefix. Here's how you do that:
from bisect import bisect_left
wordlist[bisect_left(wordlist, word_fragment):
bisect_left(wordlist, word_fragment[:-1] + chr(ord(word_fragment[-1])+1))]
This returns the slice from your original sorted list.
As Peter suggested I would use the Bisect module. Especially if you're reading from a large file of words.
If you really need speed you could make a daemon ( How do you create a daemon in Python? ) that has a pre-processed data structure suited for the task
I suggest you could use "tries"
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=usingTries
There are many algorithms and data structures to index and search
strings inside a text, some of them are included in the standard
libraries, but not all of them; the trie data structure is a good
example of one that isn't.
Let word be a single string and let dictionary be a large set of
words. If we have a dictionary, and we need to know if a single word
is inside of the dictionary the tries are a data structure that can
help us. But you may be asking yourself, "Why use tries if set
and hash tables can do the same?" There are two main reasons:
The tries can insert and find strings in O(L) time (where L represent
the length of a single word). This is much faster than set , but is it
a bit faster than a hash table.
The set and the hash tables
can only find in a dictionary words that match exactly with the single
word that we are finding; the trie allow us to find words that have a
single character different, a prefix in common, a character missing,
etc.
The tries can be useful in TopCoder problems, but also have a
great amount of applications in software engineering. For example,
consider a web browser. Do you know how the web browser can auto
complete your text or show you many possibilities of the text that you
could be writing? Yes, with the trie you can do it very fast. Do you
know how an orthographic corrector can check that every word that you
type is in a dictionary? Again a trie. You can also use a trie for
suggested corrections of the words that are present in the text but
not in the dictionary.
an example would be:
start={'a':nodea,'b':nodeb,'c':nodec...}
nodea={'a':nodeaa,'b':nodeab,'c':nodeac...}
nodeb={'a':nodeba,'b':nodebb,'c':nodebc...}
etc..
then if you want all the words starting with ab you would just traverse
start['a']['b'] and that would be all the words you want.
to build it you could iterate through your wordlist and for each word, iterate through the characters adding a new default dict where required.
In case of binary search (assuming wordlist is sorted), I'm thinking of something like this:
wordlist = "ab", "abc", "bc", "bcf", "bct", "cft", "k", "l", "m"
fragment = "bc"
a, m, b = 0, 0, len(wordlist)-1
iterations = 0
while True:
if (a + b) / 2 == m: break # endless loop = nothing found
m = (a + b) / 2
iterations += 1
if wordlist[m].startswith(fragment): break # found word
if wordlist[m] > fragment >= wordlist[a]: a, b = a, m
elif wordlist[b] >= fragment >= wordlist[m]: a, b = m, b
if wordlist[m].startswith(fragment):
print wordlist[m], iterations
else:
print "Not found", iterations
It will find one matched word, or none. You will then have to look to the left and right of it to find other matched words. My algorithm might be incorrect, its just a rough version of my thoughts.
Here's my fastest way to narrow the list wordlist down to a list of valid words starting with a given fragment :
sect() is a generator function that uses the excellent Peter's idea to employ bisect, and the islice() function :
from bisect import bisect_left
from itertools import islice
from time import clock
A,B = [],[]
iterations = 5
repetition = 10
with open('words.txt') as f:
wordlist = f.read().split()
wordlist.sort()
print 'wordlist[0:10]==',wordlist[0:10]
def sect(wordlist,word_fragment):
lgth = len(word_fragment)
for w in islice(wordlist,bisect_left(wordlist, word_fragment),None):
if w[0:lgth]==word_fragment:
yield w
else:
break
def hooloo(wordlist,word_fragment):
usque = len(word_fragment)
for w in wordlist:
if w[:usque] > word_fragment:
break
if w.startswith(word_fragment):
yield w
for rep in xrange(repetition):
te = clock()
for i in xrange(iterations):
newlistA = list(sect(wordlist,'VEST'))
A.append(clock()-te)
te = clock()
for i in xrange(iterations):
newlistB = list(hooloo(wordlist,'VEST'))
B.append(clock() - te)
print '\niterations =',iterations,' number of tries:',repetition,'\n'
print newlistA,'\n',min(A),'\n'
print newlistB,'\n',min(B),'\n'
result
wordlist[0:10]== ['AA', 'AAH', 'AAHED', 'AAHING', 'AAHS', 'AAL', 'AALII', 'AALIIS', 'AALS', 'AARDVARK']
iterations = 5 number of tries: 30
['VEST', 'VESTA', 'VESTAL', 'VESTALLY', 'VESTALS', 'VESTAS', 'VESTED', 'VESTEE', 'VESTEES', 'VESTIARY', 'VESTIGE', 'VESTIGES', 'VESTIGIA', 'VESTING', 'VESTINGS', 'VESTLESS', 'VESTLIKE', 'VESTMENT', 'VESTRAL', 'VESTRIES', 'VESTRY', 'VESTS', 'VESTURAL', 'VESTURE', 'VESTURED', 'VESTURES']
0.0286089433154
['VEST', 'VESTA', 'VESTAL', 'VESTALLY', 'VESTALS', 'VESTAS', 'VESTED', 'VESTEE', 'VESTEES', 'VESTIARY', 'VESTIGE', 'VESTIGES', 'VESTIGIA', 'VESTING', 'VESTINGS', 'VESTLESS', 'VESTLIKE', 'VESTMENT', 'VESTRAL', 'VESTRIES', 'VESTRY', 'VESTS', 'VESTURAL', 'VESTURE', 'VESTURED', 'VESTURES']
0.415578236899
sect() is 14.5 times faster than holloo()
PS:
I know the existence of timeit, but here, for such a result, clock() is fully sufficient
Doing binary search in the list is not going to guarantee you anything. I am not sure how that would work either.
You have a list which is ordered, it is a good news. The algorithmic performance complexity of both your cases is O(n) which is not bad, that you just have to iterate through the whole wordlist once.
But in the second case, the performance (engineering performance) should be better because you are breaking as soon as you find that rest cases will not apply. Try to have a list where 1st element is match and rest 38000 - 1 elements do not match, you will the second will beat the first.