Here’s a quick example of I’m trying to do and the error I’m getting:
for symbol in itertools.product(list_a, repeat=8):
list_b.append(symbol)
I’m also afterwards excluding combinations from that list like so:
for combination in list_b:
valid_b = True
for symbols in range(len(list_exclude)):
if list_exclude[symbols] in combination:
valid_b = False
else:
pass
if valid_b:
new_list.append(combination)
I’ve heard somehow chunking the process might help, not sure how that could be done here though.
I’m using multiprocessing for this as well.
When I run it I get “MemoryError”
How would you go about it?
Don't pre-compute anything, especially not the first full list:
def symbols(lst, exclude):
for symbol in map(''.join, itertools.product(lst, repeat=8)):
if any(map(symbol.__contains__, exclude)):
continue
yield symbol
Now use the generator as you need to lazily evaluate the elements. Keep in mind that since it's pre-filtering the data, even list(symbols(list_a, list_exclude)) will he much cheaper than what you originally wrote.
Here is a breakdown of what happens:
itertools.product is a generator. That means that it produces an output without retaining a reference to any previous items. Each element it returns is a tuple containing some combination of the input elements.
Since you want to compare strings, you need to convert the tuples. Hence, ''.join. Mapping it onto each of the tuples that itertools.product produces converts those elements into strings. For example:
>>> ''.join(('$', '$', '&', '&', '♀', '#', '%', '$'))
'$$&&♀#%$'
Filtering each symbol thus created can be done by checking if any of the items in excludes are contained in it. You can do this with something like
[ex in symbol for ex in exclude]
The operation ... in symbol is implemented via the magic method symbol.__contains__. You can therefore map that method to every element of exclude.
Since the first element of exclude that is contained in symbol invalidates it, you don't need to check the remainder. This is called short-circuiting, and is implemented in the any function. Notice that because map is a generator, the remaining elements will actually not be computed once a match is found. This is different from using a list comprehension, which pre-computed all the elements.
Putting yield into your function turns it into a generator function. That means that when you call symbols(...), it returns a generator object that you can iterate over. This object does not pre-compute anything until you call next on it. So if you write the data to a file (for example), only the current result will be in memory at once. It may take a long time to write out a large number of results but your memory usage should not spike at all from it.
This little change i made could save you a bit of ram usage.
for combination in list_b:
valid_b = True
for symbols in list_exclude:
if symbols in combination:
valid_b = False
else:
pass
if valid_b:
new_list.append(combination)
I want to perform calculations on a list and assign this to a second list, but I want to do this in the most efficient way possible as I'll be using a lot of data. What is the best way to do this? My current version uses append:
f=time_series_data
output=[]
for i, f in enumerate(time_series_data):
if f > x:
output.append(calculation with f)
etc etc
should I use append or declare the output list as a list of zeros at the beginning?
Appending the values is not slower compared to other ways possible to accomplish this.
The code looks fine and creating a list of zeroes would not help any further. Although it can create problems as you might not know how many values will pass the condition f > x.
Since you wrote etc etc I am not sure how long or what operations you need to do there. If possible try using list comprehension. That would be a little faster.
You can have a look at below article which compared the speed for list creation using 3 methods, viz, list comprehension, append, pre-initialization.
https://levelup.gitconnected.com/faster-lists-in-python-4c4287502f0a
Please look at this piece of code :
sig_array=[]
...
for i in range (0, 2):
....
temp=[]
for k in range (0, len (sig)):
#print (k)
temp.append(downsample(sig[k],sampl, new_freq))
sig_array.append(temp)
In other words, tempis a list of arrays (my downsamplefunction, as its name may suggest, return an array) and then the temp will be agregated so it would be a list of lists of arrays !
My questions are : How to deal with that (indexing, ...) and is there simplest way to proceed, by generating list of arrays in a loop but how to keep it in a data structure ?
Thanks
Regarding indexing, you'd just refer to elements like sig_array[0], sig_array[1][2] or sig_array[3][0][2] etc.
Regarding any better data structures, it really just depends on your use case. As #smagnan says in the comments, are you using it for easily accessing data? Matrix processing? If so, have a look at numpy ndarrays. You say that you need it for big data on time series analysis. In that case, using the pandas module will be quite helpful (more info).
Also, as #Bazingaa says, you can make your code less verbose by using list comprehensions (more info):
sig_array = [ [downsample(sig[i],sampl, new_freq) for i in range (len(sig))] for _ in range(2)]
With list comprehensions, it's best to start from the outside, and from the end. The for _ in range(2) will run twice (I've replaced your i with _ as I couldn't see you using it anywhere. If you need it, replace _ with a relevant variable name). In each iteration, it'll append the inner list comprehension to the sig_array. Inside the inner listcomp, the result of the downsample() function will be appended to the temporary list for each iteration of the for loop,
This will have exactly the same output as your code, but is clearly way shorter :)
I have a list of ~30 floats. I want to see if a specific float is in my list. For example:
1 >> # For the example below my list has integers, not floats
2 >> list_a = range(30)
3 >> 5.5 in list_a
False
4 >> 1 in list_a
True
The bottleneck in my code is line 3. I search if an item is in my list numerous times, and I require a faster alternative. This bottleneck takes over 99% of my time.
I was able to speed up my code by making list_a a set instead of a list. Are there any other ways to significantly speed up this line?
The best possible time to check if an element is in list if the list is not sorted is O(n) because the element may be anywhere and you need to look at each item and check if it is what you are looking for
If the array was sorted, you could've used binary search to have O(log n) look up time. You also can use hash maps to have average O(1) lookup time (or you can use built-in set, which is basically a dictionary that accomplishes the same task).
That does not make much sense for a list of length 30, though.
In my experience, Python indeed slows down when we search something in a long list.
To complement the suggestion above, my suggestion will be subsetting the list, of course only if the list can be subset and the query can be easily assigned to the correct subset.
Example is searching for a word in an English dictionary, first subsetting the dictionary into 26 "ABCD" sections based on each word's initials. If the query is "apple", you only need to search the "A" section. The advantage of this is that you have greatly limited the search space and hence the speed boost.
For numerical list, either subset it based on range, or on the first digit.
Hope this helps.
Say I have a list x with unkown length from which I want to randomly pop one element so that the list does not contain the element afterwards. What is the most pythonic way to do this?
I can do it using a rather unhandy combincation of pop, random.randint, and len, and would like to see shorter or nicer solutions:
import random
x = [1,2,3,4,5,6]
x.pop(random.randint(0,len(x)-1))
What I am trying to achieve is consecutively pop random elements from a list. (i.e., randomly pop one element and move it to a dictionary, randomly pop another element and move it to another dictionary, ...)
Note that I am using Python 2.6 and did not find any solutions via the search function.
What you seem to be up to doesn't look very Pythonic in the first place. You shouldn't remove stuff from the middle of a list, because lists are implemented as arrays in all Python implementations I know of, so this is an O(n) operation.
If you really need this functionality as part of an algorithm, you should check out a data structure like the blist that supports efficient deletion from the middle.
In pure Python, what you can do if you don't need access to the remaining elements is just shuffle the list first and then iterate over it:
lst = [1,2,3]
random.shuffle(lst)
for x in lst:
# ...
If you really need the remainder (which is a bit of a code smell, IMHO), at least you can pop() from the end of the list now (which is fast!):
while lst:
x = lst.pop()
# do something with the element
In general, you can often express your programs more elegantly if you use a more functional style, instead of mutating state (like you do with the list).
You won't get much better than that, but here is a slight improvement:
x.pop(random.randrange(len(x)))
Documentation on random.randrange():
random.randrange([start], stop[, step])
Return a randomly selected element from range(start, stop, step). This is equivalent to choice(range(start, stop, step)), but doesn’t actually build a range object.
To remove a single element at random index from a list if the order of the rest of list elements doesn't matter:
import random
L = [1,2,3,4,5,6]
i = random.randrange(len(L)) # get random index
L[i], L[-1] = L[-1], L[i] # swap with the last element
x = L.pop() # pop last element O(1)
The swap is used to avoid O(n) behavior on deletion from a middle of a list.
despite many answers suggesting use random.shuffle(x) and x.pop() its very slow on large data. and time required on a list of 10000 elements took about 6 seconds when shuffle is enabled. when shuffle is disabled speed was 0.2s
the fastest method after testing all the given methods above was turned out to be written by #jfs
import random
L = [1,"2",[3],(4),{5:"6"},'etc'] #you can take mixed or pure list
i = random.randrange(len(L)) # get random index
L[i], L[-1] = L[-1], L[i] # swap with the last element
x = L.pop() # pop last element O(1)
in support of my claim here is the time complexity chart from this source
IF there are no duplicates in list,
you can achieve your purpose using sets too. once list made into set duplicates will be removed. remove by value and remove random cost O(1), ie very effecient. this is the cleanest method i could come up with.
L=set([1,2,3,4,5,6...]) #directly input the list to inbuilt function set()
while 1:
r=L.pop()
#do something with r , r is random element of initial list L.
Unlike lists which support A+B option, sets also support A-B (A minus B) along with A+B (A union B)and A.intersection(B,C,D). super useful when you want to perform logical operations on the data.
OPTIONAL
IF you want speed when operations performed on head and tail of list, use python dequeue (double ended queue) in support of my claim here is the image. an image is thousand words.
Here's another alternative: why don't you shuffle the list first, and then start popping elements of it until no more elements remain? like this:
import random
x = [1,2,3,4,5,6]
random.shuffle(x)
while x:
p = x.pop()
# do your stuff with p
I know this is an old question, but just for documentation's sake:
If you (the person googling the same question) are doing what I think you are doing, which is selecting k number of items randomly from a list (where k<=len(yourlist)), but making sure each item is never selected more than one time (=sampling without replacement), you could use random.sample like #j-f-sebastian suggests. But without knowing more about the use case, I don't know if this is what you need.
One way to do it is:
x.remove(random.choice(x))
While not popping from the list, I encountered this question on Google while trying to get X random items from a list without duplicates. Here's what I eventually used:
items = [1, 2, 3, 4, 5]
items_needed = 2
from random import shuffle
shuffle(items)
for item in items[:items_needed]:
print(item)
This may be slightly inefficient as you're shuffling an entire list but only using a small portion of it, but I'm not an optimisation expert so I could be wrong.
This answer comes courtesy of #niklas-b:
"You probably want to use something like pypi.python.org/pypi/blist "
To quote the PYPI page:
...a list-like type with better asymptotic performance and similar
performance on small lists
The blist is a drop-in replacement for the Python list that provides
better performance when modifying large lists. The blist package also
provides sortedlist, sortedset, weaksortedlist, weaksortedset,
sorteddict, and btuple types.
One would assume lowered performance on the random access/random run end, as it is a "copy on write" data structure. This violates many use case assumptions on Python lists, so use it with care.
HOWEVER, if your main use case is to do something weird and unnatural with a list (as in the forced example given by #OP, or my Python 2.6 FIFO queue-with-pass-over issue), then this will fit the bill nicely.