Preallocate very large array in Python leads to MemoryError - python

I am trying to preallocate a list in python
c=[1]*mM #preallocate array
My Problem is that I run in to a MemoryError since
mM=4999999950000000
What is the best way to deal with this. I am thinking about creating a new object where is split my list at about a value of 500000000.
Is this what I should do or is there a best practice to create an array with a lot of inputs?

Using a Generator
You are attempting to create an object that you very likely will not be able to fit into your computer's memory. If you truly need to represent a list of that length, you can use a generator that dynamically produces values as they are needed.
def ones_generator(length):
for _ in range(length):
yield 1
gen = ones_generator(4999999950000000)
for i in gen:
print(i) # prints 1, a lot
Note: The question is tagged for Python 3, but if you are using Python 2.7, you will want to use xrange instead of range.
Using a Dictionary
By the sound of your question, you do not actually need to preallocate a list of that length, but you want to store values very sparsely at indexes that are very large. This pattern matches the dict type in Python more so than the list. You can simply store values in a dictionary, without pre-allocating they keys/space, Python handles that under the hood for you.
dct = {}
dct[100000] = "A string"
dct[592091] = 123
dct[4999999950000000] = "I promise, I need to be at this index"
print(dct[4999999950000000])
# I promise, I need to be at this index
In that example, I just stored str and int values, but they can be any object in Python. The best part about this is that this dictionary will not consume memory based on the maximum index (like a list would) but instead based on how many values are stored within it.

Related

How do I get every 6-element permutation with repetition in Python?

I want to create a list of all possible 6-element permutations from "abcdefghijklmnopqrstuvwxyz0123456789" so for example it should output:
['aaaaaa','aaaaab','aaaaac'...,'aaaaa0','aaaaa1'...,'aaaaba','aaaabb'...] and so on.
This is what I tried:
import itertools
dictionary = 'abcdefghijklmnopqrstuvwxyz0123456789'
print(list(itertools.product(dictionary, repeat=6)))
but I ran into a MemoryError and then my computer froze up completely, so is there a more efficient way to compute this list?
(I'm using Python 3.8 64-bit)
Do you know how long your list would be? It is 36**6 = 2176782336 items. A bit too much to hold in memory. You should have used a generator:
dictionary = 'abcdefghijklmnopqrstuvwxyz0123456789'
for x in itertools.product(dictionary, repeat=6):
print(''.join(x))
The size of the permutation is huge: 36^6! That's 2176782336 strings. A 6-char string in python is already relatively large due to how python stores separate objects.
from sys import getsizeof
getsizeof('aaaaaa') # 55
At 55 bytes per string, the whole list is almost 120 Gigabytes. You probably don't much have memory on your machine.
If you try to convert this iterator to a list, it will generate all permutations at once. What you can do instead is to use the iterator returned by itertools.product(dictionary, repeat=6) without converting it to a list.
for s in itertools.product(dictionary, repeat=6):
# Do something with the string, such as writing it to a file.
Without knowing what you are trying to do with the product, I can't specifically tell you how to optimize this. But I can still say that trying to convert this iterator to a list is a bad idea.

How to define multidimensional matrix with strings in Python?

I wish to store strings in multidimensional array. I tried using numpy package along with following line:
co_entity = np.zeros((5000,4))
However, I need to store strings later on. This matrix cannot be used to store strings as it has floats/int. I tried using list to store the strings but since the number of input is dynamic, I have to use multidimensional array with upper limit.
Any ideas for this?
You could try object type with empty() function like so
co_entity = np.empty((5000,4), dtype='object')
This will allow you to store a string in each of the elements generated.

Formatting a variable with another variable in Python

Let's suppose I have a for loop. After each loop, I get a result that I want to store in a list.
I want to write something like this but it's not working:
for i in range(50):
"result_%d" %i = result
Supposing that result is a list containing the results after each loop.
I want to do that in order to have a different list for each result, so I could be able to use them after the loop is finished.
Is there any way to do that?
Note: I thought about storing all the result lists in one big list. But won't that be heavy for the code? Noting that each result list has a size of 60.
I thought about storing all the result arrays in one big array. But won't that be heavy for the code?
In Python, everything is an object and names and list contents are just references to them. Creating new list objects containing references to existing values is quite lightweight really.
Don't try to create dynamic variables, just store your results in another list or a dictionary.
In this case that's as easy as:
results = []
for i in range(50):
# do things
results.append(result)
and results is then a list with 50 references to other list objects. That's no different from having 50 names referencing those 50 list objects, other than that it is much easier to address them now.

How to grow a list to fit a given capacity in Python

I'm a Python newbie. I have a series of objects that need to be inserted at specific indices of a list, but they come out of order, so I can't just append them. How can I grow the list whenever necessary to avoid IndexErrors?
def set(index, item):
if len(nodes) <= index:
# Grow list to index+1
nodes[index] = item
I know you can create a list with an initial capacity via nodes = (index+1) * [None] but what's the usual way to grow it in place? The following doesn't seem efficient:
for _ in xrange(len(nodes), index+1):
nodes.append(None)
In addition, I suppose there's probably a class in the Standard Library that I should be using instead of built-in lists?
This is the best way to of doing it.
>>> lst.extend([None]*additional_size)
oops seems like I misunderstood your question at first. If you are asking how to expand the length of a list so you can insert something at an index larger than the current length of the list, then lst.extend([None]*(new_size - len(lst)) would probably be the way to go, as others have suggested. Of course, if you know in advance what the maximum index you will be needing is, it would make sense to create the list in advance and fill it with Nones.
For reference, I leave the original text: to insert something in the middle of the existing list, the usual way is not to worry about growing the list yourself. List objects come with an insert method that will let you insert an object at any point in the list. So instead of your set function, just use
lst.insert(item, index)
or you could do
lst[index:index] = item
which does the same thing. Python will take care of resizing the list for you.
There is not necessarily any class in the standard library that you should be using instead of list, especially if you need this sort of random-access insertion. However, there are some classes in the collections module which you should be aware of, since they can be useful for other situations (e.g. if you're always appending to one end of the list, and you don't know in advance how many items you need, deque would be appropriate).
Perhaps something like:
lst += [None] * additional_size
(you shouldn't call your list variable list, since it is also the name of the list constructor).

Parsing indeterminate amount of data into a python tuple

I have a config file that contains a list of strings. I need to read these strings in order and store them in memory and I'm going to be iterating over them many times when certain events take place. Since once they're read from the file I don't need to add or modify the list, a tuple seems like the most appropriate data structure.
However, I'm a little confused on the best way to first construct the tuple since it's immutable. Should I parse them into a list then put them in a tuple? Is that wasteful? Is there a way to get them into a tuple first without the overhead of copying/destroying the tuple every time I add a new element.
As you said, you're going to read the data gradually - so a tuple isn't a good idea after all, as it's immutable.
Is there a reason for not using a simple list for holding the strings?
Since your data is changing, I am not sure you need a tuple. A list should do fine.
Look at the following which should provide you further information. Assigning a tuple is much faster than assigning a list. But if you are trying to modify elements every now and then then creating a tuple may not make more sense.
Are tuples more efficient than lists in Python?
I wouldn't worry about the overhead of first creating a list and then a tuple from that list. My guess is that the overhead will turn out to be negligible if you measure it.
On the other hand, I would stick with the list and iterate over that instead of creating a tuple. Tuples should be used for struct like data and list for lists of data, which is what your data sounds like to me.
with open("config") as infile:
config = tuple(infile)
You may want to try using chained generators to create your tuple. You can use the generators to perform multiple filtering and transformation operations on your input without creating intermediate lists. All of the generator processing is delayed until iteration. In the example below the processing/iteration all happens on the last line.
Like so:
f = open('settings.cfg')
step1 = (tuple(i.strip() for i in l.split(':', 1)) for l in f if len(l) > 2 and ':' in l)
step2 = ((l[0], ',' in l[1] and 'Tag' in l[0] and l[1].split(',') or l[1]) for l in step1)
t = tuple(step2)

Categories