Let's suppose I have a for loop. After each loop, I get a result that I want to store in a list.
I want to write something like this but it's not working:
for i in range(50):
"result_%d" %i = result
Supposing that result is a list containing the results after each loop.
I want to do that in order to have a different list for each result, so I could be able to use them after the loop is finished.
Is there any way to do that?
Note: I thought about storing all the result lists in one big list. But won't that be heavy for the code? Noting that each result list has a size of 60.
I thought about storing all the result arrays in one big array. But won't that be heavy for the code?
In Python, everything is an object and names and list contents are just references to them. Creating new list objects containing references to existing values is quite lightweight really.
Don't try to create dynamic variables, just store your results in another list or a dictionary.
In this case that's as easy as:
results = []
for i in range(50):
# do things
results.append(result)
and results is then a list with 50 references to other list objects. That's no different from having 50 names referencing those 50 list objects, other than that it is much easier to address them now.
Related
I want to perform calculations on a list and assign this to a second list, but I want to do this in the most efficient way possible as I'll be using a lot of data. What is the best way to do this? My current version uses append:
f=time_series_data
output=[]
for i, f in enumerate(time_series_data):
if f > x:
output.append(calculation with f)
etc etc
should I use append or declare the output list as a list of zeros at the beginning?
Appending the values is not slower compared to other ways possible to accomplish this.
The code looks fine and creating a list of zeroes would not help any further. Although it can create problems as you might not know how many values will pass the condition f > x.
Since you wrote etc etc I am not sure how long or what operations you need to do there. If possible try using list comprehension. That would be a little faster.
You can have a look at below article which compared the speed for list creation using 3 methods, viz, list comprehension, append, pre-initialization.
https://levelup.gitconnected.com/faster-lists-in-python-4c4287502f0a
I am trying to preallocate a list in python
c=[1]*mM #preallocate array
My Problem is that I run in to a MemoryError since
mM=4999999950000000
What is the best way to deal with this. I am thinking about creating a new object where is split my list at about a value of 500000000.
Is this what I should do or is there a best practice to create an array with a lot of inputs?
Using a Generator
You are attempting to create an object that you very likely will not be able to fit into your computer's memory. If you truly need to represent a list of that length, you can use a generator that dynamically produces values as they are needed.
def ones_generator(length):
for _ in range(length):
yield 1
gen = ones_generator(4999999950000000)
for i in gen:
print(i) # prints 1, a lot
Note: The question is tagged for Python 3, but if you are using Python 2.7, you will want to use xrange instead of range.
Using a Dictionary
By the sound of your question, you do not actually need to preallocate a list of that length, but you want to store values very sparsely at indexes that are very large. This pattern matches the dict type in Python more so than the list. You can simply store values in a dictionary, without pre-allocating they keys/space, Python handles that under the hood for you.
dct = {}
dct[100000] = "A string"
dct[592091] = 123
dct[4999999950000000] = "I promise, I need to be at this index"
print(dct[4999999950000000])
# I promise, I need to be at this index
In that example, I just stored str and int values, but they can be any object in Python. The best part about this is that this dictionary will not consume memory based on the maximum index (like a list would) but instead based on how many values are stored within it.
I am using Python 3.5 to create a set of generators to parse a set of opened files in order to cherry pick data from those files to construct an object I plan to export later. I was originally parsing through the entirety of each file and creating a list of dictionary objects before doing any analysis, but this process would take up to 30 seconds sometimes, and since I only need to work with each line of each file only once, I figure its a great opportunity to use a generator. However, I feel that I am missing something conceptually with generators, and perhaps the mutability of objects within a generator.
My original code that makes a list of dictionaries goes as follows:
parsers = {}
# iterate over files in the file_name file to get their attributes
for dataset, data_file in files.items():
# Store each dataset as a list of dictionaries with keys that
# correspond to the attributes of that dataset
parsers[dataset] = [{attributes[dataset][i]: value.strip('~')
for i, value in enumerate(line.strip().split('^'))}
for line
in data_file]
And I access the the list by calling:
>>>parsers['definitions']
And it works as expected returning a list of dictionaries. However when I convert this list into a generator, all sorts of weirdness happens.
parsers = {}
# iterate over files in the file_name file to get their attributes
for dataset, data_file in files.items():
# Store each dataset as a list of dictionaries with keys that
# correspond to the attributes of that dataset
parsers[dataset] = ({attributes[dataset][i]: value.strip('~')
for i, value in enumerate(line.strip().split('^'))}
for line
in data_file)
And I call it by using:
>>> next(parsers['definitions'])
Running this code returns an index out of range error.
The main difference I can see between the two code segments is that in the list comprehension version, python constructs the list from the file and moves on without needing to store the comprehensions variables for later use.
Conversely, in the generator expression the variables defined within the generator need to be stored with the generator, as they effect each successive call of the generator later in my code. I am thinking that perhaps the variables inside the generator are sharing a namespace with the other generators my code creates, and so each generator has erratic behavior based on whatever generator expression was run last, and therefore set the values of the variables last.
I appreciate any thoughts as to the reason for this issue!
I assume that the problem is when you're building the dictionaries.
attributes[dataset][i]
Note that with the list version, dataset is whatever dataset was at that particular turn of the for loop. However, with the generator, that expression isn't evaluated until after the for loop has completed, so dataset will have the value of the last dataset from the files.items() loop...
Here's a super simple demo that hopefully elaborates on the problem:
results = []
for a in [1, 2, 3]:
results.append(a for _ in range(3))
for r in results:
print(list(r))
Note that we always get [3, 3, 3] because when we take the values from the generator, the value of a is 3.
I'm working with the Flask framework in Python, and need to hand off a list of lists to a renderer.
I step through a loop and create a list, sort it, append it to another list, then call the render function with the masterlist, like so:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
candidateArray.sort()
multiCandidateArray.append(candidateArray)
renderPage(multiCandidateArray)
My problem is that I need to clear the candidateArray and create a new one each time through the loop, but it looks like the candidateArray that I append to the multiCandidateArray is actually a pointer, not the values themselves.
When I do this:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
candidateArray.sort()
multiCandidateArray.append(candidateArray)
**del candidateArray[:]**
renderPage(multiCandidateArray)
I end up with no values.
Is there a way to handle this situation that I'm missing?
I would probably go with something like:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
multiCandidateArray.append(sorted(candidateArray))
No need to del anything here, and sorted returns a new list, so even if FindLowestPrices is for some reason returning references to the same list (which is unlikely), then you'll still have unique lists in the multiCandidateArray (although your unique lists could hold references to the same objects).
Your code already creates a new one each time through the loop.
candidateArray = findLowestPrices(...)
This assigns a new list to the variable, candidateArray. It should work fine.
When you do this:
del candidateArray[:]
...you're deleting the contents of the same list you just appended to the master list.
Don't think about pointers or variables; just think about objects, and remember nothing in Python is ever implicitly copied. A list is an object. At the end of the loop, candidateArray names the same list object as multiCandidateArray[-1]. They're different names for the same thing. On the next run through the loop, candidateArray becomes a name for a new list as produced by findLowestPrices, and the list at the end of the master list is unaffected.
I've written about this before; the C way of thinking about variables as being predetermined blocks of memory just doesn't apply to Python at all. Names are moved onto values, rather than values being copied into some fixed number of buckets.
(Also, nitpicking, but Python code generally uses under_scores and doesn't bother with types in names unless it's really ambiguous. So you might have candidates and multi_candidates. Definitely don't call anything an "array", since there's an array module in the standard library that does something different and generally not too useful. :))
I have a config file that contains a list of strings. I need to read these strings in order and store them in memory and I'm going to be iterating over them many times when certain events take place. Since once they're read from the file I don't need to add or modify the list, a tuple seems like the most appropriate data structure.
However, I'm a little confused on the best way to first construct the tuple since it's immutable. Should I parse them into a list then put them in a tuple? Is that wasteful? Is there a way to get them into a tuple first without the overhead of copying/destroying the tuple every time I add a new element.
As you said, you're going to read the data gradually - so a tuple isn't a good idea after all, as it's immutable.
Is there a reason for not using a simple list for holding the strings?
Since your data is changing, I am not sure you need a tuple. A list should do fine.
Look at the following which should provide you further information. Assigning a tuple is much faster than assigning a list. But if you are trying to modify elements every now and then then creating a tuple may not make more sense.
Are tuples more efficient than lists in Python?
I wouldn't worry about the overhead of first creating a list and then a tuple from that list. My guess is that the overhead will turn out to be negligible if you measure it.
On the other hand, I would stick with the list and iterate over that instead of creating a tuple. Tuples should be used for struct like data and list for lists of data, which is what your data sounds like to me.
with open("config") as infile:
config = tuple(infile)
You may want to try using chained generators to create your tuple. You can use the generators to perform multiple filtering and transformation operations on your input without creating intermediate lists. All of the generator processing is delayed until iteration. In the example below the processing/iteration all happens on the last line.
Like so:
f = open('settings.cfg')
step1 = (tuple(i.strip() for i in l.split(':', 1)) for l in f if len(l) > 2 and ':' in l)
step2 = ((l[0], ',' in l[1] and 'Tag' in l[0] and l[1].split(',') or l[1]) for l in step1)
t = tuple(step2)