I have a really large list and while processing it I want to know which index I am on during the process.
A simple example:
l = ['a','b','c']
[ print(i), char.capitalize() for i,char in enumerate(l)]
But here i is not defined. Is it possible to print and run some logic in list comprehension?
Update: Seems normal for loop is the way. fyi, my motivation is to use asyncio.gather, for which I've only seen examples in list comprehension, e.g.
async def gather():
await asyncio.gather(*[slowtask() for _ in range(10)])
All of the comment are good but you must consider that print function is an "io" operation in operation systems and is very slow. If your list is too big, this function make your code very slow.
I think that because of long runtime you want to see the index of your list.
If it is true, I advise you to use from pandas package. Convert your list to a pandas series and use from pandas methods for your computation. Pandas is very very fast and optimize your computation.
Related
I want to perform calculations on a list and assign this to a second list, but I want to do this in the most efficient way possible as I'll be using a lot of data. What is the best way to do this? My current version uses append:
f=time_series_data
output=[]
for i, f in enumerate(time_series_data):
if f > x:
output.append(calculation with f)
etc etc
should I use append or declare the output list as a list of zeros at the beginning?
Appending the values is not slower compared to other ways possible to accomplish this.
The code looks fine and creating a list of zeroes would not help any further. Although it can create problems as you might not know how many values will pass the condition f > x.
Since you wrote etc etc I am not sure how long or what operations you need to do there. If possible try using list comprehension. That would be a little faster.
You can have a look at below article which compared the speed for list creation using 3 methods, viz, list comprehension, append, pre-initialization.
https://levelup.gitconnected.com/faster-lists-in-python-4c4287502f0a
Please look at this piece of code :
sig_array=[]
...
for i in range (0, 2):
....
temp=[]
for k in range (0, len (sig)):
#print (k)
temp.append(downsample(sig[k],sampl, new_freq))
sig_array.append(temp)
In other words, tempis a list of arrays (my downsamplefunction, as its name may suggest, return an array) and then the temp will be agregated so it would be a list of lists of arrays !
My questions are : How to deal with that (indexing, ...) and is there simplest way to proceed, by generating list of arrays in a loop but how to keep it in a data structure ?
Thanks
Regarding indexing, you'd just refer to elements like sig_array[0], sig_array[1][2] or sig_array[3][0][2] etc.
Regarding any better data structures, it really just depends on your use case. As #smagnan says in the comments, are you using it for easily accessing data? Matrix processing? If so, have a look at numpy ndarrays. You say that you need it for big data on time series analysis. In that case, using the pandas module will be quite helpful (more info).
Also, as #Bazingaa says, you can make your code less verbose by using list comprehensions (more info):
sig_array = [ [downsample(sig[i],sampl, new_freq) for i in range (len(sig))] for _ in range(2)]
With list comprehensions, it's best to start from the outside, and from the end. The for _ in range(2) will run twice (I've replaced your i with _ as I couldn't see you using it anywhere. If you need it, replace _ with a relevant variable name). In each iteration, it'll append the inner list comprehension to the sig_array. Inside the inner listcomp, the result of the downsample() function will be appended to the temporary list for each iteration of the for loop,
This will have exactly the same output as your code, but is clearly way shorter :)
This statement is running quite slowly, and I have run out of ideas to optimize it. Could someone help me out?
[dict(zip(small_list1, small_list2)) for small_list2 in really_huge_list_of_list]
The small_lists contain only about 6 elements.
A really_huge_list_of_list of size 209,510 took approximately 16.5 seconds to finish executing.
Thank you!
Edit:
really_huge_list_of_list is a generator. Apologies for any confusion.
The size is obtained from the result list.
Possible minor improvement:
[dict(itertools.izip(small_list1, small_list2)) for small_list2 in really_huge_list_of_list]
Also, you may consider to use generator instead of list comprehensions.
To expand on what the comments are trying to say, you should use a generator instead of that list comprehension. Your code currently looks like this:
[dict(zip(small_list1, small_list2)) for small_list2 in really_huge_list_of_list]
and you should change it to this instead:
def my_generator(input_list_of_lists):
small_list1 = ["wherever", "small_list1", "comes", "from"]
for small_list2 in input_list_of_lists:
yield dict(zip(small_list1, small_list2))
What you're doing right now is taking ALL the results of iterating over your really huge list, and building up a huge list of the results, before doing whatever you do with that list of results. Instead, you should turn that list comprehension into a generator so that you never have to build up a list of 200,000 results. It's building that result list that's taking up so much memory and time.
... Or better yet, just turn that list comprehension into a generator comprehension by changing its outer brackets into parentheses:
(dict(zip(small_list1, small_list2)) for small_list2 in really_huge_list_of_list)
That's really all you need to do. The syntax for list comprehensions and generator comprehensions is almost identical, on purpose: if you understand a list comprehension, you'll understand the corresponding generator comprehension. (In this case, I wrote out the generator in "long form" first so that you'd see what that comprehension expands to).
For more on generator comprehensions, see here, here and/or here.
Hope this helps you add another useful tool to your Python toolbox!
It is illegal to use assignment in map function, such as
map(lambda in: test[in]+=value[in], somelist)
So what is a good alternative for it. You could use for loop to do this, but it seems to me when facing large scale, the for loop solution is very slow, is there a better way?
Use this, it's preferred:
for i in somelist:
test[i] += value[i]
And anyway, your example is not a good case for using map. You use map or even better, list comprehensions, when you want to create a new list as a result. In this case an assignment is being performed over each item, so there's no point in creating a new list here!
If you don't mind using numpy (and I don't see why you would), then this should be a lot more performant:
test[somelist] += value[somelist]
assuming you have first converted your lists to numpy arrays (negligible overhead):
import numpy as np
test = np.array(test)
value = np.array(value)
Note that this would also work with the pure assignment operator =, besides -=, *= etc.
print sum(1 for x in alist if x[1] == 8)
This code runs fine, but it is so slow. Is there a way better than this. Because, my list is very large and the computation takes a lot of time. Do you know a better and faster way to do it?
You'd have to create indexes or cached counts to speed up such code; trade memory for speed.
Wherever you handle your list (add to it, remove from it, edit entries) you also maintain your indices. For example, if you had a counts dict with ids as keys and their frequency as values, all you had to do is look up the count directly, and ensure that the counts stayed up-to-date as you manipulate alist.
The best way to manage this is by encapsulating your list in a custom type, so that you can control all manipulations of the data structure and maintain the extra information.
Not sure how much faster it would be but
len([x for x in alist if x[1] == 8])
is a little clearer.
I would use numpy. My numpy skills are a little bit rusty, but len(np_array == 8) would give you what you need for a single depth array. I think for you it would be len(np_array[:,1]) but I would have to check (this assumes your problem could use numpy arrays)