I have a non-empty set S and every s in S has an attribute s.x which I know is independent of the choice of s. I'd like to extract this common value a=s.x from S. There is surely something better than
s=S.pop()
a=s.x
S.add(s)
-- maybe that code is fast but surely I shouldn't be changing S?
Clarification: some answers and comments suggest iterating over all of S. The reason I want to avoid this is that S might be huge; my method above will I think run quickly however large S is; my only issue with it is that S changes, and I see no reason that I need to change S.
This is almost but not quite the same as this question on getting access to an element of a set when there's only one-- there are solutions which apply there which won't work here, and others which work but are inefficient. But the general trick of using next(iter(something_iterable)) to nondestructively get an element still applies:
>>> S = {1+2j, 2+2j, 3+2j}
>>> next(iter(S))
(2+2j) # Note: could have been any element
>>> next(iter(S)).imag
2.0
Related
I'm a sucker for reducing code to its bare minimum and love keeping it short and slim, but occasionally I get into the dilemma of whether I'm doing more harm than good. Below is an example of a situation I frequently encounter and where I start pondering if I am minifying at the expense of speed.
str = "my name is john"
##Alternative 1
for el in str.split(" "):
print(el)
##Alternative 2
splittedStr = str.split(" ")
for el in splittedStr:
print(el)
What is faster? I'd assume it's the second one because we don't split the string after every iteration (not even sure we do that)?
str.split(" ") does the exact same thing in both cases. It creates an anonymous list of the split strings. In the second case you have the minor overhead of assigning it to a variable and then fetching the value of the variable. Its wasted time if you don't need to keep the object for other reasons. But this is a trivial amount of time compared to other object referencing taking place in the same loop. Alternative 2 also leaves the data in memory which is another small performance issue.
The real reason Alternative 1 is better than 2, IMHO, is that it doesn't leave the hint that splittedStr is going to be needed later.
Look my friend, if you want to actually reduce the amount of time in the code in general,loop on a tuple instead of list but assigning the result in a variable then using the variable is not the best approach is you just reserved a memory location just to store the value but sometimes you can do that just for the sake of having a clean code like if you have more than one operation in one line like
min(str.split(mylist)[3:10])
In this case, it is better to have a variable called min_value for example just to make things cleaner.
returning back to the performance issue, you could actually notice the difference in performance if you loop through a list or a tuple like
This is looping through a tuple
for i in (1,2,3):
print(i)
& This is looping through a list
for i in [1,2,3]:
print(i)
you will find that using tuple will be faster !
Many of us know that, enumerate is being using in a situation you use the for loop and need to know the index. However, it has its downsides. According to my tests with the timeit module, just using enumerate makes the code 2x slower. Adding this a tuple assignment makes it slower up to 3x. These numbers may come as fast enough for any programmer, but people dealing with algorithms know that every bit of code you can optimize, is a huge advantage. Now to my question,
An example of this usage would be, the need of finding indexes of multiple elements in a list. Say that there is two elements we need to find. The first two solutions that occur to me is like so:
x, y = 0, 0
for ind, val in enumerate(lst):
if x and y:
break
if val == "a":
x = ind
elif val == "b":
y = ind
The solution above iterates the list, assign the values, than break if the two is found.
x = lst.index("a")
y = lst.index("b")
This is an other solution, which I didn't want to use because it appeared really naive. It iterates over the same list twice, to find two elements. The first solution, does this in a single iteration. So by complexity terms, even though we make extra assignments in the first solution, it should be faster than the second one in larger lists. But my assumption failed.
Here is the code I tested the performance: https://codeshare.io/XfvGA
The second solution was 2x to 10x faster than the first one, changing with the position of these two elements. There are several possibilities which this would occur.
There is an optimization in index() method that I am unaware of.
Lower level assignments being made in index() method. Possible use of C++ code.
The conditions and extra assignments in the first solution, makes it slower than expected.
Even these reasons fall short of explaining the speed of iterating the list twice over iterating it once. Though languages have much difference in time while running code, iteration process itself is independant from the programming language, if you need to check a million elements, you still have to check a million elements (Could be exampled by map() being not much faster than using a loop to change values).
So though I need you to examine the cases I presented, in order to clarify what is being asked here, question can be put together like this. We know that Python's for loop is actually a while running in background (possibly in C ?). So this means, the index is being stored as it is incremented somewhere in the memory. If there was a way to access it, this would eliminate the cost of calling and unpacking enumerate. My question is:
Is there such a way exists ?, If not, could be made (why, or why not) ?
The sources I used for more information on the subject:
Python speed
Python objects time complexity
Performance tips for Python
I dont think that the enumerate is the problem, to prove this you can do:
x, y = 0, 0
for val in a:
if x and y:
break
if val == "a":
x = val
elif val == "b":
y = val
This doesnt do the same thing you wanted in the first place (you dont get the index) but if you messure it with timeit, you will find that the diffrence is not so significant, meaning that the enumerate is not the source of the problem ( in my case it was 0.185 to 0.155 when running your example, so it is faster but the second solution got 0.055 at my computer )
The reason that lst.index is faster is that it is implemented in C .
You can see it's source code here:
https://svn.python.org/projects/python/trunk/Objects/listobject.c
the index function is called listindex in this file and is defined like
static PyObject *
listindex(PyListObject *self, PyObject *args)
( i couldnt find a way to add a link directly to the function )
You are trying to be un-Pythonic, which isn't going to end terribly well for you. If you really need to have that iterator count information available, there is a well-known and optimized way to do that: enumerate(). If you need to find an item in a list, there is a well-known and optimized way to do that: lst.index(). As DorElias showed above/below, enumerate is not the problem, it's that you're attempting to reinvent the wheel with the rest of your for loop. enumerate is going to be the best-supported (clearest, fastest, etc.) way to maintain an iteration count in every situation where an iteration count is actually the thing you need.
Suppose I have a list called icecream_flavours, and two lists called new_flavours and unavailable. I want to remove the elements in flavours that appear in 'unavailable', and add those in new_flavours to the original one. I wrote the following program:
for i in unavailable:
icecream_flavours.remove(i)
for j in new_flavours:
icecream_flavours.append(j)
the append one is fine, but it keeps showing 'ValueError: list.remove(x): x not in list' for the first part of the program. What's the problem?
thanks
There are two possibilities here.
First, maybe there should never be anything in unavailable that wasn't in icecream_flavours, but, because of some bug elsewhere in your program, that isn't true. In that case, you're going to need to debug where things first go wrong, whether by running under the debugger or by adding print calls all over the code. At any rate, since the problem is most likely in code that you haven't shown us here, we can't help if that's the problem.
Alternatively, maybe it's completely reasonable for something to appear in unavailable even though it's not in icecream_flavours, and in that case you just want to ignore it.
That's easy to do, you just need to write the code that does it. As the docs for list.remove explain, it:
raises ValueError when x is not found in s.
So, if you want to ignore cases when i is not found in icecream_flavours, just use a try/except:
for i in unavailable:
try:
icecream_flavours.remove(i)
except ValueError:
# We already didn't have that one... which is fine
pass
That being said, there are better ways to organize your code.
First, using the right data structure always makes things easier. Assuming you don't want duplicate flavors, and the order of flavors doesn't matter, what you really want here is sets, not lists. And if you had sets, this would be trivial:
icecream_flavours -= unavailable
icecream_flavours |= new_flavours
Even if you can't do that, it's usually simpler to create a new list than to mutate one in-place:
icecream_flavours = [flavour for flavour in icecream_flavours
if flavour not in set(unavailable)]
(Notice that I converted unavailable to a set, so we don't have to brute-force search for each flavor in a list.)
Either one of these changes makes the code shorter, and makes it more efficient. But, more importantly, they both make the code easier to reason about, and eliminate the possibility of bugs like the one you're trying to fix.
To add all the new_flavours that are not unavailable, you can use a list comprehension, then use the += operator to add it to the existing flavors.
icecream_flavours += [i for i in new_flavours if i not in unavailable]
If there are already flavors in the original list you want to remove, you can remove them in the same way
icecream_flavours = [i for i in icecream_flavours if i not in unavailable]
If you first want to remove all the unavailable flavours from icecream_flavours and then add the new flavours, you can use this list comprehension:
icecream_flavours = [i for i in icecream_flavours if i not in unavailable] + new_flavours
Your error is caused because unavailable contains flavours that are not in icecream_flavours.
Unless order is important, you could use set instead of list as they have operations for differences and unions and you don't need to worry about duplicates
If you must use lists, a list comprehension is a better way to filter the list
icecream_flavours = [x for x in icecream_flavours if x not in unavaiable]
You can extend the list of flavours like this
icecream_flavours += new_flavours
assuming there are no duplicates.
I have been looking at Pandas: run length of NaN holes, and this code fragment from the comments in particular:
Series([len(list(g)) for k, g in groupby(a.isnull()) if k])
As a python newbie, I am very impressed by the conciseness but not sure how to read this. Is it short for something along the lines of
myList = []
for k, g in groupby(a.isnull()) :
if k:
myList.append(len(list(g)))
Series(myList)
In order to understand what is going on I was trying to play around with it but get an error:
list object is not callable
so not much luck there.
It would be lovely if someone could shed some light on this.
Thanks,
Anne
You've got the translation correct. However, the code you give cannot be run because a is a free variable.
My guess is that you are getting the error because you have assigned a list object to the name list. Don't do that, because list is a global name for the type of a list.
Also, in future please always provide a full stack trace, not just one part of it. Please also provide sufficient code that at least there are no free variables.
If that is all of your code, then you have only a few possibilities:
myList.append is really a list
len is really a list
list is really a list
isnull is really a list
groupby is really a list
Series is really a list
The error exists somewhere behind groupby.
I'm going to go ahead and strike out myList.append (because that is impossible unless you are using your own groupby function for some reason) and Series. Unless you are importing Series from somewhere strange, or you are re-assigning the variable, we know Series can't be a list. A similar argument can be made for a.isnull.
So that leaves us with two real possibilities. Either you have re-assigned something somewhere in your script to be a list where it shouldn't be, or the error is behind groupby.
I think you're using the wrong groupby itertools.groupby takes and array or list as an argument, groupby in pandas may evaluate the first argument as a function. I especially think this because isnull() returns an array-like object.
I was trying to write an answer to this question and was quite surprised to find out that there is no find method for lists, lists have only the index method (strings have find and index).
Can anyone tell me the rationale behind that?
Why strings have both?
I don't know why or maybe is buried in some PEP somewhere, but i do know 2 very basic "find" method for lists, and they are array.index() and the in operator. You can always make use of these 2 to find your items. (Also, re module, etc)
I think the rationale for not having separate 'find' and 'index' methods is they're not different enough. Both would return the same thing in the case the sought item exists in the list (this is true of the two string methods); they differ in case the sought item is not in the list/string; however you can trivially build either one of find/index from the other. If you're coming from other languages, it may seem bad manners to raise and catch exceptions for a non-error condition that you could easily test for, but in Python, it's often considered more pythonic to shoot first and ask questions later, er, to use exception handling instead of tests like this (example: Better to 'try' something and catch the exception or test if its possible first to avoid an exception?).
I don't think it's a good idea to build 'find' out of 'index' and 'in', like
if foo in my_list:
foo_index = my_list.index(foo)
else:
foo_index = -1 # or do whatever else you want
because both in and index will require an O(n) pass over the list.
Better to build 'find' out of 'index' and try/catch, like:
try:
foo_index = my_list.index(foo)
catch ValueError:
foo_index = -1 # or do whatever else you want
Now, as to why list was built this way (with only index), and string was built the other way (with separate index and find)... I can't say.
The "find" method for lists is index.
I do consider the inconsistency between string.find and list.index to be unfortunate, both in name and behavior: string.find returns -1 when no match is found, where list.index raises ValueError. This could have been designed more consistently. The only irreconcilable difference between these operations is that string.find searches for a string of items, where list.index searches for exactly one item (which, alone, doesn't justify using different names).