I'm a new learner of python/programming. Here is a question on top of head about the use of function in python.
If I had a list called myList.
(a) If I were to sort it, I would use myList.sort()
(b) If I were to sort it temporarily, I would use sorted(myList)
Note the difference between the use of two functions, one is to apply the function to myList, the other one is use myList as a parameter to the function.
My question is, each time when I use a function.
How do I know if the function should be used as an "action" to be applied to an object (in (a)), or
should an object passed to the function as a parameter,(in (b)).
I have been confused with this for quite long time. appreciate any explanations.
Thanks.
There are two big differences between list.sort and sorted(list)
The list.sort() sorts the list in-place, which means it modifies the
list. The sorted function does not modify original list but returns
a sorted list
The list.sort() only applies to list (it is a method), but sorted built-in function can take any iterable object.
Please go through this useful documentation.
Only sorted is a function - list.sort is a method of the list type.
Functions such as sorted are applicable to more than a specific type. For example, you can get a sorted list, set, or even a temporary generator. Only the output is concrete (you always get a new list) but not the input.
Methods such as sort are applicable only to the type that holds them. For example, there is a list.sort method but not a dict.sort method. Even for types whose methods have the same name, switching them is not sensible - for example, set.copy cannot be used to copy a dict.
An easy way to distinguish the two is that functions live in regular namespaces, such as modules. On the other hand, methods only live inside classes and their instances.
sorted # function
list.sort # method
import math
math.sqrt # function
math.pi.as_integer_ratio # method
Conventionally, Python usually uses functions for immutable actions and methods for mutating actions. For example, sorted provides a new sorted list leaving the old one untouched; my_list.sort() sorts the existing list, providing no new one.
my_list = [4, 2, 3, 1]
print(sorted(my_list)) # prints [1, 2, 3, 4]
print(my_list) # prints [4, 2, 3, 1] - unchanged by sorted
print(my_list.sort()) # prints None - no new list produced
print(my_list) # prints [1, 2, 3, 4] - changed by sort
sort() is an in-place function whereas sorted() will return a sorted list, but will not alter your variable in place. The following demonstrates the difference:
l = [1, 2, 1, 3, 2, 4]
l.sort()
print(l) --returns [1, 1, 2, 2, 3, 4]
l = [1, 2, 1, 3, 2, 4]
new_l = sorted(l)
print(new_l) -- returns [1, 1, 2, 2, 3, 4]
print(l) -- [1, 2, 1, 3, 2, 4]
If you want to maintain the original order of your list use sorted, otherwise you can use sort().
Related
I am trying to sort a list by frequency of its elements.
>>> a = [5, 5, 4, 4, 4, 1, 2, 2]
>>> a.sort(key = a.count)
>>> a
[5, 5, 4, 4, 4, 1, 2, 2]
a is unchanged. However:
>>> sorted(a, key = a.count)
[1, 5, 5, 2, 2, 4, 4, 4]
Why does this method not work for .sort()?
What you see is the result of a certain CPython implementation detail of list.sort. Try this again, but create a copy of a first:
a.sort(key=a.copy().count)
a
# [1, 5, 5, 2, 2, 4, 4, 4]
.sort modifies a internally, so a.count is going to produce un-predictable results. This is documented as an implementation detail.
What copy call does is it creates a copy of a and uses that list's count method as the key. You can see what happens with some debug statements:
def count(x):
print(a)
return a.count(x)
a.sort(key=count)
[]
[]
[]
...
a turns up as an empty list when accessed inside .sort, and [].count(anything) will be 0. This explains why the output is the same as the input - the predicates are all the same (0).
OTOH, sorted creates a new list, so it doesn't have this problem.
If you really want to sort by frequency counts, the idiomatic method is to use a Counter:
from collections import Counter
a.sort(key=Counter(a).get)
a
# [1, 5, 5, 2, 2, 4, 4, 4]
It doesn't work with the list.sort method because CPython decides to "empty the list" temporarily (the other answer already presents this). This is mentioned in the documentation as implementation detail:
CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python makes the list appear empty for the duration, and raises ValueError if it can detect that the list has been mutated during a sort.
The source code contains a similar comment with a bit more explanation:
/* The list is temporarily made empty, so that mutations performed
* by comparison functions can't affect the slice of memory we're
* sorting (allowing mutations during sorting is a core-dump
* factory, since ob_item may change).
*/
The explanation isn't straight-forward but the problem is that the key-function and the comparisons could change the list instance during sorting which is very likely to result in undefined behavior of the C-code (which may crash the interpreter). To prevent that the list is emptied during the sorting, so that even if someone changes the instance it won't result in an interpreter crash.
This doesn't happen with sorted because sorted copies the list and simply sorts the copy. The copy is still emptied during the sorting but there's no way to access it, so it isn't visible.
However you really shouldn't sort like this to get a frequency sort. That's because for each item you call the key function once. And list.count iterates over each item, so you effectively iterate the whole list for each element (what is called O(n**2) complexity). A better way would be to calculate the frequency once for each element (can be done in O(n)) and then just access that in the key.
However since CPython has a Counter class that also supports most_common you could really just use that:
>>> from collections import Counter
>>> [item for item, count in reversed(Counter(a).most_common()) for _ in range(count)]
[1, 2, 2, 5, 5, 4, 4, 4]
This may change the order of the elements with equal counts but since you're doing a frequency count that shouldn't matter to much.
list.sort() sorts the list and replaces the original list, whereas sorted(list) returns a sorted copy of the list, without changing the original list.
When is one preferred over the other?
Which is more efficient? By how much?
Can a list be reverted to the unsorted state after list.sort() has been performed?
Please use Why do these list operations (methods) return None, rather than the resulting list? to close questions where OP has inadvertently assigned the result of .sort(), rather than using sorted or a separate statement. Proper debugging would reveal that .sort() had returned None, at which point "why?" is the remaining question.
sorted() returns a new sorted list, leaving the original list unaffected. list.sort() sorts the list in-place, mutating the list indices, and returns None (like all in-place operations).
sorted() works on any iterable, not just lists. Strings, tuples, dictionaries (you'll get the keys), generators, etc., returning a list containing all elements, sorted.
Use list.sort() when you want to mutate the list, sorted() when you want a new sorted object back. Use sorted() when you want to sort something that is an iterable, not a list yet.
For lists, list.sort() is faster than sorted() because it doesn't have to create a copy. For any other iterable, you have no choice.
No, you cannot retrieve the original positions. Once you called list.sort() the original order is gone.
What is the difference between sorted(list) vs list.sort()?
list.sort mutates the list in-place & returns None
sorted takes any iterable & returns a new list, sorted.
sorted is equivalent to this Python implementation, but the CPython builtin function should run measurably faster as it is written in C:
def sorted(iterable, key=None):
new_list = list(iterable) # make a new list
new_list.sort(key=key) # sort it
return new_list # return it
when to use which?
Use list.sort when you do not wish to retain the original sort order
(Thus you will be able to reuse the list in-place in memory.) and when
you are the sole owner of the list (if the list is shared by other code
and you mutate it, you could introduce bugs where that list is used.)
Use sorted when you want to retain the original sort order or when you
wish to create a new list that only your local code owns.
Can a list's original positions be retrieved after list.sort()?
No - unless you made a copy yourself, that information is lost because the sort is done in-place.
"And which is faster? And how much faster?"
To illustrate the penalty of creating a new list, use the timeit module, here's our setup:
import timeit
setup = """
import random
lists = [list(range(10000)) for _ in range(1000)] # list of lists
for l in lists:
random.shuffle(l) # shuffle each list
shuffled_iter = iter(lists) # wrap as iterator so next() yields one at a time
"""
And here's our results for a list of randomly arranged 10000 integers, as we can see here, we've disproven an older list creation expense myth:
Python 2.7
>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[3.75168503401801, 3.7473005310166627, 3.753129180986434]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[3.702025591977872, 3.709248117986135, 3.71071034099441]
Python 3
>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[2.797430992126465, 2.796825885772705, 2.7744789123535156]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[2.675589084625244, 2.8019039630889893, 2.849375009536743]
After some feedback, I decided another test would be desirable with different characteristics. Here I provide the same randomly ordered list of 100,000 in length for each iteration 1,000 times.
import timeit
setup = """
import random
random.seed(0)
lst = list(range(100000))
random.shuffle(lst)
"""
I interpret this larger sort's difference coming from the copying mentioned by Martijn, but it does not dominate to the point stated in the older more popular answer here, here the increase in time is only about 10%
>>> timeit.repeat("lst[:].sort()", setup=setup, number = 10000)
[572.919036605, 573.1384446719999, 568.5923951]
>>> timeit.repeat("sorted(lst[:])", setup=setup, number = 10000)
[647.0584738299999, 653.4040515829997, 657.9457361929999]
I also ran the above on a much smaller sort, and saw that the new sorted copy version still takes about 2% longer running time on a sort of 1000 length.
Poke ran his own code as well, here's the code:
setup = '''
import random
random.seed(12122353453462456)
lst = list(range({length}))
random.shuffle(lst)
lists = [lst[:] for _ in range({repeats})]
it = iter(lists)
'''
t1 = 'l = next(it); l.sort()'
t2 = 'l = next(it); sorted(l)'
length = 10 ** 7
repeats = 10 ** 2
print(length, repeats)
for t in t1, t2:
print(t)
print(timeit(t, setup=setup.format(length=length, repeats=repeats), number=repeats))
He found for 1000000 length sort, (ran 100 times) a similar result, but only about a 5% increase in time, here's the output:
10000000 100
l = next(it); l.sort()
610.5015971539542
l = next(it); sorted(l)
646.7786222379655
Conclusion:
A large sized list being sorted with sorted making a copy will likely dominate differences, but the sorting itself dominates the operation, and organizing your code around these differences would be premature optimization. I would use sorted when I need a new sorted list of the data, and I would use list.sort when I need to sort a list in-place, and let that determine my usage.
The main difference is that sorted(some_list) returns a new list:
a = [3, 2, 1]
print sorted(a) # new list
print a # is not modified
and some_list.sort(), sorts the list in place:
a = [3, 2, 1]
print a.sort() # in place
print a # it's modified
Note that since a.sort() doesn't return anything, print a.sort() will print None.
Can a list original positions be retrieved after list.sort()?
No, because it modifies the original list.
Here are a few simple examples to see the difference in action:
See the list of numbers here:
nums = [1, 9, -3, 4, 8, 5, 7, 14]
When calling sorted on this list, sorted will make a copy of the list. (Meaning your original list will remain unchanged.)
Let's see.
sorted(nums)
returns
[-3, 1, 4, 5, 7, 8, 9, 14]
Looking at the nums again
nums
We see the original list (unaltered and NOT sorted.). sorted did not change the original list
[1, 2, -3, 4, 8, 5, 7, 14]
Taking the same nums list and applying the sort function on it, will change the actual list.
Let's see.
Starting with our nums list to make sure, the content is still the same.
nums
[-3, 1, 4, 5, 7, 8, 9, 14]
nums.sort()
Now the original nums list is changed and looking at nums we see our original list has changed and is now sorted.
nums
[-3, 1, 2, 4, 5, 7, 8, 14]
Note: Simplest difference between sort() and sorted() is: sort()
doesn't return any value while, sorted() returns an iterable list.
sort() doesn't return any value.
The sort() method just sorts the elements of a given list in a specific order - Ascending or Descending without returning any value.
The syntax of sort() method is:
list.sort(key=..., reverse=...)
Alternatively, you can also use Python's in-built function sorted()
for the same purpose. sorted function return sorted list
list=sorted(list, key=..., reverse=...)
The .sort() function stores the value of new list directly in the list variable; so answer for your third question would be NO.
Also if you do this using sorted(list), then you can get it use because it is not stored in the list variable. Also sometimes .sort() method acts as function, or say that it takes arguments in it.
You have to store the value of sorted(list) in a variable explicitly.
Also for short data processing the speed will have no difference; but for long lists; you should directly use .sort() method for fast work; but again you will face irreversible actions.
With list.sort() you are altering the list variable but with sorted(list) you are not altering the variable.
Using sort:
list = [4, 5, 20, 1, 3, 2]
list.sort()
print(list)
print(type(list))
print(type(list.sort())
Should return this:
[1, 2, 3, 4, 5, 20]
<class 'NoneType'>
But using sorted():
list = [4, 5, 20, 1, 3, 2]
print(sorted(list))
print(list)
print(type(sorted(list)))
Should return this:
[1, 2, 3, 4, 5, 20]
[4, 5, 20, 1, 3, 2]
<class 'list'>
I'm new to writing code, and I understand the behavior of lists to an extent. Whenever a list is modified within a function's scope, the one in global scope changes too.
For example,
def modify_list(lst):
lst.append(5)
lst = [1, 2, 3, 4]
#This output is [1, 2, 3, 4]
print(lst)
modify_list(lst)
#This output is [1, 2, 3, 4, 5] because of the function.
print(lst)
I don't understand why this example won't work:
def modify_list(lst):
lst = [1, 2, 3, 4, 5]
lst = [1, 2, 3, 4]
#Output is [1, 2, 3, 4]
print(lst)
modify_list(lst)
#Output is [1, 2, 3, 4]
print(lst)
Why doesn't lst get modified in the second example? Is it because I'm creating a new object within the function's scope? Using the global keyword works instead of passing a parameter, but I want to avoid using global unless absolutely necessary.
I'm using this in an initialization function and want to revert the list back to its original state whenever the function is called. Again, using global works, I'm just wondering why this doesn't work.
Thanks! (Sorry if I'm not good at explaining things well)
The id function comes in handy here. Basically, the id function returns an integer that is guaranteed to be unique and constant for the lifetime of whatever object it was called on. In fact, in CPython (regular Python), id returns the address of the object in memory.
So if you run your code, printing the id of lst before and after running your modify_list, you'll find that the id changes when you assign to lst but not when you append.
Calling append on lst won't change the id of lst because lists in Python are mutable, and appending simply mutates the list. But, when you assign [1, 2, 3, 4, 5] to lst, you are creating a brand-new list object and assigning it to lst. This is not a mutation, and doesn't change the original in anyway. In general in Python, you can mutate arguments within a function to modify the original copy, but assigning a new object to it is not a mutation and won't change the original copy.
In
def modify_list(lst):
lst = [1, 2, 3, 4, 5]
you make local variable lst point to a completely different object than the one you passed as the argument in modify_list(lst) call. Maybe this article will help you understand: https://medium.com/school-of-code/passing-by-assignment-in-python-7c829a2df10a
list.sort() sorts the list and replaces the original list, whereas sorted(list) returns a sorted copy of the list, without changing the original list.
When is one preferred over the other?
Which is more efficient? By how much?
Can a list be reverted to the unsorted state after list.sort() has been performed?
Please use Why do these list operations (methods) return None, rather than the resulting list? to close questions where OP has inadvertently assigned the result of .sort(), rather than using sorted or a separate statement. Proper debugging would reveal that .sort() had returned None, at which point "why?" is the remaining question.
sorted() returns a new sorted list, leaving the original list unaffected. list.sort() sorts the list in-place, mutating the list indices, and returns None (like all in-place operations).
sorted() works on any iterable, not just lists. Strings, tuples, dictionaries (you'll get the keys), generators, etc., returning a list containing all elements, sorted.
Use list.sort() when you want to mutate the list, sorted() when you want a new sorted object back. Use sorted() when you want to sort something that is an iterable, not a list yet.
For lists, list.sort() is faster than sorted() because it doesn't have to create a copy. For any other iterable, you have no choice.
No, you cannot retrieve the original positions. Once you called list.sort() the original order is gone.
What is the difference between sorted(list) vs list.sort()?
list.sort mutates the list in-place & returns None
sorted takes any iterable & returns a new list, sorted.
sorted is equivalent to this Python implementation, but the CPython builtin function should run measurably faster as it is written in C:
def sorted(iterable, key=None):
new_list = list(iterable) # make a new list
new_list.sort(key=key) # sort it
return new_list # return it
when to use which?
Use list.sort when you do not wish to retain the original sort order
(Thus you will be able to reuse the list in-place in memory.) and when
you are the sole owner of the list (if the list is shared by other code
and you mutate it, you could introduce bugs where that list is used.)
Use sorted when you want to retain the original sort order or when you
wish to create a new list that only your local code owns.
Can a list's original positions be retrieved after list.sort()?
No - unless you made a copy yourself, that information is lost because the sort is done in-place.
"And which is faster? And how much faster?"
To illustrate the penalty of creating a new list, use the timeit module, here's our setup:
import timeit
setup = """
import random
lists = [list(range(10000)) for _ in range(1000)] # list of lists
for l in lists:
random.shuffle(l) # shuffle each list
shuffled_iter = iter(lists) # wrap as iterator so next() yields one at a time
"""
And here's our results for a list of randomly arranged 10000 integers, as we can see here, we've disproven an older list creation expense myth:
Python 2.7
>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[3.75168503401801, 3.7473005310166627, 3.753129180986434]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[3.702025591977872, 3.709248117986135, 3.71071034099441]
Python 3
>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[2.797430992126465, 2.796825885772705, 2.7744789123535156]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[2.675589084625244, 2.8019039630889893, 2.849375009536743]
After some feedback, I decided another test would be desirable with different characteristics. Here I provide the same randomly ordered list of 100,000 in length for each iteration 1,000 times.
import timeit
setup = """
import random
random.seed(0)
lst = list(range(100000))
random.shuffle(lst)
"""
I interpret this larger sort's difference coming from the copying mentioned by Martijn, but it does not dominate to the point stated in the older more popular answer here, here the increase in time is only about 10%
>>> timeit.repeat("lst[:].sort()", setup=setup, number = 10000)
[572.919036605, 573.1384446719999, 568.5923951]
>>> timeit.repeat("sorted(lst[:])", setup=setup, number = 10000)
[647.0584738299999, 653.4040515829997, 657.9457361929999]
I also ran the above on a much smaller sort, and saw that the new sorted copy version still takes about 2% longer running time on a sort of 1000 length.
Poke ran his own code as well, here's the code:
setup = '''
import random
random.seed(12122353453462456)
lst = list(range({length}))
random.shuffle(lst)
lists = [lst[:] for _ in range({repeats})]
it = iter(lists)
'''
t1 = 'l = next(it); l.sort()'
t2 = 'l = next(it); sorted(l)'
length = 10 ** 7
repeats = 10 ** 2
print(length, repeats)
for t in t1, t2:
print(t)
print(timeit(t, setup=setup.format(length=length, repeats=repeats), number=repeats))
He found for 1000000 length sort, (ran 100 times) a similar result, but only about a 5% increase in time, here's the output:
10000000 100
l = next(it); l.sort()
610.5015971539542
l = next(it); sorted(l)
646.7786222379655
Conclusion:
A large sized list being sorted with sorted making a copy will likely dominate differences, but the sorting itself dominates the operation, and organizing your code around these differences would be premature optimization. I would use sorted when I need a new sorted list of the data, and I would use list.sort when I need to sort a list in-place, and let that determine my usage.
The main difference is that sorted(some_list) returns a new list:
a = [3, 2, 1]
print sorted(a) # new list
print a # is not modified
and some_list.sort(), sorts the list in place:
a = [3, 2, 1]
print a.sort() # in place
print a # it's modified
Note that since a.sort() doesn't return anything, print a.sort() will print None.
Can a list original positions be retrieved after list.sort()?
No, because it modifies the original list.
Here are a few simple examples to see the difference in action:
See the list of numbers here:
nums = [1, 9, -3, 4, 8, 5, 7, 14]
When calling sorted on this list, sorted will make a copy of the list. (Meaning your original list will remain unchanged.)
Let's see.
sorted(nums)
returns
[-3, 1, 4, 5, 7, 8, 9, 14]
Looking at the nums again
nums
We see the original list (unaltered and NOT sorted.). sorted did not change the original list
[1, 2, -3, 4, 8, 5, 7, 14]
Taking the same nums list and applying the sort function on it, will change the actual list.
Let's see.
Starting with our nums list to make sure, the content is still the same.
nums
[-3, 1, 4, 5, 7, 8, 9, 14]
nums.sort()
Now the original nums list is changed and looking at nums we see our original list has changed and is now sorted.
nums
[-3, 1, 2, 4, 5, 7, 8, 14]
Note: Simplest difference between sort() and sorted() is: sort()
doesn't return any value while, sorted() returns an iterable list.
sort() doesn't return any value.
The sort() method just sorts the elements of a given list in a specific order - Ascending or Descending without returning any value.
The syntax of sort() method is:
list.sort(key=..., reverse=...)
Alternatively, you can also use Python's in-built function sorted()
for the same purpose. sorted function return sorted list
list=sorted(list, key=..., reverse=...)
The .sort() function stores the value of new list directly in the list variable; so answer for your third question would be NO.
Also if you do this using sorted(list), then you can get it use because it is not stored in the list variable. Also sometimes .sort() method acts as function, or say that it takes arguments in it.
You have to store the value of sorted(list) in a variable explicitly.
Also for short data processing the speed will have no difference; but for long lists; you should directly use .sort() method for fast work; but again you will face irreversible actions.
With list.sort() you are altering the list variable but with sorted(list) you are not altering the variable.
Using sort:
list = [4, 5, 20, 1, 3, 2]
list.sort()
print(list)
print(type(list))
print(type(list.sort())
Should return this:
[1, 2, 3, 4, 5, 20]
<class 'NoneType'>
But using sorted():
list = [4, 5, 20, 1, 3, 2]
print(sorted(list))
print(list)
print(type(sorted(list)))
Should return this:
[1, 2, 3, 4, 5, 20]
[4, 5, 20, 1, 3, 2]
<class 'list'>
Normally, list comprehensions are used to derive a new list from an existing list. Eg:
>>> a = [1, 2, 3, 4, 5]
>>> [i for i in a if i > 2]
[3, 4, 5]
Should we use them to perform other procedures? Eg:
>>> a = [1, 2, 3, 4, 5]
>>> b = []
>>> [b.append(i) for i in a]
[None, None, None, None, None]
>>> print b
[1, 2, 3, 4, 5]
or should I avoid the above and use the following instead?:
for i in a:
b.append(i)
You should indeed avoid using list comprehensions (along with dictionary comprehensions, set comprehensions and generator expressions) for side effects. Apart from the fact that they'd accumulate a bogus list and thus waste memory, it's also confusing. I expect a list comprehension to generate a (meaningful) value, and many would agree. Loops, on the other hand, are clearly a sequence of statements. They are expected to kick off side effects and generate no result value - no surprise.
From python documentation:
List comprehensions provide a concise way to create lists. Common
applications are to make new lists
Perhaps you want to learn more about reduce(), filter() and map() functions.
In the example you give it would make the most sense to do:
b = [i for i in a]
if for some reason you wanted to create b. In general, there is some common sense that must be employed. If using a comprehension makes your code unreadable, don't use it. Otherwise go for it.
Only use list comprehensions if you plan to use the created list. Otherwise you create it just for the GC to throw it again without ever being used.
So instead of [b.append(i) for i in a] you should use a proper for loop:
for i in a:
b.append(i)
Another solution would be through a generator expression:
b += (i for i in a)
However, if you want to append the whole list, you can simply do
b += a
And if you just need to apply a function to the elements before adding them to the list, you can always use map:
b += map(somefunc, a)
b = []
a = [1, 2, 3, 4, 5]
b.extend (a)