Understanding the set() function - python

In python, set() is an unordered collection with no duplicate elements. However, I am not able to understand how it generates the output.
For example, consider the following:
>>> x = [1, 1, 2, 2, 2, 2, 2, 3, 3]
>>> set(x)
set([1, 2, 3])
>>> y = [1, 1, 6, 6, 6, 6, 6, 8, 8]
>>> set(y)
set([8, 1, 6])
>>> z = [1, 1, 6, 6, 6, 6, 6, 7, 7]
>>> set(z)
set([1, 6, 7])
Shouldn't the output of set(y) be: set([1, 6, 8])? I tried the above two in Python 2.6.

Sets are unordered, as you say. Even though one way to implement sets is using a tree, they can also be implemented using a hash table (meaning getting the keys in sorted order may not be that trivial).
If you'd like to sort them, you can simply perform:
sorted(set(y))
which will produce a sorted list containing the set's elements. (Not a set. Again, sets are unordered.)
Otherwise, the only thing guaranteed by set is that it makes the elements unique (nothing will be there more than once).
Hope this helps!

As an unordered collection type, set([8, 1, 6]) is equivalent to set([1, 6, 8]).
While it might be nicer to display the set contents in sorted order, that would make the repr() call more expensive.
Internally, the set type is implemented using a hash table: a hash function is used to separate items into a number of buckets to reduce the number of equality operations needed to check if an item is part of the set.
To produce the repr() output it just outputs the items from each bucket in turn, which is unlikely to be the sorted order.

As +Volatility and yourself pointed out, sets are unordered. If you need the elements to be in order, just call sorted on the set:
>>> y = [1, 1, 6, 6, 6, 6, 6, 8, 8]
>>> sorted(set(y))
[1, 6, 8]

Python's sets (and dictionaries) will iterate and print out in some order, but exactly what that order will be is arbitrary, and not guaranteed to remain the same after additions and removals.
Here's an example of a set changing order after a lot of values are added and then removed:
>>> s = set([1,6,8])
>>> print(s)
{8, 1, 6}
>>> s.update(range(10,100000))
>>> for v in range(10, 100000):
s.remove(v)
>>> print(s)
{1, 6, 8}
This is implementation dependent though, and so you should not rely upon it.

After reading the other answers, I still had trouble understanding why the set comes out un-ordered.
Mentioned this to my partner and he came up with this metaphor: take marbles. You put them in a tube a tad wider than marble width : you have a list. A set, however, is a bag. Even though you feed the marbles one-by-one into the bag; when you pour them from a bag back into the tube, they will not be in the same order (because they got all mixed up in a bag).

Related

if i have a list of characters, how would i return its most two frequencies with the less time complexity?

I'm using python language.
The clear algorithm will be enough for me.
I've tried using a dictionary, and counting the existence of each character if it is not in the list.
But I'm not sure if it has the possible less complexity.
Use the in built Counter(list).most_common(n) method, as below.
from collections import Counter
input_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 5, 7, 3, 1]
most_common_values = [value[0] for value in Counter(input_list).most_common(2)]
print(most_common_values)
This outputs: [1, 2].
The advantages to this approach are that it is fast, simple, and returns a list of the items in order. In addition, if their is a 'tie' in value count, it will return the example that appears first, as displayed in the example above.
Use built-int Counter in collection library

Python Faster 'If' Usage

I have a list:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
There are multiple if usages that check a number is in the a list.
while True:
if 3 in a:
some_work1 #(different)
if 4 in a:
some_work2 #(different)
if 8 in a:
some_work3 #(different)
if 11 in a:
some_work4 #(different)
if 12 in a:
some_work5 #(different)
Are there any faster (less cpu usage) methods for these multiple if usages? (List a is always same. Also it does not change over iterations.). There is no dublicated items in the a list. Works do not overlap.
Python 3.8.7
Use a set which has constant insert and retrieve times. In comparison, the in operator performs a linear search in your a every check.
I'm not exactly sure what your use-case is without seeing your larger code. I'm assuming your use-case treats a as a list of flags. As such, a set fits the bill.
a = [1, 2, 3, 4, 5, 6, 7, 8, 9 10, 11, 12]
a = set(a) # pass an iterable
# or simply
a = {1, 2, 3, 4, 5, 6, 7, 8, 9 10, 11, 12}
# or built at runtime
a = set()
a.add(1)
a.add(2)
if 3 in a:
some_work1
If you want a more efficient switch statement, you have already found it. Python uses if..elif for this. This ensures each is evaluated in sequence with short-circuit. If you could match multiple outcomes, use a dict (e.g. {3: functor3, 4: functor4, ...}. A functor is a callable, ie it has a __call__() method defined. A lambda also satisfies this.
A set is an unordered collection that does not allow duplicates. It's like a dictionary but with the values removed, leaving just keys. As you know, dictionary keys are unique, and likewise members of a set are unique. Here we just want a set for performance.
option 1
You could use a dictionary whose keys are the number in a and values the corresponding function. Then you loop over them once and store the needed functions in an array (to_call). In the while loop you simply iterate over this array and call its members.
def some_work1():
print("work1")
def some_work2():
print("work2");
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
func = {3:some_work1,4:some_work2}
to_call = []
for k in func:
if k in a:
to_call.append(func[k])
while 1:
for f in to_call:
f();
Option 2
Write some kind of code generator that reads a and generates a .py file containing the function calls.

How to sort iterable with Python without using stable sort?

So, I have an iterable in Input like this:
[4, 6, 2, 2, 6, 4, 4, 4]
And I want to sort it based on decreased frequency order. So that the result will be this:
[4, 4, 4, 4, 6, 6, 2, 2]
So what happened here is that, when an element has the same frequency of another one, they will be in the same order (6 appeared first so the 6 goes before the 2).
I tried to implement this mechanism using the sorted function but I have a big problem.
def frequency_sort(items):
return sorted(items, key=lambda elem: sum([True for i in items if i == elem]), reverse=True)
I know this short way is difficult to read but it just sort the array using the key parameter to extract the frequency of a number. But, the output is this:
[4, 4, 4, 4, 6, 2, 2, 6]
As you can see the output is a little different from what it should be. And that happened (I think) because sorted() is a function that does a "stable sort" i.e. a sort that will keep the order as it is if there are same keys.
So what is happening here is like a strong stable sort. I want more like a soft-sort that will take into account the order but will put the same elements next to each other.
You could use collections.Counter and use most_common that returns in descending order of frequency:
from collections import Counter
def frequency_sorted(lst):
counts = Counter(lst)
return [k for k, v in counts.most_common() for _ in range(v)]
result = frequency_sorted([4, 6, 2, 2, 6, 4, 4, 4])
print(result)
Output
[4, 4, 4, 4, 6, 6, 2, 2]
From the documentation on most_common:
Return a list of the n most common elements and their counts from the
most common to the least. If n is omitted or None, most_common()
returns all elements in the counter. Elements with equal counts are
ordered in the order first encountered

Order of variables in python using the .update method of sets

How are the order of variables decided when updating a set?
A = {1, 2, 3, 4, 5}
A.add(8)
print(A)
A.update({-1, -2, -3})
print(A)
Why is the order {1, 2, 3, 4, 5, 8, -2, -3, -1} and not {1, 2, 3, 4, 5, 8, -1, -2, -3}?
When using {1, 2, 3} you generate a set, which is an unordered object.
You can't expect the set to conserve the order, because it uses a hashtable to avoid double entries of the same value. This is faster than using a list for this case.
When the order matters to you, than you have to use a list, on which you can call .append(element) or .insert(position, element).
order of elements in a set is undefined.unlike array so you couldn't reference to variable as A[0]=1
moreover, the set elements are usually not stored in order of appearance in the set; this allows checking if an element belongs to a set faster than just going through all the elements of the set.
you can check link below
https://snakify.org/en/lessons/sets/

Why does list(set([2,1,3,6,5,3,6,4])) automatically order the list?

I was experimenting with set in python and while I understood that it is unsorted, based on hashes, I find it strange that it automatically sorts these numbers, both in Python 2 and in 3:
>>> list(set([2,1,3,6,5,3,6,4]))
[1, 2, 3, 4, 5, 6]
>>> list(set([2,1,3,6,5,3,6,4,0,7]))
[0, 1, 2, 3, 4, 5, 6, 7]
I googled for some time but didn't find the answer to the behavior of these two functions combined.
It does not, here is an example
In [2]: list(set([1,9,5,10]))
Out[2]: [1, 10, 5, 9]
Also, from the docs: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
A set object is an unordered collection of distinct hashable objects.
The reason sometimes you see the sorted output is dependent on how the hash is computed, and sometimes on what REPL is being used, and this behaviour is well described in this answer
Example for ipython, the way the set is printed, changes when we enable doctest_mode, which disables pretty-printing of ipython
In [1]: set([1,6,8,4])
Out[1]: {1, 4, 6, 8}
In [2]: %doctest_mode
Exception reporting mode: Plain
Doctest mode is: ON
>>> set([1,6,8,4])
{8, 1, 4, 6}
This is not a feature of sets and only a coincidence stemming from how sets are created.
The hash of the items in your list are the numbers themselves. So under some circumstances the created set will show this behaviour but is in no way reliable.
Looking at this answer you can read more about it.

Categories