Encode Python lists as indexes of unique values

Encode Python lists as indexes of unique values - python

I'd like to represent an arbitrary list as two other lists. The first, call it values, containing the unique elements in the original list, and the second, call it codes, containing the index in values of each element in the original list, in such a way that the original list could be reconstructed as
orig_list = [values[c] for c in codes]
(Note: this is similar to how pandas.Categorical represents series)
I've created the function below to do this decomposition:
def decompose(x):
values = sorted(list(set(x)))
codes = [0 for _ in x]
for i, value in enumerate(values):
codes = [i if elem == value else code for elem, code in zip(x, codes)]
return values, codes
This works, but I would like to know if there is a better/more efficient way of achieving this (no double loop?), or if there's something in the standard library that could do this for me.
Update:
The answers below are great and a big improvement to my function. I've timed all that worked as intended:
test_list = [random.randint(1, 10) for _ in range(10000)]
functions = [decompose, decompose_boris1, decompose_boris2,
decompose_alexander, decompose_stuart1, decompose_stuart2,
decompose_dan1]
for f in functions:
print("-- " + f.__name__)
# test
values, codes = f(test_list)
decoded_list = [values[c] for c in codes]
if decoded_list == test_list:
print("Test passed")
%timeit f(test_list)
else:
print("Test failed")
Results:
-- decompose
Test passed
12.4 ms ± 269 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
-- decompose_boris1
Test passed
1.69 ms ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_boris2
Test passed
1.63 ms ± 18.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_alexander
Test passed
681 µs ± 2.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_stuart1
Test passed
1.7 ms ± 3.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_stuart2
Test passed
682 µs ± 5.98 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_dan1
Test passed
896 µs ± 19.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I'm accepting Stuart's answer for being the simplest and one of the fastest.

I’m quite happy with this solution, although I am still trying to find a better one.
Code
def decompose(original_list: List[Any]) -> Tuple[List[int], Dict[int, Any]]:
code_to_elem = dict(enumerate(set(original_list)))
elem_to_code = {v: k for k, v in code_to_elem.items()}
encoded_list = [elem_to_code[elem] for elem in original_list]
return encoded_list, code_to_elem
Test run
# t_list for test_list
t_list = [1, 2, 19, 3, 2, 19, 2, 3, 19, 1, 1, 3]
t_encoded, t_decoder = decompose(t_list)
t_decoded = [t_decoder[curr_code] for curr_code in t_encoded]
Here are the contents of the important variables:
t_list: [1, 2, 19, 3, 2, 19, 2, 3, 19, 1, 1, 3]
t_encoded: [1, 2, 3, 0, 2, 3, 2, 0, 3, 1, 1, 0]
t_decoder: {0: 3, 1: 1, 2: 2, 3: 19}
t_decoded: [1, 2, 19, 3, 2, 19, 2, 3, 19, 1, 1, 3]
Let me know if you have any questions :)

This would count as an answer even if it is merely an improvement on Boris's answer.
I would use index_of_values.append(values.setdefault(elem, len(values))) as the loop body as that reduces three dict lookups to one and keeps the branch outside the interpreter. One might even create locals for the two methods to not repeatedly do lookups for them. But it seems that the savings of doing both is only 7%.
But using the insane looking values = defaultdict(lambda: len(values)) gives a 23%.
from collections import defaultdict
def decompose(x):
values = defaultdict(lambda: len(values))
index_of_values = []
_append = index_of_values.append
for elem in x:
_append(values[elem])
return list(values), index_of_values
It is even better if the loop is replaced by a map:
def decompose(x):
values = defaultdict(lambda: len(values))
index_of_values = list(map(values.__getitem__, x))
return list(values), index_of_values
Gives 57%. I would have caught that if I had been looking at the output of the function. Also get evidently doesn't trigger the factory. I don't know why it doesn't.
If the dict does not retain insertion order:
return sorted(values, key=values.get), index_of_values

You can use a simple index lookup:
def decompose(x):
values = sorted(set(x))
return values, [values.index(v) for v in x]
If more time-efficiency is needed (because x is very large) then this can be achieved (in exchange for some memory overhead) by representing values as a dictionary:
def decompose(x):
values = sorted(set(x))
d = {value: index for index, value in enumerate(values)}
return values, [d[v] for v in x]
If sorting is not needed (or not possible for some reason) then replace sorted with list in the above.

You can do this, IIUC, with cummin:
df['newgroup'] = df.reset_index().groupby('group')['index'].cummin()
In [1579]: df
Out[1579]:
group newgroup
0 5 0
1 4 1
2 5 0
3 6 3
4 7 4
5 8 5
6 5 0
7 3 7
8 2 8
9 5 0
10 6 3
11 7 4
12 8 5
13 8 5
14 5 0

Create a dictionary where the keys are the unique values and they map to their index in the keys (dictionaries keep order starting with CPython 3.6). You do this by iterating over the list, if an element is not in the dictionary, you add it to the dictionary and map it to the length of the dictionary at the time you added it. Then you look up the element's index in the dictionary and append it to the list. Then you return just the keys, along with the list of indexes.
def decompose(x):
values = {}
index_of_values = [values.setdefault(elem, len(values)) for elem in x]
return list(values), index_of_values
This is linear time and space complexity. Use it like this:
>>> decompose([2, 1, 1, 1, 131, 42, 2])
([2, 1, 131, 42], [0, 1, 1, 1, 2, 3, 0])
Using the side effect of a list comprehension is generally frowned upon, so you might want to write it out this function more explicitly:
def decompose(x):
values = {}
index_of_values = []
for elem in x:
if elem not in values:
values[elem] = len(values)
index_of_values.append(values[elem])
return list(values), index_of_values

If you need some thing like pandas.Caterogical
arbi_arr=[1, 2, 3, 1, 2, 3]
value=list(dict.fromkey(arbi_arr))
code=list(range(0, len(arbi_arr)))

Related

Low Complexity Python Sort (Small - Large - Small) by a Class Attribute

I am looking to sort my list(class) into the order of small - large - small, for example if it were purely numeric and the list was [1,5,3,7,7,3,2] the sort would look like [1,3,7,7,5,3,2].
The basic class structure is:
class LaneData:
def __init__(self):
self.Name = "Random"
self.laneWeight = 5
So essentially the sort function would work from the LaneData.laneWeight variable.
I found this answer here which I'm not entirely sure if it will work in this instance using a class as the list variable.
My second idea is like so (pseudo code below) [inspired by]:
def sortByWeight(e):
return e.getLaneWeight()
newList = [] # create a new list to store
lanes.sort(key=sortByWeight) # python default sort by our class attrib
newList = lanes[:len(lanes) // 2] # get the first half of the sorted list (low-high)
lanes.sort(key=sortByWeight, reverse=True) # reverse the sort
newList = newList + lanes[:len(lanes) // 2] # get the first half of the sorted list (high-low)
If possible I'd like to keep the sort small and efficient, I don't mind building a small algorithm for it if needs be.
What's your thoughts team?

Your solution works but you sort the end of the list in ascending order first and in descending order afterwards.
You could optimize it by :
Looking for the index of the max, swap the max with the element in the middle position and finally sort separately the first half of the table (in ascending order) and the second half (in descending order).
Instead of doing a reverse sort of the second half, you can simply reverse it (with the reverse function). This is less complex (O(n) instead of O(nlog(n))

Discussion
Given that you can just use the key parameter, I would just ignore it for the time being.
Your algorithm for a given sequence looks like:
def middle_sort_flip_OP(seq, key=None):
result = []
length = len(seq)
seq.sort(key=key)
result = seq[:length // 2]
seq.sort(key=key, reverse=True)
result.extend(seq[:length // 2])
return result
print(middle_sort_flip_OP([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 2, 3, 3, 9, 8, 7, 5]
print(middle_sort_flip_OP([1, 5, 3, 7, 3, 2, 8]))
# [1, 2, 3, 8, 7, 5]
The second sorting step is completely unnecessary (but has the same computational complexity as a simple reversing for the Timsort algorithm implemented in Python), since you can simply slice the sorted sequence backward (making sure to compute the correct offset for the "middle" element):
def middle_sort_flip(seq, key=None):
length = len(seq)
offset = length % 2
seq.sort(key=key)
return seq[:length // 2] + seq[:length // 2 - 1 + offset:-1]
print(middle_sort_flip([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 2, 3, 3, 9, 8, 7, 5]
print(middle_sort_flip([1, 5, 3, 7, 3, 2, 8]))
# [1, 2, 3, 8, 7, 5]
Another approach which is theoretically more efficient consist in ordering the left and right side of the sequence separately. This is more efficient because each sorting step is O(N/2 log N/2) and, when combined gives O(N log N/2) (instead of O(N + N log N)):
def middle_sort_half(seq, key=None):
length = len(seq)
return \
sorted(seq[:length // 2], key=key) \
+ sorted(seq[length // 2:], key=key, reverse=True)
However, those approaches either give a largely unbalanced result where the whole right side is larger than the left side (middle_sort_flip()), or have a balancing which is dependent on the initial ordering of the input (middle_sort_half()).
A more balanced result can be obtained by extracting and recombining the odd and even subsequences. This is simple enough in Python thanks to the slicing operations and has the same asymptotic complexity as middle_sort_flip() but with much better balancing properties:
def middle_sort_mix(seq, key=None):
length = len(seq)
offset = length % 2
seq.sort(key=key)
result = [None] * length
result[:length // 2] = seq[::2]
result[length // 2 + offset:] = seq[-1 - offset::-2]
return result
print(middle_sort_mix([1, 5, 3, 7, 3, 2, 9, 8]))
# [1, 3, 5, 8, 9, 7, 3, 2]
print(middle_sort_mix([1, 5, 3, 7, 3, 2, 8]))
# [1, 3, 5, 8, 7, 3, 2]
Benchmarks
Speedwise they are all very similar when the key parameter is not used, because the execution time is dominated by the copying around:
import random
nums = [10 ** i for i in range(1, 7)]
funcs = middle_sort_flip_OP, middle_sort_flip, middle_sort_half, middle_sort_mix
print(nums)
# [10, 100, 1000, 10000, 100000, 1000000]
def gen_input(num):
return list(range(num))
for num in nums:
print(f"N = {num}")
for func in funcs:
seq = gen_input(num)
random.shuffle(seq)
print(f"{func.__name__:>24s}", end=" ")
%timeit func(seq.copy())
print()
...
N = 1000000
middle_sort_flip_OP 542 ms ± 54.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_flip 510 ms ± 49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_half 546 ms ± 4.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_mix 539 ms ± 63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
On the other hand, when the key parameter is non-trivial, your approach has a much larger amount of function calls as the other two, which may result in a significant increase in execution time for middle_sort_OP():
def gen_input(num):
return list(range(num))
def key(x):
return x ** 2
for num in nums:
print(f"N = {num}")
for func in funcs:
seq = gen_input(num)
random.shuffle(seq)
print(f"{func.__name__:>24s}", end=" ")
%timeit func(seq.copy(), key=key)
print()
...
N = 1000000
middle_sort_flip_OP 1.33 s ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_flip 1.09 s ± 23.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_half 1.1 s ± 27.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_mix 1.11 s ± 8.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
or, closer to your use-case:
class Container():
def __init__(self, x):
self.x = x
def get_x(self):
return self.x
def gen_input(num):
return [Container(x) for x in range(num)]
def key(c):
return c.get_x()
for num in nums:
print(f"N = {num}")
for func in funcs:
seq = gen_input(num)
random.shuffle(seq)
print(f"{func.__name__:>24s}", end=" ")
%timeit func(seq.copy(), key=key)
print()
...
N = 1000000
middle_sort_flip_OP 1.27 s ± 4.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_flip 1.13 s ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_half 1.24 s ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
middle_sort_mix 1.16 s ± 8.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
which seems to be a bit less dramatic.

Find if a value exists in a list of lists in Python

I have a list of lists like this:
my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
I now want to find if a value exists in my_list, and I need to do this in the most efficient way. I will use it in the following way:
if my_value in my_list:
# do something
I have tried 2 versions as below.
# 1
if any(my_value in sublist for sublist in my_list)
# 2
for sublist in my_list:
if my_value in sublist:
return True
I find version 1 easier to read and doesn't require a separate function call. But does the any() function stop when it finds the value, or does it loop through the entire list? And is there a better, or more pythonic, way of doing this lookup?

The difference seems to be negligible with the second version only slightly faster.
my_list = [np.random.randint(1, 10000, 1000) for v in range(1000)]
my_value = np.random.randint(1, 10000, 1)[0]
def v1(my_list):
if any(my_value in sublist for sublist in my_list):
return True
return False
def v2(my_list):
for sublist in my_list:
if my_value in sublist:
return True
return False
print(my_value)
%timeit v1(my_list)
%timeit v2(my_list)
>> 3012
>> 56.9 µs ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>> 51.2 µs ± 383 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Most efficient way to convert a list of counts to a list of numbers

I have a list of counts where each index represents a number and its count represents how many of that number is in the list:
a = [3,5,1,2]
turns into
b = [0,0,0,1,1,1,1,1,2,3,3]
I was thinking we could do something like:
b = []
for ix, el in enumerate(a):
b.extend([ix]*a[ix])
print(b)
But if I am not mistaken it takes k (count val) time to put it in list b as the extend takes k time, but it also has to be done n times giving us a runtime of n*k where n is the number of indicies and k is the number of counts for each index
Another idea is instead of having an array of counts we can have an array of the pure elements:
a = [[0,0,0],[1,1,1,1,1],[2],[3,3]]
but to flatten it it still takes quite some time (I believe n*k time)
b = [item for sublist in a for item in sublist]
is there a way to make this more efficient? Maybe converting to string removing all brackets and converting back into a list?

You can use numpy's np.repeat for a performant approach:
np.repeat(np.arange(len(a)), a)
# array([0, 0, 0, 1, 1, 1, 1, 1, 2, 3, 3])
Here are the timings -
a_large = np.concatenate([a]*10_000, axis=0)
def op(a):
b = []
for ix, el in enumerate(a):
b.extend([ix]*a[ix])
def yatu(a):
np.repeat(np.arange(len(a)), a)
%timeit op(a_large)
# 17.1 ms ± 422 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit yatu(a_large)
# 368 µs ± 1.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Convert list values to boolean [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What I want to convert is something like this
a = [ 0, 10, 3, 2, 0, 2 ]
def covert_to_boolean(a)
...
return a_converted
a_coverted = [ 0, 1, 1, 1, 0, 1]
what would be the easiest way to convert like this?

To convert to true Booleans, you could just use:
def covert_to_boolean(a)
return [bool(x) for x in a]
This returns
[False, True, True, True, False, True]
If you'd prefer them as 0s and 1s, then:
return [int(bool(x)) for x in a]
Would return:
[0, 1, 1, 1, 0, 1]

Not actually suggesting this unless the code is the hottest code in your program, but there are ways to improve on:
def covert_to_boolean(a)
return [bool(x) for x in a]
# Or the straightforward way of converting back to 1/0
return [int(bool(x)) for x in a]
First off, if a is large enough, since int/bool are built-ins implemented in C, you can use map to remove byte code interpreter overhead:
def covert_to_boolean(a)
return [*map(bool, a)]
# Or converting back to 1/0
return [*map(int, map(bool, a))]
Another savings can come from not using the bool constructor (C constructor calls have unavoidable overhead on CPython, even when the result doesn't actually "construct" anything), and replacing it with operator.truth (a plain function taking exactly one argument, which CPython heavily optimizes) reduces overhead significantly, and using it can reduce overhead by 40%:
>>> import random
>>> from operator import truth
>>> a = random.choices([*[0] * 100, *range(1, 101)], k=1000)
>>> %%timeit -r5
... [bool(x) for x in a]
...
...
248 µs ± 7.82 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %%timeit -r5
... [*map(bool, a)]
...
...
140 µs ± 2.5 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
>>> %%timeit -r5
... [*map(truth, a)]
...
...
81.3 µs ± 3.91 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
map(bool improved on the list comprehension by about 45%, and was in turn beat by map(truth by 40% (map(truth took almost exactly one third the time of the list comprehension).
If the result must be an int, we could expand it to [*map(int, map(truth, a))], but again, int is a constructor, and even though it returns singleton values (CPython caches single copies of -5 through 256 as an implementation detail), it still pays constructor overhead (worse, because it can take keyword arguments). There is no equivalent "convert to true int" function like bool has operator.truth, but you can cheat your way into one by "adding to 0":
>>> %%timeit -r5
... [int(bool(x)) for x in a]
...
...
585 µs ± 65.2 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %%timeit -r5
... [*map(int, map(bool, a))]
...
...
363 µs ± 58.6 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %%timeit -r5
... [*map((0).__add__, map(truth, a))]
...
...
168 µs ± 2.2 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
(0).__add__ just takes advantage of the fact that adding a bool to 0 produces either 0 or 1, and __add__ has far lower overhead than a constructor; in this case, the switch from list comprehension to map (even nested map) saved nearly 40%, switching from int/bool to (0).__add__/truth saved nearly 55% off what remained, for a total reduction in runtime of over 70%.
Again, to be clear, don't do this unless:
You've profiled, and converting really is the critical path in your code, speed-wise, and
The inputs aren't too small (if a were only a five elements, the setup overhead for calling map would outweigh the tiny savings from avoiding byte code per loop)
but when it comes up, it's good to know about. bool is one of the slowest things in Python, in terms of overhead:productive work ratio; int of already int-like things is similarly bad.
There is one last thing to check though. Maybe pushing things to syntax, avoiding function calls, might save more. As it happens, the answer is "it does, for one of them":
>>> %%timeit -r5
... [not not x for x in a] # Worse than map
...
...
122 µs ± 6.6 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
>>> %%timeit -r5
... [0 + (not not x) for x in a] # BETTER than map!!!
...
...
158 µs ± 22.4 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
>>> %%timeit -r5
...: [0 + x for x in map(truth, a)] # Somehow not the best of both worlds...
...:
...:
177 µs ± 5.77 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)
While [not not x for x in a] lost to [*map(truth, a)], [0 + (not not x) for x in a] actually beat [*map((0).__add__, map(truth, a))] (as it happens, there is some overhead in (0).__add__ being invoked through a wrapper around the tp_add slot which can be avoided by actually using + at the Python layer). Mixing the best of each solution (map(truth with 0 + in list comp) didn't actually benefit us though (readding the bytecode overhead was roughly a fixed cost, and not not beats even operator.truth). Point is, none of this is worth it unless you actually need it, and performance can be unintuitive. I had code that needed it, once upon a time, so you benefit from my testing.

You can use the and operator in a list comprehension to keep the code both fast and readable:
def covert_to_boolean(a)
return [i and 1 for i in a]
This approach is faster than #ShadowRanger's fastest approach, as demonstrated here:
https://repl.it/#blhsing/NeglectedClientsideLanserver

Not sure if you wanted b or c, so here's both
>>> a = [ 0, 10, 3, 2, 0, 2 ]
>>> b = [bool(i) for i in a]
>>> b
[False, True, True, True, False, True]
>>> c = [int(bool(i)) for i in a]
>>> c
[0, 1, 1, 1, 0, 1]

Never mind the lapses in terminology; here is a solution using list comprehension that you can study (assuming you are a student):
a=[2,0,12,45,0,0,99]
b=[1 if i != 0 else 0 for i in a]
print b
[1, 0, 1, 1, 0, 0, 1]

If you are trying to convert your values to 0 and 1, I think the most elegant way would be:
a_converted = [1 if e else 0 for e in a]
where you basically check if e, meaning e is non-zero and assign 1, vs it being zero and assign 0, for each e in a.

2 Half line solutions:
def covert_to_boolean(a):
return [1 if i !=0 else 0 for i in a]
# [0, 1, 1, 1, 0, 1]
#OR
def covert_to_boolean(a):
return [bool(i)*1 for i in a]
# [0, 1, 1, 1, 0, 1]

creating a new list with subset of list using index in python

A list:
a = ['a', 'b', 'c', 3, 4, 'd', 6, 7, 8]
I want a list using a subset of a using a[0:2],a[4], a[6:],
that is I want a list ['a', 'b', 4, 6, 7, 8]

Suppose
a = ['a', 'b', 'c', 3, 4, 'd', 6, 7, 8]
and the list of indexes is stored in
b= [0, 1, 2, 4, 6, 7, 8]
then a simple one-line solution will be
c = [a[i] for i in b]

Try new_list = a[0:2] + [a[4]] + a[6:].
Or more generally, something like this:
from itertools import chain
new_list = list(chain(a[0:2], [a[4]], a[6:]))
This works with other sequences as well, and is likely to be faster.
Or you could do this:
def chain_elements_or_slices(*elements_or_slices):
new_list = []
for i in elements_or_slices:
if isinstance(i, list):
new_list.extend(i)
else:
new_list.append(i)
return new_list
new_list = chain_elements_or_slices(a[0:2], a[4], a[6:])
But beware, this would lead to problems if some of the elements in your list were themselves lists.
To solve this, either use one of the previous solutions, or replace a[4] with a[4:5] (or more generally a[n] with a[n:n+1]).

The following definition might be more efficient than the first solution proposed
def new_list_from_intervals(original_list, *intervals):
n = sum(j - i for i, j in intervals)
new_list = [None] * n
index = 0
for i, j in intervals :
for k in range(i, j) :
new_list[index] = original_list[k]
index += 1
return new_list
then you can use it like below
new_list = new_list_from_intervals(original_list, (0,2), (4,5), (6, len(original_list)))

This thread is years old and I do not know if the method existed at the time, but the fastest solution I found in 2022 is not mentioned in the answers so far.
My exemplary list contains integers from 1 to 6 and I want to retrieve 4 items from this list.
I used the %timeit functionality of Jupyter Notebook / iPython on a Windows 10 system with Python 3.7.4 installed.
I added a numpy approach just to see how fast it is. It might take more time with the mixed type collection from the original question.
The fastest solution appears to be itemgetter from the operator module (Standard Library). If it does not matter whether the return is a tuple or a list, use itemgetter as is or otherwise use a list conversion. Both cases are faster than the other solutions.
from itertools import chain
import numpy as np
from operator import itemgetter
#
my_list = [1,2,3,4,5,6]
item_indices = [2, 0, 1, 5]
#
%timeit itemgetter(*item_indices)(my_list)
%timeit list(itemgetter(*item_indices)(my_list))
%timeit [my_list[item] for item in item_indices]
%timeit list(np.array(my_list)[item_indices])
%timeit list(chain(my_list[2:3], my_list[0:1], my_list[1:2], my_list[5:6]))
and the output is:
184 ns ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
251 ns ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
283 ns ± 85.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
4.3 µs ± 260 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
663 ns ± 49.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I would be interested in possible deviations of which solution is fastest depending on the size of the list and the number of items we want to extract, but this is my typical use case for my current project.
If someone finds the time to investigate this further, please let me know.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Encode Python lists as indexes of unique values - python

You can do this, IIUC, with cummin: df['newgroup'] = df.reset_index().groupby('group')['index'].cummin() In [1579]: df Out[1579]: group newgroup 0 5 0 1 4 1 2 5 0 3 6 3 4 7 4 5 8 5 6 5 0 7 3 7 8 2 8 9 5 0 10 6 3 11 7 4 12 8 5 13 8 5 14 5 0

If you need some thing like pandas.Caterogical arbi_arr=[1, 2, 3, 1, 2, 3] value=list(dict.fromkey(arbi_arr)) code=list(range(0, len(arbi_arr)))

Related

Low Complexity Python Sort (Small - Large - Small) by a Class Attribute

Find if a value exists in a list of lists in Python

Most efficient way to convert a list of counts to a list of numbers

Convert list values to boolean [closed]

creating a new list with subset of list using index in python

Categories

Resources