dropping the lowest values from a list - python

I'm trying to write a python program that drops 25% of the lowest values from a list and (return the original unsorted list). For example;
Input : [1,5,6,72,3,4,9,11,3,8]
Output : [5,6,72,4,9,11,8]
I tried to do:
l = [1,5,6,72,3,4,9,11,3,8]
def drop(k):
while len(l)!=0 and k > 0:
k = k - 1
l.sort(reverse = True)
l.pop()
return l
k = math.ceil(len(l) * 0.25)
drop (k)
it returns [72, 11, 9, 8, 6, 5, 4] but is there a way to do it without sorting?.

You don't require to reverse sort and find the smallest element. Use min on list l which returns the smallest value from l and remove using l.remove conveniently.
import math
l = [1,5,6,72,3,4,9,11,3,8]
def drop(k):
while len(l)!=0 and k > 0:
k = k - 1
l.remove(min(l))
return l
k = math.ceil(len(l) * 0.25)
print(drop (k))
# [5, 6, 72, 4, 9, 11, 8]

You could use a heapq and keep popping elements until 25% of the container has been removed. Then, filter the contents of the original list
import heapq, copy
s = [1,5,6,72,3,4,9,11,3,8]
new_s = copy.deepcopy(s)
heapq.heapify(s)
count = 0
last_items = set()
while count/float(len(new_s)) <= 0.25:
last_items.add(heapq.heappop(s))
count += 1
final_s = [i for i in new_s if i not in last_items]
Output:
[5, 6, 72, 4, 9, 11, 8]

There are O(n) solutions to this problem. One of those, introselect, is implemented in numpy in the partition and argpartition functions:
>>> data = [1,5,6,72,3,4,9,11,3,8]
>>>
>>> k = int(round(len(data) / 4))
>>>
>>> import numpy as np
>>> dnp = np.array(data)
>>> drop_them = np.argpartition(dnp, k)[:k]
>>> keep_them = np.ones(dnp.shape, dtype=bool)
>>> keep_them[drop_them] = False
>>> result = dnp[keep_them].tolist()
>>>
>>> result
[5, 6, 72, 4, 9, 11, 3, 8]
Note that this method keeps one of the 3s and drops the other one in order to get the split at exactly k elements.
If instead you want to treat all 3s the same, you could do
>>> boundary = np.argpartition(dnp, k)[k]
>>> result = dnp[dnp > dnp[boundary]]
>>>
>>> result
array([ 5, 6, 72, 4, 9, 11, 8])

One way of doing this is this is very slow especially for longer lists!:
quart_len = int(0.25*len(l))
for i in range(quart_len):
l.remove(min(l))
A much faster way of doing this:
import numpy as np
from math import ceil
l = [1,5,6,72,3,4,9,11,3,8]
sorted_values = np.array(l).argsort()
l_new = [l[i] for i in range(len(l)) if i in sorted_values[int(ceil(len(l)/4.)):]]
Another approach:
l = np.array(l)
l = list(l[l > sorted(l)[len(l)/4]])

l1=[1,5,6,72,3,4,9,11,3,8]
l2=sorted(l1)
ln=round(len(l1)*0.25)
[i for i in l1 if i not in l2[ln+1:]]
Output:
[5, 6, 72, 4, 9, 11, 8]

Related

Find consecutive and nonconsecutive ordered sequences of items in a list

I have two lists:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
I want to count how many times the lookup_list appeared in my_list with the following logic:
The order should be 1 -> 2 -> 3
In my_list, the lookup_list items doesn't have to be next to each other: 1,4,2,1,5,3 -> should generate a match since there is a 2 comes after a 1 and a 3 comes after 2.
The mathces based on the logic:
1st match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
2nd match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
3rd match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
4th match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
The lookup_list is dynamic, it could be defined as [1,2] or [1,2,3,4], etc. How can I solve it? All the answers I've found is about finding matches where 1,2,3 appears next to each other in an ordered way like this one: Find matching sequence of items in a list
I can find the count of consecutive sequences with the below code but it doesn't count the nonconsecutive sequences:
from nltk import ngrams
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
all_counts = Counter(ngrams(l2, len(l1)))
counts = {k: all_counts[k] for k in [tuple(lookup_list)]}
counts
>>> {(1, 2, 3): 2}
I tried using pandas rolling window functions but they don't have a custom reset option.
def find_all_sequences(source, sequence):
def find_sequence(source, sequence, index, used):
for i in sequence:
while True:
index = source.index(i, index + 1)
if index not in used:
break
yield index
first, *rest = sequence
index = -1
used = set()
while True:
try:
index = source.index(first, index + 1)
indexes = index, *find_sequence(source, rest, index, used)
except ValueError:
break
else:
used.update(indexes)
yield indexes
Usage:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
print(*find_all_sequences(my_list, lookup_list), sep="\n")
Output:
(0, 1, 2)
(6, 7, 11)
(9, 10, 15)
(14, 16, 17)
Generator function find_all_sequences() yields tuples with indexes of sequence matches. In this function we initialize loop which will be stopped when list.index() call will throw ValueError. Internal generator function find_sequence() yields index of every sequence item.
According to this benchmark, my method is about 60% faster than one from Andrej Kesely's answer.
The function find_matches() returns indices where the matches from lookup_list are:
def find_matches(lookup_list, lst):
buckets = []
def _find_bucket(i, v):
for b in buckets:
if lst[b[-1]] == lookup_list[len(b) - 1] and v == lookup_list[len(b)]:
b.append(i)
if len(b) == len(lookup_list):
buckets.remove(b)
return b
break
else:
if v == lookup_list[0]:
buckets.append([i])
rv = []
for i, v in enumerate(my_list):
b = _find_bucket(i, v)
if b:
rv.append(b)
return rv
lookup_list = [1, 2, 3]
my_list = [1, 2, 3, 4, 5, 2, 1, 2, 2, 1, 2, 3, 4, 5, 1, 3, 2, 3, 1]
print(find_matches(lookup_list, my_list))
Prints:
[[0, 1, 2], [6, 7, 11], [9, 10, 15], [14, 16, 17]]
Here is a recursive solution:
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
def find(my_list, continue_from_index):
if continue_from_index > (len(my_list) - 1):
return 0
last_found_index = 0
found_indizes = []
first_occuring_index = 0
found = False
for l in lookup_list:
for m_index in range(continue_from_index, len(my_list)):
if my_list[m_index] is l and m_index >= last_found_index:
if not found:
found = True
first_occuring_index = m_index
last_found_index = m_index
found += 1
found_indizes.append(str(m_index))
break
if len(found_indizes) is len(lookup_list):
return find(my_list, first_occuring_index+1) + 1
return 0
print(find(my_list, 0))
my_list = [5, 6, 3, 8, 2, 1, 7, 1]
lookup_list = [8, 2, 7]
counter =0
result =False
for i in my_list:
if i in lookup_list:
counter+=1
if(counter==len(lookup_list)):
result=True
print (result)

finding the indices of the most maximum values in a list effectively

Suppose we have a list: [1, 3.5, -1, 7, 10, 20, 5, 17, 31, -5]
I want to write a function that returns the indices of the first 3 maximum values in order.
For example in this case the results would be: [8, 5, 7]
I know one way is to write nested-loops, however, can there be any other effective way to achieve this?
Use heapq.nlargest to find the n largest values.
from heapq import nlargest
from operator import itemgetter
l = [1, 3.5, -1, 7, 10, 20, 5, 17, 31, -5]
indices_of_3_largest = [i for i,_ in nlargest(3, enumerate(l), itemgetter(1))]
print(indices_of_3_largest)
# [8, 5, 7]
zip the list with the indices, sort it, and take the first 3 values
lst = [1, 3.5, -1, 7, 10, 20, 5, 17, 31, -5]
indices = [i for _, i in sorted(zip(lst, range(len(lst))), reverse=True)[:3]]
print(indices) # [8, 5, 7]
This can generate the result in one step, but the less elegant dunder method is used as the key:
>>> lst = [1, 3.5, -1, 7, 10, 20, 5, 17, 31, -5]
>>> heapq.nlargest(3, range(len(lst)), lst.__getitem__)
[8, 5, 7]
lambda function do this things easier.
Code:
def sort_index(lst, rev=True):
index = range(len(lst))
s = sorted(index, reverse=rev, key=lambda i: lst[i])
return s
score = [1, 3.5, -1, 7, 10, 20, 5, 17, 31, -5]
sort_index(score)[:3]
Result:
[8, 5, 7]
l = [1, 3.5, -1, 7, 10, 20, 5, 17, 31, -5]
list(map(lambda x: l.index(x), sorted(l, reverse=True)[:3] )) # [8, 5, 7]
Using consecutive max to a copy of the list.
lst = [1, 3.5, -1, 7, 10, 20, 5, 17, 31, -5]
lst_cp = lst.copy()
indeces_max = []
for _ in range(3):
m = max(lst_cp)
indeces_max.append(lst.index(m))
lst_cp.remove(m)
del lst_cp # just to remember that is not needed
print(indeces_max)
You could use np.argsort:
import numpy as np
ar = [1, 3.5, -1, 7, 10, 20, 5, 17, 31, -5]
ar_argsort = np.argsort(ar)
reversed_argsort = ar_argsort[::-1]
indices_3_max = reversed_argsort[0:3]
print(indices_3_max)
Result:
[8 5 7]
Concise version of above in one line:
ar_argsort = np.argsort(ar)[::-1][0:3]
While all 6 answers (right now available after ~24h) are effective in providing the requested result I was wondering which solution is the most efficient one.
I used timeit for the comparison (only runtime, no memory usage, 5k runs with an increased input list - the same for all).
Results in increasing order:
consecutive_max: 2.604 s
heapq_oneStep: 3.052 s
np_argsort: 3.234 s
lambda_unnamed: 6.182 s
heapq_nlargest: 4.522 s
lambda_function: 9.960 s
zip_sort: 15.402 s
Code of the evaluation:
import timeit
timeit_runs = 5_000
# heapq_oneStep by Mechanic Pic
# https://stackoverflow.com/a/73568080/11815313
def heapq_oneStep():
timeit_setup_heapq_oneStep = '''
import heapq
import random
random.seed(42)
integer_list = random.sample(range(-10_000, 10_000), 10_000)
lst = [x/100 for x in integer_list]
'''
testcode_heapq_oneStep = '''
heapq.nlargest(3, range(len(lst)), lst.__getitem__)
'''
return timeit.timeit(stmt=testcode_heapq_oneStep,
setup=timeit_setup_heapq_oneStep, number = timeit_runs)
# heapq_nlargest by Stef
# https://stackoverflow.com/a/73567876/11815313
def heapq_nlargest():
timeit_setup_heapq_nlargest = '''
from heapq import nlargest
from operator import itemgetter
import random
random.seed(42)
integer_list = random.sample(range(-10_000, 10_000), 10_000)
lst = [x/100 for x in integer_list]
'''
testcode_heapq_nlargest = '''
[i for i,_ in nlargest(3, enumerate(lst), itemgetter(1))]
'''
return timeit.timeit(stmt=testcode_heapq_nlargest,
setup=timeit_setup_heapq_nlargest, number = timeit_runs)
# zip_sort by Guy
# https://stackoverflow.com/a/73567874/11815313
def zip_sort():
timeit_setup_zip_sort = '''
import random
random.seed(42)
integer_list = random.sample(range(-10_000, 10_000), 10_000)
lst = [x/100 for x in integer_list]
'''
testcode_zip_sort = '''
[i for _, i in sorted(zip(lst, range(len(lst))), reverse=True)[:3]]
'''
return timeit.timeit(stmt=testcode_zip_sort,
setup=timeit_setup_zip_sort, number = timeit_runs)
# lambda_function by Mehmaam
# https://stackoverflow.com/a/73568027/11815313
def lambda_function():
timeit_setup_lambda_function = '''
import random
random.seed(42)
integer_list = random.sample(range(-10_000, 10_000), 10_000)
lst = [x/100 for x in integer_list]
'''
testcode_lambda_function = '''
def sort_index(lst, rev=True):
index = range(len(lst))
s = sorted(index, reverse=rev, key=lambda i: lst[i])
return s
sort_index(lst)[:3]
'''
return timeit.timeit(stmt=testcode_lambda_function,
setup=timeit_setup_lambda_function, number = timeit_runs)
# lambda_unnamed by uozcan12
# https://stackoverflow.com/a/73569339/11815313
def lambda_unnamed():
timeit_setup_lambda_unnamed = '''
import random
random.seed(42)
integer_list = random.sample(range(-10_000, 10_000), 10_000)
lst = [x/100 for x in integer_list]
'''
testcode_lambda_unnamed = '''
list(map(lambda x: lst.index(x), sorted(lst, reverse=True)[:3] ))
'''
return timeit.timeit(stmt=testcode_lambda_unnamed,
setup=timeit_setup_lambda_unnamed, number = timeit_runs)
# np_argsort by MagnusO_O
# https://stackoverflow.com/a/73567884/11815313
def np_argsort():
timeit_setup_np_argsort = '''
import numpy as np
import random
random.seed(42)
integer_list = random.sample(range(-10_000, 10_000), 10_000)
lst = [x/100 for x in integer_list]
lst_nparr = np.array(lst)
'''
testcode_np_argsort = '''
lst_nparr_max3 = np.argsort(lst_nparr)[::-1][0:3]
lst_nparr_max3.tolist()
'''
return timeit.timeit(stmt=testcode_np_argsort,
setup=timeit_setup_np_argsort, number = timeit_runs)
# consecutive_max by cards
def consecutive_max():
timeit_setup_consecutive_max = '''
import random
random.seed(42)
integer_list = random.sample(range(-10_000, 10_000), 10_000)
lst = [x/100 for x in integer_list]
'''
testcode_consecutive_max = '''
lst_cp = lst.copy()
indeces_max = []
for _ in range(3):
m = max(lst_cp)
indeces_max.append(lst.index(m))
lst_cp.remove(m)
'''
return timeit.timeit(stmt=testcode_consecutive_max,
setup=timeit_setup_consecutive_max, number = timeit_runs)
time_heapq_oneStep = heapq_oneStep()
time_heapq_nlargest = heapq_nlargest()
time_zip_sort = zip_sort()
time_lambda_function = lambda_function()
time_lambda_unnamed = lambda_unnamed()
time_np_argsort = np_argsort()
time_consecutive_max = consecutive_max()
print(f'''consecutive_max: {time_consecutive_max:.3f} s''')
print(f'''np_argsort: {time_np_argsort:.3f} s''')
print(f'''heapq_oneStep: {time_heapq_oneStep:.3f} s''')
print(f'''lambda_unnamed: {time_lambda_unnamed:.3f} s''')
print(f'''heapq_nlargest: {time_heapq_nlargest:.3f} s''')
print(f'''lambda_function: {time_lambda_function:.3f} s''')
print(f'''zip_sort: {time_zip_sort:.3f} s''')

how to make sure that two numbers next to each other in a list are different

I have a simple code that generates a list of random numbers.
x = [random.randrange(0,11) for i in range(10)]
The problem I'm having is that, since it's random, it sometimes produces duplicate numbers right next to each other. How do I change the code so that it never happens? I'm looking for something like this:
[1, 7, 2, 8, 7, 2, 8, 2, 6, 5]
So that every time I run the code, all the numbers that are next to each other are different.
x = []
while len(x) < 10:
r = random.randrange(0,11)
if not x or x[-1] != r:
x.append(r)
x[-1] contains the last inserted element, which we check not to be the same as the new random number. With not x we check that the array is not empty, as it would generate a IndexError during the first iteration of the loop
Here's an approach that doesn't rely on retrying:
>>> import random
>>> x = [random.choice(range(12))]
>>> for _ in range(9):
... x.append(random.choice([*range(x[-1]), *range(x[-1]+1, 12)]))
...
>>> x
[6, 2, 5, 8, 1, 8, 0, 4, 6, 0]
The idea is to choose each new number by picking from a list that excludes the previously picked number.
Note that having to re-generate a new list to pick from each time keeps this from actually being an efficiency improvement. If you were generating a very long list from a relatively short range, though, it might be worthwhile to generate different pools of numbers up front so that you could then select from the appropriate one in constant time:
>>> pool = [[*range(i), *range(i+1, 3)] for i in range(3)]
>>> x = [random.choice(random.choice(pool))]
>>> for _ in range(10000):
... x.append(random.choice(pool[x[-1]]))
...
>>> x
[0, 2, 0, 2, 0, 2, 1, 0, 1, 2, 0, 1, 2, 1, 0, ...]
O(n) solution by adding to the last element randomly from [1,stop) modulo stop
import random
x = [random.randrange(0,11)]
x.extend((x[-1]+random.randrange(1,11)) % 11 for i in range(9))
x
Output
[0, 10, 4, 5, 10, 1, 4, 8, 0, 9]
from random import randrange
from itertools import islice, groupby
# Make an infinite amount of randrange's results available
pool = iter(lambda: randrange(0, 11), None)
# Use groupby to squash consecutive values into one and islice to at most 10 in total
result = [v for v, _ in islice(groupby(pool), 10)]
Function solution that doesn't iterate to check for repeats, just checks each add against the last number in the list:
import random
def get_random_list_without_neighbors(lower_limit, upper_limit, length):
res = []
# add the first number
res.append(random.randrange(lower_limit, upper_limit))
while len(res) < length:
x = random.randrange(lower_limit, upper_limit)
# check that the new number x doesn't match the last number in the list
if x != res[-1]:
res.append(x)
return res
>>> print(get_random_list_without_neighbors(0, 11, 10)
[10, 1, 2, 3, 1, 8, 6, 5, 6, 2]
def random_sequence_without_same_neighbours(n, min, max):
x = [random.randrange(min, max + 1)]
uniq_value_count = max - min + 1
next_choises_count = uniq_value_count - 1
for i in range(n - 1):
circular_shift = random.randrange(0, next_choises_count)
x.append(min + (x[-1] + circular_shift + 1) % uniq_value_count)
return x
random_sequence_without_same_neighbours(n=10, min=0, max=10)
It's not to much pythonic but you can do something like this
import random
def random_numbers_generator(n):
"Generate a list of random numbers but without two duplicate numbers in a row "
result = []
for _ in range(n):
number = random.randint(1, n)
if result and number == result[-1]:
continue
result.append(number)
return result
print(random_numbers_generator(10))
Result:
3, 6, 2, 4, 2, 6, 2, 1, 4, 7]

Can anyone help me in handling ties in a python list while I try to replace it's elements with their ranks?

I have a list that looks something like this:
lst_A = [32,12,32,55,12,90,32,75]
I want to replace the numbers with their rank. I am using this function to do this:
def obtain_rank(lstC):
sort_data = [(x,i) for i,x in enumerate(lstC)]
sort_data = sorted(sort_data,reverse=True)
result = [0]*len(lstC)
for i,(_,idx) in enumerate(sort_data,1):
result[idx] = i
return result
I am getting the following output while I use this:
[6, 8, 5, 3, 7, 1, 4, 2]
But what I want from this is:
[4, 7, 5, 3, 8, 1, 6, 2]
How can I go about this?
Try this:
import pandas as pd
def obtain_rank(a):
s = pd.Series(a)
return [int(x) for x in s.rank(method='first', ascending=False)]
#[4, 7, 5, 3, 8, 1, 6, 2]
You could use 2 loops:
l = [32,12,32,55,12,90,32,75]
d = list(enumerate(sorted(l, reverse = True), start = 1))
res = []
for i in range(len(l)):
for j in range(len(d)):
if d[j][1] == l[i]:
res.append(d[j][0])
del d[j]
break
print(res)
#[4, 7, 5, 3, 8, 1, 6, 2]
Here you go. In case, you are not already aware, please read https://docs.python.org/3.7/library/collections.html to understand defaultdict and deque
from collections import defaultdict, deque
def obtain_rank(listC):
sorted_list = sorted(listC, reverse=True)
d = defaultdict(deque) # deque are efficient at appending/popping elements at both sides of the sequence.
for i, ele in enumerate(sorted_list):
d[ele].append(i+1)
result = []
for ele in listC:
result.append(d[ele].popleft()) # repeating numbers with lower rank will be the start of the list, therefore popleft
return result
Update: Without using defaultdict and deque
def obtain_rank(listC):
sorted_list = sorted(listC, reverse=True)
d = {}
for i, ele in enumerate(sorted_list):
d[ele] = d.get(ele, []) + [i + 1] # As suggested by Joshua Nixon
result = []
for ele in listC:
result.append(d[ele][0])
del d[ele][0]
return result

Python non-linear slicing

Given a list a, I can slice it by doing:
a[start:end:step]
However this is only linear slicing.
For instance, I would like to select the indexes that are powers of 2.
Unfortunately this does not work:
a[slice(2**x for x in range(len(a))]
Is there a way to avoid loops when non-linear slicing is required?
EDIT:
mainly I need this to modify the list. E.g.
a[*non-linear-slicing*] = [*list-with-new-values*]
You could use list comprehension:
is_power_of_two = lambda num: num != 0 and ((num & (num - 1)) == 0)
print [item for i, item in enumerate(a) if is_power_of_two(i)]
EDIT:
mainly I need this to modify the list. E.g.
a[*non-linear-slicing*] = [*list-with-new-values*]
Great. Then we can use that list of new values to drive this thing. Demo:
>>> a = range(17)
>>> newvals = ['foo', 'bar', 'baz', 'qux']
>>> for i, val in enumerate(newvals):
a[2**i] = val
>>> a
[0, 'foo', 'bar', 3, 'baz', 5, 6, 7, 'qux', 9, 10, 11, 12, 13, 14, 15, 16]
Or if you also have the indexes in a list already, then use zip instead of enumerate:
>>> a = range(17)
>>> indexes = [1, 2, 4, 8]
>>> newvals = ['foo', 'bar', 'baz', 'qux']
>>> for i, val in zip(indexes, newvals):
a[i] = val
>>> a
[0, 'foo', 'bar', 3, 'baz', 5, 6, 7, 'qux', 9, 10, 11, 12, 13, 14, 15, 16]
And since you just mentioned coming from R, maybe you want to use NumPy, which does support such indexing:
>>> import numpy as np
>>> a = np.arange(13)
>>> a[[1, 2, 4, 8]] = [111, 222, 444, 888]
>>> a
array([ 0, 111, 222, 3, 444, 5, 6, 7, 888, 9, 10, 11, 12])
Is this what you are trying to do?
a = [0,1,2,3,4,5,6,7,8,9]
a[1],a[2],a[4],a[8] = [11,12,14,18]
# a = [0,11,12,3,14,5,6,7,18,9]
I know no way to build the a[1],a[2],a[4],a[8] in a generic way as comprehensions or generator will not work as they do not admit the assigment operation.
I am not sure if this is what you want, but you can try this:
# Example list
l = [1,2,3,4,5,6,7,8]
result = list(map(lambda y: l[y], list(filter(lambda x: (x>0 and (x & (x-1) == 0)),list(range(0,len(l)))))))
print(result)
The result is:
[2, 3, 5]
But if I were you, I wouldn't use that because it is barely readable.
Why iterate over the whole list, when you just need the 2 ** ith elements. Which improves complexity dramatically.
a = range(100)
def two_power_slice(lst):
result = []
for i in xrange(len(lst)):
j = 2 ** i
if j > len(lst) - 1:
break
result.append(lst[j])
return result
print two_power_slice(a)
I think you can also use logarithm to avoid using if altogether.
Edit: Improved version, Fixed errors.
def two_power_slice(lst):
if not lst: return lst
result = []
for i in xrange(int(math.ceil(math.log(len(lst), 2)))):
result.append(lst[2 ** i])
return result

Categories