Related
GOAL: Filter a list of lists using dictionary as reference in Python 3.8+
CASE USE: When reviewing a nested list -- a series of survey responses -- filtering out responses based on control questions. In the dictionary, the responses to questions 3 (index 2 in list) and 7 (index 6) should both be of corresponding value 5. If both answers for a response are not 5, they should not be populated in the filtered_responses list.
Open to interpretation on how to solve for this. I have reviewed several resources touching on filtering dictionaries using lists. This method is preferred as some survey responses many contain the same array of values, therefore the list element is retained.
no_of_survey_questions = 10
no_of_participants = 5
min_score = 1
max_score = 10
control_questions = {3: 5,
7: 5, }
unfiltered_responses = [[4, 5, 4, 5, 4, 5, 4, 5, 4, 5], # omit
[9, 8, 7, 6, 5, 4, 3, 2, 1, 1], # omit
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5], # include
[5, 2, 5, 2, 5, 2, 5, 9, 1, 1], # include
[1, 2, 5, 1, 2, 1, 2, 1, 2, 1]] # omit
for response in unfiltered_responses:
print(response)
print()
filtered_responses = [] # should contain only unfiltered_responses values marked 'include'
for response in filtered_responses:
# INSERT CODE HERE
print(response)
Thanks in advance!
You can use list comprehension + all():
control_questions = {3: 5,
7: 5}
unfiltered_responses = [[4, 5, 4, 5, 4, 5, 4, 5, 4, 5], # omit
[9, 8, 7, 6, 5, 4, 3, 2, 1, 1], # omit
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5], # include
[5, 2, 5, 2, 5, 2, 5, 9, 1, 1], # include
[1, 2, 5, 1, 2, 1, 2, 1, 2, 1]] # omit
filted_questions = [subl for subl in unfiltered_responses if all(subl[k-1] == v for k, v in control_questions.items())]
print(filted_questions)
Prints:
[
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[5, 2, 5, 2, 5, 2, 5, 9, 1, 1]
]
I have data for example
>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([3, 4, 5, 6])
I want to duplicate each item in each vector to the value of the length of the vector. So the results can are
>>> a2 = np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
>>> b2 = np.array([3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6])
Using np.tile(b, len(b)) can output b2. However, how can I get a2?
The two replications are a bit different. The first one can be obtained with .repeat(..) [numpy-doc]:
>>> a.repeat(len(a))
array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
the second one with .tile(..) [numpy-doc]:
>>> np.tile(b, len(b))
array([3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6])
You can do both in one go using np.meshgrid
A,B = map(np.ravel,np.meshgrid(a,b,indexing='ij'))
A
# array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
B
# array([3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6])
I want to know the method that weighted-random in Python.
1:10%, 2:10%, 3:10%, 4:50%, 5:20%
Then I choose the random number without duplication. How should I code? Generally, we will code below that:
Python
from random import *
sample(range(1,6),1)
You should have a look at random.choices (https://docs.python.org/3/library/random.html#random.choices), which allows you to define a weighting, if you are using python 3.6 ore newer
Example:
import random
choices = [1,2,3,4,5]
random.choices(choices, weights=[10,10,10,50,20], k=20)
Output:
[3, 5, 2, 4, 4, 4, 5, 3, 5, 4, 5, 4, 5, 4, 2, 4, 5, 2, 4, 4]
Try this:
from numpy.random import choice
list_of_candidates = [1,2,5,4,12]
number_of_items_to_pick = 120
p = [0.1, 0, 0.3, 0.6, 0]
choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)
If you really wanted a sample-version you can prepare the range accordingly:
nums = [1,2,3,4,5]
w = [10,10,10,50,20] # total of 100%
d = [x for y in ( [n]*i for n,i in zip(nums,w)) for x in y]
a_sample = random.sample(d,k=5)
print(a_sample)
print(d)
Output:
# 5 samples
[4, 2, 3, 1, 4]
# the whole sample input:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
If you just need 1 number you can use random.choices - it is limited to 1 number because its drawing with replacement.
import random
from collections import Counter
# draw and count 10k to show distribution works
print(Counter( random.choices([1,2,3,4,5], weights=[10,10,10,50,20], k=10000)).most_common())
Output:
[(4, 5019), (5, 2073), (3, 1031), (1, 978), (2, 899)]
Using a "sample" w/o replacement and "weighted" is (for me) weired - because you would change the weighting for each successive number because you removed available numbers from the range (thats by feel - my guess would be the math behind tells me its not so).
I have list with n multiple lists.
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5]
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3]
]
How can I efficiently compare them and generate a list which always contains the highest value at the current position?
I don't know how I can do this since the boundaries for each list are different.
The output for the above example should be a list with these values:
[8,9,4,5,9,6,7,8,9,4,5]
The most idiomatic approach would be transposing the 2D list and calling max on each row in the transposed list. But in your case, you're dealing with ragged lists, so zip cannot be directly applied here (it zips upto the shortest list only).
Instead, use itertools.zip_longest (izip_longest for python 2), and then apply max using map -
from itertools import zip_longest
r = list(map(max, zip_longest(*data, fillvalue=-float('inf'))))
Or, using #Peter DeGlopper's suggestion, with a list comprehension -
r = [max(x) for x in zip_longest(*data, fillvalue=-float('inf'))]
print(r)
[8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
Here, I use a fillvalue parameter to fill missing values with negative infinity. The intermediate result looks something like this -
list(zip_longest(*data, fillvalue=-float('inf')))
[(1, 2, 8, 3),
(2, 6, 1, 9),
(3, 3, 4, 1),
(4, 5, 1, 2),
(5, 9, 2, 2),
(6, 1, 3, 1),
(7, 1, 4, 1),
(8, 1, 2, 5),
(-inf, 2, 5, 9),
(-inf, 4, -inf, 3),
(-inf, 5, -inf, -inf)]
Now, applying max becomes straightforward - just do it over each row and you're done.
zip_longest is your friend in this case.
from itertools import zip_longest
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5],
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3],
]
output = list()
for x in zip_longest(*data, fillvalue=0):
output.append(max(x))
print(output)
>>> [8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
Adding a pandas solution
import pandas as pd
pd.DataFrame(data).max().astype(int).tolist()
Out[100]: [8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
You don't need any external module , Just use some logic and you go :
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5],
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3]
]
new_data={}
for j in data:
for k,m in enumerate(j):
if k not in new_data:
new_data[k] = [m]
else:
new_data[k].append(m)
final_data=[0]*len(new_data.keys())
for key,value in new_data.items():
final_data[key]=max(value)
print(final_data)
output:
[8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
You can use itertools.izip_longest (itertools.zip_longest in Python3):
Python2:
import itertools
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5],
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3],
]
new_data = [max(filter(lambda x:x, i)) for i in itertools.izip_longest(*data)]
Output:
[8, 9, 4, 5, 9, 6, 7, 8, 9, 4, 5]
Python3:
import itertools
data = [
[1, 2, 3, 4, 5, 6, 7, 8],
[2, 6, 3, 5, 9, 1, 1, 1, 2, 4, 5],
[8, 1, 4, 1, 2, 3, 4, 2, 5],
[3, 9, 1, 2, 2, 1, 1, 5, 9, 3],
]
new_data = [max(filter(None, i)) for i in itertools.zip_longest(*data)]
I'm trying to generate Langford numbers with Python. I have already written the following code which works well for getting the fourth Langford number. Here is my code :
import itertools
n=0
for l in set(itertools.permutations(["1", "1", "2", "2", "3", "3", "4", "4"])):
t1, t2, t3, t4 = [i for i, j in enumerate(l) if j == "1"], [i for i, j in enumerate(l) if j == "2"], [i for i, j in enumerate(l) if j == "3"], [i for i, j in enumerate(l) if j == "4"]
if abs(t1[1]-t1[0]) == 2 and abs(t2[1]-t2[0]) == 3 and abs(t3[1]-t3[0]) == 4 and abs(t4[1]-t4[0]) == 5:
print("".join(l))
n+=1
else:
pass
print(n)
I have two questions :
first, are there techniques to make this code quicker (for the moment it finds the result in 0.1s)
second, could you give me hints on how I could adapt the code to get any n-th Langford number
I you wonder, here is the wikipedia page for Langford numbers.
Thank you very much if you take the time to answer me !
Yes, there are a few faster ways to make Langford sequences.
Firstly, here's a fairly simple way. Rather than testing all of the permutations containing the pairs of numbers from 1 to n, we generate the permutations of numbers from 1 to n and then try to build Langford sequences from those permutations by placing each pair of numbers into the next available pair of slots. If no pair of slots is available we abandon that permutation and go onto the next one.
Building a sequence is a little slower than simply testing if a full permutation of 2n items is valid, but it means we need to test a lot fewer permutations when n is large. Eg, if n=7 there are 7! = 5040 permutations, but if we test the permutations of 7 pairs, that's 14! = 87178291200 permutations!
We can reduce that number though, because it contains a lot of duplicates. For 7 pairs, the number of unique permutations is 14! / (2**7) = 681080400 since swapping the 2 items in any of the 7 pairs produces a duplicate permutation. Unfortunately, itertools.permutations doesn't care about duplicates, but my answer here has code for a permutation generator that doesn't produce duplicate permutations. But still, 681 million permutations is a large number, and it takes a long time to test them. So it's better if we can avoid doing that.
import sys
from itertools import permutations
def place(t):
slen = 2 * len(t)
seq = [0] * slen
for u in t:
# Find next vacant slot
for i, v in enumerate(seq):
if v == 0:
break
else:
# No vacant slots
return
j = i + u + 1
if j >= slen or seq[j]:
return
seq[i] = seq[j] = u
return tuple(seq)
def langford(n):
count = 0
for t in permutations(range(1, n+1)):
seq = place(t)
#if seq and seq < seq[::-1]:
if seq:
count += 1
print(seq, count)
return count // 2
def main():
n = int(sys.argv[1]) if len(sys.argv) > 1 else 4
count = langford(n)
print(count)
if __name__ == '__main__':
main()
output for n=7
(1, 4, 1, 5, 6, 7, 4, 2, 3, 5, 2, 6, 3, 7) 1
(1, 4, 1, 6, 7, 3, 4, 5, 2, 3, 6, 2, 7, 5) 2
(1, 5, 1, 4, 6, 7, 3, 5, 4, 2, 3, 6, 2, 7) 3
(1, 5, 1, 6, 3, 7, 4, 5, 3, 2, 6, 4, 2, 7) 4
(1, 5, 1, 6, 7, 2, 4, 5, 2, 3, 6, 4, 7, 3) 5
(1, 5, 1, 7, 3, 4, 6, 5, 3, 2, 4, 7, 2, 6) 6
(1, 6, 1, 3, 5, 7, 4, 3, 6, 2, 5, 4, 2, 7) 7
(1, 6, 1, 7, 2, 4, 5, 2, 6, 3, 4, 7, 5, 3) 8
(1, 7, 1, 2, 5, 6, 2, 3, 4, 7, 5, 3, 6, 4) 9
(1, 7, 1, 2, 6, 4, 2, 5, 3, 7, 4, 6, 3, 5) 10
(2, 3, 6, 2, 7, 3, 4, 5, 1, 6, 1, 4, 7, 5) 11
(2, 3, 7, 2, 6, 3, 5, 1, 4, 1, 7, 6, 5, 4) 12
(2, 4, 7, 2, 3, 6, 4, 5, 3, 1, 7, 1, 6, 5) 13
(2, 5, 6, 2, 3, 7, 4, 5, 3, 6, 1, 4, 1, 7) 14
(2, 6, 3, 2, 5, 7, 3, 4, 6, 1, 5, 1, 4, 7) 15
(2, 6, 3, 2, 7, 4, 3, 5, 6, 1, 4, 1, 7, 5) 16
(2, 6, 7, 2, 1, 5, 1, 4, 6, 3, 7, 5, 4, 3) 17
(2, 7, 4, 2, 3, 5, 6, 4, 3, 7, 1, 5, 1, 6) 18
(3, 4, 5, 7, 3, 6, 4, 1, 5, 1, 2, 7, 6, 2) 19
(3, 4, 6, 7, 3, 2, 4, 5, 2, 6, 1, 7, 1, 5) 20
(3, 5, 7, 2, 3, 6, 2, 5, 4, 1, 7, 1, 6, 4) 21
(3, 5, 7, 4, 3, 6, 2, 5, 4, 2, 7, 1, 6, 1) 22
(3, 6, 7, 1, 3, 1, 4, 5, 6, 2, 7, 4, 2, 5) 23
(3, 7, 4, 6, 3, 2, 5, 4, 2, 7, 6, 1, 5, 1) 24
(4, 1, 6, 1, 7, 4, 3, 5, 2, 6, 3, 2, 7, 5) 25
(4, 1, 7, 1, 6, 4, 2, 5, 3, 2, 7, 6, 3, 5) 26
(4, 5, 6, 7, 1, 4, 1, 5, 3, 6, 2, 7, 3, 2) 27
(4, 6, 1, 7, 1, 4, 3, 5, 6, 2, 3, 7, 2, 5) 28
(4, 6, 1, 7, 1, 4, 5, 2, 6, 3, 2, 7, 5, 3) 29
(4, 6, 3, 5, 7, 4, 3, 2, 6, 5, 2, 1, 7, 1) 30
(5, 1, 7, 1, 6, 2, 5, 4, 2, 3, 7, 6, 4, 3) 31
(5, 2, 4, 6, 2, 7, 5, 4, 3, 1, 6, 1, 3, 7) 32
(5, 2, 4, 7, 2, 6, 5, 4, 1, 3, 1, 7, 6, 3) 33
(5, 2, 6, 4, 2, 7, 5, 3, 4, 6, 1, 3, 1, 7) 34
(5, 2, 7, 3, 2, 6, 5, 3, 4, 1, 7, 1, 6, 4) 35
(5, 3, 6, 4, 7, 3, 5, 2, 4, 6, 2, 1, 7, 1) 36
(5, 3, 6, 7, 2, 3, 5, 2, 4, 6, 1, 7, 1, 4) 37
(5, 6, 1, 7, 1, 3, 5, 4, 6, 3, 2, 7, 4, 2) 38
(5, 7, 1, 4, 1, 6, 5, 3, 4, 7, 2, 3, 6, 2) 39
(5, 7, 2, 3, 6, 2, 5, 3, 4, 7, 1, 6, 1, 4) 40
(5, 7, 2, 6, 3, 2, 5, 4, 3, 7, 6, 1, 4, 1) 41
(5, 7, 4, 1, 6, 1, 5, 4, 3, 7, 2, 6, 3, 2) 42
(6, 1, 5, 1, 7, 3, 4, 6, 5, 3, 2, 4, 7, 2) 43
(6, 2, 7, 4, 2, 3, 5, 6, 4, 3, 7, 1, 5, 1) 44
(7, 1, 3, 1, 6, 4, 3, 5, 7, 2, 4, 6, 2, 5) 45
(7, 1, 4, 1, 6, 3, 5, 4, 7, 3, 2, 6, 5, 2) 46
(7, 2, 4, 5, 2, 6, 3, 4, 7, 5, 3, 1, 6, 1) 47
(7, 2, 4, 6, 2, 3, 5, 4, 7, 3, 6, 1, 5, 1) 48
(7, 2, 6, 3, 2, 4, 5, 3, 7, 6, 4, 1, 5, 1) 49
(7, 3, 1, 6, 1, 3, 4, 5, 7, 2, 6, 4, 2, 5) 50
(7, 3, 6, 2, 5, 3, 2, 4, 7, 6, 5, 1, 4, 1) 51
(7, 4, 1, 5, 1, 6, 4, 3, 7, 5, 2, 3, 6, 2) 52
26
That takes about 0.2 seconds on my old 2GHz machine.
Conventionally, 2 Langford sequences are considered to be the same if one is a reversal of the other. One way to deal with that is to compare a sequence with its reversed version, and only print it if its less than the reversed version. You can change the above code to do that by commenting out
if seq:
in the langford function and un-commenting the following line:
#if seq and seq < seq[::-1]:
The above code is an improvement, but we can do better. My next solution uses a technique known as recursive backtracking. This technique can be elegantly implemented in Python through the use of a recursive generator function.
We start with a sequence of zeros. Starting with the highest number, we try to place a pair of numbers into each legal pair of slots, and if we're successful we recurse to place the next pair of numbers, if there are no numbers left to place we've found a solution and we can yield it.
import sys
def langford(n, seq):
''' Generate Langford sequences by recursive backtracking '''
# The next n
n1 = n - 1
# Test each valid pair of positions for this n
for i in range(0, len(seq) - n - 1):
j = i + n + 1
if not (seq[i] or seq[j]):
# Insert this n into the sequence
seq[i] = seq[j] = n
if n1:
# Recurse to add the next n
yield from langford(n1, seq)
else:
# Nothing left to insert
yield seq
# Remove this n from the sequence in preparation
# for trying the next position
seq[i] = seq[j] = 0
def main():
n = int(sys.argv[1]) if len(sys.argv) > 1 else 4
for i, t in enumerate(langford(n, [0] * 2 * n), 1):
print(t, i)
if i % 1000 == 0:
print(' ', i, end='\r', flush=True)
print('\n', i // 2)
if __name__ == '__main__':
main()
The if i % 1000 == 0: stuff lets you see the progress when n is large. This is handy if you comment-out the print(t, i) line.
This code can generate the 35584 = 2*17792 sequences for n=11 in under 25 seconds on my machine.
If you want to collect the sequences yielded by langford into a list, rather than just printing them, you can do it like this:
n = 7
results = list(langford(n, [0] * 2 * n))
However, if you want to do that you must make a slight change to the langford function. Where it says
yield seq
change it to
yield seq[:]
so that it yields a copy of seq rather than the original seq list.
If you just want to get the count of sequences (not counting reversals), you can do this:
n = 7
count = sum(1 for _ in langford(n, [0] * 2 * n)) // 2
print(count)
That will work ok with yield seq.
The above code will be slow for larger values of n. There are faster techniques for calculating the number of Langford sequences, using fairly advanced mathematics, but there's no known simple formula. The OEIS has a list of Langford sequences numbers at A014552.
You are complicating things: All that is needed is to generate all permutations, and eliminate those that are not Langford sequences.
1- do not use set(itertools...), itertools already returns unique elements.
2- for each permutation, you must check is it is a Langfort sequence.
3- if not, break and check the next one
4- if it is, check that its inverse has not yet been collated, and save it in a set of unique elements
5- return the resulting unique langfort sequences
This code is fast for n=4, and can find sequences for an arbitrary n; however, the time complexity is massively exponential; past n=6, it will require quite a bit of time to finish.
import itertools
def langfort(n):
seq = [_ for _ in range(1, n+1)] * 2
lang = set()
for s in itertools.permutations(seq):
for elt in seq:
first = s.index(elt)
if s[first+1:].index(elt) == elt:
continue
else:
break
else:
if s[::-1] not in lang:
lang.add(s)
return lang
langfort(4)
output:
{(4, 1, 3, 1, 2, 4, 3, 2)}
Performance:
on a 2011 mac book air:
%timeit langfort(4)
10 loops, best of 3: 53.4 ms per loop
more output:
langfort(5)
set() # there are no langfort(5) sequences
langfort(6)
set() # there are no langfort(6) sequences