Filtering two lists simultaneously - python

I have three lists:
del_ids = [2, 4]
ids = [3, 2, 4, 1]
other = ['a', 'b', 'c', 'd']
and my goal is to remove del_ids with the result being
ids = [3, 1]
other = ['a', 'd']
I have tried to do a mask for elements to keep (mask = [id not in del_ids for id in ids]) and I plan to apply this mask on both lists.
But I feel that this is not a pythonic solution. Can you please tell me how I can do this better?

zip, filter and unzip again:
ids, other = zip(*((id, other) for id, other in zip(ids, other) if id not in del_ids))
The zip() call pairs each id with the corresponding other element, the generator expression filters out any pair where the id is listed in del_ids, and the zip(*..) then teases out the remaining pairs into separate lists again.
Demo:
>>> del_ids = [2, 4]
>>> ids = [3, 2, 4, 1]
>>> other = ['a', 'b', 'c', 'd']
>>> zip(*((id, other) for id, other in zip(ids, other) if id not in del_ids))
[(3, 1), ('a', 'd')]

zip, filter, unzip :
ids, other = zip(*filter(lambda (id,_): not id in del_ids, zip(ids, other)))

In order to avoid learning tricky syntax, do it in two steps.
other = [o for my_id, o in zip(ids, other) if my_id not in del_ids]
ids = [my_id for my_id in ids if my_id not in del_ids]
Drawback
You must execute the statements in correct order, so there's risk of bugs if for some reason the order changes.
Advantage
It's straight forward, so you don't have to search Stackoverflow next time you want to do it.

Converting to pandas data frame and applying the mask:
del_ids = [2, 4]
ids = [3, 2, 4, 1]
other = ['a', 'b', 'c', 'd']
df = pd.DataFrame({'ids':ids,'other':other})
df = df[~df.ids.isin(del_ids)]
ids = df['ids'].tolist()
other = df['other'].tolist()

Related

How to efficiently get shared key subsets of multiple dicts as multiple arrays?

Is there an efficient way to get the intersection of (the keys of) multiple dictionaries?
Similar to iterating over shared keys in two dictionaries , except the idea is not to iterate but rather get the set so it can be used to get the subset of dicts.
d1 = {'a':[1,2], 'b':[2,2]}
d2 = {'e':[3,2], 'b':[5,1], 'a':[5,5]}
d3 = {'b':[8,2], 'a':[3,3], 'c': [1,2]}
So intersection manually is simple
d1.keys() & d2.keys() & d3.keys()
but what about n-dimensional list? I feel like there is a better way than this:
d_list = [d1, d2, d3]
inter_keys = {}
for i in range(len(d_list)):
if i == 0:
inter_keys = d_list[i]
inter_keys = inter_keys & d_list[i].keys()
Then getting a subset
subsets = []
for n in d_list:
subsets.append( {k: n[k] for k in inter_keys} )
and finally use it to get the value subset
v = [ x.values() for x in subsets ]
really the last part is formatted as v = np.array([ np.array(list(x.values())) for x in subsets ]) to get the ndarray as:
[[[2 2] [1 2]]
[[5 1] [5 5]]
[[8 2] [3 3]]]
I was thinking there may be an approach using something like the numpy where to more efficiently get the subset, but not sure.
I think your code can be simplified to:
In [383]: d_list=[d1,d2,d3]
In [388]: inter_keys = d_list[0].keys()
In [389]: for n in d_list[1:]:
...: inter_keys &= n.keys()
...:
In [390]: inter_keys
Out[390]: {'a', 'b'}
In [391]: np.array([[n[k] for k in inter_keys] for n in d_list])
Out[391]:
array([[[1, 2],
[2, 2]],
[[5, 5],
[5, 1]],
[[3, 3],
[8, 2]]])
That is, iteratively get the intersection of keys, followed by extraction of the values into a list of lists, which can be made into an array.
inter_keys starts as a dict.keys object, but becomes a set; both work with &=.
I don't think there's a way around the double loop with dict indexing, n[k] as the core. Unless you can use the values or items lists, there isn't a way around accessing dict items one by one.
The sub_sets list of dict is an unnecessary intermediate step.
All the keys and values can be extracted into a list of lists, but that doesn't help with selecting a common subset:
In [406]: big_list = [list(d.items()) for d in d_list]
In [407]: big_list
Out[407]:
[[('a', [1, 2]), ('b', [2, 2])],
[('e', [3, 2]), ('b', [5, 1]), ('a', [5, 5])],
[('b', [8, 2]), ('a', [3, 3]), ('c', [1, 2])]]
Assuming that the lists of values in your dictionaries are of the same length, you can use this approach:
import numpy as np
d1 = {'a':[1,2], 'b':[2,2]}
d2 = {'e':[3,2], 'b':[5,1], 'a':[5,5]}
d3 = {'b':[8,2], 'a':[3,3], 'c':[1,2]}
d_list = [d1, d2, d3]
inter_map = {} if len(d_list) == 0 else d_list[0]
for d_it in d_list[1:]:
# combine element lists based on the current intersection. keys that do not match once are removed from inter_map
inter_map = {k: inter_map[k] + d_it[k] for k in d_it.keys() & inter_map.keys()}
# inter_map holds a key->value list mapping at this point
values = np.array([item for sublist in inter_map.values() for item in sublist]).reshape([len(inter_map.keys()),
2 * len(d_list)])
# real_values restructures the values into the order used in your program, assumes you always have 2 values per sublist
real_values = np.zeros(shape=[len(d_list), 2 * len(inter_map.keys())])
for i, k in enumerate(inter_map.keys()):
real_values[:, 2*i:2*(i+1)] = values[i].reshape([len(d_list), 2])
Please note that this code is not deterministic, since the order of keys in your map is not guaranteed to be the same for different runs of the program.

How would I achieve this dict comprehension of two keys with list values using zip?

I've been trying for the past while to convert a code snippet I have into code comprehension, not because it is faster or more beneficial, but because it would help me understand Python more.
I have a generator that I'm reading from that contains 2 values, 1 value for EN, and 1 value for AR (Languages). I want to map the values inside into a new dictionary with 2 keys EN and AR that contains the list of values per language. The languages are specified from a config file under the name SupportedLanguages: SupportedLanguages = ["en", "ar"].
I know that in Javascript something similar can be done using the map and reduce functions, but I'm unsure how to do that here in Python, and nothing of what I've tried seems to be doing what I want.
Here is the current code snippet that I have:
reader = ([i, str(i)] for i in range(10))
rows = [row for row in reader]
SupportedLanguages = ["en", "ar"]
dict_ = {
"en": []
"ar": []
}
for row in rows:
for i, _ in enumerate(row):
dict_[SupportedLanguages[i]].append(row[i])
reader is a dummy generator to replicate the behavior of my csv reader (since I have n rows, where n depends on the number of languages I have).
The working solution that I have come up with is this comprehension:
{SupportedLanguages[i]: [row[i] for row in rows] for i in range(len(row))}
However, this is looping over rows twice, and I don't want this behavior, I just need a single loop given that the generator will only loop once as well (csv.reader). I know that I can store all the values into a list before looping over them, but then again this creates n loops, and I only want 1.
Edit:
As I mentioned above it is creating 2 loops, but what I should've said is n loops, as in 1 loop per language.
I think I understand what you're trying to achieve. Let's say you have a reader that looks like this:
reader = ([en_value, ar_value] for en_value, ar_value in enumerate("abcdefg"))
for pair in reader: # Don't actually do this part - print just for demonstration
print(pair)
Output:
[0, 'a']
[1, 'b']
[2, 'c']
[3, 'd']
[4, 'e']
[5, 'f']
[6, 'g']
>>>
Then you can construct your dict_ in the following way:
supported_languages = ["en", "ar"]
dict_ = dict(zip(supported_languages, zip(*reader)))
print(dict_)
Output:
{'en': (0, 1, 2, 3, 4, 5, 6), 'ar': ('a', 'b', 'c', 'd', 'e', 'f', 'g')}
>>>
Based on the observation that...:
a, b = zip(*reader)
print(a)
print(b)
...yields:
(0, 1, 2, 3, 4, 5, 6)
('a', 'b', 'c', 'd', 'e', 'f', 'g')
>>>

Python create a list that contains tuples [duplicate]

For example, I have three lists (of the same length)
A = [1,2,3]
B = [a,b,c]
C = [x,y,z]
and i want to merge it into something like:
[[1,a,x],[2,b,y],[3,c,z]].
Here is what I have so far:
define merger(A,B,C):
answer =
for y in range (len(A)):
a = A[y]
b = B[y]
c = C[y]
temp = [a,b,c]
answer = answer.extend(temp)
return answer
Received error:
'NoneType' object has no attribute 'extend'
It looks like your code is meant to say answer = [], and leaving that out will cause problems. But the major problem you have is this:
answer = answer.extend(temp)
extend modifies answer and returns None. Leave this as just answer.extend(temp) and it will work. You likely also want to use the append method rather than extend - append puts one object (the list temp) at the end of answer, while extend appends each item of temp individually, ultimately giving the flattened version of what you're after: [1, 'a', 'x', 2, 'b', 'y', 3, 'c', 'z'].
But, rather than reinventing the wheel, this is exactly what the builtin zip is for:
>>> A = [1,2,3]
>>> B = ['a', 'b', 'c']
>>> C = ['x', 'y', 'z']
>>> list(zip(A, B, C))
[(1, 'a', 'x'), (2, 'b', 'y'), (3, 'c', 'z')]
Note that in Python 2, zip returns a list of tuples; in Python 3, it returns a lazy iterator (ie, it builds the tuples as they're requested, rather than precomputing them). If you want the Python 2 behaviour in Python 3, you pass it through list as I've done above. If you want the Python 3 behaviour in Python 2, use the function izip from itertools.
To get a list of lists, you can use the built-in function zip() and list comprehension to convert each element of the result of zip() from a tupleto a list:
A = [1, 2, 3]
B = [4, 5, 6]
C = [7, 8, 9]
X = [list(e) for e in zip(A, B, C,)]
print X
>>> [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
Assuming you are doing this for class and not learning all of the tricks that make Python a great tool here is what you need. You had two problems, first if you want to extend then you do it in place but your desired result shows that you want to append, not extend
def merger(A,B,C):
answer = []
for y in range (len(A)):
a=A[y]
b=B[y]
c=C[y]
temp = [a,b,c]
answer.append(temp)
return answer
>>> answer
[[1, 'a', 'x'], [2, 'b', 'y'], [3, 'c', 'z']]
I was just wondering the same thing. I'm a total noob using code academy. This is what i came up to combine two lists index at index
toppings = ['pepperoni', 'pineapple', 'cheese', 'sausage', 'olives', 'anchovies', 'mushrooms']
prices = [2,6,1,3,2,7,2]
num_pizzas = len(toppings)
print("We sell "+str(num_pizzas)+" different kinds of pizza!")
***pizzas = list(zip(toppings, prices))***
print (pizzas)
the list pizzas printed out ...[('pepperoni', 2), ('pineapple', 6), ('cheese', 1), ('sausage', 3), ('olives', 2), ('anchovies', 7), ('mushrooms', 2)]

Python: Insert a string after each element of an array with an exception

I have the following Python script. What I want to do is add several strings to random.shuffle(scenArray). Specifically, there will be a string after each element of the array, however, the 8th element in the array will need a different string.
E.g. if the array is
1,2,3,4,5,6,7,8,9 I want to make it 1,A,2,A,3,A,4,A,5,A,6,A,7,A,8,B,9,A
Any help greatly appreciated.
import random
# General comment: some of the script might be confusing because python
# uses zero-based numbering to index arrays
# read in the full list of scenario x conditions
f = open('scenarioList.xml', 'U')
data = f.read()
f.close()
inst = data.split("\n\n")
# This specifies which scenarios are in which counterbalancing group
cGroups = [[0,1,2,3,4],
[5,6,7,8,9],
[10,11,12,13,14]]
conds = [inst[0:15],inst[15:30],inst[30:45]] # the xml strings divided up by condition
# this is the counterbalancing scheme (latin square)
cScheme = [[1,2,3],
[1,3,2],
[2 ,1 , 3],
[2 , 3 , 1],
[3 , 1 , 2],
[3, 2 , 1]]
# change the second index in the range to loop up to 60; set to 12 now for testing
for subj in range(1,12):
cRow = cScheme[(subj-1)%6] # use the modulus operator to find out which row to use in counterbalancing table
scenArray = []
# loop across scenario groups and look up their assigned interruption condition for this subj
for group in range(0,3):
#conds[cScheme[group]][i]
scenArray.extend([ conds[cRow[group]-1][i] for i in cGroups[group]]) # use extend and not append here
# randomize order of arrays---this is something you might modify to control this a bit more
random.shuffle(scenArray)
f = open('scenarios' + str(subj) + 'xml', 'w')
f.write('\r\n\r\n'.join(scenArray))
f.close()
Ignoring all your code, but from your description and example:
E.g. if the array is 1,2,3,4,5,6,7,8,9
I want to make it 1,A,2,A,3,A,4,A,5,A,6,A,7,A,8,B,9,A
You could do something like:
lst1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
lst2 = sum(([x,'A'] if i != 7 else [x,'B'] for (i,x) in enumerate(lst1)), [])
print lst2 # [1, 'A', 2, 'A', 3, 'A', 4, 'A', 5, 'A', 6, 'A', 7, 'A', 8, 'B', 9, 'A']
EDIT
The one-liner that is assigned to lst2 can be equivalently re-written as:
lst3 = [] # Initialize an empty list
for (i,x) in enumerate(lst1):
if i != 7:
lst3 += [x,'A'] # This is just concatenates the list [x,'A'] with lst3
else:
lst3 += [x,'B']
print lst3 # [1, 'A', 2, 'A', 3, 'A', 4, 'A', 5, 'A', 6, 'A', 7, 'A', 8, 'B', 9, 'A']
Note that lst3 += [x, 'A'] could also be written as
lst3.append(x)
lst3.append('A')
Also, sum() is used with a generator expression and it's optional start argument.
Finally, enumerate returns a generator-like object that, when iterated over, produces a (index, value) tuple at each iteration -- see the docs I linked for a small example.

indexing and finding values in list of namedtuples

I have a namedtuple like the following,
tup = myTuple (
a=...,
b=...,
c=...,
)
where ... could be any value(string, number, date, time, etc). Now, i make a list of these namedtuples and want to find, lets say c=1 and the corresponding value of a and b. Is there any pythonic way of doing this?
Use List Comprehension, like a filter, like this
[[record.a, record.b] for record in records if record.c == 1]
For example,
>>> myTuple = namedtuple("Test", ['a', 'b', 'c', 'd'])
>>> records = [myTuple(3, 2, 1, 4), myTuple(5, 6, 7, 8)]
>>> [[record.a, record.b] for record in records if record.c == 1]
[[3, 2]]

Categories