Python: vertical binning of two lists - python

I have two lists of the same size:
A = [1, 1, 2, 2, 3, 3, 4, 5]
B = [a, b, c, d, e, f, g, h] # numeric values
How do I do a vertical binning?
Output desired:
C = [ 1, 2, 3, 4, 5] # len = 5
D = [a + b, c + d, e + f, g, h] # len = 5
i.e. a mapping of A list to its cumulative sum (vertical binning?) where it occurs in list B.

I assume a, b, ... are numeric variables:
bins = dict()
for b, x in zip(A,B):
bins[b] = bins.setdefault(b, 0) + x
C = [key for key in bins]
D = [bins[key] for key in bins]
If a, b, ... are of another type, you would have to adjust the default value in bins.setdefault(b, ...).

This is a perfect case for the use of itertools.groupby:
from itertools import groupby
from operator import itemgetter
fst = itemgetter(0)
A = [1,1,2,2,3,3,4,5]
B = [1,3,4,6,7,7,8,8]
C = []
D = []
for k, v in groupby(zip(A, B), key=fst):
C.append(k)
D.append(sum(item[-1] for item in v))
C
>>[1, 2, 3, 4, 5]
D
>>[4, 10, 14, 8, 8]
If B is a list of strings then your summation operation becomes:
D.append(''.join(item[-1] for item in v))

You can use a dictionary and since Python 3.6 the order is preserved, therefore you get your C as the keys and D as values:
A = [1,1,2,2,3,3,4,5]
B = ["a","b","c","d","e","f","g","h"]
from random import randint
rename_to_B_for_numeric = [randint(0, 255) for _ in A]
result = {}
for idx, item in enumerate(A):
if item not in result:
# not sure about the type, so...
result[item] = "" if isinstance(B[idx], str) else 0
result[item] += B[idx]
print(result)
# {1: 'ab', 2: 'cd', 3: 'ef', 4: 'g', 5: 'h'}
print(list(result.keys()))
# [1, 2, 3, 4, 5]
print(list(result.values()))
# ['ab', 'cd', 'ef', 'g', 'h']
obviously if the type of item in B is not a string nor a number (int in this case) you'll need to modify the code a little bit to get some default type. Or just use else:
if item not in result:
result[item] = B[idx]
else:
result[item] += B[idx]

Here, C is the unique values of A:
C = sorted(set(A))
gives:
[1, 2, 3, 4, 5]
Now, D is the vertical binning of B w.r.t A (if B's elements are alpha):
D = [''.join(B[i] for i in range(len(B)) if A[i] == j) for j in C]
if B's elements are num:
D = [sum(B[i] for i in range(len(B)) if A[i] == j) for j in C]
gives:
['ab', 'cd', 'ef', 'g', 'h']
Note:
A = [1,1,2,2,3,3,4,5]
B = ['a','b','c','d','e','f','g','h']
Here a,b,c,... if numeric, go for the second eqn :)

Related

Compare different elements in two different lists

I need to compare if 2 different data are matching from different lists.
I have those 2 lists and I need to count the numbers of babies with :
first_name_baby = S AND age_baby = 1
age_baby = [ 2, 1, 3, 1, 4, 2, 4, 1, 1, 3, 4, 2, 2, 3]. first_name_baby= [ T, S, R, T, O, A, L, S, F, S, Z, U, S, P]
There is actually 2 times when first_name_baby = S AND age_baby = 1 but I need to write a Python program for that.
Use zip to combine corresponding list entries and then .count
>>> age_baby = [ 2, 1, 3, 1, 4, 2, 4, 1, 1, 3, 4, 2, 2, 3]
>>> first_name_baby = "T, S, R, T, O, A, L, S, F, S, Z, U, S, P".split(', ')
>>> list(zip(first_name_baby, age_baby)).count(('S', 1))
2
Alternatively, you could use numpy. This would allow a solution very similar to what you have tried:
>>> import numpy as np
>>>
>>> age_baby = np.array(age_baby)
>>> first_name_baby = np.array(first_name_baby)
>>>
>>> np.count_nonzero((first_name_baby == 'S') & (age_baby == 1))
2
you can just take the sum of 1 whenever the conditions match. iterate over the lists simultaneously using zip:
# need to make sense of the names
T, S, R, O, A, L, F, Z, U, S, P = 'T, S, R, O, A, L, F, Z, U, S, P'.split(', ')
age_baby = [2, 1, 3, 1, 4, 2, 4, 1, 1, 3, 4, 2, 2, 3]
first_name_baby = [T, S, R, T, O, A, L, S, F, S, Z, U, S, P]
sum(1 for age, name in zip(age_baby, first_name_baby)
if age == 1 and name == S)
thanks to Austin a more elegant version of this:
sum(age == 1 and name == S for age, name in zip(age_baby, first_name_baby))
this works because bools in python are subclasses of int and True is basically 1 (with overloaded __str__ and __repr__) and False is 0; therefore the booleans can just be summed and the result is the number of True comparisons.
Try this:
>>> count = 0
>>>
>>>
>>> for i in range(len(first_name_baby)):
... if first_name_baby[i] == 'S' and age_baby[i] == 1:
... count += 1
...
>>> count
2
x = len([item for idx, item in enumerate(age_baby) if item == 1 and first_name_baby[idx] == 'S'])
2
Expanded:
l = []
for idx, item in enumerate(age_baby):
if item == 1 and first_name_baby[idx] == 'S':
l.append(item)
x = len(l)

Python: Unpacking arrays of tuples or of arrays - unpacking more than 2 elements per array or tuple

My aim is to get a more elegant unpacking of a sub-tuple or sub-list for longer tuples or longer lists.
For example, I have an array with sub-arrays
s = [['yellow', 1,5,6], ['blue', 2,8,3], ['yellow', 3,4,7], ['blue',4,9,1], ['red', 1,8,2,11]]
Experimenting with an array and sub-tuple or sub-list with 2 elements,I have the following:
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
OR
s = [['yellow', 1], ['blue', 2], ['yellow', 3], ['blue', 4], ['red', 1]]
I can unpack 's' whether it has tuples or lists:
for k, v in s:
print('k = {0}, v = {1}'.format(k,v))
Produces the result
k = yellow, v = 1
k = blue, v = 2
k = yellow, v = 3
k = blue, v = 4
k = red, v = 1
Suppose I have the following array with sub-arrays of four elements each:
bongo =
[[1, 2, 3, 4], [6, 3, 2, 3], [5, 7, 11, 15], [2, 4, 7, 8]]
I can unpack 'bongo' using the variables a,b,c,d
for a,b,c,d in bongo:
print('a = {0}, b = {1}, c={2}, d={3}'.format(a,b,c,d))
a = 1, b = 2, c=3, d=4
a = 6, b = 3, c=2, d=3
a = 5, b = 7, c=11, d=15
a = 2, b = 4, c=7, d=8
Despite being able to unpack the mixed chr/number sub-array I seem to have a problem unpacking a mixed 'chr' and number sub-list (or sub-tuple (not shown, but get the same result)):
s = [['yellow', 1,5,6], ['blue', 2,8,3], ['yellow', 3,4,7], ['blue',
4,9,1], ['red', 1,8,2,11]]
That is, doing an unpacking I get the desired result with an error:
for a,b,c,d in s:
print('a = {0}, b = {1}, c = {2}, d = {3} '.format(a,b,c,d))
a = yellow, b = 1, c = 5, d = 6
a = blue, b = 2, c = 8, d = 3
a = yellow, b = 3, c = 4, d = 7
a = blue, b = 4, c = 9, d = 1
Traceback (most recent call last):
File "<pyshell#288>", line 1, in <module>
for a,b,c,d in s:
ValueError: too many values to unpack (expected 4)
My question: Is there a more elegant way of unpacking, such that I would like to get the first element, say as a key, and the rest?
To illustrate with pseudo-code - it does not work directly in python:
for k[0][0], v[0][1:4] in s:
print('k[0][0] = {0}, v[0][1:4] = {1}'.format(k[0][0],v[0][1:4]))
Such as to get the following output:
a = yellow, b = 1, c = 5, d = 6
a = blue, b = 2, c = 8, d = 3
a = yellow, b = 3, c = 4, d = 7
a = blue, b = 4, c = 9, d = 1
Inspiration:
Experimenting with the defaultdict at para 3.4.1 https://docs.python.org/3/library/collections.html#collections.defaultdict particularly the unpacking of an array with a sub-tuple.
Thank you,
Anthony of Sydney
You can covert to your desired format first:
>>> ss = {x[0]: x[1:] for x in s}
>>> ss
{'blue': [4, 9, 1], 'red': [1, 8, 2, 11], 'yellow': [3, 4, 7]}
>>> for s, v in ss.items():
... print "a = {0} b = {1} c = {2} d = {3}".format(s, *v)
...
a = blue b = 4 c = 9 d = 1
a = red b = 1 c = 8 d = 2
a = yellow b = 3 c = 4 d = 7
>>>
Further to the Mr Azim's answer, in the 5th line he used *v. This inspired me to apply this for further experimentation to an array/tuple/list instead of the dictionary.
This code produces the same result:
s = [('yellow', 1, 5, 6), ('blue', 2, 8, 3), ('green', 4, 9, 1), ('red', 1, 8, 2)]
for x, *y in s:
temparray = [b for b in y]; Note we don't use *y
print('x = {0}, temparray = {1}'.format(x, temparray))
as
for x, *y in s:
print('x = {0}, y = {1}'.format(x,y)); note we don't use *y
x = yellow, y = [1, 5, 6]
x = blue, y = [2, 8, 3]
x = green, y = [4, 9, 1]
x = red, y = [1, 8, 2]
type(y)
<class 'list'>
Conclusion: the * operator can be applied not only in dictionaries, but also in arrays/tuples/lists. When applied in a 'for' loop, as in
for var1 *var2 in aListorTupleorArray:
# var1 gets the first element of the list or tuple or array
# *var2 gets the remaining elements of the list or tuple or array
print('var1 = {0}, var2 = {1}'.format(var1,var2);#Note we don't use the * in *var2. just use var2
Thanks,
Anthony of exciting Sydney
Here is a subtle difference between printing *v and v.
#printing v in the loop
for s,v in ss.items():
print("s = {0}, v = {1}".format(s,v)); #printing s & v
s = yellow, v = [3,4,7]
s = blue, v = [4,9,1]
s = red, v = [1,8,2]
Then we have
#printing *v in the loop
for s,v in ss.items():
print("s = {0}, *v = {1}".format(s,*v)); #printing s & *v
s = yellow, v = 3 4 7
s = blue, v = 4 9 1
s = red, v = 1 8 2
Note the subtlety here: whether we use *v in the 'for' loop, print v or *v produces the same result:
#printing v in the loop
for s,*v in ss.items():
print("s = {0}, v = {1}".format(s,v)); #printing s & v
#printing v in the loop
for s,*v in ss.items():
print("s = {0}, v = {1}".format(s,*v)); #printing s & v
Produces the same result:
s = yellow, v = [[3,4,7]]
s = blue, v = [[4,9,1]]
s = red, v = [[1,8,2]]
Thank you,
Anthony of Sydney

Python 3 lists to dictionary

I have a three lists and I would like to build a dictionary using them:
a = [a,b,c]
b = [1,2,3]
c = [4,5,6]
i expect to have:
{'a':1,4, 'b':2,5, 'c':3,6}
All I can do now is:
{'a':1,'b':2, 'c':3}
What should i do?
You can try this:
a = ["a","b","c"]
b = [1,2,3]
c = [4,5,6]
new_dict = {i:[j, k] for i, j, k in zip(a, b, c)}
Output:
{'b': [2, 5], 'c': [3, 6], 'a': [1, 4]}
If you really want a sorted result, you can try this:
from collections import OrderedDict
d = OrderedDict()
for i, j, k in zip(a, b, c):
d[i] = [j, k]
Now, you have an OrderedDict object with the keys sorted alphabetically.
check this: how to zip two lists into new one
i suggest to first zip the b and c lists and then map them into a dictionary again using zip:
a = ['a','b','c']
b = [1,2,3]
c = [4,5,6]
vals = zip(b,c)
d = dict(zip(a,vals))
print(d)
a = ['a','b','c']
b = [1,2,3]
c = [4,5,6]
result = { k:v for k,*v in zip(a,b,c)}
RESULT
print(result)

How to specifically select item from a list with index information and assign them to library

I am working on a function to exclude all occurrences in a list and return a tuple/list with index information that will be assigned to a library. For example:
for a list input:
x = [0,0,1,2,3,0,0,]
output:
{"inds":[2,3,4],"vals":[1,2,3]}
My current solution is very ungly:
def function(x):
b = list()
c = list()
d = {'inds': [], 'vals': []}
a = list(enumerate(x))
for i in a:
if i[1]!=0:
b.append(i[1])
c.append(i[0])
d["inds"] = c
d["vals"] = b
return d
I am looking forward a concise solution.
You're basically there, you have the concept in mind. There's just a few ways to clean up your code.
There's no need to create lists b and c, when you can simply append the new data into the dictionary:
x = [0, 0, 1, 2, 3, 0, 0]
d = {'inds': [], 'vals': []}
for i, j in enumerate(x):
if j != 0:
d['inds'].append(i)
d['vals'].append(j)
print(d)
# Prints: {'vals': [1, 2, 3], 'inds': [2, 3, 4]}
There's also no need to call list() around enumerate(). I'm going to assume you use Python 3 here and that when you do enumerate(), you see something like:
<enumerate object at 0x102c579b0>
This is ok! This is because enumerate returns a special object of its own which is iterable just like a list, so you can simply loop through a. Also, since the list will have two values per item, you can do for i, j like I have.
idx, vals = zip(*[[n, v] for n, v in enumerate(x) if v])
d = {"inds": idx, "vals": vals}
>>> d
{'inds': [2, 3, 4], 'vals': [1, 2, 3]}
Your solution is ok, but you have some superfluous lines.
def function(x):
d = {'inds': [], 'vals': []}
for index, value in enumerate(x):
if value != 0:
d['inds'].append(index)
d['vals'].append(value)
return d
If performance is an issue for very long arrays you could also use numpy:
def function(x):
x_arr = np.array(x)
mask = x_arr != 0
indices = np.argwhere(mask)[:,0]
values = x_arr[mask]
return {'inds': list(indices), 'vals': list(values)}
You can do it also like this:
d = dict((i, v) for i, v in enumerate(x) if v)
d = {'inds': d.keys(), 'vals': d.values()}
EDIT:
If order matters, then like this (thanks to comments):
import collections
d = collections.OrderedDict((i, v) for i, v in enumerate(x) if v)
d = {'inds': d.keys(), 'vals': d.values()}
It can be done by this ugly functional one-liner:
{'inds': list(filter(lambda x: x>0, map(lambda (i,x): i if x>0 else 0 , enumerate(x)))), 'vals': list(filter(lambda x: x!=0, x))}
output:
{'inds': [2, 3, 4], 'vals': [1, 2, 3]}
this gives you inds:
list(filter(lambda x: x>0, map(lambda (i,x): i if x>0 else 0 , enumerate(x))))
this gives you values:
list(filter(lambda x: x!=0, x))

Getting items from a list of lists if the list contains any keywords?

In my Python code, there are two objects, (x, y).
x is a numpy array from a separate function containing an x, y, and z coordinate. Each x, y and z coordinate corresponds to an object in list 'y'.
and 'y' would be a list of letters between a - j in random order.
There are be multiple instances of each letter i.e.: a b b c d a a f b d e e f e c a so on. For every value of 'x', there is a corresponding 'y' letter. Each line is different.
I want to get the x that corresponds a list of chosen letters, say a, c, and f.
How can I do this? I've tried looking into slices and indices but I'm not sure where to begin.
Trying to grab an item from array x, that corresponds to the same line in list y, if that makes any sense?
You wanted the values corresponding to 'a', 'c', and 'f':
>>> x = [ 1, 2, 3, 4, 5, 6 ]
>>> y = 'cgadfh'
>>> d = dict(zip(y, x))
>>> d['a']
3
>>> [d[char] for char in 'acf']
[3, 1, 5]
'a' is the third character in y and 3 is the third number in x, so d['a'] returns 3.
Incidentally, this approach works the same whether y is a string or a list:
>>> x = [ 1, 2, 3, 4, 5, 6 ]
>>> y = ['c', 'g', 'a', 'd', 'f', 'h']
>>> d = dict(zip(y, x))
>>> [d[char] for char in 'acf']
[3, 1, 5]
You can use collections.defaultdict and enumerate function to achieve this
X = [ 1, 2, 3, 4, 5, 6]
Y = ["a", "f", "c", "a", "c", "f"]
from collections import defaultdict
result = defaultdict(list)
for idx, y in enumerate(Y):
result[y].append(X[idx])
print result
Output
defaultdict(<type 'list'>, {'a': [1, 4], 'c': [3, 5], 'f': [2, 6]})
If X is just an indeces started from 1 then you can do the following:
exp = ['a', 'c', 'f']
output = [idx+1 for idx, ch in enumerate(Y) if ch in exp]
Otherwise you can try zip or izip or izip_longest:
import string
import random
a = range(15)
b = random.sample(string.lowercase, 15)
exp = random.sample(b, 3)
output = [k for k, v in zip(a, b) if v in exp]

Categories