Getting unique values in python using List Comprehension technique - python

I want to get the values that appear in one of the lists but not in the others. I even tried using '<>', it says invalid syntax. I am trying using list comprehensions.
com_list = []
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
come_list = [a for a in a1 for b in b1 if a != b ]
Output:
[1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5]
My expected output would be `[3, 5, 6]

What you want is called symmetric difference, you can do:
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
set(a1).symmetric_difference(b1)
# {3, 5, 6}
which you can also write as:
set(a1) ^ set(b1)
If you really want a list in the end, just convert it:
list(set(a1) ^ set(b1))
# [3, 5, 6]
a1 = [1,2,3,4,5]
b1 = [6,4,2,1]
If you really want to do that using list comprehensions, well, here it is, but it's really not the right thing to do here.
A totally inefficient version:
# Don't do that !
sym_diff = [x for x in a1+b1 if x in a1 and x not in b1 or x in b1 and x not in a1]
print(sym_diff)
# [3, 5, 6]
It would be a bit better using sets to test membership efficiently:
# Don't do that either
a1 = set([1,2,3,4,5])
b1 = set([6,4,2,1])
sym_diff = [x for x in a1|b1 if x in a1 and x not in b1 or x in b1 and x not in a1]
print(sym_diff)
# [3, 5, 6]
But if you start using sets, which is the right thing to do here, use them all the way properly and use symmetric_difference.

You can do
come_list =[i for i in list((set(a1) - set(b1))) + list((set(b1) - set(a1)))]
print(come_list)
Output
[3, 5, 6]
This new list contains all unique numbers for both of the lists together.
the problem with this line come_list = [a for a in a1 for b in b1 if a != b ] is that the items iterating over each item in the first list over all the items in the second list to check if it's inited but it's not giving unique numbers between both.

Related

Merging lists horizontally into a new list

Python newbie here struggling with numpy.
I have 18 lists
L1 = [1,2,3,4]
L2 = [5,6,7,8]
L3 = [5,7,8,5]
..........
......
L18 = [6,4,7,8]
I want to merge them into a new list (lets say L_ALL), so that in a single row I have all the lists..(1 row, 18 columns with lists...)
I have tried
L_ALL = [L1,L2,....L18]
but this merges them adding new rows, so I end up with a list with 18 rows.
Things like hstack, np.concatenate and sum do not help because they do something like:
L_ALL = [1,2,3,4,5,6,7,8,5,7....]
and I need the lists as separate lists in different columns (same row), not a single list (column) with all the elements.
Does this makes sense?
Thanks in advance
import pandas as pd
L1 = [1,2,3,4]
L2 = [5,6,7,8]
L3 = [5,7,8,5]
l_all = [[L1,L2,L3]]
df = pd.DataFrame(l_all)
this creates a single row with as many columns as you have the list
Now if you do
df.values[0]
you get
array([list([1, 2, 3, 4]), list([5, 6, 7, 8]), list([5, 7, 8, 5])],
dtype=object)
You can try in this way
L1 = [1,2,3,4]
L2 = [5,6,7,8]
L3 = [5,7,8,5]
..........
......
L18 = [6,4,7,8]
finalList=['']
finalList.append(L1)
finalList.append(L2)
.
.
.
finalList.append(L18)
print(" ".join(map(str, finalList)))
You will get output as follows
[1, 2, 3, 4] [5, 6, 7, 8] [5,7,8,5] ... [6,4,7,8]

Numpy array insert every second element from second array

I have two arrays of the same shape and now want to combine them by making every odd element and 0 one of the first array and every even one of the second array in the same order.
E.g.:
a = ([0,1,3,5])
b = ([2,4,6])
c = ([0,1,2,3,4,5,6])
I tried something including modulo to identify uneven indices:
a = ([0,1,3,5])
b = ([2,4,6])
c = a
i = 0
j = 2
l = 0
for i in range(1,22):
k = (i+j) % 2
if k > 0:
c = np.insert(c, i, b[l])
l+=1
else:
continue
I guess there is some easier/faster slicing option, but can't figure it out.
np.insert would work well:
>>> A = np.array([1, 3, 5, 7])
>>> B = np.array([2, 4, 6, 8])
>>> np.insert(B, np.arange(len(A)), A)
array([1, 2, 3, 4, 5, 6, 7, 8])
However, if you don't rely on sorted values, try this:
>>> A = np.array([5, 3, 1])
>>> B = np.array([1, 2, 3])
>>> C = [ ]
>>> for element in zip(A, B):
C.extend(element)
>>> C
[5, 1, 3, 2, 1, 3]
read the documentation of the range
for i in range(0,10,2):
print(i)
will print [0,2,4,6,8]
From what I understand, the first element in a is always first the rest are just intereleaved. If that is the case, then some clever use of stacking and reshaping is probably enough.
a = np.array([0,1,3,5])
b = np.array([2,4,6])
c = np.hstack([a[:1], np.vstack([a[1:], b]).T.reshape((-1, ))])
You could try something like this
import numpy as np
A = [0,1,3,5]
B = [2,4,6]
lst = np.zeros(len(A)+len(B))
lst[0]=A[0]
lst[1::2] = A[1:]
lst[2::2] = B
Even though I don't understand why you would make it so complicated

Giving indices to list entries

I have a Python list looking like this:
A1 = ['a','a','a','foo','c','d','a','e','bar','bar','bar','e','d','d']
I want to transform it into this...
A2 = [1,1,1,2,3,4,1,5,6,6,6,5,4,4]
...where entries in A1 are taken in order and given an incremental index in A2.
Is there a straight forward way to do this in Python?
index_map = {}
result = []
i = 0 # or 1 or whatever
for value in A1:
if value not in index_map:
index_map[value] = i
i = i + 1
result.append(index_map[value])
One of the ways of doing it can be.
>>> A1 = ['a','a','a','foo','c','d','a','e','bar','bar','bar','e','d','d']
>>> ref = []
>>> for i in A1:
... if i not in ref:
... ref.append(i)
...
>>> [ref.index(i)+1 for i in A1]
[1, 1, 1, 2, 3, 4, 1, 5, 6, 6, 6, 5, 4, 4]
Logic
We remove the duplicate values in the original list (whilst preserving order). Then we find the index of the individual items in the list with respect to the original list.
Advantages
Simple concepts/ Beginner level
Very much straight forward.
Disadvantages
Slow as it is of the order O(n2)
Use collections.defaultdict and itertools.count to create a dictionary that produces unique ids on demand for each new key:
>>> unique_ids = collections.defaultdict(itertools.count(start=1).next)
>>> [unique_ids[item] for item in A1]
[1, 1, 1, 2, 3, 4, 1, 5, 6, 6, 6, 5, 4, 4]
Though similar to Bhargav Rao's answer, this may be faster for longer arrays (especially with high numbers of unique elements) given its use of hashing.
A1 = ['a','a','a','foo','c','d','a','e','bar','bar','bar','e','d','d']
uniqueEntries = 0
ref = {}
A2 = []
for x in A1:
if x not in ref:
uniqueEntries += 1
ref[x] = uniqueEntries
A2.append(ref[x])
This could hurt your eyes but it works, just use sets:
[list(set(your_list)).index(x) for x in your_list]

Multidimensional list match in python

This has caused some serious headache today.
Suppose I have two instances of my object, instance A and instance B. These come with properties is the form of a list. Say the two properties for A are
a1 = [1, 2, 3, 4, 5]
a2 = [10, 20, 30, 40, 50]
and those for B:
b1 = [5, 7, 3, 1]
b2 = [50, 20, 30, 20]
What I want is to simply find the indices in b1 and b2, where a pair equals the values in a1 and a2. So in this example this would be the indices 0 and 2 since for those we have
b1[0] = 5 and b2[0] = 50
which we find in a1 and a2 as the last entries. Same for index 2 for which we find (3, 30) in (b1, b2) which is also in (a1, a2).
Note here, that the lists a1 and a2 have always the same length as well as b1 and b2.
Any help? 😊
You can use a combination of zip, set and enumerate:
>>> a1 = [1, 2, 3, 4, 5]
>>> a2 = [10, 20, 30, 40, 50]
>>> b1 = [5, 7, 3, 1]
>>> b2 = [50, 20, 30, 20]
>>> a12 = set(zip(a1, a2))
>>> [i for i, e in enumerate(zip(b1, b2)) if e in a12]
[0, 2]
With zip, you group the pairs together, and with set you turn them into a set, as order does not matter and set have faster lookup. Then, enumerate gives you pairs of indices and elements, and using the list-comprehension you get those indices from b12 whose elements are in a12.
I think another structure would be better?
a tuple, or a key set ...
a = [(1,10),(2,20)] and so on
edit
well... tobias_k shows you how :)
Try this
In [38]: [b1.index(i[0]) for i in zip(a1,a2) for j in zip(b1,b2) if i==j]
Out[38]: [2, 0]
There is also the possibility to check for each element in (a1, a2) whether it is in (b1, b2) and it will return all matches in a list and will take care of duplicates:
a1 = [1, 2, 3, 4, 5]
a2 = [10, 20, 30, 40, 50]
b1 = [5, 7, 3, 1, 5]
b2 = [50, 20, 30, 20, 50]
# Construct list of tuples for easier matching
pair_a = [(i, k) for i, k in zip(a1, a2)]
pair_b = [(i, k) for i, k in zip(b1, b2)]
# Get matching indices (for each entry in pair_a get the indices in pair_b)
indices = [[i for i, j in enumerate(pair_b) if j == k] for k in pair_a]
gives
[[], [], [2], [], [0, 4]]

Python equivalent of R "split"-function

In R, you could split a vector according to the factors of another vector:
> a <- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> b <- rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> split(a,b)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).
Is there anything handy in python like that, except from the itertools.groupby approach?
From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)
>>> a = range(1, 11)
>>> b = [0,1] * 5
>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])
Then you can use itertools.compress:
def split(x, f):
return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
If you need more general input (multiple numbers), something like the following will return an n-tuple:
def split(x, f):
count = max(f) + 1
return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )
>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])
Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.
Here's one way with itertools.
import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]
{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}
This gives you a dictionary, which is analogous to the named list that you get from R's split.
As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
from collections import defaultdict
def split(x, f):
res = defaultdict(list)
for v, k in zip(x, f):
res[k].append(v)
return res
>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})
You could try:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]
results in:
In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]
In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]
To make this generalise you can simply iterate over the unique elements in b:
splits = {}
for index in set(b):
splits[index] = [a[k] for k in (i for i,j in enumerate(b) if j == index)]

Categories