How to match a nested list with list - python

The problem is I'm trying to compare nested list and list without the same value or element ?
lst3 = [1, 6, 7, 10, 13, 28]
lst4 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
lst5 = [list(filter(lambda x: x not in lst3, sublist)) for sublist in lst4]
which returns:
[[17, 18, 21, 32], [11, 14], [5, 8, 15, 16]]
but I would like to get the number that don't match from l3. Here an example:
[[1,6,7,10,28],[1,6,10],[1,7,13,28]]
I would like the results to be:
[[1,6,7,10,28],[1,6,10],[1,7,13,28]]

In your example you are comparing each element in each sublist with lst3.
lst5 = [list(filter(lambda x: x not in lst3, sublist)) for sublist in lst4]
Problem is that you are asking whether each x from sublist is not in lst3 which is going to give you the remaining results from the sublist. You may want to do it the other way around.
lst5 = [list(filter(lambda x: x not in sublist, lst3)) for sublist in lst4]
Not only does it give you the answers you want but I even noticed you made a mistake in your expected results:
[[1, 6, 7, 10, 28], [1, 6, 10], [7, 10, 13, 28]]
Compared to yours:
[[1, 6, 7, 10, 28], [1, 6, 10], [1, 7, 13, 28]]
(See the last nested array)
Online example:
https://onlinegdb.com/Hy8K8GPSB

Rather than using things like filter and lambda, you could more readably just use a list comprehension:
lst5 = [[x for x in lst3 if not x in sublist] for sublist in lst4]
Which is
[[1, 6, 7, 10, 28], [1, 6, 10], [7, 10, 13, 28]]
This differs slightly from what you gave as your expected output, but I think that you made a typographical error in the third sublist of that expected output.

I would take John Coleman's answer but tweak the word order for readability.
lst5 = [[x for x in lst3 if x not in sublist] for sublist in lst4]

I have two list that are two dimensional list with at least 100 rows. I would like to match c1 to c2 or vice versa. But the real problem is instead of typing in row by row from c1 to match c2. Is there a faster way to loop through all the rows from c1 to match all the rows from c2 ?
I tried c1[0] and c1[1] and c1[2]. This method will work but i would have to do alot of typing row by row. This will be to much typing especially if its alot of rows?
Here i have two list that are two dimensional list.
c1 = [[2, 6, 7],[2,4,6],[3,6,8]].....
c2 = [[13, 17, 18], [7, 11, 13], [5, 6, 8]].......
[list(filter(lambda x: x in c3, sublist)) for sublist in c2].

Related

How to merge multiple lists by their indices

I want to combine a nested list (lists in it have different lengths) into one. I would like to add by index first. Then sort by size.
Example:
lsts = [
[7, 23, 5, 2],
[3, 8, 1],
[99, 23, 9, 23, 74]
]
merged = [3, 7, 99, 8, 23, 23, 1, 5, 9, 2, 23, 74]
I would like to solve this without importing.
Assuming your list-of-lists cannot contain Nones, you can do this with itertools.zip_longest:
from itertools import zip_longest
result = []
for row in zip_longest(*lsts):
row = (x for x in row if x is not None)
for x in sorted(row):
result.append(x)
print(result)
Here is a one-liner:
import functools
import itertools
functools.reduce(lambda x,y: x+y, [sorted(x for x in p if x is not None) for p in itertools.zip_longest(*lsts)])
Output:
[3, 7, 99, 8, 23, 23, 1, 5, 9, 2, 23, 74]
I'll explain the solution step-by-step with each building on the result of the previous step.
To group items from each list by their indexes, itertools.zip_longest() is the tool for that:
>>> import itertools as it
>>> MISSING = object() # a sentinel
>>> lsts = [
[7, 23, 5, 2],
[3, 8, 1],
[99, 23, 9, 23, 74]
]
>>> it.zip_longest(*lsts, fillvalue=MISSING)
>>> list(_)
[(7, 3, 99), (23, 8, 23), (5, 1, 9), (2, <object object at 0x7f529e9b4260>, 23), (<object object at 0x7f529e9b4260>, <object object at 0x7f529e9b4260>, 74)]
This groups list elements into n-tuples using the MISSING fill value where needed, because lists might not be of equal length.
The next step is to iterate over each n-tuple and sort it internally (while skipping the MISSING values). The built-in function sorted() comes handy here:
>>> list(
sorted(x for x in ntuple if x is not MISSING)
for ntuple in it.zip_longest(*lsts, fillvalue=MISSING)
)
[[3, 7, 99], [8, 23, 23], [1, 5, 9], [2, 23], [74]]
The final step is to flatten this sequence of lists, and we'll use itertools.chain,from_iterable():
>>> list(it.chain.from_iterable(
sorted(x for x in ntuple if x is not MISSING)
for ntuple in it.zip_longest(*lsts, fillvalue=MISSING)
))
[3, 7, 99, 8, 23, 23, 1, 5, 9, 2, 23, 74]
The good thing about chain.from_iterable() is that it doesn't repeatedly concatenate smaller lists into the longer and longer final list, making it efficient. It also does this at the C level, AFAIK.
It's worth noting that None can also be used instead of the MISSING sentinel, but I used MISSING to also demonstrate how fillvalue works (e.g. you might want to use a zero instead or something else, if you wish).
zip_longest makes the work. The rest is cleaning/formatting
In [1]: from itertools import zip_longest, chain
In [2]: lsts = [
...: [7, 23, 5, 2],
...: [3, 8, 1],
...: [99, 23, 9, 23, 74]
...: ]
In [3]: [v for v in chain.from_iterable(zip_longest(*lsts)) if v !=None]
Out[3]: [7, 3, 99, 23, 8, 23, 5, 1, 9, 2, 23, 74]

Getting a list of lists for all elements in list of lists greater than a certain threshold

I have a list of lists
l = [[1,2,3],[4,5,6],[7,8,9,10],[11,12,13,14,15]]
Now I want to make a new list of lists containing elements greater than 5. It should look like
k=[[],[6],[7,8,9,10],[11,12,13,14,15]]
I found a similar question here but it returns a list like j=[6, 7, 8, 9, 10, 11, 12, 13, 14, 15]. How to get a list that looks like k from l?
You can definately use List Comp here.
k = [[i for i in subl if i > 5] for subl in l]
You can use filter function along with list comprehension.
>>> l = [[1,2,3],[4,5,6],[7,8,9,10],[11,12,13,14,15]]
>>> [list(filter(lambda x: x > 5, i)) for i in l]
[[], [6], [7, 8, 9, 10], [11, 12, 13, 14, 15]]

Loop selecting one item per list in nested list

list = [[1,2,3,4],[5,6,7,8],[9,10,11,12]]
I would like to have a loop that randomly selects only ONE item from the indexes of the list for all 3 of them. So the loop would start and pick 3, then picks 7 and then picks 9, for example. And then the loop stops, doesn't continue on picking items again. I only want 3 repetitions
I have managed to do this
(with:
for i in list:
item = list[0].pop((random.choice(list)[0])))
but it doesn't do it only once, but it goes through all of the items (choosing the first one) of the first index, then moves to the second one and so on.
Any help is appreciated!
You seem to be indexing the list on 0 in each iteration, which will only give you random values from the first inner list. Use random.choice iterating over the list, or use map:
list(map(random.choice, my_list))
# [3, 8, 11]
Equivalently:
[random.choice(i) for i in my_list]
Based on the comments, if you want to remove the item you've randomly selected from the list, use instead:
[i.pop(random.randint(0,len(i))) for i in my_list]
# [4, 6, 9]
print(my_list)
# [[1, 2, 3], [5, 7, 8], [10, 11, 12]]
This is my code:
print("before random sample",pos)
pos = random.sample(pos, len(pos))
print("after random sample", pos)
test = [i.pop(random.randint(0,len(i))) for i in pos]```
This is the output:
#before random sample [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]]
#after random sample [[6, 7, 8, 9, 10], [1, 2, 3, 4, 5], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]]
#.......
#test = [i.pop(random.randint(0,len(i))) for i in pos]
#IndexError: pop index out of range

Python - Comparing each item of a list to every other item in that list

I need to compare every item in a very long list (12471 items) to every other item in the same list. Below is my list:
[array([3, 4, 5])
array([ 6, 8, 10])
array([ 9, 12, 15])
array([12, 16, 20])
array([15, 20, 25])
...] #12471 items long
I need to compare the second item of each array to the first item of every other array to see if they're equal. And preferably, in a very efficient way. Is there a simple and efficient way to do this in Python 2.x?
I worked up a very crude method here, but it is terribly slow:
ls=len(myList) #12471
l=ls
k=0
for i in myList:
k+=1
while l>=0:
l-=1
if i[1]==myList[l][0]:
#Do stuff
l=ls
While this is still theoretically N^2 time (worst case), it should make things a bit better:
import collections
inval = [[3, 4, 5],
[ 6, 8, 10],
[ 9, 12, 15],
[ 12, 14, 15],
[12, 16, 20],
[ 6, 6, 10],
[ 8, 8, 10],
[15, 20, 25]]
by_first = collections.defaultdict(list)
by_second = collections.defaultdict(list)
for item in inval:
by_first[item[0]].append(item)
by_second[item[1]].append(item)
for k, vals in by_first.items():
if k in by_second:
print "by first:", vals, "by second:", by_second[k]
Output of my simple, short case:
by first: [[6, 8, 10], [6, 6, 10]] by second: [[6, 6, 10]]
by first: [[8, 8, 10]] by second: [[6, 8, 10], [8, 8, 10]]
by first: [[12, 14, 15], [12, 16, 20]] by second: [[9, 12, 15]]
Though this DOES NOT handle duplicates.
We can do this in O(N) with an assumption that python dict takes O(1) time for insert and lookup.
In the first scan, we create a map storing first number and row index by scanning the full list
In the second scan, we find if map from first scan contains second element of each row. If map contains then value of map gives us the list of row indices that match the required criterion.
myList = [[3, 4, 5], [ 6, 8, 10], [ 9, 12, 15], [12, 16, 20], [15, 20, 25]]
first_column = dict()
for idx, list in enumerate(myList):
if list[0] in first_column:
first_column[list[0]].append(idx)
else:
first_column[list[0]] = [idx]
for idx, list in enumerate(myList):
if list[1] in first_column:
print ('rows matching for element {} from row {} are {}'.format(list[1], idx, first_column[list[1]]))

Fully enumerate range from list of breakpoints

This is a bit of a python 101 question, but I can't think of a pythonic way to enumerate a list of breakpoints to all integers between those breakpoints.
Say I have:
breaks = [4, 7, 13, 15, 18]
and I want
enumerated = [[4,5,6],[7,8,9,10,11,12],[13,14],[15,16,17],[18]]
(My actual use case involves breakpoints that are years; I want all years in each range).
I could loop through breaks with a counter, create a range for each interval and store it in a list, but I suspect there is a simple one-liner for this sort of enumeration. Efficiency is a concern since I am working with millions of records.
You can use zip
>>> enumerated = [range(start, end) for start,end in zip(breaks, breaks[1:])] + [[breaks[-1]]]
>>> enumerated
[[4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14], [15, 16, 17], [18]]
Using a zipping of a list with itself in an offset of one (zip(breaks, breaks[1:]) is a known "trick" to get all pairs. This drops the last one so I added it manually.
you can create a generator, it will be memory efficient:
def f(b):
if not b:
raise StopIteration
x = b[0]
for y in b[1:]:
yield xrange(x, y)
x = y
yield [y]
print list(f(breaks))
You can use a simple list comprehension and range()
breaks = [4, 7, 13, 15, 18]
new = [range(breaks[i],breaks[i+1]) for i in xrange(len(breaks)-1)]+[[breaks[-1]]]
print new
[[4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14], [15, 16, 17]]

Categories