Comparing values in a list to create a matching list - python

I have list in this style:
[ [x,y,z] , [x1,y1,z1] , [...]].
My problem is that I want to make a new list with somehow sorted values. In a first step I want to make a list where for all the same x and y I have the corresponding z values. Here is a example:
raw data:
[[1,2,5],[1,2,6],[1,2,7],[2,2,10],[2,2,11]]
processed data:
[[1,2 [5,6,7]],[2,2[10,11]]
In the final step I would like to have list like that:
[[x values], [y values], [z minimum values], [z length]]
[[1,2],[2,2],[5,10],[3,2]]
First I tried to make a list with all possible combinations of x and y (its not infinite in my data) but then I thought that just comparing the consecutive values would be easier, but I didnt figure it out.

If you are starting out, it would be useful to break the problem into smaller problems and tackle it bit by bit.
For the first step, the easiest approach might be to first collect the information using a dictionary.
You go through all triplets in the original list and create a dictionary where each key is a distinct (x, y) pair. Values would be z values from the list.
import collections
l = [[1,2,5],[1,2,6],[1,2,7],[2,2,10],[2,2,11]]
mapping = collections.defaultdict(list)
for x, y, z in l:
mapping[(x, y)].append(z)
# >> defaultdict(<type 'list'>, {(1, 2): [5, 6, 7], (2, 2): [10, 11]})
We are using a defaultdict, with a list constructor, so that we don't have to manually check if an items already exists in the list.
Now that we have a dictionary, it is easy to build the first list. We just have to go through all keys and values and create a proper list format.
intermediate_list = [[x, y, zs] for (x, y), zs in mapping.iteritems()]
# >> [[1, 2, [5, 6, 7]], [2, 2, [10, 11]]]
In the third step we can again utilize our dictionary. First entries in the list will be all keys from the dictionary and then we need to keep adding minimum and maximum values.
final_list = []
minimums = []
lengths = []
for (x, y), zs in mapping.iteritems():
final_list.append([x, y])
minimums.append(min(zs))
lengths.append(len(zs))
final_list.append(minimums)
final_list.append(lengths)
# >> [[1, 2], [2, 2], [5, 10], [3, 2]]

Related

Get one resulting list with the max or min by index in a nested list

Lets say I have this structure
[[[1,2],[3,4]],[[8,9],[7,7]]]
I want to iterate the list and have this result:
[[3,2],[8,7]]
This is would be reducing the list list of arrays in the first level [[1,2],[3,4]] to one single array where the maximum selected for the first element and the minimum is found for the second.
I have already done it manually, just iterating the groups, iterating again, storing the first value and seeing if the next is bigger or smaller, I store it in a list and create another list.
I would like to find a more elegant method with list comprehensions and so on, I'm pretty sure I can use zip here to group the values in the same group but I haven't been successful so far.
You can use zip, and by unpacking the result into individual values it is pretty easy to do what you are looking for, e.g.:
>>> x = [[[1,2],[3,4]],[[8,9],[7,7]]]
>>> [[max(a), min(b)] for k in x for a, b in [zip(*k)]]
[[3, 2], [8, 7]]
An alternative way without unpacking is to have a cycling function iterable (max, min, max, min, ...) and use nested list comprehensions, e.g.:
>>> import itertools as it
>>> maxmin = it.cycle([max, min])
>>> [[next(maxmin)(a) for a in zip(*k)] for k in x]
[[3, 2], [8, 7]]
Or index into a list of functions:
>>> import itertools as it
>>> maxmin = [max, min]
>>> [[maxmin[i](a) for i, a in enumerate(zip(*k))] for k in x]
[[3, 2], [8, 7]]
This will work without zip:
mylist = [[[1,2],[3,4]],[[8,9],[7,7]]]
[[max(y[0] for y in x), min(y[1] for y in x)] for x in mylist]
The main disadvantage of this is that it looks through each sub-list twice, once to find the maximum (of the first items) and once to find the minimum (of the second items).

Get n highest y values from list of coordinate pairs in Python 3

I'm aware of how to get the n highest values in a list but I want a quick function that gets the highest y values of a coordinate pair.
I know the following gives the n highest values in a list
[a[i] for i in np.argsort(a)[-n:]]
But what I really want is to feed it coordinates of the form
[[1,2], [3,4], [6,2], [6,11], [1,5]]
and get n coordinates with the highest y values from it, i.e. for n = 2
[[6,11], [1,5]]
But I can't quite get past the last step. Not having to use external libraries other than numpy would be a bonus. I tried changing the axis of argsort(list, axis=0) but that didn't work. Thanks
As an alternative, terser, answer you could use nlargest from heapq
nlargest(n, your2DList, key=lambda x: x[-1])
A sorted function takes named argument key which is used to compute a compare key for each value.
>>> sorted([[1,2], [3,4], [6,2], [6,11], [1,5]], key=lambda pair: pair[1])[-n:]
[[1, 5], [6, 11]]
just sort the list using 2nd coord as key, and print the 2 last items:
z = [[1,2], [3,4], [6,2], [6,11], [1,5]]
print(sorted(z,key=lambda x:x[1])[-2:])
result:
[[1, 5], [6, 11]]
Here's one NumPy based approach with np.argsort -
n = 2 # Number of entries to keep
arr = np.asarray(input_list) # Convert to array for further processing
out = arr[arr[:,1].argsort()[-n:]].tolist()
Another NumPy based one and should be faster with np.argpartition -
out = arr[(-arr[:,1]).argpartition(n,axis=0)[:n]].tolist()

Calling functions on lists

I have a spectra of wavelengths as a list and some number of other lists I use in a formula (using tmm.tmm_core). Is there something more efficient than iterating through the wavelength if I'm just basically doing the same thing for all wavelengths?
Example
def go(n, thk, theta):
#do stuff
return(something)
wv = [1, 2, 3, 4]
a_vec = [3, 7, 3, 9]
b_vec = [6, 5, 9, 3]
c_vec = [0, 1, 8, 9]
theta = 0
th = [10, 1, 10]
final = []
for i in range(len(wv)):
n = [a[i], b[i], c[i]]
answer = go(n, th, theta)
final.append(answer)
in reality there are maybe 5000-10000 rows. It just seems to lag a bit when I press go and I assume it's because of the iteration. Pretty new to optimizing so I haven't used any benchmarking tools or anything.
I think you're looking for the map function in Python!
>>> list1 = [1,2,3,4]
>>> list2 = [5,6,7,8]
>>> map(lambda x,y: x+y, list1, list2)
[6, 8, 10, 12]
it takes in a function (in the above case, an anonymous lambda function), one or more lists and returns another list. At each iteration within the function, both lists are iterated and the result is added to the new list. You don't need to limit yourself to the expressive power of a lambda statement; you can also use globally defined functions as in the case below:
>>> def go(a,b,c):
... return a+b+c
...
>>> map(go, list1,list2, range(9,13))
[15, 18, 21, 24]
You can put all of your lists within a custom list like C_list and use map to create a new list all_len contain the length of all lists then use a list comprehension to create the list final :
all_len=map(len,C_list)
final =[[go([a[i], b[i], c[i]], th, theta) for i in range(li)] for li in all_len]
Also if the length of a and b and c are equal you can use zip function to zip then and refuse of multiple indexing :
all_len=map(len,C_list)
z=zip(a,b,c)
final =[[go(z[i], th, theta) for i in range(li)] for li in all_len]
If you have to perform an operation on every item in the list, then you're gonna have to go through every item in the list. However, you could gain speed through the use of list comprehensions: List Comprehensions

How to pick the smallest value in a list when an iterative process is applied to the list

I need to find the smallest value in a series of lists. I understand the code for the smallest value in just one list:
>>> x = [1, 2, 3, 4, 5]
>>> print (min(x))
1
Simple enough. However, I would like to know if there is a way to write code that finds the smallest value for each list I have without stopping and adjusting the code (whether by an iterative process or some other means). Any help would be greatly appreciated. Thanks in advance!
First, make a list of lists out of your separate lists. For example, if you have lists A, B and C, the list of lists would be [A,B,C].
Now, to get a list of all the minimum values for each list in a list of lists lst:
[min(x) for x in lst]
To get the global minimum:
min(x for sublist in lst for x in sublist)
Demo:
>>> lst
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> [min(x) for x in lst]
[1, 4, 7]
>>> min(x for sublist in lst for x in sublist)
1
Edit in reaction to OP's comment:
I just want the minimum value for each list. I don't need to compile all of the minimum values together into a new list
If you just want to print the minimum values for each list A, B, C, ... , you can do:
for lst in (A,B,C): # put as many lists as you like into the parentheses
print(min(lst))
Edit in reaction to
I am only accessing one list (in this case, the values are well depths) at a time. [..] What I would like to know is how to write a code that finds the smallest value in a list given that the iterator essentially changes the values in said list.
Just print(min(lst)) each time lst has changed.
Assuming you've got a list of lists, this should do it:
minimums = [min(l) for l in lists]

How do I subtract one list from another?

I want to take the difference between lists x and y:
>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 3, 5, 7, 9]
>>> x - y
# should return [0, 2, 4, 6, 8]
Use a list comprehension to compute the difference while maintaining the original order from x:
[item for item in x if item not in y]
If you don't need list properties (e.g. ordering), use a set difference, as the other answers suggest:
list(set(x) - set(y))
To allow x - y infix syntax, override __sub__ on a class inheriting from list:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(args)
def __sub__(self, other):
return self.__class__(*[item for item in self if item not in other])
Usage:
x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y
Use set difference
>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]
Or you might just have x and y be sets so you don't have to do any conversions.
if duplicate and ordering items are problem :
[i for i in a if not i in b or b.remove(i)]
a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]
That is a "set subtraction" operation. Use the set data structure for that.
In Python 2.7:
x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y
Output:
>>> print x - y
set([0, 8, 2, 4, 6])
For many use cases, the answer you want is:
ys = set(y)
[item for item in x if item not in ys]
This is a hybrid between aaronasterling's answer and quantumSoup's answer.
aaronasterling's version does len(y) item comparisons for each element in x, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup for each element in x—but, because it converts both x and y into sets, it loses the order of your elements.
By converting only y into a set, and iterating x in order, you get the best of both worlds—linear time, and order preservation.*
However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?
If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:
ys = {tuple(item.items()) for item in y}
[item for item in x if tuple(item.items()) not in ys]
If your types are a bit more complicated (e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type), you can still use this solution. But some types just can't be converted into anything hashable.
If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution, but not as good as the O(N+M) time of the set solution) by sorting and using bisect:
ys = sorted(y)
def bisect_contains(seq, item):
index = bisect.bisect(seq, item)
return index < len(seq) and seq[index] == item
[item for item in x if bisect_contains(ys, item)]
If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.
* Note that you could also do this by using a pair of OrderedSet objects, for which you can find recipes and third-party modules. But I think this is simpler.
** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.
If the lists allow duplicate elements, you can use Counter from collections:
from collections import Counter
result = list((Counter(x)-Counter(y)).elements())
If you need to preserve the order of elements from x:
result = [ v for c in [Counter(y)] for v in x if not c[v] or c.subtract([v]) ]
The other solutions have one of a few problems:
They don't preserve order, or
They don't remove a precise count of elements, e.g. for x = [1, 2, 2, 2] and y = [2, 2] they convert y to a set, and either remove all matching elements (leaving [1] only) or remove one of each unique element (leaving [1, 2, 2]), when the proper behavior would be to remove 2 twice, leaving [1, 2], or
They do O(m * n) work, where an optimal solution can do O(m + n) work
Alain was on the right track with Counter to solve #2 and #3, but that solution will lose ordering. The solution that preserves order (removing the first n copies of each value for n repetitions in the list of values to remove) is:
from collections import Counter
x = [1,2,3,4,3,2,1]
y = [1,2,2]
remaining = Counter(y)
out = []
for val in x:
if remaining[val]:
remaining[val] -= 1
else:
out.append(val)
# out is now [3, 4, 3, 1], having removed the first 1 and both 2s.
Try it online!
To make it remove the last copies of each element, just change the for loop to for val in reversed(x): and add out.reverse() immediately after exiting the for loop.
Constructing the Counter is O(n) in terms of y's length, iterating x is O(n) in terms of x's length, and Counter membership testing and mutation are O(1), while list.append is amortized O(1) (a given append can be O(n), but for many appends, the overall big-O averages O(1) since fewer and fewer of them require a reallocation), so the overall work done is O(m + n).
You can also test for to determine if there were any elements in y that were not removed from x by testing:
remaining = +remaining # Removes all keys with zero counts from Counter
if remaining:
# remaining contained elements with non-zero counts
Looking up values in sets are faster than looking them up in lists:
[item for item in x if item not in set(y)]
I believe this will scale slightly better than:
[item for item in x if item not in y]
Both preserve the order of the lists.
We can use set methods as well to find the difference between two list
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
y = [1, 3, 5, 7, 9]
list(set(x).difference(y))
[0, 2, 4, 6, 8]
Try this.
def subtract_lists(a, b):
""" Subtracts two lists. Throws ValueError if b contains items not in a """
# Terminate if b is empty, otherwise remove b[0] from a and recurse
return a if len(b) == 0 else [a[:i] + subtract_lists(a[i+1:], b[1:])
for i in [a.index(b[0])]][0]
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> y = [1,3,5,7,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0]
>>> x = [1,2,3,4,5,6,7,8,9,0,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0, 9] #9 is only deleted once
>>>
The answer provided by #aaronasterling looks good, however, it is not compatible with the default interface of list: x = MyList(1, 2, 3, 4) vs x = MyList([1, 2, 3, 4]). Thus, the below code can be used as a more python-list friendly:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(*args)
def __sub__(self, other):
return self.__class__([item for item in self if item not in other])
Example:
x = MyList([1, 2, 3, 4])
y = MyList([2, 5, 2])
z = x - y
from collections import Counter
y = Counter(y)
x = Counter(x)
print(list(x-y))
Let:
>>> xs = [1, 2, 3, 4, 3, 2, 1]
>>> ys = [1, 3, 3]
Keep each unique item only once   xs - ys == {2, 4}
Take the set difference:
>>> set(xs) - set(ys)
{2, 4}
Remove all occurrences   xs - ys == [2, 4, 2]
>>> [x for x in xs if x not in ys]
[2, 4, 2]
If ys is large, convert only1 ys into a set for better performance:
>>> ys_set = set(ys)
>>> [x for x in xs if x not in ys_set]
[2, 4, 2]
Only remove same number of occurrences   xs - ys == [2, 4, 2, 1]
from collections import Counter, defaultdict
def diff(xs, ys):
counter = Counter(ys)
for x in xs:
if counter[x] > 0:
counter[x] -= 1
continue
yield x
>>> list(diff(xs, ys))
[2, 4, 2, 1]
1 Converting xs to set and taking the set difference is unnecessary (and slower, as well as order-destroying) since we only need to iterate once over xs.
This example subtracts two lists:
# List of pairs of points
list = []
list.append([(602, 336), (624, 365)])
list.append([(635, 336), (654, 365)])
list.append([(642, 342), (648, 358)])
list.append([(644, 344), (646, 356)])
list.append([(653, 337), (671, 365)])
list.append([(728, 13), (739, 32)])
list.append([(756, 59), (767, 79)])
itens_to_remove = []
itens_to_remove.append([(642, 342), (648, 358)])
itens_to_remove.append([(644, 344), (646, 356)])
print("Initial List Size: ", len(list))
for a in itens_to_remove:
for b in list:
if a == b :
list.remove(b)
print("Final List Size: ", len(list))
list1 = ['a', 'c', 'a', 'b', 'k']
list2 = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'd', 'e', 'f']
for e in list1:
try:
list2.remove(e)
except ValueError:
print(f'{e} not in list')
list2
# ['a', 'a', 'c', 'd', 'e', 'f']
This will change list2. if you want to protect list2 just copy it and use the copy of list2 in this code.
def listsubtraction(parent,child):
answer=[]
for element in parent:
if element not in child:
answer.append(element)
return answer
I think this should work. I am a beginner so pardon me for any mistakes

Categories