Python: List Comparison to find unique element count of lists - python

i am trying to implement a string comparison algorithm in Python for one of my projects. As i am new to python, i'm learning on the go. But i'm stuck at a step of the algorithm.
At the moment i have list of lists. It is sorted and groupby the length.
mylist = list(list(i[1]) for i in itertools.groupby(sorted(mylist, key=len), len))
>>> [
[['pat'],['cut'],['rat']],
[['sat','pat'],['cut','pat']],
[['rat','cut','pat'],['put','cut','bat'],['mat','gut','lit']]
[[...]]...
]
If we consider mylist[2] elements in a column, it looks like this
mylist[2]
>>> [['rat','cut','pat'],
['put','cut','bat'],
['mat','gut','lit']]
i want to compare each column and return the most frequently occurring element count. i.e at index zero, it is 3(all three are different). For index one it is 2 (since 'cut' appears twice), and in index two, it is 3 again. likewise i need to repeat the process to all the lists of mylist.
It feels im stuck here. Can somebody suggest me a suitable method, perhaps a List Comprehension?
Thank You.

You can use set to extract the unique elements, and zip(*list_of_list) as a trick to "transpose" a list of list. Try this:
lst = [
[['pat'],['cut'],['rat']],
[['sat','pat'],['cut','pat']],
[['rat','cut','pat'],['put','cut','bat'],['mat','gut','lit']]
]
print map(lambda ll: [len(set(l)) for l in zip(*ll)], lst)
Output:
[[3], [2, 1], [3, 2, 3]]
Edit: To get the minimum value of each list, a trivial addition to the above will do:
print map(lambda ll: min([len(set(l)) for l in zip(*ll)]), lst)
Output:
[3, 1, 2]

Related

Index error trying to deduplicate a list only using if and for

My goal is to write a program that removes duplicated elements from a list – E.g., [2,3,4,5,4,6,5] →[2,3,4,5,6] without the function set (only if and for)
At the end of my code I got stuck. I have tried to change everything in the if statement but it got me to nowhere, same error repeating itself here:
n=int(input('enter the number of elements in your list '))
mylist=[]
for i in range (n):
for j in range (n):
ele=input(' ')
if mylist[i]!=mylist[j]: ***here is the error exactly , I dont really understand what does the out-of-range above problem have to do with the if statement right here***
mylist.append(ele)
print(mylist)
However, I changed nearly everything and I still got the following error:
if mylist[i]!=mylist[j]:
IndexError: list index out of range
Why does this issue keep coming back? ps: I cant use the function set because I am required to use if and for only
You can do this all with a list comprehension that checks if the value has previously appeared in the list slice that you have already looped over.
mylist = [1,2,3,3,4,5,6,4,1,2,4,3]
outlist = [x for index, x in enumerate(mylist) if x not in mylist[:index]]
print(outlist)
Output
[1, 2, 3, 4, 5, 6]
In case you're not familiar with list comprehensions yet, the above comprehension is functionally equivalent to:
outlist = []
for index, x in enumerate(mylist):
if x not in mylist[:index]:
outlist.append(x)
print(outlist)
A set would be more efficient, but
>>> def dedupe(L):
... seen = []
... return [seen.append(e) or e for e in L if e not in seen]
...
>>> dedupe([2,3,4,5,4,6,5])
[2, 3, 4, 5, 6]
This is not just a toy solution. Because sets can only contain hashable types, you might really have reason to resort to this kind of thing. This approach entails a linear search of the seen items, which is fine for small cases, but is slow compared to hashing.
It still may be possible to do better than this in some cases. Even if you can't hash them, if you can at least sort the seen elements, then you can speed up the search to logarithmic time by using a binary search. (See bisect).

Check number not a sum of 2 ints on a list

Given a list of integers, I want to check a second list and remove from the first only those which can not be made from the sum of two numbers from the second. So given a = [3,19,20] and b = [1,2,17], I'd want [3,19].
Seems like a a cinch with two nested loops - except that I've gotten stuck with break and continue commands.
Here's what I have:
def myFunction(list_a, list_b):
for i in list_a:
for a in list_b:
for b in list_b:
if a + b == i:
break
else:
continue
break
else:
continue
list_a.remove(i)
return list_a
I know what I need to do, just the syntax seems unnecessarily confusing. Can someone show me an easier way? TIA!
You can do like this,
In [13]: from itertools import combinations
In [15]: [item for item in a if item in [sum(i) for i in combinations(b,2)]]
Out[15]: [3, 19]
combinations will give all possible combinations in b and get the list of sum. And just check the value is present in a
Edit
If you don't want to use the itertools wrote a function for it. Like this,
def comb(s):
for i, v1 in enumerate(s):
for j in range(i+1, len(s)):
yield [v1, s[j]]
result = [item for item in a if item in [sum(i) for i in comb(b)]]
Comments on code:
It's very dangerous to delete elements from a list while iterating over it. Perhaps you could append items you want to keep to a new list, and return that.
Your current algorithm is O(nm^2), where n is the size of list_a, and m is the size of list_b. This is pretty inefficient, but a good start to the problem.
Thee's also a lot of unnecessary continue and break statements, which can lead to complicated code that is hard to debug.
You also put everything into one function. If you split up each task into different functions, such as dedicating one function to finding pairs, and one for checking each item in list_a against list_b. This is a way of splitting problems into smaller problems, and using them to solve the bigger problem.
Overall I think your function is doing too much, and the logic could be condensed into much simpler code by breaking down the problem.
Another approach:
Since I found this task interesting, I decided to try it myself. My outlined approach is illustrated below.
1. You can first check if a list has a pair of a given sum in O(n) time using hashing:
def check_pairs(lst, sums):
lookup = set()
for x in lst:
current = sums - x
if current in lookup:
return True
lookup.add(x)
return False
2. Then you could use this function to check if any any pair in list_b is equal to the sum of numbers iterated in list_a:
def remove_first_sum(list_a, list_b):
new_list_a = []
for x in list_a:
check = check_pairs(list_b, x)
if check:
new_list_a.append(x)
return new_list_a
Which keeps numbers in list_a that contribute to a sum of two numbers in list_b.
3. The above can also be written with a list comprehension:
def remove_first_sum(list_a, list_b):
return [x for x in list_a if check_pairs(list_b, x)]
Both of which works as follows:
>>> remove_first_sum([3,19,20], [1,2,17])
[3, 19]
>>> remove_first_sum([3,19,20,18], [1,2,17])
[3, 19, 18]
>>> remove_first_sum([1,2,5,6],[2,3,4])
[5, 6]
Note: Overall the algorithm above is O(n) time complexity, which doesn't require anything too complicated. However, this also leads to O(n) extra auxiliary space, because a set is kept to record what items have been seen.
You can do it by first creating all possible sum combinations, then filtering out elements which don't belong to that combination list
Define the input lists
>>> a = [3,19,20]
>>> b = [1,2,17]
Next we will define all possible combinations of sum of two elements
>>> y = [i+j for k,j in enumerate(b) for i in b[k+1:]]
Next we will apply a function to every element of list a and check if it is present in above calculated list. map function can be use with an if/else clause. map will yield None in case of else clause is successful. To cater for this we can filter the list to remove None values
>>> list(filter(None, map(lambda x: x if x in y else None,a)))
The above operation will output:
>>> [3,19]
You can also write a one-line by combining all these lines into one, but I don't recommend this.
you can try something like that:
a = [3,19,20]
b= [1,2,17,5]
n_m_s=[]
data=[n_m_s.append(i+j) for i in b for j in b if i+j in a]
print(set(n_m_s))
print("after remove")
final_data=[]
for j,i in enumerate(a):
if i not in n_m_s:
final_data.append(i)
print(final_data)
output:
{19, 3}
after remove
[20]

How to optimize code for deleting n position in all the lists from a list of lists

I search the site for this issue and I found many posts about deleting specific values from list of lists.
However, this doesn't answer my question.
Lets have:
mylist=[[1,2,3,4],[100,374,283,738]]
Now, in my mind the two lists are linked. List 1 number items: 1, 2, 3, 4,... and list 2 a feature of these items (for example prices: $100, $374, etc).
Now, I want to delete from the list the elements (number and price) if list2 is hihger of a certain value (for example if an item is too expensive, more than $300)
I have been trying and I got this:
n=0 # counter for position in the list
for i in mylist[1]:
if i>300:
for j in mylist:
del j[n]
n=n+1
result:
[[1,3],[100,283]]
This actually works. It looks not too efficient: I have to access the list several times and I have to create new variables. Too many loops.
Since lists can use comprehension lists I was wonder if there is a more efficient and elegant method getting same result
Thanks
Use zip with a filtering generator expression:
>>> mylist = [[1,2,3,4], [100,374,283,738]]
>>> mylist[:] = list(map(list, zip(*((a,b) for a,b in zip(*mylist) if b<300))))
>>> mylist
[[1, 3], [100, 283]]
Note that this keeps the old mylist pointer, to mimic the way your code modifies the original list.
It seems like you're trying to have a mapping from the elements of mylist[0] to the elements of mylist[1]. If so, I would suggest using a dictionary. Moving your data into one, your script might look like this:
mydict = { 1: 100, 2: 374, 3: 283, 4: 738 }
mykeys = list(mydict.keys())
for key in mykeys():
if dict[key] > 300:
del dict[key]
That's a little verbose, and we have to make copy the keys into a list because we can't modify a dictionary while looping over its keys. However, there is a short one-liner alternative of the type you may be looking for.
Comprehensions can also be used with dictionaries. For this example, it would look like:
mydict = { k: mydict[k] for k in mydict.keys() if mydict[k] <= 300 }
Edit:
There were some syntax issues in my original answer, but the corrected snippets should work.
If numpy is available for you to use, you can try the following code:
>>> import numpy as np
>>> mylist=[[1,2,3,4],[100,374,283,738]]
>>> arr = np.array(mylist)
>>> price = np.array(mylist[1])
>>> np.delete(arr, np.where(price>=300), 1).tolist()
[[1, 3], [100, 283]]
Given that you're only trying to check price, let's go ahead and just make a simple loop:
mylist=[[1,2,3,4],[100,374,283,738]]
print [item for item in zip(*mylist) if item[1] <= 300]

Understanding python policy for finding the minimum in a list of list

I have the following list of lists of values and I want to find the min value among all the values.
Q = [[8.85008011807927, 4.129896248976861, 5.556804136197901],
[8.047707185696948, 7.140707521433818, 7.150610818529693],
[7.5326340018228555, 7.065307672838521, 6.862894377422498]]
I was planning to do something like:
min(min(Q))
I tried this approach on a smaller example and it works:
>>>b = [[2,2],[1,9]]
>>>min(b)
[1, 9]
>>>min(min(b))
1
But using this on my original list Q it returns the wrong result:
>>> min(Q)
[7.5326340018228555, 7.065307672838521, 6.862894377422498]
>>> min(min(Q))
6.862894377422498
Why is this approach wrong and why?
Lists are compared using their lexicographical order1 (i.e. first elements compared, then the second, then the third and so on), so just because list_a < list_b doesn't mean that the smallest element in list_a is less than the smallest element in list_b, which is why your approach doesn't work in the general case.
For example, consider this:
>>> l1 = [3, 0]
>>> l2 = [2, 1]
>>>
>>> min(l1, l2)
[2, 1]
The reason min(l1, l2) is [2, 1] is because the first element of l1 (3) is initially compared with that of l2 (2). Now, 2 < 3, so l2 is returned as the minimum without any further comparisons. However, it is l1 that really contains the smallest number out of both lists (0) which occurs after the initial element. Therefore, taking the min of min(l1, l2) gives us the incorrect result of 1.
A good way to address this would be to find the minimum of the "flattened" list, which can be obtained with a generator:
>>> Q = [[8.85008011807927, 4.129896248976861, 5.556804136197901],
... [8.047707185696948, 7.140707521433818, 7.150610818529693],
... [7.5326340018228555, 7.065307672838521, 6.862894377422498]]
>>>
>>> min(a for sub in Q for a in sub) # <--
4.129896248976861
(+1 to #Ffisegydd for posting a solution along these lines first.)
1 From http://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types:
Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively. If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one.
Your approach didn't work properly because, that is how Python sequence comparison is done
I want to find the min value among all the values.
If you want to find the minimum of all the values, you can do something like this
print min(map(min, Q))
# 4.12989624898
You can use a generator expression coupled with the min function to find the answer:
Q = [[8.85008011807927, 4.129896248976861, 5.556804136197901],
[8.047707185696948, 7.140707521433818, 7.150610818529693],
[7.5326340018228555, 7.065307672838521, 6.862894377422498]]
minimum = min(i for j in Q for i in j)
print(minimum) # 4.12989624898
This generator expression flattens your list of lists and then simply returns the minimum value.
min(map(min,Q)) ist the command you're looking for.
min(Q) returns the "minimum" list in Q, which is the list that has the smallest first element.
Therefore, min(min(Q)) returns the smallest element of the list with the smalles first element, which is not what you want.
You could use
min(min(x) for x in Q)
instead, which returns the smallest of the minimums of all lists in Q.
What you really want is to flatten that list and then find the minimum:
min(value for row in Q for value in row)
There are lots of answers, but the easiest way IMHO is to make the 'list of lists' into a single list using itertools.chain.from_iterable:
from itertools import chain
min(chain.from_iterable(Q))
or the shorter and just as easy to read (to me) version:
min(chain(*Q))
I think I found why,
min applied on a list of lists will compare the first values of each sublist.
>>> b=[[3,1],[2,5]]
>>> min(b)
[2, 5]
min(Q) does not always return the list that must contain Minimum of all values. That's why your approach is wrong.
You must find the min value of all list and make another list. then find min of that list, that's it.

Printing specific items out of a list

I'm wondering how to print specific items from a list e.g. given:
li = [1,2,3,4]
I want to print just the 3rd and 4th within a loop and I have been trying to use some kind of for-loop like the following:
for i in range (li(3,4)):
print (li[i])
However I'm Getting all kinds of error such as:
TypeError: list indices must be integers, not tuple.
TypeError: list object is not callable
I've been trying to change () for [] and been shuffling the words around to see if it would work but it hasn't so far.
Using slice notation you can get the sublist of items you want:
>>> li = [1,2,3,4]
>>> li[2:]
[3, 4]
Then just iterate over the sublist:
>>> for item in li[2:]:
... print item
...
3
4
You should do:
for i in [2, 3]:
print(li[i])
By range(n), you are getting [0, 1, 2, ..., n-1]
By range(m, n), you are getting [m, m+1, ..., n-1]
That is why you use range, getting a list of indices.
It is more recommended to use slicing like other fellows showed.
li(3,4) will try to call whatever li is with the arguments 3 and 4. As a list is not callable, this will fail. If you want to iterate over a certain list of indexes, you can just specify it like that:
for i in [2, 3]:
print(li[i])
Note that indexes start at zero, so if you want to get the 3 and 4 you will need to access list indexes 2 and 3.
You can also slice the list and iterate over the lists instead. By doing li[2:4] you get a list containing the third and fourth element (i.e. indexes i with 2 <= i < 4). And then you can use the for loop to iterate over those elements:
for x in li[2:4]:
print(x)
Note that iterating over a list will give you the elements directly but not the indexes.

Categories