Related
I want to get an intersection of lists where duplication is not eliminated.
And I hope that the method is a fast way not to use loops.
Below was my attempt, but this method failed because duplicates were removed.
a = ['a','b','c','f']
b = ['a','b','b','o','k']
tmp = list(set(a) & set(b))
>>>tmp
>>>['b','a']
I want the result to be ['a', 'b', 'b'].
In this method, 'a' is a fixed value and 'b' is a variable value.
And the concept of extracting 'a' value from 'b'.
Is there a way to extract a list of cross-values that do not remove duplicate values?
A solution could be
good = set(a)
result = [x for x in b if x in good]
there are two loops here; one is the set-building loop of set (that is implemented in C, a hundred of times faster than whatever you can do in Python) the other is the comprehension and runs in the interpreter.
The first loop is done to avoid a linear search in a for each element of b (if a becomes big this can be a serious problem).
Note that using filter instead is probably not going to gain much (if anything) because despite the filter loop being in C, for each element it will have to get back to the interpreter to call the filtering function.
Note that if you care about speed then probably Python is not a good choice... for example may be PyPy would be better here and in this case just writing an optimal algorithm explicitly should be ok (avoiding re-searching a for duplicates when they are consecutive in b like happens in your example)
good = set(a)
res = []
i = 0
while i < len(b):
x = b[i]
if x in good:
while i < len(b) and b[i] == x: # is?
res.append(x)
i += 1
else:
i += 1
Of course in performance optimization the only real way is try and measure with real data on the real system... guessing works less and less as technology advances and becomes more complicated.
If you insist on not using for explicitly then this will work:
>>> list(filter(a.__contains__, b))
['a', 'b', 'b']
But directly calling magic methods like __contains__ is not a recommended practice to the best of my knowledge, so consider this instead:
>>> list(filter(lambda x: x in a, b))
['a', 'b', 'b']
And if you want to improve the lookup in a from O(n) to O(1) then create a set of it first:
>>> a_set = set(a)
>>> list(filter(lambda x: x in a_set, b))
['a', 'b', 'b']
>>a = ['a','b','c','f']
>>b = ['a','b','b','o','k']
>>items = set(a)
>>found = [i for i in b if i in items]
>>items
{'f', 'a', 'c', 'b'}
>>found
['a', 'b', 'b']
This should do your work.
I guess it's not faster than a loop and finally you probably still need a loop to extract the result. Anyway...
from collections import Counter
a = ['a','a','b','c','f']
b = ['a','b','b','o','k']
count_b = Counter(b)
count_ab = Counter(set(b)-set(a))
count_b - count_ab
#=> Counter({'a': 1, 'b': 2})
I mean, if res holds the result, you need to:
[ val for sublist in [ [s] * n for s, n in res.items() ] for val in sublist ]
#=> ['a', 'b', 'b']
It isn't clear how duplicates are handled when performing an intersection of lists which contain duplicate elements, as you have given only one test case and its expected result, and you did not explain duplicate handling.
According to how keeping duplicates work currently, the common elements are 'a' and 'b', and the intersection list lists 'a' with multiplicity 1 and 'b' with multiplicity 2. Note 'a' occurs once on both lists a and b, but 'b' occurs twice on b. The intersection list lists the common element with multiplicity equal to the list having that element at the maximum multiplicity.
The answer is yes. However, a loop may implicitly be called - though you want your code to not explicitly use any loop statements. This algorithm, however, will always be iterative.
Step 1: Create the intersection set, Intersect that does not contain duplicates (You already done that). Convert to list to keep indexing.
Step 2: Create a second array, IntersectD. Create a new variable Freq which counts the maximum number of occurrences for that common element, using count. Use Intersect and Freq to append the element Intersect[k] a number of times depending on its corresponding Freq[k].
An example code with 3 lists would be
a = ['a','b','c','1','1','1','1','2','3','o']
b = ['a','b','b','o','1','o','1']
c = ['a','a','a','b','1','2']
intersect = list(set(a) & set(b) & set(c)) # 3-set case
intersectD = []
for k in range(len(intersect)):
cmn = intersect[k]
freq = max(a.count(cmn), b.count(cmn), c.count(cmn)) # 3-set case
for i in range(freq): # Can be done with itertools
intersectD.append(cmn)
>>> intersectD
>>> ['b', 'b', 'a', 'a', 'a', '1', '1', '1', '1']
For cases involving more than two lists, freq for this common element can be computed using a more complex set intersection and max expression. If using a list of lists, freq can be computed using an inner loop. You can also replace the inner i-loop with an itertools expression from How can I count the occurrences of a list item?.
I have a list of lists as follows:
list=[]
*some code to append elements to list*
list=[['a','bob'],['a','bob'],['a','john']]
I want to go through this list and change all instances of 'bob to 'b' and leave others unchanged.
for x in list:
for a in x:
if "bob" in a:
a.replace("bob", 'b')
After printing out x it is still the same as list, but not as follows:
list=[['a','b'],['a','b'],['a','john']]
Why is the change not being reflected in list?
Because str.replace doesn't work in-place, it returns a copy. As immutable objects, you need to assign the strings to elements in your list of lists.
You can assign directly to your list of lists if you extract indexing integers via enumerate:
L = [['a','bob'],['a','bob'],['a','john']]
for i, x in enumerate(L):
for j, a in enumerate(x):
if 'bob' in a:
L[i][j] = a.replace('bob', 'b')
Result:
[['a', 'b'], ['a', 'b'], ['a', 'john']]
More Pythonic would be to use a list comprehension to create a new list. For example, if only the second of two values contains names which need checking:
L = [[i, j if j != 'bob' else 'b'] for i, j in L]
You can try using a dictionary object of python
import numpy as np
L = [['a','bob'],['a','bob'],['a','john']]
dic = {'bob':'b'} # you can specify more changes here
new_list = [dic.get(n, n) for n in np.concatenate(L)]
print(np.reshape(new_list,[-1,2]).tolist())
Result is
[['a', 'b'], ['a', 'b'], ['a', 'john']]
I'm going to use a simple example, but basically x is another variable and isn't linked to the list element. You have to change the list element directly in order to alter the list.
l=[1,2,3,4]
for x in l:
x=x+1
This doesn't change the list
l=[1,2,3,4]
for i,x in enumerate(l):
l[i]=x+1
this changes the list
I might be a little to the party, but a more Pythonic way of doing this is using a map and a list comprehension. It can operate on a list of the list with any number of values.
l = [['a','bob'],['a','bob'],['a','john']]
[list(map(lambda x: x if x != 'bob' else 'b', i)) for i in l]
it gives you the desired output
[['a', 'b'], ['a', 'b'], ['a', 'john']]
The main idea is that the inner loop is iterating through the inner loop and using the simple lambda function to perform the replacement.
I hope that this helps anyone else who is looking out for something similar.
This is the case because you are only changing the temporary variable a.
list = [1,2,3]
for i in list:
i+=1
list will still be [1,2,3]
you have to edit the string based on its index in the list
This is a function that I saw to find the unique items in an array in order, I am new to python but this seemed very elegant.
unique_in_order = lambda l: [z for i, z in enumerate(l) if i == 0 or l[i - 1] != z]
How does this for loop exactly work.
z for i,z in enumerate(l)
enumerate(..) is a builtin function that takes as input an iterable object (l here) and generates a sequence of tuples containing the index and the element for each element.
So enumerate([1,4,2,5]) emits tuples like (0,1), (1,4), (2,2), (3,5). If you use a comma-separated list of identifiers in the head of the for loop, the tuple is untupled. So:
for i,z in enumerate([1,4,2,5]):
pass
will iterate four times, the first time i will be 0 and z 1; the next iteration i will be 1 and z 4; the next iteration i will be 2 and z 2; the next iteration i will be 3 and z 5.
Now your statement also contains some list comprehension, the first z in z for i,z in enumerate(l) means it will emit the z values. Notice furthermore that there is condition (the if part), so not all values will be emitted.
You should start with concept of list comprehensions in python to understand what does this lambda function do. In short it creates list of z elements that meet a condition on right side of statement.
Another important thing is builtin enumerate function that simply emits list of touples consisting of element and it's index.
enumerate() helps you to iterate over both the indices and the items of sequences at once.
Here is an example :
>>> l=['a','b','c']
>>> for index,value in enumerate(l):
print (index,value)
0 a
1 b
2 c
The solution you've posted is wrong and doesn't return unique elements as it only checks for duplicates on the previous item only (l[i-1]!=z).
To elaborate on what I meant, here is a test run :
>>> unique_in_order = lambda l: [z for i, z in enumerate(l) if i == 0 or l[i - 1] != z]
>>> l=[1,1,123,5,6,123]
>>> unique_in_order(l)
[1, 123, 5, 6, 123]
You can see that 123 occurs twice because it was tested only against its previous element 6.
Before I provide a simple solution, we need to be clear that we are finding unique items from a list in order or we are trying to get rid of duplicates entirely.
A simple and elegant solution would be to use list.count method. It returns the number of times an item occurs in the list.
>>> l=['a', 'a',2,5,6,'b', 'c', 'd', 'e','e',2,2,6]
>>> [x for x in l if l.count(x)<2]
[5, 'b', 'c', 'd']
If you did not meant to discard the duplicates entirely and instead wanted the list to have a single occurence of the duplicate items then you can do this :
>>> l=['a', 'a',2,5,6,'b', 'c', 'd', 'e','e',2,2,6]
>>> dups=set()
>>> [x for x in l if x not in dups and (dups.add(x) or True)]
['a', 2, 5, 6, 'b', 'c', 'd', 'e']
Assume you have a list
>>> m = ['a','b','c']
I'd like to make a new list n that has everything except for a given item in m (for example the item 'a'). However, when I use
>>> m.remove('a')
>>> m
m = ['b', 'c']
the original list is mutated (the value 'a' is removed from the original list). Is there a way to get a new list sans-'a' without mutating the original? So I mean that m should still be [ 'a', 'b', 'c' ], and I will get a new list, which has to be [ 'b', 'c' ].
I assume you mean that you want to create a new list without a given element, instead of changing the original list. One way is to use a list comprehension:
m = ['a', 'b', 'c']
n = [x for x in m if x != 'a']
n is now a copy of m, but without the 'a' element.
Another way would of course be to copy the list first
m = ['a', 'b', 'c']
n = m[:]
n.remove('a')
If removing a value by index, it is even simpler
n = m[:index] + m[index+1:]
There is a simple way to do that using built-in function :filter .
Here is ax example:
a = [1, 2, 3, 4]
b = filter(lambda x: x != 3, a)
If the order is unimportant, you can use set (besides, the removal seems to be fast in sets):
list(set(m) - set(['a']))
This will remove duplicate elements from your original list though
We can do it via built-in copy() function for list;
However, should assign a new name for the copy;
m = ['a','b','c']
m_copy=m.copy()
m_copy.remove('a')
print (m)
['a', 'b', 'c']
print(m_copy)
['b', 'c']
You can create a new list without the offending element with a list-comprehension. This will preserve the value of the original list.
l = ['a', 'b', 'c']
[s for s in l if s != 'a']
Another approach to list comprehension is numpy:
>>> import numpy
>>> a = [1, 2, 3, 4]
>>> list(numpy.remove(a, a.index(3)))
[1, 2, 4]
We can do it without using in built remove function and also without creating new list variable
Code:
# List m
m = ['a', 'b', 'c']
# Updated list m, without creating new list variable
m = [x for x in m if x != a]
print(m)
output
>>> ['b', 'c']
The question is useful as I sometimes have a list that I use throughout my given script but I need to at a certain step to apply a logic on a subset of the list elements. In that case I found it useful to use the same list but only exclude the needed element for that individual step, without the need to create a totally new list with a different name. For this you can use either:
list comprehension: say you have l=['a','b','c'] to exclude b, you can have [x for x in l if x!='b']
set [only if order is unimortant]: list(set(l) - set(['b'])), pay attention here that you pass 'b' as list ['b']
I need to build a list using a list comprehension. This is basically what it has to do:
pattern = []
for c in range(3):
for r in range(3):
if r == c:
pattern.append(a)
else:
pattern.append(b)
But this all somehow needs to be condensed to only one line! I have never used a list comprehension before, so please explain your solution.
Thank you!
Edit:
If I wanted the new list to consist of sublists, could that be put into the list comprehension too? Above I used a range of 3, so in the produced list every sublist would consist of 3 elements, i.e.
pattern = [['a','b','b'],['b','a','b'],['b','b','a']]
The general form of the list comprehension can be understood like this
[(the task to be done) (how long it has to be done)]
We normally use for loop in the how long it has to be done part and the task to be done part can have if conditions as well. The important thing to be noted is, the task to be done part should return a valid value (even None is a valid value). So, you cannot use any of the Python statements (return, in Python 2.x print, etc) in the list comprehension.
Now to answer your first question,
['a' if r == c else 'b' for c in range(3) for r in range(3)]
# ['a', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'a']
This exactly creates a list as you have shown in the for loop version.
'a' if r == c else 'b'
This is same as
if r == c:
'a'
else:
'b'
First, for c in range(3) will be executed and the list [0, 1, 2] will be generated, and then on every iteration for r in range(3) will be executed and the list [0, 1, 2] will be generated. On each iteration of r, the if condition we saw above will be executed and the result of that if..else will be used as the element of the new list being generated.
To answer your second question, you can very well use list comprehension.
Our basic understanding from the above example, is that list comprehension is to generate a list. Now, lets try and nest the list comprehensions (we are going to use list comprehension in the the task to be done part), like this
[['a' if r == c else 'b' for r in range(3)] for c in range(3)]
# [['a', 'b', 'b'], ['b', 'a', 'b'], ['b', 'b', 'a']]
Now, if you look at our nested list comprehension, first, for c in range(3) will be executed and then ['a' if r == c else 'b' for r in range(3)] part will be executed, which will generate the individual rows of the nested lists. The important thing to note here is, c is available within the nested list comprehension.