Get index from a list where the key changes, groupby - python

I have a list that looks like this:
myList = [1, 1, 1, 1, 2, 2, 2, 3, 3, 3]
What I want to do is record the index where the items in the list changes value. So for my list above it would be 3, 6.
I know that using groupby like this:
[len(list(group)) for key, group in groupby(myList)]
will result in:
[4, 3, 3]
but what I want is the index where a group starts/ends rather than just then number of items in the groups. I know I could start summing each sucessive group count-1 to get the index but thought there may be a cleaner way of doing so.
Thoughts appreciated.

Just use enumerate to generate indexes along with the list.
from operator import itemgetter
from itertools import groupby
myList = [1, 1, 1, 1, 2, 2, 2, 3, 3, 3]
[next(group) for key, group in groupby(enumerate(myList), key=itemgetter(1))]
# [(0, 1), (4, 2), (7, 3)]
This gives pairs of (start_index, value) for each group.
If you really just want [3, 6], you can use
[tuple(group)[-1][0] for key, group in
groupby(enumerate(myList), key=itemgetter(1))][:-1]
or
indexes = (next(group)[0] - 1 for key, group in
groupby(enumerate(myList), key=itemgetter(1)))
next(indexes)
indexes = list(indexes)

[i for i in range(len(myList)-1) if myList[i] != myList[i+1]]
In Python 2, replace range with xrange.

>>> x0 = myList[0]
... for i, x in enumerate(myList):
... if x != x0:
... print i - 1
... x0 = x
3
6

Related

Last index of duplicate items in a python list

Does anyone know how I can get the last index position of duplicate items in a python list containing duplicate and non-duplicate items?
I have a list sorted in ascending order with [1, 1, 1, 2, 2, 3, 3, 4, 5]
I want it to print the last index of duplicate items and index on non-duplicate items like this
2
4
6
7
8
I tried doing this way but could only print the starting index of duplicate elements and misssed non-duplicate items.
id_list = [1, 1, 1, 2, 2, 3, 3, 4, 5]
for i in range(len(id_list)):
for j in range(i+1,len(id_list)):
if id_list[i]==id_list[j]:
print(i)
Loop on the list using enumerate to get indexes & values, and use a dictionary and retain the last index (last index "wins" when there are duplicates). In the end, sort the indexes (as dictionaries aren't ordered, but you can use an OrderedDict):
import collections
lst = [1, 1, 1, 2, 2, 3, 3, 4, 5]
d = collections.OrderedDict()
for i,v in enumerate(lst):
d[v] = i
print(list(d.values()))
prints:
[2, 4, 6, 7, 8]
The advantage of this solution is that it works even if the duplicates aren't consecutive.
Python 3.7 guarantees the order of the base dictionaries so a simple dict comprehension solves it:
{v:i for i,v in enumerate(lst)}.values()
You can use enumerate and check the next index in the list. If an element is not equal to the element in the next index, it is the last duplicate:
lst = [1, 1, 1, 2, 2, 3, 3, 4, 5]
result = [i for i, x in enumerate(lst) if i == len(lst) - 1 or x != lst[i + 1]]
print(result)
# [2, 4, 6, 7, 8]
You can use a list comprehension with enumerate and zip. The last value will always be in scope, so we can include this at the end explicitly.
L = [1, 1, 1, 2, 2, 3, 3, 4, 5]
res = [idx for idx, (i, j) in enumerate(zip(L, L[1:])) if i != j] + [len(L) - 1]
print(res)
# [2, 4, 6, 7, 8]

How to find common elements inside a list

I have a list l1 that looks like [1,2,1,0,1,1,0,3..]. I want to find, for each element the indexes of elements which have same value as the element.
For eg, for the first value in the list, 1, it should list out all indexes where 1 is present in the list and it should repeat same for every element in the list. I can wrote a function to do that iterating through the list but wanted to check if there is any predefined function.
I am getting the list from Pandas dataframe columns, it would be good know if series/dataframe library offer any such functions
You can use numpy.unique, which can return the inverse too. This can be used to reconstruct the indices using numpy.where:
In [49]: a = [1,2,1,0,1,1,0,3,8,10,6,7]
In [50]: uniq, inv = numpy.unique(a, return_inverse=True)
In [51]: r = [(uniq[i], numpy.where(inv == i)[0]) for i in range(uniq.size)]
In [52]: print(r)
[(0, array([3, 6])), (1, array([0, 2, 4, 5])), (2, array([1])), (3, array([7])), (6, array([10])), (7, array([11])), (8, array([8])), (10, array([9]))]
i tried brute force..may be u can optimize
here is python3 code
L = [1,2,1,0,1,1,0,3]
D = dict()
for i in range(len(L)):
n =[]
if L[i] not in D.keys():
for j in range(len(L)):
if L[i] == L[j]:
n.append(j)
D[L[i]] = n
for j in D.keys():
print(j,"->",D.get(j))
You could achieve this using a defaultdict.
from collection import defaultdict
input = [1,2,1,0,1,1,0,3]
#Dictionary to store our indices for each value
index_dict = defaultdict(list)
#Store index for each item
for i, item in enumerate(input):
index_dict[item].append(i)
If you want a list which contains the indices of elements which are the same as the corresponding element in your input list, you can just create a reference to the dictionary:
same_element_indices = [index_dict[x] for x in input]
This has the advantage of only referencing the one object for each identical element.
Output would be:
[[0, 2, 4, 5],
[1],
[0, 2, 4, 5],
[3, 6],
[0, 2, 4, 5],
[0, 2, 4, 5],
[3, 6],
[7]]
You can also try something like this:
import pandas as pd
df = pd.DataFrame({'A': [1,2,1,0,1,1,0,3]})
uni = df['A'].unique()
for i in uni:
lists = df[df['A'] == i].index.tolist()
print(i, '-->', lists)
Output:
1 --> [0, 2, 4, 5]
2 --> [1]
0 --> [3, 6]
3 --> [7]

Rapid compression of multiple lists with value addition

I am looking for a pythonic way to iterate through a large number of lists and use the index of repeated values from one list to calculate a total value from the values with the same index in another list.
For example, say I have two lists
a = [ 1, 2, 3, 1, 2, 3, 1, 2, 3]
b = [ 1, 2, 3, 4, 5, 6, 7, 8, 9]
What I want to do is find the unique values in a, and then add together the corresponding values from b with the same index. My attempt, which is quite slow, is as follows:
a1=list(set(a))
b1=[0 for y in range(len(a1))]
for m in range(len(a)):
for k in range(len(a1)):
if a1[k]==a[m]:
b1[k]+=b[m]
and I get
a1=[1, 2, 3]
b1=[12, 15, 18]
Please let me know if there is a faster, more pythonic way to do this.
Thanks
Use the zip() function and a defaultdict dictionary to collect values per unique value:
from collections import defaultdict
try:
# Python 2 compatibility
from future_builtins import zip
except ImportError:
# Python 3, already there
pass
values = defaultdict(int)
for key, value in zip(a, b):
values[key] += value
a1, b1 = zip(*sorted(values.items()))
zip() pairs up the values from your two input lists, now all you have to do is sum up each value from b per unique value of a.
The last line pulls out the keys and values from the resulting dictionary, sorts these, and puts just the keys and just the values into a1 and b1, respectively.
Demo:
>>> from collections import defaultdict
>>> a = [ 1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> b = [ 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> values = defaultdict(int)
>>> for key, value in zip(a, b):
... values[key] += value
...
>>> zip(*sorted(values.items()))
[(1, 2, 3), (12, 15, 18)]
If you don't care about output order, you can drop the sorted() call altogether.

Search min value within a list of tuples

I have a list which contain a tuple, within each tuple there's a list and a interger value
E.g.
Mylist = [([1,1,3], 3),([1,1,3], 30),([2,2,3], 15),([1,3,3], 2)]
I want this list to return this tuple ([1,3,3], 2)
since Mylist[i][1] = 2 that is the min in the list.
Now, the built-in function min() doesn't really do that.. it compares it on the basis of the actual list that is Mylist[i][0]
I can perform this only if the list contains two items:
But i have not figured how to do it in a list of.. say 10 items!
def min(a,x,b,y):
t = a
if x >= y:
t = b
return t
Mylist = [([1,1,3], 3),([1,1,3], 30),([2,2,3], 15),([1,3,3], 2)]
print min(Mylist,key=lambda x:x[1])
You can provide a key to min function using lambda.
Output:([1, 3, 3], 2)
If you store your list with the value first then you can just use min and sorted directly:
Mylist = [(3, [1,1,3]), (30, [1,1,3]), (15, [2,2,3]),(2, [1,3,3])]
min(Mylist)
Output: (2, [1, 3, 3])
my solution
myList = [([1, 1, 3], 3), ([1, 1, 3], 30), ([2, 2, 3], 15), ([1, 3, 3], 2)]
minValue = [i for i in myList if i[1] == min([x[1] for x in myList])]
return a list of items with the min value
[([1, 3, 3], 2)]
for example if you have a list like
myList = [([1, 1, 3], 3), ([1, 1, 3], 30), ([2, 2, 3], 15), ([1, 3, 3], 2), ([1, 1, 3], 2)]
Result will be
[([1, 3, 3], 2),([1, 1, 3], 2)]
I don't know if you need this but works :D
Just for interest's sake, here's a functional approach:
def get_min_tuple(l):
def get_index(lst, num, index=0):
if num in lst[index]:
return index
else:
return get_index(lst, num, index + 1)
def find_min(l, smallest=None, assigned=False):
if l == []:
return smallest
else:
if not assigned:
smallest = l[0][1]
assigned = True
else:
if l[0][1] < smallest:
smallest = l[0][1]
return find_min(l[1:], smallest, assigned)
return l[get_index(l, find_min(l))]
While the one-liner of supplying a key to the min function is of course more useful in a practical sense, I thought I'd share this for educational purposes.
Time Complexity = n
Mylist = [([1,1,3], 3),([1,1,3], 30),([2,2,3], 15),([1,3,3], 2)]
minv=MyList[0][1]
minv1=MyList[0][0]
for lst in MyList:
if(lst[1]<minv):
minv=lst[1]
minv1=lst[0]
print(tuple(minv1,minv))

python: sum similar values in list

Is there an easy way to sum all similar values in a list using list comprehensions?
i.e. input:
[1, 2, 1, 3, 3]
expected output:
[6, 2, 2] (sorted)
I tried using zip, but it only works for max 2 similar values:
[x + y for (x, y) in zip(l[:-1], l[1:]) if x == y]
You can use Counter.
from collections import Counter
[x*c for x,c in Counter([1, 2, 1, 3, 3]).items()]
from itertools import groupby
a=[1, 2, 1,1,4,5,5,5,5, 3, 3]
print sorted([sum(g) for i,g in groupby(sorted(a))],reverse=True)
#output=[20, 6, 4, 3, 2]
explantion for the code
first sort the list using sorted(a)
perform groupby to make groupf of similar elements
from each group use sum()
You can use collections.Counter for this, this will take O(N) time.:
>>> from collections import Counter
>>> lst = [1, 2, 1, 3, 3]
>>> [k*v for k, v in Counter(lst).iteritems()]
[2, 2, 6]
Here Counter() returns the count of each unique item, and then we multiply those numbers with their count to get the sum.

Categories