Slicing sublists with different lengths - python

I have a list of lists. Each sublist has a length that varies between 1 and 100. Each sublist contains a particle ID at different times in a set of data. I would like to form lists of all particle IDs at a given time. To do this I could use something like:
list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8]]
list2 = [item[0] for item in list]
list2 would contain the first elements of each sublist in list. I would like to do this operation not just for the first element, but for every element between 1 and 100. My problem is that element number 100 (or 66 or 77 or whatever) does not exists for every sublist.
Is there some way of creating a lists of lists, where each sublist is the list of all particle IDs at a given time.
I have thought about trying to use numpy arrays to solve this problem, as if the lists were all the same length this would be trivial. I have tried adding -1's to the end of each list to make them all the same length, and then masking the negative numbers, but this hasn't worked for me so far. I will use the list of IDs at a given time to slice another separate array:
pos = pos[satIDs]

lst = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8]]
func = lambda x: [line[x] for line in lst if len(line) > x]
func(3)
[4, 8, 7]
func(4)
[5, 8]
--update--
func = lambda x: [ (line[x],i) for i,line in enumerate(lst) if len(line) > x]
func(4)
[(5, 0), (8, 2)]

You could use itertools.zip_longest. This will zip the lists together and insert None when one of the lists is exhausted.
>>> lst = [[1,2,3,4,5],['A','B','C'],['a','b','c','d','e','f','g']]
>>> list(itertools.zip_longest(*lst))
[(1, 'A', 'a'),
(2, 'B', 'b'),
(3, 'C', 'c'),
(4, None, 'd'),
(5, None, 'e'),
(None, None, 'f'),
(None, None, 'g')]
If you don't want the None elements, you can filter them out:
>>> [[x for x in sublist if x is not None] for sublist in itertools.zip_longest(*lst)]
[[1, 'A', 'a'], [2, 'B', 'b'], [3, 'C', 'c'], [4, 'd'], [5, 'e'], ['f'], ['g']]

Approach #1
One almost* vectorized approach could be suggested that goes along creating ID based on the new order and splitting, like so -
def position_based_slice(L):
# Get lengths of each element in input list
lens = np.array([len(item) for item in L])
# Form ID array that has *ramping* IDs within an element starting from 0
# and restarts with a new element at 0
id_arr = np.ones(lens.sum(),int)
id_arr[lens[:-1].cumsum()] = -lens[:-1]+1
# Get order maintained sorted indices for sorting flattened version of list
ids = np.argsort(id_arr.cumsum(),kind='mergesort')
# Get sorted version and split at boundaries decided by lengths of ids
vals = np.take(np.concatenate(L),ids)
cut_idx = np.where(np.diff(ids)<0)[0]+1
return np.split(vals,cut_idx)
*There is a loop comprehension involved at the start, but being meant to collect just the lengths of the input elements of the list, its effect on the total runtime should be minimal.
Sample run -
In [76]: input_list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8],[3,2]]
In [77]: position_based_slice(input_list)
Out[77]:
[array([1, 2, 1, 3]), # input_list[ID=0]
array([2, 6, 3, 2]), # input_list[ID=1]
array([3, 7, 6]), # input_list[ID=2]
array([4, 8, 7]), # input_list[ID=3]
array([5, 8])] # input_list[ID=4]
Approach #2
Here's another approach that creates a 2D array, which is easier to index and trace back to original input elements. This uses NumPy broadcasting alongwith boolean indexing. The implementation would look something like this -
def position_based_slice_2Dgrid(L):
# Get lengths of each element in input list
lens = np.array([len(item) for item in L])
# Create a mask of valid places in a 2D grid mapped version of list
mask = lens[:,None] > np.arange(lens.max())
out = np.full(mask.shape,-1,dtype=int)
out[mask] = np.concatenate(L)
return out
Sample run -
In [126]: input_list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8],[3,2]]
In [127]: position_based_slice_2Dgrid(input_list)
Out[127]:
array([[ 1, 2, 3, 4, 5],
[ 2, 6, 7, 8, -1],
[ 1, 3, 6, 7, 8],
[ 3, 2, -1, -1, -1]])
So, now each column of the output would correspond to your ID based outputting.

If you want it with a one-line forloop and in an array you can do this:
list2 = [[item[i] for item in list if len(item) > i] for i in range(0, 100)]
And if you want to know which id is from which list you can do this:
list2 = [{list.index(item): item[i] for item in list if len(item) > i} for i in range(0, 100)]
list2 would be like this:
[{0: 1, 1: 2, 2: 1}, {0: 2, 1: 6, 2: 3}, {0: 3, 1: 7, 2: 6}, {0: 4, 1: 8, 2: 7},
{0: 5, 2: 8}, {}, {}, ... ]

You could append numpy.nan to your short lists and afterwards create a numpy array
import numpy
import itertools
lst = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8,9]]
arr = numpy.array(list(itertools.izip_longest(*lst, fillvalue=numpy.nan)))
Afterwards you can use numpy slicing as usual.
print arr
print arr[1, :] # [2, 6, 3]
print arr[4, :] # [5, nan, 8]
print arr[5, :] # [nan, nan, 9]

Related

How to find common positions in list of lists where the elements are always duplicates and then remove those duplicates?

I have a list of lists, where the lists are always ordered in the same way, and within each list several of the elements are duplicates. I would therefore like to remove duplicates from the list, but it's important that I retain the structure of each list
i.e. if elements indices 0, 1 and 2 are all duplicates for a given list, two of these would be removed from the list, but then the same positions elements would also have to be removed from all the other lists too to retain the ordered structure.
Crucially however, it may not be the case that elements with indices 0, 1 and 2 are duplicates in the other lists, and therefore I would only want to do this if I was sure that across the lists, elements indexed by 0, 1 and 2 were always duplicated.
As an example, say I had this list of lists
L = [ [1,1,1,3,3,2,4,6,6],
[5,5,5,4,5,6,5,7,7],
[9,9,9,2,2,7,8,10,10] ]
After applying my method I would like to be left with
L_new = [ [1,3,3,2,4,6],
[5,4,5,6,5,7],
[9,2,2,7,8,10] ]
where you see that elements index 1 and 2 and element 8 have all been constantly removed because they are consistently duplicated across all lists, whereas elements index 3 and 4 have not because they are not always duplicated.
My thinking so far (though I believe this is probably not the best approach and why I asked for help)
def check_duplicates_in_same_position(arr_list):
check_list = []
for arr in arr_list:
duplicate_positions_list = []
positions = {}
for i in range(len(arr)):
item = arr[i]
if item in positions:
positions[item].append(i)
else:
positions[item] = [i]
duplicate_positions = {k: v for k, v in positions.items() if len(v) > 1}
for _, item in duplicate_positions.items():
duplicate_positions_list.append(item)
check_list.append(duplicate_positions_list)
return check_list
This returns a list of lists of lists, where each element is a list that contains a bunch of lists whose elements are the indices of the duplicates for that list as so
[[[0, 1, 2], [3, 4], [7, 8]],
[[0, 1, 2, 4, 6], [7, 8]],
[[0, 1, 2], [3, 4], [7, 8]]]
I then thought to somehow compare these lists and for example remove elements index 1 and 2 and index 8, because these are common matches for each.
Assuming all sub-lists will have the same length, this should work:
l = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]
[list(x) for x in zip(*dict.fromkeys(zip(*l)))]
# Output: [[1, 3, 3, 2, 4, 6], [5, 4, 5, 6, 5, 7], [9, 2, 2, 7, 8, 10]]
Explanation:
zip(*l) - This will create a new 1-dimension array. The nth element will be a tuple with all the nth elements in the original sublists:
[(1, 5, 9),
(1, 5, 9),
(1, 5, 9),
(3, 4, 2),
(3, 5, 2),
(2, 6, 7),
(4, 5, 8),
(6, 7, 10),
(6, 7, 10)]
From the previous list, we only want to keep those that are not repeated. There are various ways of achieving this. If you search how to remove duplicates while mantaining order, this answer will pop up. It uses dict.fromkeys(<list>). Since python dict keys must be unique, this removes duplicates and generates the following output:
{(1, 5, 9): None,
(3, 4, 2): None,
(3, 5, 2): None,
(2, 6, 7): None,
(4, 5, 8): None,
(6, 7, 10): None}
We now want to unzip those keys to the original 2-dimensional array. For that, we can use zip again:
zip(*dict.fromkeys(zip(*l)))
Since zip returns tuples, we have to finally convert the tuples to list using a list comprehension:
[list(x) for x in zip(*dict.fromkeys(zip(*l)))]
I would go with something like this. It is not too fast, but dependent on the size of your lists, it could be sufficient.
L = [ [1,1,1,3,3,2,4,6,6], [5,5,5,4,5,6,5,7,7], [9,9,9,2,2,7,8,10,10] ]
azip = zip(*L)
temp_L = []
for zz in azip:
if not zz in temp_L:
temp_L.append(zz)
new_L = [list(zip(*temp_L))[zz] for zz in range(len(L))]
first, we zip the three (or more) lists within L. Then, we iterate over each element, check if it already exists. If not, we add it to our temporary list temp_L. And in the end we restructure temp_L to be of the original format. It returns
new_L
>> [(1, 3, 3, 2, 4, 6), (5, 4, 5, 6, 5, 7), (9, 2, 2, 7, 8, 10)]

Sorting a list of lists by every list and return the final index

I want to sort a list with an arbitrary number of lists inside to sort by each of said lists.
Furthermore I do not want to use any libraries (neither python-native nor 3rd party).
data = [['a', 'b', 'a', 'b', 'a'], [9, 8, 7, 6, 5]]
I know I can achieve this by doing
list(zip(*sorted(zip(*data))))
# [('a', 'a', 'a', 'b', 'b'), (5, 7, 9, 6, 8)]
but I would like to have the sorting-index of that very process.
In this case:
index = [4, 2, 0, 3, 1]
I found several answers for a fixed number of inside lists, or such that only want to sort by a specific list. Neither case is what I am looking for.
Add a temporary index list to the end before sorting. The result will show you the pre-sorted indices in the appended list:
data = [['a', 'b', 'a', 'b', 'a'], [9, 8, 7, 6, 5]]
assert all(len(sublist) == len(data[0]) for sublist in data)
data.append(range(len(data[0])))
*sorted_data, indices = list(zip(*sorted(zip(*data))))
print(sorted_data)
# [('a', 'a', 'a', 'b', 'b'), (5, 7, 9, 6, 8)]
print(indices)
# (4, 2, 0, 3, 1)
Try this
data = [["a", "b", "a", "b", "a"], [9, 8, 7, 6, 5]]
def sortList(inputList):
masterList = [[value, index] for index, value in enumerate(inputList)]
masterList.sort()
values = []
indices = []
for item in masterList:
values.append(item[0]) # get the item
indices.append(item[1]) # get the index
return values, indices
sortedData = []
sortedIndices = []
for subList in data:
sortedList, indices = sortList(subList)
sortedData.append(sortedList)
sortedIndices.append(indices)
print(sortedData)
print(sortedIndices)

How to find common elements inside a list

I have a list l1 that looks like [1,2,1,0,1,1,0,3..]. I want to find, for each element the indexes of elements which have same value as the element.
For eg, for the first value in the list, 1, it should list out all indexes where 1 is present in the list and it should repeat same for every element in the list. I can wrote a function to do that iterating through the list but wanted to check if there is any predefined function.
I am getting the list from Pandas dataframe columns, it would be good know if series/dataframe library offer any such functions
You can use numpy.unique, which can return the inverse too. This can be used to reconstruct the indices using numpy.where:
In [49]: a = [1,2,1,0,1,1,0,3,8,10,6,7]
In [50]: uniq, inv = numpy.unique(a, return_inverse=True)
In [51]: r = [(uniq[i], numpy.where(inv == i)[0]) for i in range(uniq.size)]
In [52]: print(r)
[(0, array([3, 6])), (1, array([0, 2, 4, 5])), (2, array([1])), (3, array([7])), (6, array([10])), (7, array([11])), (8, array([8])), (10, array([9]))]
i tried brute force..may be u can optimize
here is python3 code
L = [1,2,1,0,1,1,0,3]
D = dict()
for i in range(len(L)):
n =[]
if L[i] not in D.keys():
for j in range(len(L)):
if L[i] == L[j]:
n.append(j)
D[L[i]] = n
for j in D.keys():
print(j,"->",D.get(j))
You could achieve this using a defaultdict.
from collection import defaultdict
input = [1,2,1,0,1,1,0,3]
#Dictionary to store our indices for each value
index_dict = defaultdict(list)
#Store index for each item
for i, item in enumerate(input):
index_dict[item].append(i)
If you want a list which contains the indices of elements which are the same as the corresponding element in your input list, you can just create a reference to the dictionary:
same_element_indices = [index_dict[x] for x in input]
This has the advantage of only referencing the one object for each identical element.
Output would be:
[[0, 2, 4, 5],
[1],
[0, 2, 4, 5],
[3, 6],
[0, 2, 4, 5],
[0, 2, 4, 5],
[3, 6],
[7]]
You can also try something like this:
import pandas as pd
df = pd.DataFrame({'A': [1,2,1,0,1,1,0,3]})
uni = df['A'].unique()
for i in uni:
lists = df[df['A'] == i].index.tolist()
print(i, '-->', lists)
Output:
1 --> [0, 2, 4, 5]
2 --> [1]
0 --> [3, 6]
3 --> [7]

How to get items in a list before and after a specified index

Suppose I have the list f=[1,2,3] and index i -- I want to iterate over f, excluding i. Is there a way I can use i to split the list, something like f[:i:], where I would be given a new list of [1,3] when ran with i=1?
Code I'm trying to fit this into:
# permutations, excluding self addition
# <something here> would be f excluding f[x]
f = [1,2,3]
r = [x + y for x in f for y in <something here>]
# Expected Output (notice absence of any f[i]+f[i])
[3, 4, 3, 5, 4, 5]
Use enumerate() in order to have access to index at iteration time.
[item for i, item in enumerate(f) if i != 3]
In this case you can escape the intended index or if you have a set of indices you can check the membership with in:
[item for i, item in enumerate(f) if i not in {3, 4, 5}]
If you want to remove an item in a certain index you can use del statement:
>>> l = ['a', 'b', 'c', 'd', 'e']
>>>
>>> del l[3]
>>> l
['a', 'b', 'c', 'e']
>>>
If you want to create a new list by removing that item and preserve teh main list you can use a simple slicing:
>>> new = l[:3] + l[4:]
>>> new
['a', 'b', 'c', 'e']
iterate y over the index:
f = [10,20,30,40,50,60]
r = [x + f[y] for x in f for y in range(len(f)) if f[y] != x]
Probably not the most elegant solution, but this might work:
f = [1,2,3,4,5]
for i, x in enumerate(f):
if i == 0:
new_list = f[1:]
elif i == len(f) -1:
new_list = f[:-1]
else:
new_list = f[:i]+f[i+1:]
print i, new_list
prints:
0 [2, 3, 4, 5]
1 [1, 3, 4, 5]
2 [1, 2, 4, 5]
3 [1, 2, 3, 5]
4 [1, 2, 3, 4]
Well, it may seem scary but that's a one-liner that does the work:
>>> from numpy import array
>>> import itertools
>>> list(itertools.chain(*(i+array(l) for i,l in zip(reversed(f), itertools.combinations(f, len(f)-1)))))
[3, 4, 3, 5, 4, 5]
If you look at it slowly, it's not so complicated:
The itertools.combination give all the possible options to the len(f)-1 combination:
>>> list(itertools.combinations(f, len(f)-1))
[(1, 2), (1, 3), (2, 3)]
You wrap it with zip and reversed(f) so you can get each combination together with the missing value:
>>> [(i,l) for i,l in zip(reversed(f), itertools.combinations(f, len(f)-1))]
[(3, (1, 2)), (2, (1, 3)), (1, (2, 3))]
Then you convert l to a numpy.array so you can add the missing value:
>>> list((i+array(l) for i,l in zip(reversed(f), itertools.combinations(f, len(f)-1))))
[array([4, 5]), array([3, 5]), array([3, 4])]
And finaly you use itertools.chain to get the desired result.

Interleave two lists of different length in python v. 2?

I am trying to write a Python function that takes two lists as arguments and interleaves them. The order of the component lists should be preserved. If the lists do not have the same length, the elements of the longer list should end up at the
end of the resulting list.
For example, I'd like to put this in Shell:
interleave(["a", "b"], [1, 2, 3, 4])
And get this back:
["a", 1, "b", 2, 3, 4]
If you can help me I'd appreciate it.
Here's how I'd do it, using various bits of the itertools module. It works for any number of iterables, not just two:
from itertools import chain, izip_longest # or zip_longest in Python 3
def interleave(*iterables):
sentinel = object()
z = izip_longest(*iterables, fillvalue = sentinel)
c = chain.from_iterable(z)
f = filter(lambda x: x is not sentinel, c)
return list(f)
You could try this:
In [30]: from itertools import izip_longest
In [31]: l = ['a', 'b']
In [32]: l2 = [1, 2, 3, 4]
In [33]: [item for slist in izip_longest(l, l2) for item in slist if item is not None]
Out[33]: ['a', 1, 'b', 2, 3, 4]
izip_longest 'zips' the two lists together, but instead of stopping at the length of the shortest list, it continues until the longest one is exhausted:
In [36]: list(izip_longest(l, l2))
Out[36]: [('a', 1), ('b', 2), (None, 3), (None, 4)]
You then add items by iterating through each item in each pair in the zipped list, omitting those that have a value of None. As pointed out by #Blckknight, this will not function properly if your original lists have None values already. If that is possible in your situation, you can use the fillvalue property of izip_longest to fill with something other than None (as #Blckknight does in his answer).
Here is the above example as a function:
In [37]: def interleave(*iterables):
....: return [item for slist in izip_longest(*iterables) for item in slist if item is not None]
....:
In [38]: interleave(l, l2)
Out[38]: ['a', 1, 'b', 2, 3, 4]
In [39]: interleave(l, l2, [44, 56, 77])
Out[39]: ['a', 1, 44, 'b', 2, 56, 3, 77, 4]
A not very elegant solution, but still may be helpful
def interleave(lista, listb):
(tempa, tempb) = ([i for i in reversed(lista)], [i for i in reversed(listb)])
result = []
while tempa or tempb:
if tempa:
result.append(tempa.pop())
if tempb:
result.append(tempb.pop())
return result
or in a single line
def interleave2(lista, listb):
return reduce(lambda x,y : x + y,
map(lambda x: x[0] + x[1],
[(lista[i:i+1], listb[i:i+1])
for i in xrange(max(len(lista),len(listb)))]))
Another solution is based on: How would I do it by hand? Well, almost by hand, using the built-in zip(), and extending the result of zipping the in the length of the shorter list by the tail of the longer one:
#!python2
def interleave(lst1, lst2):
minlen = min(len(lst1), len(lst2)) # find the length of the shorter
tail = lst1[minlen:] + lst2[minlen:] # get the tail
result = []
for t in zip(lst1, lst2): # use a standard zip
result.extend(t) # expand tuple to two items
return result + tail # result of zip() plus the tail
print interleave(["a", "b"], [1, 2, 3, 4])
print interleave([1, 2, 3, 4], ["a", "b"])
print interleave(["a", None, "b"], [1, 2, 3, None, 4])
It prints the results:
['a', 1, 'b', 2, 3, 4]
[1, 'a', 2, 'b', 3, 4]
['a', 1, None, 2, 'b', 3, None, 4]

Categories