-Python- Ordering lists based on a format

-Python- Ordering lists based on a format - python

I'm new to programming in general, so looking to really expand my skills here. I'm trying to write a script that will grab a list of strings from an object, then order them based on a template of my design. Any items not in the template will be added to the end.
Here's how I'm doing it now, but could someone suggest a better/more efficient way?
originalList = ['b', 'a', 'c', 'z', 'd']
listTemplate = ['a', 'b', 'c', 'd']
listFinal = []
for thing in listTemplate:
if thing in originalList:
listFinal.append(thing)
originalList.pop(originalList.index(thing))
for thing in originalList:
listFinal.append(thing)
originalList.pop(originalList.index(thing))

Try this:
originalList = ['b', 'a', 'c', 'z', 'd']
listTemplate = ['a', 'b', 'c', 'd']
order = { element:index for index, element in enumerate(listTemplate) }
sorted(originalList, key=lambda element: order.get(element, float('+inf')))
=> ['a', 'b', 'c', 'd', 'z']
This is how it works:
First, we build a dictionary indicating, for each element in listTemplate, its relative order with respect to the others. For example a is 0, b is 1 and so on
Then we sort originalList. If one of its elements is present in the order dictionary, then use its relative position for ordering. If it's not present, return a positive infinite value - this will guarantee that the elements not in listTemplate will end up at the end, with no further ordering among them.
The solution in the question, although correct, is not very pythonic. In particular, whenever you have to build a new list, try to use a list comprehension instead of explicit looping/appending. And it's not a good practice to "destroy" the input list (using pop() in this case).

You can create a dict using the listTemplate list, that way the expensive(O(N)) list.index operations can be reduced to O(1) lookups.
>>> lis1 = ['b', 'a', 'c', 'z', 'd']
>>> lis2 = ['a', 'b', 'c', 'd']
Use enumerate to create a dict with the items as keys(Considering that the items are hashable) and index as values.
>>> dic = { x:i for i,x in enumerate(lis2) }
Now dic looks like:
{'a': 0, 'c': 2, 'b': 1, 'd': 3}
Now for each item in lis1 we need to check it's index in dic, if the key is not found we return float('inf').
Function used as key:
def get_index(key):
return dic.get(key, float('inf'))
Now sort the list:
>>> lis1.sort(key=get_index)
>>> lis1
['a', 'b', 'c', 'd', 'z']

For the final step, you can just use:
listFinal += originalList
and it will add these items to the end.

There is no need to create a new dictionary at all:
>>> len_lis1=len(lis1)
>>> lis1.sort(key = lambda x: lis2.index(x) if x in lis2 else len_lis1)
>>> lis1
['a', 'b', 'c', 'd', 'z']

Here is a way that has better computational complexity:
# add all elements of originalList not found in listTemplate to the back of listTemplate
s = set(listTemplate)
listTemplate.extend(el for el in originalList if el not in s)
# now sort
rank = {el:index for index,el in enumerate(listTemplate)}
listFinal = sorted(originalList, key=rank.get)

Related

How to efficiently get common items from two lists that may have duplicates?

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
The common items are:
c = ['a', 'b', 'a']
The code:
for e in my_list:
if e in my_list_2:
c.append(e)
...
If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.
How to deal with this efficiently?

dict is already a hashmap, so lookups are practically as efficient as a set, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set to check before checking the dict
However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values
def foo(src_dict, check_list):
for value in check_list:
if value in my_dict:
yield value
With the edit, you may find you're better off packing all the inputs into a set
def foo(src_list, check_list):
hashmap = set(src_list)
for value in check_list:
if value in hashmap:
yield value
If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set)

I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:
[x for x in my_list if x in my_list_2]
Output
['a', 'b', 'a']

First, utilize the set.intersection() method to get the intersecting values in the list. Then, use a nested list comprehension to include the duplicates based on the original list's count on each value:
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = [x for x in set(my_list).intersection(set(my_list_2)) for _ in range(my_list.count(x))]
print(c)
The above may be slower than just
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = []
for e in my_list:
if e in my_list_2:
c.append(e)
print(c)
But when the lists are significantly larger, the code block utilizing the set.intersection() method will be significantly more efficient (faster).

sorry for not reading the post carefully and now it is not possible to delete.. however, it is an attempt for solution.
c = lambda my_list, my_list_2: (my_list, my_list_2, list(set(my_list).intersection(set(my_list_2))))
print("(list_1,list_2,duplicate_items) -", c(my_list, my_list_2))
Output:
(list_1,list_2,duplicate_items) -> (['a', 'b', 'a', 'd', 'e', 'f'], ['a', 'b', 'c'], ['b', 'a'])
or can be
[i for i in my_list if i in my_list_2]
output:
['a', 'b', 'a']

return a list of items without any elements with the same value next to each other

Implement the function unique_in_order which takes as argument a sequence and returns a list of items without any elements with the same value next to each other and preserving the original order of elements.
For example:
unique_in_order('AAAABBBCCDAABBB') == ['A', 'B', 'C', 'D', 'A', 'B']
unique_in_order('ABBCcAD') == ['A', 'B', 'C', 'c', 'A', 'D']
unique_in_order([1,2,2,3,3]) == [1,2,3]
my code return the correct output:
def unique_in_order(iterable):
list = []
for i in range(0, len(iterable)):
if iterable[i] != iterable[i-1]:
list.append(iterable[i])
return list
pass on test but it fails on attempt, saying:
should work with one element:
[] should equal ['A']
should reduce duplicates:
[] should equal ['A']
I wanna know what is wrong with my code, thanks

Use existing libraries to perform that task, like itertools.groupby
import itertools
def unique_in_order(iterable):
return [k for k,_ in itertools.groupby(iterable)]
print(unique_in_order('AAAABBBCCDAABBB')) # ['A', 'B', 'C', 'D', 'A', 'B']
print(unique_in_order(['A'])) # ['A']
With the default group key, groupby groups identical consecutive elements, yielding tuples with the value and the group of values (that we ignore here, we just need the key)

Getting specific indexed distinct values in nested lists

I have a nested list of around 1 million records like:
l = [['a', 'b', 'c', ...], ['d', 'b', 'e', ...], ['f', 'z', 'g', ...],...]
I want to get the distinct values of inner lists on second index, so that my resultant list be like:
resultant = ['b', 'z', ...]
I have tried nested loops but its not fast, any help will be appreciated!

Since you want the unique items you can use collections.OrderedDict.fromkeys() in order to keep the order and unique items (because of using hashtable fro keys) and use zip() to get the second items.
from collections import OrderedDict
list(OrderedDict.fromkeys(zip(my_lists)[2]))
In python 3.x since zip() returns an iterator you can do this:
colls = zip(my_lists)
next(colls)
list(OrderedDict.fromkeys(next(colls)))
Or use a generator expression within dict.formkeys():
list(OrderedDict.fromkeys(i[1] for i in my_lists))
Demo:
>>> lst = [['a', 'b', 'c'], ['d', 'b', 'e'], ['f', 'z', 'g']]
>>>
>>> list(OrderedDict().fromkeys(sub[1] for sub in lst))
['b', 'z']

You can unzip the list of lists then choice the second tuple with set like below :
This code take 4.05311584473e-06 millseconds, in my laptop
list(set(zip(*lst)[1]))
Input :
lst = [['a', 'b', 'c'], ['d', 'b', 'e'], ['f', 'z', 'g']]
Out put :
['b', 'z']

Would that work for you?
result = set([inner_list[1] for inner_list in l])

I can think of two options.
Set comprehension:
res = {x[1] for x in l}
I think numpy arrays work faster than list/set comprehensions, so converting this list to an array and then using array functions can be faster. Here:
import numpy as np
res = np.unique(np.array(l)[:, 1])
Let me explain: np.array(l) converts the list to a 2d array, then [:, 1] take the second column (starting to count from 0) which consists of the second item of each sublist in the original l, and finally taking only unique values using np.unique.

Removing duplicates (not by using set)

My data look like this:
let = ['a', 'b', 'a', 'c', 'a']
How do I remove the duplicates? I want my output to be something like this:
['b', 'c']
When I use the set function, I get:
set(['a', 'c', 'b'])
This is not what I want.

One option would be (as derived from Ritesh Kumar's answer here)
let = ['a', 'b', 'a', 'c', 'a']
onlySingles = [x for x in let if let.count(x) < 2]
which gives
>>> onlySingles
['b', 'c']

Try this,
>>> let
['a', 'b', 'a', 'c', 'a']
>>> dict.fromkeys(let).keys()
['a', 'c', 'b']
>>>

Sort the input, then removing duplicates becomes trivial:
data = ['a', 'b', 'a', 'c', 'a']
def uniq(data):
last = None
result = []
for item in data:
if item != last:
result.append(item)
last = item
return result
print uniq(sorted(data))
# prints ['a', 'b', 'c']
This is basically the shell's cat data | sort | uniq idiom.
The cost is O(N * log N), same as with a tree-based set.

Instead of sorting, or linearly scanning and re-counting the main list for its occurrences each time.
Count the number of occurrences and then filter on items that appear once...
>>> from collections import Counter
>>> let = ['a', 'b', 'a', 'c', 'a']
>>> [k for k, v in Counter(let).items() if v == 1]
['c', 'b']
You have to look at the sequence at least once regardless - although it makes sense to limit the amount of times you do so.
If you really want to avoid any type or set or otherwise hashed container (because you perhaps can't use them?), then yes, you can sort it, then use:
>>> from itertools import groupby, islice
>>> [k for k,v in groupby(sorted(let)) if len(list(islice(v, 2))) == 1]
['b', 'c']

Python: filtering lists by indices

In Python I have a list of elements aList and a list of indices myIndices. Is there any way I can retrieve all at once those items in aList having as indices the values in myIndices?
Example:
>>> aList = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> myIndices = [0, 3, 4]
>>> aList.A_FUNCTION(myIndices)
['a', 'd', 'e']

I don't know any method to do it. But you could use a list comprehension:
>>> [aList[i] for i in myIndices]

Definitely use a list comprehension but here is a function that does it (there are no methods of list that do this). This is however bad use of itemgetter but just for the sake of knowledge I have posted this.
>>> from operator import itemgetter
>>> a_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> my_indices = [0, 3, 4]
>>> itemgetter(*my_indices)(a_list)
('a', 'd', 'e')

Indexing by lists can be done in numpy. Convert your base list to a numpy array and then apply another list as an index:
>>> from numpy import array
>>> array(aList)[myIndices]
array(['a', 'd', 'e'],
dtype='|S1')
If you need, convert back to a list at the end:
>>> from numpy import array
>>> a = array(aList)[myIndices]
>>> list(a)
['a', 'd', 'e']
In some cases this solution can be more convenient than list comprehension.

You could use map
map(aList.__getitem__, myIndices)
or operator.itemgetter
f = operator.itemgetter(*aList)
f(myIndices)

If you do not require a list with simultaneous access to all elements, but just wish to use all the items in the sub-list iteratively (or pass them to something that will), its more efficient to use a generator expression rather than list comprehension:
(aList[i] for i in myIndices)

Alternatively, you could go with functional approach using map and a lambda function.
>>> list(map(lambda i: aList[i], myIndices))
['a', 'd', 'e']

I wasn't happy with these solutions, so I created a Flexlist class that simply extends the list class, and allows for flexible indexing by integer, slice or index-list:
class Flexlist(list):
def __getitem__(self, keys):
if isinstance(keys, (int, slice)): return list.__getitem__(self, keys)
return [self[k] for k in keys]
Then, for your example, you could use it with:
aList = Flexlist(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
myIndices = [0, 3, 4]
vals = aList[myIndices]
print(vals) # ['a', 'd', 'e']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

-Python- Ordering lists based on a format - python

For the final step, you can just use: listFinal += originalList and it will add these items to the end.

There is no need to create a new dictionary at all: >>> len_lis1=len(lis1) >>> lis1.sort(key = lambda x: lis2.index(x) if x in lis2 else len_lis1) >>> lis1 ['a', 'b', 'c', 'd', 'z']

Related

How to efficiently get common items from two lists that may have duplicates?

return a list of items without any elements with the same value next to each other

Getting specific indexed distinct values in nested lists

Removing duplicates (not by using set)

Python: filtering lists by indices

Categories

Resources