Python: How to custom order a list? - python

Obs: I know lists in python are not order-fixed, but think that this one will be.
And I'm using Python 2.4
I have a list, like (for example) this one:
mylist = [ ( u'Article', {"...some_data..."} ) ,
( u'Report' , {"...some_data..."} ) ,
( u'Book' , {"...another_data..."} ) ,
...#continue
]
This variable mylist is obtained from a function, and the 'order' of the list returned will vary. So, sometimes it will be like on the example. Sometimes, the 'Report' will come before 'Article', etc.
I have a fixed order that I want on this list (and isn't the alphabetical).
Let's say that my fixed order is: 'Report', 'Article', 'Book', ...
So, what I want is that: whatever order 'mylist' is instantiated, I want to reorder it making 'Report' stay on front, 'Article' on second, etc...
What's the best approach to reorder my list (taking the first element of the tuple of each item on list) using my 'custom' order?
Answer:
I ended up with this:
mylist became a list of dicts, like this:
mylist = [{'id':'Article', "...some_data..."} ,
...etc
]
each dict having a 'id' that had to be sorted.
Saving the correct order on a listAssigning the correct_order on a list:
correct_order = ['Report', 'Article', 'Book', ...]
and doing:
results = sorted([item for item in results], cmp=lambda x,y:cmp(correct_order.index(x['id']), correct_order.index(y['id'])))

You could use a dictionary that would map every first element to its "weight" and then check this dictionary inside a sorting function.
Something like:
d = { "Report": 1,
"Article": 2,
"Book": 3 }
result = sorted(mylist, key=lambda x:d[x[0]])

You could use a dictionary, that would allow you to access "Book", "Article", etc. without having to care about the order. I would put the data from that list into a dict that look like this:
mydict = { u'Article': "somedata",
u'Report': "someotherdata", ...}
If you really want to sort your list in the way you described, you can use the list.sort with a key function that represents your particular sort order (Documentation). You need the key function as you need to access only the first element and your sorting order also is not alphabetical.

This way creates a dict and pulls the items from it in order
mylist = [ ( u'Article', {"...some_data..."} ) ,
( u'Report' , {"...some_data..."} ) ,
( u'Book' , {"...another_data..."} ) ,
]
mydict = dict(mylist)
ordering = [u'Report', u'Article', u'Book']
print [(k,mydict[k]) for k in ordering]
This way uses sort with O(1) lookups for the ordering
mylist = [ ( u'Article', {"...some_data..."} ) ,
( u'Report' , {"...some_data..."} ) ,
( u'Book' , {"...another_data..."} ) ,
]
mydict = dict(mylist)
ordering = dict((k,v) for v,k in enumerate([u'Report', u'Article', u'Book']))
print sorted(mydict.items(), key=lambda (k,v): ordering[k])

More generally, there could be elements of the mylist that are not in the specified fixed order. This will order according to the rule, but leave alone the relative order of everything outside of the rule:
def orderListByRule(alist,orderRule,listKeys=None,dropIfKey=None):
###
#######################################################################################
""" Reorder alist according to the order specified in orderRule. The orderRule lists the order to be imposed on a set of keys. The keys are alist, if listkeys==None, or listkeys otherwise. That is, the length of listkeys must be the same as of alist. That is, listkeys are the tags on alist which determine the ordering. orderRule is a list of those same keys and maybe more which specifies the desired ordering.
There is an optional dropIfKey which lists keys of items that should be dropped outright.
"""
maxOR = len(orderRule)
orDict = dict(zip(orderRule, range(maxOR)))
alDict = dict(zip(range(maxOR, maxOR+len(alist)),
zip(alist if listKeys is None else listKeys, alist)))
outpairs = sorted( [[orDict.get(b[0],a),(b)] for a,b in alDict.items()] )
if dropIfKey is None: dropIfKey=[]
outL = [b[1] for a,b in outpairs if b[0] not in dropIfKey]
return outL
def test_orderListByRule():
L1 = [1,2,3,3,5]
L2 = [3,4,5,10]
assert orderListByRule(L1, L2) == [3, 3, 5, 1, 2]
assert orderListByRule(L1, L2, dropIfKey=[2,3]) == [5, 1,]
Lv = [c for c in 'abcce']
assert orderListByRule(Lv, L2, listKeys=L1) == ['c', 'c', 'e', 'a', 'b']
assert orderListByRule(Lv, L2, listKeys=L1, dropIfKey=[2,3]) == ['e','a']

This function sorts through a custom list, but if any element is not in the list, an error will not be generated. However, this element will be at the end of the line.
def sort_custom(ordem_custom : list , origin : list) -> list:
list_order_equals = [c for c in ordem_custom if (c in origin)]
list_no_equals = [c for c in origin if (not c in ordem_custom)]
list_order = list_order_equals + list_no_equals
return list_order
#Exemple
custom_order = ('fa','a','b','c','d','e')
my_list = ('e','c','fa','h','h','g')
result = sort_custom(custom_order,my_list)
print(result)

Related

What is the correct way to write this function?

I was making a program where first parameter is a list and second parameter is a list of dictionaries. I want to return a list of lists like this:
As an example, if this were a function call:
make_lists(['Example'],
[{'Example': 'Made-up', 'Extra Keys' : 'Possible'}]
)
the expected return value would be:
[ ['Made-up'] ]
As an second example, if this were a function call:
make_lists(['Hint', 'Num'],
[{'Hint': 'Length 2 Not Required', 'Num' : 8675309},
{'Num': 1, 'Hint' : 'Use 1st param order'}]
)
the expected return value would be:
[ ['Length 2 Not Required', 8675309],
['Use 1st param order', 1]
]
I have written a code for this but my code does not return a list of lists, it just returns a single list. Please can someone explain?
def make_lists(s,lod):
a = []
lol =[]
i = 0
for x in lod:
for y in x:
for k in s:
if(y==k):
lol.append(x.get(y))
i = i+1
return lol
Expected Output:
[ ['Length 2 Not Required', 8675309],['Use 1st param order', 1] ]
Output:
['Length 2 Not Required', 8675309, 1, 'Use 1st param order']
The whole point of dictionaries, is that you can access them by key:
def make_lists(keys, dicts):
result = []
for d in dicts:
vals = [d[k] for k in keys if k in d]
if len(vals) > 0:
result.append(vals)
return result
Let's have a look what happens here:
We still have the result array, which accumulates the answers, but now it's called result instead of lol
Next we iterate through every dictionary:
for d in dicts:
For each dictionary d, we create a list, which is a lookup in that dictionary for the keys in keys, if the key k is in the dictionary d:
vals = [d[k] for k in keys if k in d]
The specs don't detail this, but I assume if none of the keys are in the dictionary, you don't want it added to the array. For that, we have a check if vals have any results, and only then we add it to the results:
if len(vals) > 0:
result.append(vals)
Try this code - I've managed to modify your existing code slighty, and added explanation in the comments. Essentially, you just need to use a sub-list and add that to the master list lol, and then in each loop iteration over elements in lod, append to the sub-list instead of the outermost list.
def make_lists(s,lod):
a = []
lol =[]
i = 0
for x in lod:
## Added
# Here we want to create a new list, and add it as a sub-list
# within 'lol'
lols = []
lol.append(lols)
## Done
for y in x:
for k in s:
if(y==k):
# Changed 'lol' to 'lols' here
lols.append(x.get(y))
i = i+1
return lol
print(make_lists(['Example'], [{'Example': 'Made-up', 'Extra Keys' : 'Possible'}]))
print(make_lists(['Hint', 'Num'], [{'Hint': 'Length 2 Not Required', 'Num' : 8675309}, {'Num': 1, 'Hint' : 'Use 1st param order'}]))
Prints:
[['Made-up']]
[['Length 2 Not Required', 8675309], [1, 'Use 1st param order']]
A simpler solution
For a cleaner (and potentially more efficient approach), I'd suggest using builtins like map and using a list comprehension to tackle this problem:
def make_lists(s, lod):
return [[*map(dict_obj.get, s)] for dict_obj in lod]
But note, that this approach includes elements as None in cases where the desired keys in s are not present in the dictionary objects within the list lod.
To work around that, you can pass the result of map to the filter builtin function so that None values (which represent missing keys in dictionaries) are then stripped out in the result:
def make_lists(s, lod):
return [[*filter(None, map(dict_obj.get, s))] for dict_obj in lod]
print(make_lists(['Example'], [{'Extra Keys' : 'Possible'}]))
print(make_lists(['Hint', 'Num'], [{'Num' : 8675309}, {'Num': 1, 'Hint' : 'Use 1st param order'}]))
Output:
[[]]
[[8675309], ['Use 1st param order', 1]]

Sort a list of dictionaries by multiple keys/values, where the order of the values should be specific

I have a list of dictionaries and want each item to be sorted by a specific property values.
The list:
[
{'name':'alpha', status='run'},
{'name':'alpha', status='in'},
{'name':'alpha-32', status='in'},
{'name':'beta', status='out'}
{'name':'gama', status='par'}
{'name':'gama', status='in'}
{'name':'aeta', status='run'}
{'name':'aeta', status='unknown'}
{'pname': 'boc', status='run'}
]
I know I can do:
newlist = sorted(init_list, key=lambda k: (k['name'], k['status'])
but there two more conditions:
If the key name is no present in a dict, for the name to be used the value corresponding to pname key.
the status order to be ['out', 'in', 'par', 'run']
if the status value doesn't correspond with what is in the list, ignore it - see unknown;
The result should be:
[
{'name':'aeta', status='unknown'}
{'name':'aeta', status='run'}
{'name':'alpha', status='in'},
{'name':'alpha', status='run'},
{'name':'alpha-32', status='in'},
{'name':'beta', status='out'},
{'pname': 'boc', status='run'}
{'name':'gama', status='in'},
{'name':'gama', status='par'}
]
Use
from itertools import count
# Use count() instead of range(4) so that we
# don't need to worry about the length of the status list.
newlist = sorted(init_list,
key=lambda k: (k.get('name', k.get('pname')),
dict(zip(['out', 'in', 'par', 'run'], count())
).get(k['status'], -1)
)
)
If k['name'] doesn't exits, fall back to k['pname'] (or None if that doesn't exist). Likewise, if there is no known integer for the given status, default to -1.
I deliberately put this all in one logical line to demonstrate that at this point, you may want to just define the key function using a def statement.
def list_order(k):
name_to_use = k.get('name')
if name_to_use is None:
name_to_use = k['pname'] # Here, just assume pname is available
# Explicit definition; you might still write
# status_orders = dict(zip(['out', ...], count())),
# or better yet a dict comprehension like
# { status: rank for rank, status in enumerate(['out', ...]) }
status_orders = {
'out': 0,
'in': 1,
'par': 2,
'run': 3
}
status_to_use = status_orders.get(k['status'], -1)
return name_to_use, status_to_use
newlist = sorted(init_list, key=list_order)
The first condition is simple, you get default the first value of the ordering tuple to pname, i.e.
lambda k: (k.get('name', k.get('pname')), k['status'])
For the second and third rule I would define an order dict for statuses
status_order = {key: i for i, key in enumerate(['out', 'in', 'par', 'run'])}
and then use it in key-function
lambda k: (k.get('name', k.get('pname')), status_order.get(k['status']))
I haven't tested it, so it might need some tweaking

How to sort a list a string of two list path in python?

I have two list that contains the path of files
lst_A =['/home/data_A/test_AA_123.jpg',
'/home/data_A/test_AB_234.jpg',
'/home/data_A/test_BB_321.jpg',
'/home/data_A/test_BC_112.jpg',
]
lst_B =['/home/data_B/test_AA_222.jpg',
'/home/data_B/test_CC_444.jpg',
'/home/data_B/test_AB_555.jpg',
'/home/data_B/test_BC_777.jpg',
]
Based on the lst_A, I want to sort the list B so that the first and second name of basename of two path in A and B should be same. In this case is test_xx. So, the expected short list B is
lst_B =['/home/data_B/test_AA_222.jpg',
'/home/data_B/test_AB_555.jpg',
'/home/data_B/test_CC_444.jpg',
'/home/data_B/test_BC_777.jpg',
]
In additions, I want to indicate which position of two lists have first and second name are same in the basename (such as test_xx), so the array indicator should be
array_same =[1,1,0,1]
How should I do it in python? I have tried the .sort() function but it returns unexpected result. Thanks
Update: This is my solution
import os
lst_A =['/home/data_A/test_AA_123.jpg',
'/home/data_A/test_AB_234.jpg',
'/home/data_A/test_BB_321.jpg',
'/home/data_A/test_BC_112.jpg',
]
lst_B =['/home/data_B/test_AA_222.jpg',
'/home/data_B/test_CC_444.jpg',
'/home/data_B/test_AB_555.jpg',
'/home/data_B/test_BC_777.jpg']
lst_B_sort=[]
same_array=[]
for ind_a, a_name in enumerate(lst_A):
for ind_b, b_name in enumerate(lst_B):
print (os.path.basename(b_name).split('_')[1])
if os.path.basename(b_name).split('_')[1] in os.path.basename(a_name):
lst_B_sort.append(b_name)
same_array.append(1)
print(lst_B_sort)
print(same_array)
Output: ['/home/data_B/test_AA_222.jpg', '/home/data_B/test_AB_555.jpg', '/home/data_B/test_BC_777.jpg']
[1, 1, 1]
Because I did not add the element that has not same name
We will discuss the issue with a SIMPLE technique followed by an APPLIED solution.
SIMPLE
We just focus on sorting the names given a key.
Given
Simple names and a key list:
lst_a = "AA AB BB BC EE".split()
lst_b = "AA DD CC AB BC".split()
key_list = [1, 1, 0, 1, 0]
Code
same = sorted(set(lst_a) & set(lst_b))
diff = sorted(set(lst_b) - set(same))
isame, idiff = iter(same), iter(diff)
[next(isame) if x else next(idiff) for x in key_list]
# ['AA', 'AB', 'CC', 'BC', 'DD']
lst_b gets sorted according to elements shared with lst_a first. Remnants are inserted as desired.
Details
This problem is mainly reduced to sorting the intersection of names from both lists. The intersection is a set of common elements called same. The remnants are in a set called diff. We sort same and diff and here's what they look like:
same
# ['AA', 'AB', 'BC']
diff
# ['CC', 'DD']
Now we just want to pull a value from either list, in order, according to the key. We start by iterating the key_list. If 1, pull from the isame iterator. Otherwise, pull from idiff.
Now that we have the basic technique, we can apply it to the more complicated path example.
APPLIED
Applying this idea to more complicated path-strings:
Given
import pathlib
lst_a = "foo/t_AA_a.jpg foo/t_AB_a.jpg foo/t_BB_a.jpg foo/t_BC_a.jpg foo/t_EE_a.jpg".split()
lst_b = "foo/t_AA_b.jpg foo/t_DD_b.jpg foo/t_CC_b.jpg foo/t_AB_b.jpg foo/t_BC_b.jpg".split()
key_list = [1, 1, 0, 1, 0]
# Helper
def get_name(s_path):
"""Return the shared 'name' from a string path.
Examples
--------
>>> get_name("foo/test_xx_a.jpg")
'test_xx'
"""
return pathlib.Path(s_path).stem.rsplit("_", maxsplit=1)[0]
Code
Map the names to paths:
name_path_a = {get_name(p): p for p in lst_a}
name_path_b = {get_name(p): p for p in lst_b}
Names are in dict keys, so directly substitute sets with dict keys:
same = sorted(name_path_a.keys() & name_path_b.keys())
diff = sorted(name_path_b.keys() - set(same))
isame, idiff = iter(same), iter(diff)
Get the paths via names pulled from iterators:
[name_path_b[next(isame)] if x else name_path_b[next(idiff)] for x in key_list]
Output
['foo/t_AA_b.jpg',
'foo/t_AB_b.jpg',
'foo/t_CC_b.jpg',
'foo/t_BC_b.jpg',
'foo/t_DD_b.jpg']
Loop through lst_A, get the filename prefix, then append the element from lst_B with the same prefix to the result list.
Create a set of all the elements from lst_B, and when you add a path to the result, remove it from the set. Then at the end you can go through this set, filling in the blank spaces in the result where there were no matches.
lst_A =['/home/data_A/test_AA_123.jpg',
'/home/data_A/test_AB_234.jpg',
'/home/data_A/test_BB_321.jpg',
'/home/data_A/test_BC_112.jpg',
]
lst_B =['/home/data_B/test_AA_222.jpg',
'/home/data_B/test_CC_444.jpg',
'/home/data_B/test_AB_555.jpg',
'/home/data_B/test_BC_777.jpg',
]
new_lst_B = []
same_array = []
set_B = set(lst_B)
for fn in lst_A:
prefix = "_".join(os.path.basename(fn).split('_')[:-1])+'_' # This gets test_AA_
try:
found_B = next(x for x in lst_B if os.path.basename(x).startswith(prefix))
new_lst_b.append(found_B)
same_array.append(1)
set_B.remove(found_B)
except StopIteration: # No match found
new_lst_b.append(None) # Placeholder to fill in
same_array.append(0)
for missed in set_B:
index = new_lst_B.index(None)
new_lst_B[index] = missed
DEMO

List of dicts: Getting list of matching dictionary based on id

I'm trying to get the matching IDs and store the data into one list. I have a list of dictionaries:
list = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
Expected output would be something like
# {'id':'123','name':'Jason','location': ['McHale', 'Tompson Hall']},
# {'id':'432','name':'Tom','location': 'Sydney'},
How can I get matching data based on dict ID value? I've tried:
for item in mylist:
list2 = []
row = any(list['id'] == list.id for id in list)
list2.append(row)
This doesn't work (it throws: TypeError: tuple indices must be integers or slices, not str). How can I get all items with the same ID and store into one dict?
First, you're iterating through the list of dictionaries in your for loop, but never referencing the dictionaries, which you're storing in item. I think when you wrote list[id] you mean item[id].
Second, any() returns a boolean (true or false), which isn't what you want. Instead, maybe try row = [dic for dic in list if dic['id'] == item['id']]
Third, if you define list2 within your for loop, it will go away every iteration. Move list2 = [] before the for loop.
That should give you a good start. Remember that row is just a list of all dictionaries that have the same id.
I would use kdopen's approach along with a merging method after converting the dictionary entries I expect to become lists into lists. Of course if you want to avoid redundancy then make them sets.
mylist = [
{'id':'123','name':['Jason'],'location': ['McHale']},
{'id':'432','name':['Tom'],'location': ['Sydney']},
{'id':'123','name':['Jason'],'location':['Tompson Hall']}
]
def merge(mylist,ID):
matches = [d for d in mylist if d['id']== ID]
shell = {'id':ID,'name':[],'location':[]}
for m in matches:
shell['name']+=m['name']
shell['location']+=m['location']
mylist.remove(m)
mylist.append(shell)
return mylist
updated_list = merge(mylist,'123')
Given this input
mylist = [
{'id':'123','name':'Jason','location': 'McHale'},
{'id':'432','name':'Tom','location': 'Sydney'},
{'id':'123','name':'Jason','location':'Tompson Hall'}
]
You can just extract it with a comprehension
matched = [d for d in mylist if d['id'] == '123']
Then you want to merge the locations. Assuming matched is not empty
final = matched[0]
final['location'] = [d['location'] for d in matched]
Here it is in the interpreter
In [1]: mylist = [
...: {'id':'123','name':'Jason','location': 'McHale'},
...: {'id':'432','name':'Tom','location': 'Sydney'},
...: {'id':'123','name':'Jason','location':'Tompson Hall'}
...: ]
In [2]: matched = [d for d in mylist if d['id'] == '123']
In [3]: final=matched[0]
In [4]: final['location'] = [d['location'] for d in matched]
In [5]: final
Out[5]: {'id': '123', 'location': ['McHale', 'Tompson Hall'], 'name': 'Jason'}
Obviously, you'd want to replace '123' with a variable holding the desired id value.
Wrapping it all up in a function:
def merge_all(df):
ids = {d['id'] for d in df}
result = []
for id in ids:
matches = [d for d in df if d['id'] == id]
combined = matches[0]
combined['location'] = [d['location'] for d in matches]
result.append(combined)
return result
Also, please don't use list as a variable name. It shadows the builtin list class.

Python 2.7 : delete item from list by value

After performing some operations I get a list as following :
FreqItemset(items=[u'A_String_0'], freq=303)
FreqItemset(items=[u'A_String_0', u'Another_String_1'], freq=302)
FreqItemset(items=[u'B_String_1', u'A_String_0', u'A_OtherString_1'], freq=301)
I'd like to remove from list all items start from A_String_0 , but I'd like to keep other items (doesn't matter if A_String_0 exists in the middle or at the end of item )
So in example above delete lines 1 and 2 , keep line 3
I tried
filter(lambda a: a != 'A_String_0', result)
and
result.remove('A_String_0')
all this doesn't help me
It is as simple as this:
from pyspark.mllib.fpm import FPGrowth
sets = [
FPGrowth.FreqItemset(
items=[u'A_String_0'], freq=303),
FPGrowth.FreqItemset(
items=[u'A_String_0', u'Another_String_1'], freq=302),
FPGrowth.FreqItemset(
items=[u'B_String_1', u'A_String_0', u'A_OtherString_1'], freq=301)
]
[x for x in sets if x.items[0] != 'A_String_0']
## [FreqItemset(items=['B_String_1', 'A_String_0', 'A_OtherString_1'], freq=301)]
In practice it would better to filter beffore collect:
filtered_sets = (model
.freqItemsets()
.filter(lambda x: x.items[0] != 'A_String_0')
.collect())
How about result = result if result[0] != 'A_String_0' else result[1:]?
It seems that you are using a list called FreqItemset. However, the name suggests that you should be using a set, instead of a list.
This way, you could have a set of searchable pairs string, frequency. For example:
>>> d = { "the": 2, "a": 3 }
>>> d[ "the" ]
2
>>> d[ "the" ] = 4
>>> d[ "a" ]
3
>>> del d[ "a" ]
>>> d
{'the': 4}
You can easily access each word (which is a key of the dictionary), change its value (its frequency of apparition), or remove it. All operations avoid the access to all the elements of the list, since it is a dictionary, i.e., its performance is good (better than using a list, anyway).
Just my two cents.

Categories