Python Sorting 2D List with custom Key - python
So I have a 2D list and want to sort it using a second file of keys. Does anyone know how I would go about doing that?
Heres an example input:
And here is an example input file:
first_nm,last_nm,gender,cwid,cred_hrs,qual_pts,gpa
John,Roe,M,44444444,40,150,3.75
Jane,Roe,F,66666666,100,260,2.6
John,Doe,M,22222222,50,140,2.8
Jane,Doe,F,88888888,80,280,3.5
Penny,Lowe,F,55555555,40,140,3.5
Lenny,Lowe,M,11111111,100,280,2.8
Denny,Lowe,M,99999999,80,260,3.25
Benny,Lowe,M,77777777,120,90,0.75
Jenny,Lowe,F,33333333,50,90,1.8
Zoe,Coe,F,0,50,130,2.6
Here are the keys to sort it(there could be more or less, depending on how you want to sort it)
gender,ascend,string
gpa,descend,float
last_nm,ascend,string
And here would be the output for that input and keys:
first_nm,last_nm,gender,cwid,cred_hrs,qual_pts,gpa
Jane,Doe,F,88888888,80,280,3.5
Penny,Lowe,F,55555555,40,140,3.5
Zoe,Coe,F,00000000,50,130,2.6
Jane,Roe,F,66666666,100,260,2.6
Jenny,Lowe,F,33333333,50,90,1.8
John,Roe,M,44444444,40,150,3.75
Denny,Lowe,M,99999999,80,260,3.25
John,Doe,M,22222222,50,140,2.8
Lenny,Lowe,M,11111111,100,280,2.8
Benny,Lowe,M,77777777,120,90,0.75
I was thinking of just using the built in sort() but was not sure if I would be able to use it if I am sorting 3 different times. I think I would have to sort backwards? (last_nm, then gpa, then gender)
You can return a tuple from your key function to create complex sorts. And as a quick trick, multiply numeric values by -1 for a reverse sort. Your example would look something like this:
lists.sort(key = lambda x: (x[2], x[6] * -1, x[1]))
The list sort() method takes a boolean parameter reverse, but it applies to the whole key; you can't say that you want some parts of the key to use ascending sort and others to use descending. Sadly, there isn't a simple way to extend g.d.d.c's trick of multiplying by -1 to non-numeric data.
So if you need to handle arbitrary combinations of ascending and descending then yes, you will have to sort multiple times, working backwards over your list of keys, like you mention in your question. The built-in Python sorting algorithm, timsort, is a stable sort, which means each time you sort your 2D list with a different key the previous sort results won't get scrambled.
Related
How to split a dataframe and select all possible pairs?
I have a dataframe that I want to separate in order to apply a certain function. I have the fields df['beam'], df['track'], df['cycle'] and want to separate it by unique values of each of this three. Then, I want to apply this function (it works between two individual dataframes) to each pair that meets that df['track'] is different between the two. Also, the result doesn't change if you switch the order of the pair, so I'd like to not make unnecessary calls to the function if possible. I currently work it through with four nested for loops into an if conditional, but I'm absolutely sure there's a better, cleaner way. I'd appreciate all help! Edit: I ended up solving it like this: I split the original dataframe into multiple by using df.groupby() dfsplit=df.groupby(['beam','track','cycle']) This generates a dictionary where the keys are all the unique ['beam','track','cycle'] combinations as tuples I combined all possible ['beam','track','cycle'] pairs with the use of itertools.combinations() keys=list(itertools.combinations(dfsplit.keys(),2)) This generates a list of 2-element tuples where each element is one ['beam','track','cycle'] tuple itself, and it doesn't include the tuple with the order swapped, so I avoid calling the function twice for what would be the same case. I removed the combinations where 'track' was the same through a for loop for k in keys.copy(): if k[0][1]==k[1][1]: keys.remove(k) Now I can call my function by looping through the list of combinations for k in keys: function(dfsplit[k[0]],dfsplit[k[1]]) Step 3 is taking a long time, probably because I have a very large number of unique ['beam','track','cycle'] combinations so the list is very long, but also probably because I'm doing it sub-optimally. I'll keep the question open in case someone realizes a better way to do this last step. EDIT 2: Solved the problem with step 3, once again with itertools, just by doing keys=list(itertools.filterfalse(lambda k : k[0][1]==k[1][1], keys)) itertools.filterfalse returns all elements of the list that return false to the function defined, so it's doing the same as the previous for loop but selecting the false instead of removing the true. It's very fast and I believe this solves my problem for good.
I don't know how to mark the question as solved so I'll just repeat the solution here: dfsplit=df.groupby(['beam','track','cycle']) keys=list(itertools.combinations(dfsplit.keys(),2)) keys=list(itertools.filterfalse(lambda k : k[0][1]==k[1][1], keys)) for k in keys: function(dfsplit[k[0]],dfsplit[k[1]])
Tuple-key dictionary in python: Accessing a whole block of entries
I am looking for an efficient python method to utilise a hash table that has two keys: E.g.: (1,5) --> {a} (2,3) --> {b,c} (2,4) --> {d} Further I need to be able to retrieve whole blocks of entries, for example all entries that have "2" at the 0-th position (here: (2,3) as well as (2,4)). In another post it was suggested to use list comprehension, i.e.: sum(val for key, val in dict.items() if key[0] == 'B') I learned that dictionaries are (probably?) the most efficient way to retrieve a value from an object of key:value-pairs. However, calling only an incomplete tuple-key is a bit different than querying the whole key where I either get a value or nothing. I want to ask if python can still return the values in a time proportional to the number of key:value-pairs that match? Or alternatively, is the tuple-dictionary (plus list comprehension) better than using pandas.df.groupby() (but that would occupy a bit much memory space)?
The "standard" way would be something like d = {(randint(1,10),i):"something" for i,x in enumerate(range(200))} def byfilter(n,d): return list(filter(lambda x:x==n, d.keys())) byfilter(5,d) ##returns a list of tuples where x[0] == 5 Although in similar situations I often used next() to iterate manually, when I didn't need the full list. However there may be some use cases where we can optimize that. Suppose you need to do a couple or more accesses by key first element, and you know the dict keys are not changing meanwhile. Then you can extract the keys in a list and sort it, and make use of some itertools functions, namely dropwhile() and takewhile(): ls = [x for x in d.keys()] ls.sort() ##I do not know why but this seems faster than ls=sorted(d.keys()) def bysorted(n,ls): return list(takewhile(lambda x: x[0]==n, dropwhile(lambda x: x[0]!=n, ls))) bysorted(5,ls) ##returns the same list as above This can be up to 10x faster in the best case (i=1 in my example) and more or less take the same time in the worst case (i=10) because we are trimming the number of iterations needed. Of course you can do the same for accessing keys by x[1], you just need to add a key parameter to the sort() call
Is it possible to sort a list of strings that represents Filipino numbers
list = [dalawa, tatlo, apat, siyam, isa] and is there a way to sort this to list = [isa, dalawa, tatlo, apat, siyam]. I an new in python so I don't have any idea about this.
The python sort() method will sort a list in alphabetical order. What you can do is assign the value of each filipino number as a dictionary, and then sort it according to value. That should be done as so: (I'm making up the values) list = {"dalawa":2, "tatlo":3, "apat":4, "siyam":5, "isa":1} # look up lambda functions in order to better understand the below functionality. # In short what this does is, return to the sorted function the values of keys in the above dictionary and telling it to sort by them and not by the actual via resetting the key parameter to the lambda function. result = sorted(list, key=lambda x:list[x[0]])
Sorting a dictionary in Python with several parameters?
I have a Dictionary that, when printed with each key and value, looks like this: a__________a_______:1 ___________________:2 _______________a___:1 ___________a____a__:1 ___________a_a_____:1 __________a__a_____:1 ________a______a___:1 ____a____________a_:1 _____a_____________:1 __a_______a________:1 __________a________:2 ____________a____a_:2 _______a___________:1 I am trying to sort the dictionary so that it is sorted by value, which is pretty simple. However, each group within the same value has to be sorted by the number of '_'s; and if the number of underscores is the same, then they are sorted alphabetically. So, after sorting, it would look like this: ___________a____a__:1 ___________a_a_____:1 __________a__a_____:1 ________a______a___:1 ____a____________a_:1 __a_______a________:1 a__________a_______:1 _______________a___:1 _______a___________:1 _____a_____________:1 ____________a____a_:2 __________a________:2 ___________________:2 I understand sorting by value. I am just having trouble being able to sort it within each value group like I stated.
You can sort using a tuple in order of your desired means of ordering: res = dict(sorted(d.items(), key=lambda it: (it[1], it[0].count('_'), it[0])))
numpy.unique has the problem with frozensets
Just run the code: a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})] print(set(a)) # out: {frozenset({3, 4}), frozenset({1, 2})} print(np.unique(a)) # out: [frozenset({1, 2}), frozenset({3, 4}), frozenset({1, 2})] The first out is correct, the second is not. The problem exactly is here: a[0]==a[-1] # out: True But set from np.unique has 3 elements, not 2. I used to utilize np.unique to work with duplicates for ex (using return_index=True and others). What can u advise for me to use instead np.unique for these purposes?
numpy.unique operates by sorting, then collapsing runs of identical elements. Per the doc string: Returns the sorted unique elements of an array. The "sorted" part implies it's using a sort-collapse-adjacent technique (similar to what the *NIX sort | uniq pipeline accomplishes). The problem is that while frozenset does define __lt__ (the overload for <, which most Python sorting algorithms use as their basic building block), it's not using it for the purposes of a total ordering like numbers and sequences use it. It's overloaded to test "is a proper subset of" (not including direct equality). So frozenset({1,2}) < frozenset({3,4}) is False, and so is frozenset({3,4}) > frozenset({1,2}). Because the expected sort invariant is broken, sorting sequences of set-like objects produces implementation-specific and largely useless results. Uniquifying strategies based on sorting will typically fail under those conditions; one possible result is that it will find the sequence to be sorted in order or reverse order already (since each element is "less than" both the prior and subsequent elements); if it determines it to be in order, nothing changes, if it's in reverse order, it swaps the element order (but in this case that's indistinguishable from preserving order). Then it removes adjacent duplicates (since post-sort, all duplicates should be grouped together), finds none (the duplicates aren't adjacent), and returns the original data. For frozensets, you probably want to use hash based uniquification, e.g. via set or (to preserve original order of appearance on Python 3.7+), dict.fromkeys; the latter would be simply: a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})] uniqa = list(dict.fromkeys(a)) # Works on CPython/PyPy 3.6 as implementation detail, and on 3.7+ everywhere It's also possible to use sort-based uniquification, but numpy.unique doesn't seem to support a key function, so it's easier to stick to Python built-in tools: from itertools import groupby # With no key argument, can be used much like uniq command line tool a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})] uniqa = [k for k, _ in groupby(sorted(a, key=sorted))] That second line is a little dense, so I'll break it up: sorted(a, key=sorted) - Returns a new list based on a where each element is sorted based on the sorted list form of the element (so the < comparison actually does put like with like) groupby(...) returns an iterator of key/group-iterator pairs. With no key argument to groupby, it just means each key is a unique value, and the group-iterator produces that value as many times as it was seen. [k for k, _ in ...] Since we don't care how many times each duplicate value was seen, so we ignore the group-iterator (assigning to _ means "ignored" by convention), and have the list comprehension produce only the keys (the unique values)