List of dictionaries - stack one value of dictionary

List of dictionaries - stack one value of dictionary - python

I have trouble in adding one value of dictionary when conditions met, For example I have this list of dictionaries:
[{'plu': 1, 'price': 150, 'quantity': 2, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 7, 'stock': 10},
{'plu': 1, 'price': 150, 'quantity': 6, 'stock': 5},
{'plu': 1, 'price': 200, 'quantity': 4, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 3, 'stock': 10}
]
Then output should look like this:
[{'plu': 1, 'price': 150, 'quantity': 8, 'stock': 5},
{'plu': 1, 'price': 200, 'quantity': 4, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 10, 'stock': 10}
]
Quantity should be added only if plu and price are the same, it should ignore key:values other than that (ex. stock). What is the most efficient way to do that?
#edit
I tried:
import itertools as it
keyfunc = lambda x: x['plu']
groups = it.groupby(sorted(new_data, key=keyfunc), keyfunc)
x = [{'plu': k, 'quantity': sum(x['quantity'] for x in g)} for k, g in groups]
But it works only on plu and then I get only quantity value when making html table in django, other are empty

You need to sort/groupby the combined key, not just one key. Easiest/most efficient way to do this is with operator.itemgetter. To preserve an arbitrary stock value, you'll need to use the group twice, so you'll need to convert it to a sequence:
from operator import itemgetter
keyfunc = itemgetter('plu', 'price')
# Unpack key and listify g so it can be reused
groups = ((plu, price, list(g))
for (plu, price), g in it.groupby(sorted(new_data, key=keyfunc), keyfunc))
x = [{'plu': plu, 'price': price, 'stock': g[0]['stock'],
'quantity': sum(x['quantity'] for x in g)}
for plu, price, g in groups]
Alternatively, if stock is guaranteed to be the same for each unique plu/price pair, you can include it in the key to simplify matters, so you don't need to listify the groups:
keyfunc = itemgetter('plu', 'price', 'stock')
groups = it.groupby(sorted(new_data, key=keyfunc), keyfunc)
x = [{'plu': plu, 'price': price, 'stock': stock,
'quantity': sum(x['quantity'] for x in g)
for (plu, price, stock), g in groups]
Optionally, you could create getquantity = itemgetter('quantity') at top level (like the keyfunc) and change sum(x['quantity'] for x in g) to sum(map(getquantity, g)) which pushes work to the C layer in CPython, and can be faster if your groups are large.
The other approach is to avoid sorting entirely using collections.Counter (or collections.defaultdict(int), though Counter makes the intent more clear here):
from collections import Counter
grouped = Counter()
for plu, price, stock, quantity in map(itemgetter('plu', 'price', 'stock', 'quantity'), new_data):
grouped[plu, price, stock] += quantity
then convert back to your preferred form with:
x = [{'plu': plu, 'price': price, 'stock': stock, 'quantity': quantity}
for (plu, price, stock), quantity in grouped.items()]
This should be faster for large inputs, since it replaces O(n log n) sorting work with O(n) dict operations (which are roughly O(1) cost).

Using pandas will make this a trivial problem:
import pandas as pd
data = [{'plu': 1, 'price': 150, 'quantity': 2, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 7, 'stock': 10},
{'plu': 1, 'price': 150, 'quantity': 6, 'stock': 5},
{'plu': 1, 'price': 200, 'quantity': 4, 'stock': 5},
{'plu': 2, 'price': 150, 'quantity': 3, 'stock': 10}]
df = pd.DataFrame.from_records(data)
# df
#
# plu price quantity stock
# 0 1 150 2 5
# 1 2 150 7 10
# 2 1 150 6 5
# 3 1 200 4 5
# 4 2 150 3 10
new_df = df.groupby(['plu','price','stock'], as_index=False).sum()
new_df = new_df[['plu','price','quantity','stock']] # Optional: reorder the columns
# new_df
#
# plu price quantity stock
# 0 1 150 8 5
# 1 1 200 4 5
# 2 2 150 10 10
And finally, if you want to, port it back to dict (though I would argue pandas give you a lot more functionality to handle the data elements):
new_data = df2.to_dict(orient='records')
# new_data
#
# [{'plu': 1, 'price': 150, 'quantity': 8, 'stock': 5},
# {'plu': 1, 'price': 200, 'quantity': 4, 'stock': 5},
# {'plu': 2, 'price': 150, 'quantity': 10, 'stock': 10}]

Related

Repeating values when we use append function in list or update in dict

when I try to append a Dictonary values to a list I'm getting the value appended multiple time but when I try to see what is really happening in there by using print statement, the values are iterating properly but while appending it is appending same value multiple times
Here is my code
from random import randint
x = {'1':
{
'item_id': 6,
'item_name': 'burger',
'item_price': 10,
'item_quantity': 2
},
'2':
{
'item_id': 7,
'item_name': 'pizza',
'item_price': 15,
'item_quantity': 4
},
'3':
{
'item_id': 8,
'item_name': 'Biryani',
'item_price': 20,
'item_quantity': 6
}
}
cart=[]
items = {}
for y in x.values():
items['name'] = y['item_name']
items['price'] = y['item_price']
items['quantity'] = y['item_quantity']
print(items)
cart.append(items)
print(cart)
And This is the Output:
{'name': 'burger', 'price': 10, 'quantity': 2}
{'name': 'pizza', 'price': 15, 'quantity': 4}
{'name': 'Biryani', 'price': 20, 'quantity': 6}
[{'name': 'Biryani', 'price': 20, 'quantity': 6}, {'name': 'Biryani', 'price': 20, 'quantity': 6}, {'name': 'Biryani', 'price': 20, 'quantity': 6}]
>

You are mutating and appending the same dict items to cart again and again. Instantiate a new dict items = {} in each iteration of the for loop.

try initializing your dictionary every time inside the loop
cart=[]
for y in x.values():
items = {}
items['name'] = y['item_name']
items['price'] = y['item_price']
items['quantity'] = y['item_quantity']
print(items)
cart.append(items)
print(cart)

Correlation in Apache Spark and groupBy with Python

I'm new in Python and Apache Spark, and try to understand, how function "pyspark.sql.functions.corr (val1, val2)" works.
I have big dataframe with auto brand, age and price. I want to get correlation between age and price for each auto brand.
I have 2 solutions:
//get all brands
get_all_maker = data.groupBy("brand").agg(F.count("*").alias("counts")).collect()
for row in get_all_maker:
print(row["brand"],": ",data.filter(data["brand"]==row["brand"]).corr("age","price"))
This solution is slow, because I use "corr" a lot of times.
So I tried to do it with one aggregation:
get_all_maker_corr = data.groupBy("brand").agg(
F.count("*").alias("counts"),
F.corr("age","price").alias("correlation")).collect()
for row in get_all_maker_corr:
print(row["brand"],": ",row["correlation"])
If I try to compare results, they are different. But why?

I tried with simple examples. Here I generate simple data frame:
d = [
{'name': 'a', 'age': 1, 'price': 2},
{'name': 'a', 'age': 2, 'price': 4},
{'name': 'b', 'age': 1, 'price': 1},
{'name': 'b', 'age': 2, 'price': 2}
]
b = spark.createDataFrame(d)
Let's test two methods:
#first version
get_all_maker = b.groupBy("name").agg(F.count("*").alias("counts")).collect()
print("Correlation (1st)")
for row in get_all_maker:
print(row["name"],"(",row["counts"],"):",b.filter(b["name"] == row["name"]).corr("age","price"))
#second version
get_all_maker_corr = b.groupBy("name").agg(
F.count("*").alias("counts"),
F.corr("age","price").alias("correlation")).collect()
print("Correlation (2nd)")
for row in get_all_maker_corr:
print(row["name"],"(",row["counts"],"):",row["correlation"])
Both of them bring me the same answer:
Correlation (1st)
b ( 2 ): 1.0
a ( 2 ): 1.0
Let's add another entry to data frame with None-value:
d = [
{'name': 'a', 'age': 1, 'price': 2},
{'name': 'a', 'age': 2, 'price': 4},
{'name': 'a', 'age': 3, 'price': None},
{'name': 'b', 'age': 1, 'price': 1},
{'name': 'b', 'age': 2, 'price': 2}
]
b = spark.createDataFrame(d)
In first version you will get these results:
Correlation (1st)
b ( 2 ): 1.0
a ( 3 ): -0.5
and the second version bring you other results:
Correlation (2nd)
b ( 2 ): 1.0
a ( 3 ): 1.0
I think, that dataframe.filter with corr-function set None-value to 0-value.
And dataframe.groupBy with F.corr-function in agg-function will ignore None-value.
So, two these methods are not equal. I don't know, if this is a bug or a feature of the Spark system, but just in case you want to count correlation value, the data should be used only without None-value.

Creating a complex nested dictionary from multiple lists in Python

I am struggling to create a nested dictionary with the following data:
Team, Group, ID, Score, Difficulty
OneTeam, A, 0, 0.25, 4
TwoTeam, A, 1, 1, 10
ThreeTeam, A, 2, 0.64, 5
FourTeam, A, 3, 0.93, 6
FiveTeam, B, 4, 0.5, 7
SixTeam, B, 5, 0.3, 8
SevenTeam, B, 6, 0.23, 9
EightTeam, B, 7, 1.2, 4
Once imported as a Pandas Dataframe, I turn each feature into these lists:
teams, group, id, score, diff.
Using this stack overflow answer Create a complex dictionary using multiple lists I can create the following dictionary:
{'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25},
'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}
using the code:
{team: {'id': i, 'score': s, 'diff': d} for team, i, s, d in zip(teams, id, score, diff)}
But what I'm after is having 'Group' as the main key, then team, and then id, score and difficulty within the team (as above).
I have tried:
{g: {team: {'id': i, 'score': s, 'diff': d}} for g, team, i, s, d in zip(group, teams, id, score, diff)}
but this doesn't work and results in only one team per group within the dictionary:
{'A': {'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93}},
'B': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2}}}
Below is how the dictionary should look, but I'm not sure how to get there - any help would be much appreciated!
{'A:': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25}},
'B': {'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}}

A dict comprehension may not be the best way of solving this if your data is stored in a table like this.
Try something like
from collections import defaultdict
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
groups[g][team] = {'id': i, 'score': s, 'diff': d }
By using defaultdict, if groups[g] already exists, the new team is added as a key, if it doesn't, an empty dict is automatically created that the new team is then inserted into.
Edit: you edited your answer to say that your data is in a pandas dataframe. You can definitely skip the steps of turning the columns into list. Instead you could then for example do:
from collections import defaultdict
groups = defaultdict(dict)
for row in df.itertuples():
groups[row.Group][row.Team] = {'id': row.ID, 'score': row.Score, 'diff': row.Difficulty}

If you absolutely want to use comprehension, then this should work:
z = zip(teams, group, id, score, diff)
s = set(group)
d = { #outer dict, one entry for each different group
group: ({ #inner dict, one entry for team, filtered for group
team: {'id': i, 'score': s, 'diff': d}
for team, g, i, s, d in z
if g == group
})
for group in s
}
I added linebreaks for clarity
EDIT:
After the comment, to better clarify my intention and out of curiosity, I run a comparison:
# your code goes here
from collections import defaultdict
import timeit
teams = ['OneTeam', 'TwoTeam', 'ThreeTeam', 'FourTeam', 'FiveTeam', 'SixTeam', 'SevenTeam', 'EightTeam']
group = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
id = [0, 1, 2, 3, 4, 5, 6, 7]
score = [0.25, 1, 0.64, 0.93, 0.5, 0.3, 0.23, 1.2]
diff = [4, 10, 5, 6, 7, 8, 9, 4]
def no_comprehension():
global group, teams, id, score, diff
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
groups[g][team] = {'id': i, 'score': s, 'diff': d }
def comprehension():
global group, teams, id, score, diff
z = zip(teams, group, id, score, diff)
s = set(group)
d = {group: ({team: {'id': i, 'score': s, 'diff': d} for team, g, i, s, d in z if g == group}) for group in s}
print("no comprehension:")
print(timeit.timeit(lambda : no_comprehension(), number=10000))
print("comprehension:")
print(timeit.timeit(lambda : comprehension(), number=10000))
executable version
Output:
no comprehension:
0.027287796139717102
comprehension:
0.028979241847991943
They do look the same, in terms of performance. With my sentence above, I was just highlighting this as an alternative solution to the one already posted by #JohnO.

Grouping python list of dictionaries and aggregation value data

I have input list
inlist = [{"id":123,"hour":5,"groups":"1"},{"id":345,"hour":3,"groups":"1;2"},{"id":65,"hour":-2,"groups":"3"}]
I need to group the dictionaries by 'groups' value. After that I need to add key min and max of hour in new grouped lists. The output should look like this
outlist=[(1, [{"id":123, "hour":5, "min_group_hour":3, "max_group_hour":5}, {"id":345, "hour":3, "min_group_hour":3, "max_group_hour":5}]),
(2, [{"id":345, "hour":3, "min_group_hour":3, "max_group_hour":3}])
(3, [{"id":65, "hour":-2, "min_group_hour":-2, "max_group_hour":-2}])]
So far I managed to group input list
new_list = []
for domain in test:
for group in domain['groups'].split(';'):
d = dict()
d['id'] = domain['id']
d['group'] = group
d['hour'] = domain['hour']
new_list.append(d)
for k,v in itertools.groupby(new_list, key=itemgetter('group')):
print (int(k),max(list(v),key=itemgetter('hour'))
And output is
('1', [{'group': '1', 'id': 123, 'hour': 5}])
('2', [{'group': '2', 'id': 345, 'hour': 3}])
('3', [{'group': '3', 'id': 65, 'hour': -2}])
I don't know how to aggregate values by group? And is there more pythonic way of grouping dictionaries by key value that needs to be splitted?

Start by creating a dict that maps group numbers to dictionaries:
from collections import defaultdict
dicts_by_group = defaultdict(list)
for dic in inlist:
groups = map(int, dic['groups'].split(';'))
for group in groups:
dicts_by_group[group].append(dic)
This gives us a dict that looks like
{1: [{'id': 123, 'hour': 5, 'groups': '1'},
{'id': 345, 'hour': 3, 'groups': '1;2'}],
2: [{'id': 345, 'hour': 3, 'groups': '1;2'}],
3: [{'id': 65, 'hour': -2, 'groups': '3'}]}
Then iterate over the grouped dicts and set the min_group_hour and max_group_hour for each group:
outlist = []
for group in sorted(dicts_by_group.keys()):
dicts = dicts_by_group[group]
min_hour = min(dic['hour'] for dic in dicts)
max_hour = max(dic['hour'] for dic in dicts)
dicts = [{'id': dic['id'], 'hour': dic['hour'], 'min_group_hour': min_hour,
'max_group_hour': max_hour} for dic in dicts]
outlist.append((group, dicts))
Result:
[(1, [{'id': 123, 'hour': 5, 'min_group_hour': 3, 'max_group_hour': 5},
{'id': 345, 'hour': 3, 'min_group_hour': 3, 'max_group_hour': 5}]),
(2, [{'id': 345, 'hour': 3, 'min_group_hour': 3, 'max_group_hour': 3}]),
(3, [{'id': 65, 'hour': -2, 'min_group_hour': -2, 'max_group_hour': -2}])]

IIUC: Here is another way to do it in pandas:
import pandas as pd
input = [{"id":123,"hour":5,"group":"1"},{"id":345,"hour":3,"group":"1;2"},{"id":65,"hour":-2,"group":"3"}]
df = pd.DataFrame(input)
#Get minimum
dfmi = df.groupby('group').apply(min)
#Rename hour column as min_hour
dfmi.rename(columns={'hour':'min_hour'}, inplace=True)
dfmx = df.groupby('group').apply(max)
#Rename hour column as max_hour
dfmx.rename(columns={'hour':'max_hour'}, inplace=True)
#Merge min df with main df
df = df.merge(dfmi, on='group', how='outer')
#Merge max df with main df
df = df.merge(dfmx, on='group', how='outer')
output = list(df.apply(lambda x: x.to_dict(), axis=1))
#Dictionary of dictionaries
dict_out = df.to_dict(orient='index')

python sort with adjacent difference

I have a list of item
item = [a,a,a,b,b,b,b,c,c,c,c,c,e,e,e,e,e,e]
I would like to sort it with mix up order, so adjacent allowed maximum duplicate twice, like
[a,a,b,a,b,b,c,c,b,b,c,e,c,c,e,e,e,e,e]
because there are no more item could be shuffle with e, so e will remain duplicate with adjacent.
is there any quick way to sort this?
EDIT
To make it clear, give it a real life example, in a laptop category, I have 100 products from IBM, 10 products from Acer, 6 products from Apple, I want to sort the same brands to be as mix up as possible.
for example,
unsorted list I have
[{brand:"ibm", "id":1},{brand:"ibm", "id":2},{brand:"ibm", "id":3},{brand:"ibm", "id":4},{brand:"ibm", "id":5},{brand:"ibm", "id":6},{brand:"acer", "id":7},{brand:"acer", "id":8},{brand:"acer", "id":9},{brand:"acer", "id":10},{brand:"apple", "id":11},{brand:"apple", "id":12}]
target result, as long as same brand are not adjacent each other, like first 10 all from same brand, but it is ok 2-3 same brand adjacent,
[{brand:"ibm", "id":1},,{brand:"acer", "id":7},{brand:"ibm", "id":2},{brand:"ibm", "id":3},{brand:"acer", "id":8},{brand:"apple", "id":12}{brand:"ibm", "id":4},{brand:"acer", "id":9},{brand:"ibm", "id":5},{brand:"ibm", "id":6},{brand:"acer", "id":10}]
it will be good not use random, but with a deterministic sort, so every time the user still see the same order, however it is not a must, since it could be saved into cache.
Thanks

SECOND EDIT
Ok, well now I get it. You made this sound like a shuffle when it's really not like that. Here's an answer, a little more involved.
First I want to introduce pprint. This is just a version of print that formats things nicely:
from pprint import pprint
pprint(items)
#>>> [{'brand': 'ibm', 'id': 1},
#>>> {'brand': 'ibm', 'id': 2},
#>>> {'brand': 'ibm', 'id': 3},
#>>> {'brand': 'ibm', 'id': 4},
#>>> {'brand': 'ibm', 'id': 5},
#>>> {'brand': 'ibm', 'id': 6},
#>>> {'brand': 'acer', 'id': 7},
#>>> {'brand': 'acer', 'id': 8},
#>>> {'brand': 'acer', 'id': 9},
#>>> {'brand': 'acer', 'id': 10},
#>>> {'brand': 'apple', 'id': 11},
#>>> {'brand': 'apple', 'id': 12}]
With that out of the way, here we go.
We want to group the items by brand:
from collections import defaultdict
brand2items = defaultdict(list)
for item in items:
brand2items[item["brand"]].append(item)
pprint(brand2items)
#>>> {'acer': [{'brand': 'acer', 'id': 7},
#>>> {'brand': 'acer', 'id': 8},
#>>> {'brand': 'acer', 'id': 9},
#>>> {'brand': 'acer', 'id': 10}],
#>>> 'apple': [{'brand': 'apple', 'id': 11}, {'brand': 'apple', 'id': 12}],
#>>> 'ibm': [{'brand': 'ibm', 'id': 1},
#>>> {'brand': 'ibm', 'id': 2},
#>>> {'brand': 'ibm', 'id': 3},
#>>> {'brand': 'ibm', 'id': 4},
#>>> {'brand': 'ibm', 'id': 5},
#>>> {'brand': 'ibm', 'id': 6}]}
We can then get the values, 'cause we don't care about the key:
items_by_brand = list(brand2items.values())
pprint(items_by_brand)
#>>> [[{'brand': 'apple', 'id': 11}, {'brand': 'apple', 'id': 12}],
#>>> [{'brand': 'ibm', 'id': 1},
#>>> {'brand': 'ibm', 'id': 2},
#>>> {'brand': 'ibm', 'id': 3},
#>>> {'brand': 'ibm', 'id': 4},
#>>> {'brand': 'ibm', 'id': 5},
#>>> {'brand': 'ibm', 'id': 6}],
#>>> [{'brand': 'acer', 'id': 7},
#>>> {'brand': 'acer', 'id': 8},
#>>> {'brand': 'acer', 'id': 9},
#>>> {'brand': 'acer', 'id': 10}]]
Now we want to interleave the results. The basic idea is that we want to take from largest pool more often because it's going to take the longest to exhaust. So each iteration we want to take the longest and pop one of its items., only we don't want to repeat. We can do this by taking two different groups, the two largest, and interleaving their results.
We stop when none of the groups have any items left.
from heapq import nlargest
shufflatored = []
while any(items_by_brand):
items1, items2 = nlargest(2, items_by_brand, key=len)
if items1: shufflatored.append(items1.pop())
if items2: shufflatored.append(items2.pop())
The heapq module is a little known but bloody brilliant module. In fact with a fair bit of effort this could be made more efficient by keeping items_by_brand as a heap. However it's not really worth the effort because the other tools for working with heaps don't take keys, which requires obscure workarounds.
So that's it. If you want to allow doubling-up, you can replace
if items1: shufflatored.append(items1.pop())
if items2: shufflatored.append(items2.pop())
with
if items1: shufflatored.append(items1.pop())
if items1: shufflatored.append(items1.pop())
if items2: shufflatored.append(items2.pop())
if items2: shufflatored.append(items2.pop())
!
EDIT
You want something deterministic? Well why didn't you say so?
lst = list(range(20))
lst[::2], lst[1::2] = lst[1::2], lst[::2]
lst
#>>> [1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18]
Magic, isn't it?
Hopefully you know about this method to swap values in-place:
a = 1
b = 2
a, b = b, a
a
#>>> 2
b
#>>> 1
Well, lst[::2] is every other value
lst[::2]
#>>> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
and lst[1::2] is all of the other other values,
lst[1::2]
#>>> [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
so lst[::2], lst[1::2] = lst[1::2], lst[::2] swaps every other value with every other other value!
import random
items = [1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4,4]
[
iv[1] for iv in
sorted(
enumerate(items),
key=lambda iv: iv[0]+random.choice([-1, 1])
)
]
#>>> [1, 1, 2, 1, 2, 2, 3, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4]
[
iv[1] for iv in
sorted(
enumerate(range(20)),
key=lambda iv: iv[0]+random.choice([-1, 1])
)
]
#>>> [0, 2, 1, 4, 3, 5, 6, 7, 9, 8, 11, 10, 12, 14, 13, 15, 17, 16, 18, 19]
This is a random shuffle, so the first list doesn't show up most of the shuffles. The result chosen is picked by hand of all the possibilities.
Basically, this algorithm takes a list and indexes it:
items a b c d e f g h i j
indexes 0 1 2 3 4 5 6 7 8 9
It then sorts by the index + a random choice from [-1, 1]:
items a b c d e f g h i j
indexes 0 1 2 3 4 5 6 7 8 9
sort by 1 0 3 2 5 4 5 6 9 8
And results in
items b a d c f e g h j i
indexes 1 0 3 2 5 4 6 7 9 8
sort by 0 1 2 3 4 5 5 6 8 9
And it's shuffled. To change the type of shuffle, say to make it shuffle more or less, change the specifics of the list [-1, 1]. You can also try [-1, 0, 1], [0, 1] and other variations.
The algorithm in steps:
indexed = enumerate(items)
shuffled = sorted(indexed, key=lambda iv: iv[0]+random.choice([-1, 1]))
# Remove the index, extract the values out again
result = [iv[1] for iv in shuffled]
Now, efficiency.
If you're quite astute you might realise that sorting is traditionally O(n log n). Python uses TimSort, a wonderful sorting algorithm. Although any comparison sort (aka. sort that compares values) has to have an upper bound of at least O(n log n), they can also have a lower bound as low as O(n)!
This is because sorting an already-sorted list is trivial as long as you check whether it's sorted. TimSort has a localised idea of "sorted" and it will detect very quickly when the values are sorted. This means that because they're only somewhat-shuffled TimSort would perform something closer to O(kn) where k is the "shuffled-ness" of the list, which is much less than log n!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

List of dictionaries - stack one value of dictionary - python

Related

Repeating values when we use append function in list or update in dict

Correlation in Apache Spark and groupBy with Python

Creating a complex nested dictionary from multiple lists in Python

Grouping python list of dictionaries and aggregation value data

python sort with adjacent difference

Categories

Resources