Summing over an array and then multiply by a dictionary [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Fruits = ['apple', 'orange', 'banana', 'kiwi']
A = [4, 3, 10, 8]
B = {'apple': {'Bill': 4, 'Jan': 3, 'Frank': 5},
'orange': {'Bill': 0, 'Jan': 1, 'Frank': 5},
'banana': {'Bill': 8, 'Jan': 6, 'Frank': 2},
'kiwi': {'Bill': 4, 'Jan': 2, 'Frank': 7}}
I am trying to sum over all the fruits of A and multiply that by B. I am having trouble doing this A is an array of just numbers and B is a dictionary. This is where I am getting confused. I am a new Python user. The numbers in A are in the same position relative to Fruits (the first number in A is the number of apples). Would this involve using sum(A)?
Sorry folks for the lack of details. Here is some clarity. I have fruits and I have numbers of fruits that each person has based on the type. I am wanting to sum all of the values of each fruit type in B such that I get:
apple = 12
orange = 6
banana = 16
kiwi = 13
Now, I want to multiple these numbers, by A, but keeping in mind that the first number in A, is apple, then orange, and so on to get a new array:
Solution = [48,18,160,104] #solution order is apple, orange, banana, kiwi

Assuming that you want to multply the sum of the fruits for each person (in B) by the cost in A, you can do the following list comprehension:
>>> [cost * sum(B[fruit].values()) for cost, fruit in zip(A, Fruits)]
[48, 18, 160, 104]

fruit_costs = {fruit_name:fruit_cost for fruit_name,fruit_cost in zip(Fruits,A)
for fruit in Fruits:
print "Fruit:",fruit,"=",sum(B[fruit].values())*fruit_costs[fruit]
I guess?

Merge everything into one big dictionary; everything here is just properties of fruits:
>>> for i, fruit in enumerate(fruits):
>>> B[fruit]['cost'] = A[i]
>>> B
{'banana': {'Frank': 2, 'Jan': 6, 'Bill': 8, 'cost': 10}, 'apple': {'Frank': 5, 'Jan': 3, 'Bill': 4, 'cost': 4}, 'orange': {'Frank': 5, 'Jan': 1, 'Bill': 0, 'cost': 3}, 'kiwi': {'Frank': 7, 'Jan': 2, 'Bill': 4, 'cost': 8}}
Rename "B" to "fruits" (losing the old value of "fruits"):
>>> fruits = B
Calculate fruit cost for each fruit:
>>> for fruitname in fruits:
... fruit = test.B[fruitname]
... fruit['total'] = fruit['Frank'] + fruit['Bill'] + fruit['Jan']
... fruit['total cost'] = fruit['cost'] * fruit['total']
...
>>> fruits
{'banana': {'total': 16, 'Frank': 2, 'Jan': 6, 'total cost': 160, 'Bill': 8, 'cost': 10}, 'apple': {'total': 12, 'Frank': 5, 'Jan': 3, 'total cost': 48, 'Bill': 4, 'cost': 4}, 'orange': {'total': 6, 'Frank': 5, 'Jan': 1, 'total cost': 18, 'Bill': 0, 'cost': 3}, 'kiwi': {'total': 13, 'Frank': 7, 'Jan': 2, 'total cost': 104, 'Bill': 4, 'cost': 8}}
Calculate total cost:
>>> total = sum(fruits[fruit]['total cost'] for fruit in fruits)
Or if that last line is awkward since you're new to Python, you can expand it out into:
>>> total = 0
>>> for fruitname in fruits:
... fruit = fruits[fruitname]
... total += fruit['total cost']
...
Either way:
>>> total
330

Related

dataframe to list of dictionary

I have the following df:
df = pd.DataFrame({"year":[2020,2020,2020,2021,2021,2021,2022,2022, 2022],"region":['europe','USA','africa','europe','USA','africa','europe','USA','africa'],'volume':[1,6,5,3,8,7,6,3,5]})
I wish to convert it to a list of dictionary such that the year would be mentioned only once in each item. Example
[{'year':2020,'europe':1,'USA':6,'africa':5,}...]
when I do:
df.set_index('year').to_dict('records')
I lost the years and the list
Another approach that uses pivot before to_dict(orient='records')
df.pivot(
index='year',
columns='region',
values='volume'
).reset_index().to_dict(orient='records')
#Output:
#[{'year': 2020, 'USA': 6, 'africa': 5, 'europe': 1},
# {'year': 2021, 'USA': 8, 'africa': 7, 'europe': 3},
# {'year': 2022, 'USA': 3, 'africa': 5, 'europe': 6}]
Try:
d = [
{"year": y, **dict(zip(x["region"], x["volume"]))}
for y, x in df.groupby("year")
]
print(d)
Prints:
[
{"year": 2020, "europe": 1, "USA": 6, "africa": 5},
{"year": 2021, "europe": 3, "USA": 8, "africa": 7},
{"year": 2022, "europe": 6, "USA": 3, "africa": 5},
]
you can use groupby on year and then zip region and volume
import pandas as pd
df = pd.DataFrame({"year":[2020,2020,2020,2021,2021,2021,2022,2022, 2022],"region":['europe','USA','africa','europe','USA','africa','europe','USA','africa'],'volume':[1,6,5,3,8,7,6,3,5]})
year_dfs = df.groupby("year")
records = []
for year, year_df in year_dfs:
year_dict = {key: value for key, value in zip(year_df["region"], year_df["volume"])}
year_dict["year"] = year
records.append(year_dict)
""" Answer
[{'europe': 1, 'USA': 6, 'africa': 5, 'year': 2020},
{'europe': 3, 'USA': 8, 'africa': 7, 'year': 2021},
{'europe': 6, 'USA': 3, 'africa': 5, 'year': 2022}]
"""
To break down each step, you could use pivot to group your df to aggregate the years, your columns become countries, and volume becomes your values
df.pivot('year','region','volume')
region USA africa europe
year
2020 6 5 1
2021 8 7 3
2022 3 5 6
To get this into dictionary format you can use the .to_dict('index')
command (in one line)
x = df.pivot('year','region','volume').to_dict('index')
{2020: {'USA': 6, 'africa': 5, 'europe': 1}, 2021: {'USA': 8, 'africa': 7, 'europe': 3}, 2022: {'USA': 3, 'africa': 5, 'europe': 6}}
finally you could use list comprehension to get it into your desired format
output = [dict(x[y], **{'year':y}) for y in x]
[{'USA': 6, 'africa': 5, 'europe': 1, 'year': 2020}, {'USA': 8, 'africa': 7, 'europe': 3, 'year': 2021}, {'USA': 3, 'africa': 5, 'europe': 6, 'year': 2022}]

Compare for min values in nested dict

I have a list of pairwise dictionary that goes like this:
[{'Anna': {'star': 5, 'banana': 12, 'bag': 7}, 'Ben': {'star': 5, 'banana': 12, 'melon': 1}},
{'Anna': {'star': 5, 'banana': 12, 'bag': 7}, 'Cam': {'star': 65, 'melon': 1}},
{'Anna': {'star': 5, 'banana': 12, 'bag': 7}, 'Den': {'juice': 0, 'cake': 4}}, ...]
I need to compare the pairs for min value(in fraction) but we only focus on the items in focal person, in this case Anna.
Take the first pair for example,
the items that 'Anna' and 'Ben' have in common are 'star' and 'banana'. Since we only care about the focal person 'Anna', we just need to find the min of 'star', 'banana', and 'bag'.
Then, subtract with 1 after comparing the pair for min values:
Ans = 1 - min('star':[5/24, 5/18], 'banana':[12/24, 12/24], 'bag':[7/24, 0])
So the ideal result will be
Anna-Ben = Ans1
Anna-Cam = Ans2
Anna-Den = Ans3
.
.
.
.
Any idea how to accomplish this? Thank you so much and sorry for my english!
*Edit:
Hi, thanks for your reply, but the thing I want is 1 minus the min of each item. Like in the 'Anna-Ben' pair,
min of 'star' between [5/24, 5/18] is 5/24,
min of 'banana' between [12/24, 12/18] is 12/24, and
min of 'bag' between [7/24, 0] is 0 (only Anna has bag, Ben doesn't has bag so it's zero).
And we ignore the 'melon' item in 'Ben' because we only concern the focal person 'Anna'.
So the final result should be [1 - 5/24 - 12/24 - 0 = 7/24] for the 'Anna-Ben' pair.
I hope I understood your problem correctly.
data = [
{'Anna': {'star': 5, 'banana': 12, 'bag': 7}, 'Ben': {'star': 5, 'banana': 12, 'melon': 1}},
{'Anna': {'star': 5, 'banana': 12, 'bag': 7}, 'Cam': {'star': 65, 'melon': 1}},
{'Anna': {'star': 5, 'banana': 12, 'bag': 7}, 'Den': {'juice': 0, 'cake': 4}}
]
results = {}
# iterate over each pair
for pair in data:
anna_data = pair.pop("Anna")
other_name, other_data = pair.popitem() # get comparing data
result = 1
anna_sum = float(sum(anna_data.values()))
other_sum = float(sum(other_data.values()))
# iterate over each of anna's item
for item, anna_val in anna_data.items():
other_val = other_data.get(item, 0) # set 0 if the item is not found in other_data
min_item = min(anna_val/anna_sum, other_val/other_sum)
result -= min_item
# save the result to a wonderful dict
key = "Anna-%s" % other_name
results[key] = result
print(results)
Result:
{'Anna-Ben': 0.29166666666666663, 'Anna-Cam': 0.7916666666666666,'Anna-Den': 1.0}
By the way I destroyed the data list, if you want to keep it intact make a copy() of it before computing this.

Creating a complex nested dictionary from multiple lists in Python

I am struggling to create a nested dictionary with the following data:
Team, Group, ID, Score, Difficulty
OneTeam, A, 0, 0.25, 4
TwoTeam, A, 1, 1, 10
ThreeTeam, A, 2, 0.64, 5
FourTeam, A, 3, 0.93, 6
FiveTeam, B, 4, 0.5, 7
SixTeam, B, 5, 0.3, 8
SevenTeam, B, 6, 0.23, 9
EightTeam, B, 7, 1.2, 4
Once imported as a Pandas Dataframe, I turn each feature into these lists:
teams, group, id, score, diff.
Using this stack overflow answer Create a complex dictionary using multiple lists I can create the following dictionary:
{'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25},
'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}
using the code:
{team: {'id': i, 'score': s, 'diff': d} for team, i, s, d in zip(teams, id, score, diff)}
But what I'm after is having 'Group' as the main key, then team, and then id, score and difficulty within the team (as above).
I have tried:
{g: {team: {'id': i, 'score': s, 'diff': d}} for g, team, i, s, d in zip(group, teams, id, score, diff)}
but this doesn't work and results in only one team per group within the dictionary:
{'A': {'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93}},
'B': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2}}}
Below is how the dictionary should look, but I'm not sure how to get there - any help would be much appreciated!
{'A:': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25}},
'B': {'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}}
A dict comprehension may not be the best way of solving this if your data is stored in a table like this.
Try something like
from collections import defaultdict
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
groups[g][team] = {'id': i, 'score': s, 'diff': d }
By using defaultdict, if groups[g] already exists, the new team is added as a key, if it doesn't, an empty dict is automatically created that the new team is then inserted into.
Edit: you edited your answer to say that your data is in a pandas dataframe. You can definitely skip the steps of turning the columns into list. Instead you could then for example do:
from collections import defaultdict
groups = defaultdict(dict)
for row in df.itertuples():
groups[row.Group][row.Team] = {'id': row.ID, 'score': row.Score, 'diff': row.Difficulty}
If you absolutely want to use comprehension, then this should work:
z = zip(teams, group, id, score, diff)
s = set(group)
d = { #outer dict, one entry for each different group
group: ({ #inner dict, one entry for team, filtered for group
team: {'id': i, 'score': s, 'diff': d}
for team, g, i, s, d in z
if g == group
})
for group in s
}
I added linebreaks for clarity
EDIT:
After the comment, to better clarify my intention and out of curiosity, I run a comparison:
# your code goes here
from collections import defaultdict
import timeit
teams = ['OneTeam', 'TwoTeam', 'ThreeTeam', 'FourTeam', 'FiveTeam', 'SixTeam', 'SevenTeam', 'EightTeam']
group = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
id = [0, 1, 2, 3, 4, 5, 6, 7]
score = [0.25, 1, 0.64, 0.93, 0.5, 0.3, 0.23, 1.2]
diff = [4, 10, 5, 6, 7, 8, 9, 4]
def no_comprehension():
global group, teams, id, score, diff
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
groups[g][team] = {'id': i, 'score': s, 'diff': d }
def comprehension():
global group, teams, id, score, diff
z = zip(teams, group, id, score, diff)
s = set(group)
d = {group: ({team: {'id': i, 'score': s, 'diff': d} for team, g, i, s, d in z if g == group}) for group in s}
print("no comprehension:")
print(timeit.timeit(lambda : no_comprehension(), number=10000))
print("comprehension:")
print(timeit.timeit(lambda : comprehension(), number=10000))
executable version
Output:
no comprehension:
0.027287796139717102
comprehension:
0.028979241847991943
They do look the same, in terms of performance. With my sentence above, I was just highlighting this as an alternative solution to the one already posted by #JohnO.

Using reduce on a list of dictionaries of dictionaries

Here is the given list.
Pets = [{'f1': {'dogs': 2, 'cats': 3, 'fish': 1},
'f2': {'dogs': 3, 'cats': 2}},
{'f1': {'dogs': 5, 'cats': 2, 'fish': 3}}]
I need to use the map and reduce function so that I can have a final result of
{'dogs': 10, 'cats': 7, 'fish': 4}
I have written a function using map
def addDict(d):
d2 = {}
for outKey, inKey in d.items():
for inVal in inKey:
if inVal in d2:
d2[inVal] += inKey[inVal]
else:
d2[inVal] = inKey[inVal]
return d2
def addDictN(L):
d2 = list(map(addDict, L))
print(d2)
That returns
[{'dogs': 5, 'cats': 5, 'fish': 1}, {'dogs': 5, 'cats': 2, 'fish': 3}]
It combines the f1 and f2 of the first and second dictionaries, but I am unsure of how to use reduce on the dictionaries to get the final result.
You can use collections.Counter to sum your list of counter dictionaries.
Moreover, your dictionary flattening logic can be optimised via itertools.chain.
from itertools import chain
from collections import Counter
Pets = [{'f1': {'dogs': 2, 'cats': 3, 'fish': 1},
'f2': {'dogs': 3, 'cats': 2}},
{'f1': {'dogs': 5, 'cats': 2, 'fish': 3}}]
lst = list(chain.from_iterable([i.values() for i in Pets]))
lst_sum = sum(map(Counter, lst), Counter())
# Counter({'cats': 7, 'dogs': 10, 'fish': 4})
This works for an arbitrary length list of dictionaries, with no key matching requirements across dictionaries.
The second parameter of sum is a start value. It is set to an empty Counter object to avoid TypeError.
Without using map and reduce, I would be inclined to do something like this:
from collections import defaultdict
result = defaultdict()
for fdict in pets:
for f in fdict.keys():
for pet, count in fdict[f].items():
result[pet] += count
Using reduce (which really is not the right function for the job, and is not in Python 3) on your current progress would be something like this:
from collections import Counter
pets = [{'dogs': 5, 'cats': 5, 'fish': 1}, {'dogs': 5, 'cats': 2, 'fish': 3}]
result = reduce(lambda x, y: x + Counter(y), pets, Counter())
You can use purely map and reduce like so:
Pets = [{'f1': {'dogs': 2, 'cats': 3, 'fish': 1},
'f2': {'dogs': 3, 'cats': 2}},
{'f1': {'dogs': 5, 'cats': 2, 'fish': 3}}]
new_pets = reduce(lambda x, y:[b.items() for _, b in x.items()]+[b.items() for _, b in y.items()], Pets)
final_pets = dict(reduce(lambda x, y:map(lambda c:(c, dict(x).get(c, 0)+dict(y).get(c, 0)), ['dogs', 'cats', 'fish']), new_pets))
Output:
{'fish': 4, 'cats': 7, 'dogs': 10}

python sort with adjacent difference

I have a list of item
item = [a,a,a,b,b,b,b,c,c,c,c,c,e,e,e,e,e,e]
I would like to sort it with mix up order, so adjacent allowed maximum duplicate twice, like
[a,a,b,a,b,b,c,c,b,b,c,e,c,c,e,e,e,e,e]
because there are no more item could be shuffle with e, so e will remain duplicate with adjacent.
is there any quick way to sort this?
EDIT
To make it clear, give it a real life example, in a laptop category, I have 100 products from IBM, 10 products from Acer, 6 products from Apple, I want to sort the same brands to be as mix up as possible.
for example,
unsorted list I have
[{brand:"ibm", "id":1},{brand:"ibm", "id":2},{brand:"ibm", "id":3},{brand:"ibm", "id":4},{brand:"ibm", "id":5},{brand:"ibm", "id":6},{brand:"acer", "id":7},{brand:"acer", "id":8},{brand:"acer", "id":9},{brand:"acer", "id":10},{brand:"apple", "id":11},{brand:"apple", "id":12}]
target result, as long as same brand are not adjacent each other, like first 10 all from same brand, but it is ok 2-3 same brand adjacent,
[{brand:"ibm", "id":1},,{brand:"acer", "id":7},{brand:"ibm", "id":2},{brand:"ibm", "id":3},{brand:"acer", "id":8},{brand:"apple", "id":12}{brand:"ibm", "id":4},{brand:"acer", "id":9},{brand:"ibm", "id":5},{brand:"ibm", "id":6},{brand:"acer", "id":10}]
it will be good not use random, but with a deterministic sort, so every time the user still see the same order, however it is not a must, since it could be saved into cache.
Thanks
SECOND EDIT
Ok, well now I get it. You made this sound like a shuffle when it's really not like that. Here's an answer, a little more involved.
First I want to introduce pprint. This is just a version of print that formats things nicely:
from pprint import pprint
pprint(items)
#>>> [{'brand': 'ibm', 'id': 1},
#>>> {'brand': 'ibm', 'id': 2},
#>>> {'brand': 'ibm', 'id': 3},
#>>> {'brand': 'ibm', 'id': 4},
#>>> {'brand': 'ibm', 'id': 5},
#>>> {'brand': 'ibm', 'id': 6},
#>>> {'brand': 'acer', 'id': 7},
#>>> {'brand': 'acer', 'id': 8},
#>>> {'brand': 'acer', 'id': 9},
#>>> {'brand': 'acer', 'id': 10},
#>>> {'brand': 'apple', 'id': 11},
#>>> {'brand': 'apple', 'id': 12}]
With that out of the way, here we go.
We want to group the items by brand:
from collections import defaultdict
brand2items = defaultdict(list)
for item in items:
brand2items[item["brand"]].append(item)
pprint(brand2items)
#>>> {'acer': [{'brand': 'acer', 'id': 7},
#>>> {'brand': 'acer', 'id': 8},
#>>> {'brand': 'acer', 'id': 9},
#>>> {'brand': 'acer', 'id': 10}],
#>>> 'apple': [{'brand': 'apple', 'id': 11}, {'brand': 'apple', 'id': 12}],
#>>> 'ibm': [{'brand': 'ibm', 'id': 1},
#>>> {'brand': 'ibm', 'id': 2},
#>>> {'brand': 'ibm', 'id': 3},
#>>> {'brand': 'ibm', 'id': 4},
#>>> {'brand': 'ibm', 'id': 5},
#>>> {'brand': 'ibm', 'id': 6}]}
We can then get the values, 'cause we don't care about the key:
items_by_brand = list(brand2items.values())
pprint(items_by_brand)
#>>> [[{'brand': 'apple', 'id': 11}, {'brand': 'apple', 'id': 12}],
#>>> [{'brand': 'ibm', 'id': 1},
#>>> {'brand': 'ibm', 'id': 2},
#>>> {'brand': 'ibm', 'id': 3},
#>>> {'brand': 'ibm', 'id': 4},
#>>> {'brand': 'ibm', 'id': 5},
#>>> {'brand': 'ibm', 'id': 6}],
#>>> [{'brand': 'acer', 'id': 7},
#>>> {'brand': 'acer', 'id': 8},
#>>> {'brand': 'acer', 'id': 9},
#>>> {'brand': 'acer', 'id': 10}]]
Now we want to interleave the results. The basic idea is that we want to take from largest pool more often because it's going to take the longest to exhaust. So each iteration we want to take the longest and pop one of its items., only we don't want to repeat. We can do this by taking two different groups, the two largest, and interleaving their results.
We stop when none of the groups have any items left.
from heapq import nlargest
shufflatored = []
while any(items_by_brand):
items1, items2 = nlargest(2, items_by_brand, key=len)
if items1: shufflatored.append(items1.pop())
if items2: shufflatored.append(items2.pop())
The heapq module is a little known but bloody brilliant module. In fact with a fair bit of effort this could be made more efficient by keeping items_by_brand as a heap. However it's not really worth the effort because the other tools for working with heaps don't take keys, which requires obscure workarounds.
So that's it. If you want to allow doubling-up, you can replace
if items1: shufflatored.append(items1.pop())
if items2: shufflatored.append(items2.pop())
with
if items1: shufflatored.append(items1.pop())
if items1: shufflatored.append(items1.pop())
if items2: shufflatored.append(items2.pop())
if items2: shufflatored.append(items2.pop())
!
EDIT
You want something deterministic? Well why didn't you say so?
lst = list(range(20))
lst[::2], lst[1::2] = lst[1::2], lst[::2]
lst
#>>> [1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18]
Magic, isn't it?
Hopefully you know about this method to swap values in-place:
a = 1
b = 2
a, b = b, a
a
#>>> 2
b
#>>> 1
Well, lst[::2] is every other value
lst[::2]
#>>> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
and lst[1::2] is all of the other other values,
lst[1::2]
#>>> [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
so lst[::2], lst[1::2] = lst[1::2], lst[::2] swaps every other value with every other other value!
import random
items = [1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4,4]
[
iv[1] for iv in
sorted(
enumerate(items),
key=lambda iv: iv[0]+random.choice([-1, 1])
)
]
#>>> [1, 1, 2, 1, 2, 2, 3, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4]
[
iv[1] for iv in
sorted(
enumerate(range(20)),
key=lambda iv: iv[0]+random.choice([-1, 1])
)
]
#>>> [0, 2, 1, 4, 3, 5, 6, 7, 9, 8, 11, 10, 12, 14, 13, 15, 17, 16, 18, 19]
This is a random shuffle, so the first list doesn't show up most of the shuffles. The result chosen is picked by hand of all the possibilities.
Basically, this algorithm takes a list and indexes it:
items a b c d e f g h i j
indexes 0 1 2 3 4 5 6 7 8 9
It then sorts by the index + a random choice from [-1, 1]:
items a b c d e f g h i j
indexes 0 1 2 3 4 5 6 7 8 9
sort by 1 0 3 2 5 4 5 6 9 8
And results in
items b a d c f e g h j i
indexes 1 0 3 2 5 4 6 7 9 8
sort by 0 1 2 3 4 5 5 6 8 9
And it's shuffled. To change the type of shuffle, say to make it shuffle more or less, change the specifics of the list [-1, 1]. You can also try [-1, 0, 1], [0, 1] and other variations.
The algorithm in steps:
indexed = enumerate(items)
shuffled = sorted(indexed, key=lambda iv: iv[0]+random.choice([-1, 1]))
# Remove the index, extract the values out again
result = [iv[1] for iv in shuffled]
Now, efficiency.
If you're quite astute you might realise that sorting is traditionally O(n log n). Python uses TimSort, a wonderful sorting algorithm. Although any comparison sort (aka. sort that compares values) has to have an upper bound of at least O(n log n), they can also have a lower bound as low as O(n)!
This is because sorting an already-sorted list is trivial as long as you check whether it's sorted. TimSort has a localised idea of "sorted" and it will detect very quickly when the values are sorted. This means that because they're only somewhat-shuffled TimSort would perform something closer to O(kn) where k is the "shuffled-ness" of the list, which is much less than log n!

Categories