General:
I need help finding a way in python to get the max N items in a multi-dimensional python dictionary. For example:
things = {
"car": { "weight": 100 },
"apple": { "weight": 1 },
"spanner": { "weight": 10 }
}
In this case, I would want to find the 2 highest-weighted items in the dictionary, specifically the keys of these items. So in this case, it should return ["car", "spanner"]
Actual Problem:
Note: This is my first attempt at a genetic algorithm, so I might not be doing it correctly. At all.
As I am British, I am searching for the best cup of tea I can imagine, so I am writing a python program that generates 10 random cups of tea, then uses natural selection to find the top 5 in that ten and so on.
A cup of tea is modelled as a python dictionary, with 5 keys:
{
"brew_time": Some Number,
"milk": Some Number,
"sweeteners": Some Number,
"fitness": Some Number (This is what I'm interested in),
"name": Some randomly generated name (Doesn't really matter)
}
A cup of tea my program will spit out will look something like this:
{'brew_time': 2.0, 'milk': 0.5, 'sweeteners': 3.0, 'name': 'bold cup', 'fitness': 0}
It then generates 10 cups of tea, stored in the teas variable. This is an example of an output of that:
{0: {'brew_time': 2.0, 'milk': 0.4, 'sweeteners': 1.0, 'name': 'unafraid brew', 'fitness': 0}, 1: {'brew_time': 3.0, 'milk': 0.5, 'sweeteners': 3.0, 'name': 'fire-eating blend', 'fitness': 0}, 2: {'brew_time': 2.0, 'milk': 0.6, 'sweeteners': 2.0, 'name': 'fearless drink', 'fitness': 0}, 3: {'brew_time': 2.0, 'milk': 0.9, 'sweeteners': 3.0, 'name': 'fire-eating blend', 'fitness': 0}, 4: {'brew_time': 2.0, 'milk': 0.8, 'sweeteners': 2.0, 'name': 'fire-eating cuppa', 'fitness': 0}, 5: {'brew_time': 3.0, 'milk': 0.3, 'sweeteners': 1.0, 'name': 'fire-eating drink', 'fitness': 0}, 6: {'brew_time': 4.0, 'milk': 0.7, 'sweeteners': 2.0, 'name': 'dauntless medley', 'fitness': 0}, 7: {'brew_time': 3.0, 'milk': 0.3, 'sweeteners': 2.0, 'name': 'dauntless cuppa', 'fitness': 0}, 8: {'brew_time': 3.0, 'milk': 0.9, 'sweeteners': 2.0, 'name': 'epic drink', 'fitness': 0}, 9: {'brew_time': 2.0, 'milk': 0.4, 'sweeteners': 2.0, 'name': 'gusty drink', 'fitness': 0}}
I'm now trying to code a function called selection() that will remove the 5 least fit teas from the dictionary. (The fitness of a tea is set by me, using the rank_tea() function, which takes an array and sets all the teas fitnesses, which is a number between 0 - 1 that represents the quality of the tea)
This is what I've got so far, but it doesn't work:
def selection():
teaCopy = teas.copy()
fitnesses = []
for i in range(0, len(teaCopy)):
fitnesses.append(teas[i]["fitness"])
print(fitnesses)
max_fitnesses_indicies = sorted(range(len(fitnesses)), key=lambda x: fitnesses[x])
print(max_fitnesses_indicies)
len_array = []
print(len_array)
for i in range(0, len(teas)):
len_array.append(i)
to_be_del = list( set(max_fitnesses_indicies) - set(len_array) )
print(to_be_del)
This is the full code. Sorry for the length of the question, I just didn't want to miss anything.
Any help would be appreciated
You can simply use:
>>> sorted(things.keys(),key=lambda x:things[x]['weight'],reverse=True)
['car', 'spanner', 'apple']
To obtain a list of items sorted by their weight (here in reversed order such that the more heavy things are sorted first). So if you call:
>>> sorted(things.keys(),key=lambda x:things[x]['weight'],reverse=True)[:2]
['car', 'spanner']
you get the two heaviest. But this will run in O(n log n). In case the number of values k you wish to obtain is small (compared to the total number). You can use heapq:
from heapq import nlargest
result = nlargest(k,things.keys(),key=lambda x:things[x]['weight'])
which will - as far as I know - run in O(n log k) (k the numbers of items you want to pick).
Related
I have this list of dictionaries:
[{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
I would like to map and reduce (or group) to have a result like this:
[
{
'topic_id': 1,
'count': 2,
'variance': 3.0,
'global_average': 6.5
},
{
'topic_id': 2,
'count': 1,
'variance': 5.0,
'global_average': 5.0
}
]
Something that calculate the variance (max average - min average) and sum the count of items too.
What I have already did:
Before I just tried sum the count changing the structure of the dictionary, and making the key be the topic_id and value the count, my result was:
result = sorted(dict(functools.reduce(operator.add, map(collections.Counter, data))).items(), reverse=True)
this was just the first try.
You could achieve this with some comprehensions, a map, and the mean function from the built-in statistics module.
from statistics import mean
data = [
{
'topic_id': 1,
'average': 5.0,
'count': 1
}, {
'topic_id': 1,
'average': 8.0,
'count': 1
}, {
'topic_id': 2,
'average': 5.0,
'count': 1
}
]
# a set of unique topic_id's
keys = set(i['topic_id'] for i in data)
# a list of list of averages for each topic_id
averages = [[i['average'] for i in data if i['topic_id'] == j] for j in keys]
# a map of tuples of (counts, variances, averages) for each topic_id
stats = map(lambda x: (len(x), max(x) - min(x), mean(x)), averages)
# finally reconstruct it back into a list
result = [
{
'topic_id': key,
'count': count,
'variance': variance,
'global_average': average
} for key, (count, variance, average) in zip(keys, stats)
]
print(result)
Returns
[{'topic_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5}, {'topic_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
Here is an attempt using itertools.groupby to group the data based on the topic_id:
import itertools
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# groupby
grouper = itertools.groupby(data, key=lambda x: x['topic_id'])
# holder for output
output = []
# iterate over grouper to calculate things
for key, group in grouper:
# variables for calculations
count = 0
maxi = -1
mini = float('inf')
total = 0
# one pass over each dictionary
for g in group:
avg = g['average']
maxi = avg if avg > maxi else maxi
mini = avg if avg < mini else mini
total += avg
count += 1
# write to output
output.append({'total_id':key,
'count':count,
'variance':maxi-mini,
'global_average':total/count})
Giving this output:
[{'total_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5},
{'total_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
Note that the 'variance' for the second group is 0.0 here instead of 5.0; this is different from your expected output, but I would guess this is what you want?
If you are willing to use pandas, this seems like an appropriate use case:
import pandas as pd
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# move to dataframe
df = pd.DataFrame(data)
# groupby and get all desired metrics
grouped = df.groupby('topic_id')['average'].describe()
grouped['variance'] = grouped['max'] - grouped['min']
# rename columns and remove unneeded ones
grouped = grouped.reset_index().loc[:, ['topic_id', 'count', 'mean', 'variance']].rename({'mean':'global_average'}, axis=1)
# back to list of dicts
output = grouped.to_dict('records')
output is:
[{'topic_id': 1, 'count': 2.0, 'global_average': 6.5, 'variance': 3.0},
{'topic_id': 2, 'count': 1.0, 'global_average': 5.0, 'variance': 0.0}]
You can also try to use the agg functionality of pandas dataframe like this
import pandas as pd
f = pd.DataFrame(d).set_index('topic_id')
def var(x):
return x.max() - x.min()
out = f.groupby(level=0).agg(count=('count', 'sum'),
global_average=('average', 'mean'),
variance=('average', var))
I have created a for loop and what I want is that the end result of each iteration of the loop to be stored as a dictionary(tfdict). Now what I need is all the dicts to be combined in one dict get that final dict.
for i in range(0,len(sep)):
n=len(sep[i])
tfDict = dict.fromkeys(setwords,0)
for word in sep[i]:
tfDict[word]+=1
tfDict[word] = tfDict[word]/n
x=fin.values()
for word,val in tfDict.items():
for w,v in fin.items():
x = v
if(word==w):
tfDict[word]=val*x
print(tfDict)
here on print this inside the loop, I get the needed output
{'and': 0.0, 'document': 0.23783346831109634, 'first': 0.0, 'is': 0.16666666666666666, 'one': 0, 'second': 0.16666666666666666, 'the': 0.16666666666666666, 'third': 0, 'this': 0.16666666666666666}
{'and': 0.3193817886456925, 'document': 0.0, 'first': 0.0, 'is': 0.16666666666666666, 'one': 0.16666666666666666, 'second': 0, 'the': 0.16666666666666666, 'third': 0.16666666666666666, 'this': 0.16666666666666666}
{'and': 0.0, 'document': 0.24462871026284194, 'first': 0.3021651247531982, 'is': 0.2, 'one': 0, 'second': 0, 'the': 0.2, 'third': 0, 'this': 0.2}
Now, I want all of this outside the loop as well, in from of dict of dict or panda. Is there a way i can do this?
I have a dictionary as
ex_dict_tot={'recency': 12, 'frequency': 12, 'money': 12}
another count dictionary as
ex_dict_count= {'recency': {'current': 4, 'savings': 2, 'fixed': 6},
'frequency': {'freq': 10, 'infreq': 2},
'money': {'med': 2, 'high': 8, 'low': 1, 'md': 1}}
I would like to calculate the proportions of each key values as,
In key - recency,
current=4/12,
savings=2/12,
fixed=6/12
Similarly - in key - frequency,
freq=10/12
infreq=2/12
And the required output would be,
{'recency': {'current': 0.3, 'savings': 0.16, 'fixed': 0.5},
'frequency': {'freq': 0.83, 'infreq': 0.16},
'money': {'med': 0.16, 'high': 0.6, 'low': 0.08, 'md': 0.08}}
Could you please write your suggestions/inputs on it?
You can do this with dict comprehension.
out = {key:{k:v/ex_dict_tot[key] for k,v in val.items()} for key,val in ex_dict_count.items()}
out
{'recency': {'current': 0.3333333333333333, 'savings': 0.16666666666666666, 'fixed': 0.5},
'frequency': {'freq': 0.8333333333333334, 'infreq': 0.16666666666666666},
'money': {'med': 0.16666666666666666, 'high': 0.6666666666666666, 'low': 0.08333333333333333, 'md': 0.08333333333333333}}
Use round to get values with floating-point precision 2.
out = {key:{k:round(v/ex_dict_tot[key],2) for k,v in val.items()} for key,val in ex_dict_count.items()}
out
{'recency': {'current': 0.33, 'savings': 0.17, 'fixed': 0.5},
'frequency': {'freq': 0.83, 'infreq': 0.17},
'money': {'med': 0.17, 'high': 0.67, 'low': 0.08, 'md': 0.08}}
I'm fairly new to programming, and am trying to get my head around parsing json. Specifically, I'm working with a string that describes football betting markets, and contains (amongst many others) a value for individual matches(marketId), child values for each possible result (home/away/draw)(selectionId), and further child values for the price that you can back/lay at(price).
I've run my code through json.loads, and I've got this to work with, which I have assigned to the variable "output".
[{"jsonrpc":"2.0","result":[{"marketId":"1.139185909","isMarketDataDelayed":true,"status":"OPEN","betDelay":0,"bspReconciled":false,"complete":true,"inplay":false,"numberOfWinners":1,"numberOfRunners":3,"numberOfActiveRunners":3,"lastMatchTime":"2018-02-12T10:56:09.726Z","totalMatched":645229.98,"totalAvailable":1039329.11,"crossMatching":true,"runnersVoidable":false,"version":2045792715,"runners":[{"selectionId":55190,"handicap":0.0,"status":"ACTIVE","lastPriceTraded":1.4,"totalMatched":0.0,"ex":{"availableToBack":[{"price":1.39,"size":56703.76}],"availableToLay":[{"price":1.4,"size":35537.54}],"tradedVolume":[]}},{"selectionId":1703,"handicap":0.0,"status":"ACTIVE","lastPriceTraded":11.0,"totalMatched":0.0,"ex":{"availableToBack":[{"price":10.5,"size":3592.64}],"availableToLay":[{"price":11.0,"size":5913.05}],"tradedVolume":[]}},{"selectionId":58805,"handicap":0.0,"status":"ACTIVE","lastPriceTraded":5.3,"totalMatched":0.0,"ex":{"availableToBack":[{"price":5.2,"size":9136.62}],"availableToLay":[{"price":5.3,"size":5361.48}],"tradedVolume":[]}}]},{"marketId":"1.139782182","isMarketDataDelayed":true,"status":"OPEN","betDelay":0,"bspReconciled":false,"complete":true,"inplay":false,"numberOfWinners":1,"numberOfRunners":3,"numberOfActiveRunners":3,"lastMatchTime":"2018-02-12T10:25:33.842Z","totalMatched":1715.46,"totalAvailable":39526.8,"crossMatching":true,"runnersVoidable":false,"version":2044817355,"runners":[{"selectionId":18567,"handicap":0.0,"status":"ACTIVE","lastPriceTraded":2.3,"totalMatched":0.0,"ex":{"availableToBack":[{"price":2.22,"size":148.27}],"availableToLay":[{"price":2.32,"size":10.1}],"tradedVolume":[]}},{"selectionId":62683,"handicap":0.0,"status":"ACTIVE","lastPriceTraded":3.85,"totalMatched":0.0,"ex":{"availableToBack":[{"price":3.8,"size":76.9}],"availableToLay":[{"price":3.9,"size":20.57}],"tradedVolume":[]}},{"selectionId":58805,"handicap":0.0,"status":"ACTIVE","lastPriceTraded":3.25,"totalMatched":0.0,"ex":{"availableToBack":[{"price":3.2,"size":21.19}],"availableToLay":[{"price":3.5,"size":85.41}],"tradedVolume":[]}}]}], "id":1}]
I'm trying to extract the value of 'marketId', followed by their corresponding child values of 'selectionId' and 'price', which should look like this:
1.139185909 (marketId 0)
55190 (selectionId 0 under the first market)
1.39 (selectionId 0's back price)
1.4 (selectionId 0's lay price)
1703 (selectionId 1 under the first market)
10.5 (selectionId 1's back price)
11 (selectionId 1's lay price)
58805 (selectionId 2 under the first market)
5.2 (selectionId 2's back price)
5.3 (selectionId 2's lay price)
1.139782182 (marketId 1)
18567 (selectionId 0 under the second market)
2.22 (selectionId 0's back price)
2.32 (selectionId 0's lay price)
62683 (selectionId 1 under the second market)
cont...
I've used for loops to print these values:
for i in output[0]['result']:
print(i.get('marketId'))
for j in output[0]['result'][0]['runners']:
print(j.get('selectionId'))
for k in output[0]['result'][0]['runners'][0]['ex']['availableToBack']:
print(k.get('price'))
for l in output[0]['result'][0]['runners'][0]['ex']['availableToLay']:
print(l.get('price'))
When I run it though, it returns:
1.139185909
55190
1.39
1.4
1703
1.39
1.4
58805
1.39
1.4
1.139782182
55190
1.39
1.4
1703
1.39
1.4
58805
1.39
1.4
The marketId values are OK here, but when I try to return other nested values, the program keeps returning the first set of values it comes across. I can't seem to find an answer anywhere - how can I get it to return the correct values?
You are indicing to the first element of each of your lists. Your for loops do give you access to the other objects each time, but your nested loop then ignores those elements.
When you use
for i in output[0]['result']:
i is bound to each element in the 'result' list, one by one, and you print out the marketId value for those.
But your next loop, then ignores all but the first one of those dictionaries:
for j in output[0]['result'][0]['runners']:
Here, output[0]['result'][0] is the first object bound to i in the outer loop. So for each i, you ignore the object (apart from using the 'markedId' key), and then only look at the runners for the first such object.
Use the 'runners' key of i instead:
for i in output[0]['result']:
print(i.get('marketId'))
for j in i['runners']: # i is the object from the outer loop
print(j.get('selectionId'))
# ...
i is first bound to output[0]['result'][0], then to output[0]['result'][1], etc., so i['runners'] now follows along and lets you process the correct substructure.
Do so for each nesting level; j is another dictionary, so use j['ex']['availableToBack'], etc.
You don't need to nest your for loops for the availableToBack and availableToLay entries either, ex contains just one dictionary object, and it has those two keys (which reference lists). You don't need to produce output for all availableToLay prices for each price in availableToBack.
It's easier to see all this if you pretty-print the Python data structure:
>>> pprint(output)
[{'id': 1,
'jsonrpc': '2.0',
'result': [{'betDelay': 0,
'bspReconciled': False,
'complete': True,
'crossMatching': True,
'inplay': False,
'isMarketDataDelayed': True,
'lastMatchTime': '2018-02-12T10:56:09.726Z',
'marketId': '1.139185909',
'numberOfActiveRunners': 3,
'numberOfRunners': 3,
'numberOfWinners': 1,
'runners': [{'ex': {'availableToBack': [{'price': 1.39,
'size': 56703.76}],
'availableToLay': [{'price': 1.4,
'size': 35537.54}],
'tradedVolume': []},
'handicap': 0.0,
'lastPriceTraded': 1.4,
'selectionId': 55190,
'status': 'ACTIVE',
'totalMatched': 0.0},
{'ex': {'availableToBack': [{'price': 10.5,
'size': 3592.64}],
'availableToLay': [{'price': 11.0,
'size': 5913.05}],
'tradedVolume': []},
'handicap': 0.0,
'lastPriceTraded': 11.0,
'selectionId': 1703,
'status': 'ACTIVE',
'totalMatched': 0.0},
{'ex': {'availableToBack': [{'price': 5.2,
'size': 9136.62}],
'availableToLay': [{'price': 5.3,
'size': 5361.48}],
'tradedVolume': []},
'handicap': 0.0,
'lastPriceTraded': 5.3,
'selectionId': 58805,
'status': 'ACTIVE',
'totalMatched': 0.0}],
'runnersVoidable': False,
'status': 'OPEN',
'totalAvailable': 1039329.11,
'totalMatched': 645229.98,
'version': 2045792715},
{'betDelay': 0,
'bspReconciled': False,
'complete': True,
'crossMatching': True,
'inplay': False,
'isMarketDataDelayed': True,
'lastMatchTime': '2018-02-12T10:25:33.842Z',
'marketId': '1.139782182',
'numberOfActiveRunners': 3,
'numberOfRunners': 3,
'numberOfWinners': 1,
'runners': [{'ex': {'availableToBack': [{'price': 2.22,
'size': 148.27}],
'availableToLay': [{'price': 2.32,
'size': 10.1}],
'tradedVolume': []},
'handicap': 0.0,
'lastPriceTraded': 2.3,
'selectionId': 18567,
'status': 'ACTIVE',
'totalMatched': 0.0},
{'ex': {'availableToBack': [{'price': 3.8,
'size': 76.9}],
'availableToLay': [{'price': 3.9,
'size': 20.57}],
'tradedVolume': []},
'handicap': 0.0,
'lastPriceTraded': 3.85,
'selectionId': 62683,
'status': 'ACTIVE',
'totalMatched': 0.0},
{'ex': {'availableToBack': [{'price': 3.2,
'size': 21.19}],
'availableToLay': [{'price': 3.5,
'size': 85.41}],
'tradedVolume': []},
'handicap': 0.0,
'lastPriceTraded': 3.25,
'selectionId': 58805,
'status': 'ACTIVE',
'totalMatched': 0.0}],
'runnersVoidable': False,
'status': 'OPEN',
'totalAvailable': 39526.8,
'totalMatched': 1715.46,
'version': 2044817355}]}]
Your code would be more readable if you used more descriptive names:
for response in output: # loop over the JSONRPC responses
for market in response['result']: # each is a market
print(market['marketId'])
for runner in market['runners']:
print(runner['selectionId'])
for entry in runner['ex']['availableToBack']:
print(entry['price'])
for entry in runner['ex']['availableToLay']:
print(entry['price'])
This outputs:
1.139185909
55190
1.39
1.4
1703
10.5
11.0
58805
5.2
5.3
1.139782182
18567
2.22
2.32
62683
3.8
3.9
58805
3.2
3.5
I have written a program which gives me the following outputs for five nodes which is the shortest path from each node to different nodes :
G1= {'D': 3.0, 'E': 4.0, 'B': 1.0, 'C': 5.0, 'A': 0}
G1={'D': 2.0, 'E': 3.0, 'B': 0, 'C': 4.0, 'A': 1.0}
G1={'D': 2.0, 'E': 3.0, 'B': 4.0, 'C': 0, 'A': 5.0}
G1={'D': 0, 'E': 1.0, 'B': 2.0, 'C': 2.0, 'A': 3.0}
G1={'D': 1.0, 'E': 0, 'B': 3.0, 'C': 3.0, 'A': 4.0}
I am trying to find the mean of all of the nodes from the above output. I tried the following code :
for s in G:
G1=ShortestPaths(G,s)#this gives the output i mentioned above
mean= sum([G1[s] for s in G1])/(len(G1)-1)# this is where i am not getting result
return float(mean)
But it is giving mean of only the last line.I need sum of all the values in the dictionary(sum of 25 values) and divide by 20(since there is a zero in every line of my output.I should not consider that). Can anyone help me with this with a simple code?? I am not suppose to .items and other built-in functions.
Calculate the mean at the end, after the loop:
total = 0.0
count = 0.0
for s in G:
G1=ShortestPaths(G,s)
total += sum([G1[s] for s in G1])
count += (len(G1)-1)
return float(total / count) if count else None