Matrix weight algorithm - python

I'm trying to work out how to write an algorithm to calculate the weights across different lists the most efficient way. I have a dict which contains various ids:
x["Y"]=[id1,id2,id3...]
x["X"]=[id2,id3....]
x["Z"]=[id3]
.
.
I have an associated weight for each of the elements:
w["Y"]=10
w["X"]=10
w["Z"]=5
Given an input, e.g. "Y","Z", I want to get an output of to give me:
(id1,10),(id2,10),(id3,15)
id3 gets 15 because it's in both x["Y"] and x["Z"].
Is there a way way I can do this with vector matrixes?

You can use the itertools library to group together common terms in a list:
import itertools
import operator
a = {'x': [2,3], 'y': [1,2,3], 'z': [3]}
b = {'x': 10, 'y': 10, 'z': 5}
def matrix_weight(letter1,letter2):
final_list = []
for i in a[letter1]:
final_list.append((i, b[letter1]))
for i in a[letter2]:
final_list.append((i, b[letter2]))
# final_list = [(1,10), (2,10), (3,10), (3,5)]
it = itertools.groupby(final_list, operator.itemgetter(0))
for key, subiter in it:
yield key, sum(item[1] for item in subiter)
print list(matrix_weight('y', 'z'))

I'll use the id in strings as in your example, but integer id works similarly.
def id_weights(x, w, keys):
result = {}
for key in keys:
for id in x[key]:
if id not in result:
result[id] = 0
result[id] += w[key]
return [(id, result[id]) for id in sorted(result.keys())]
x = {"Y": ["id1","id2","id3"],
"X": ["id2", "id3"],
"Z": ["id3"]}
w = {"Y": 10, "X": 10, "Z": 5}
if __name__ == "__main__":
keys = ["Y", "Z"]
print id_weights(x, w, keys)
gives
[('id1', 10), ('id2', 10), ('id3', 15)]

Related

Adding "sequential" information to python list using dictionaries

The problem
I would like to create a dictionary of dicts out of a flat list I have in order to add a "sequentiality" piece of information, but I am having some trouble finding a solution.
The list is something like
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
and I am shooting for a dict like:
dictionary = {
'Step_1': {
'Q=123',
'W=456',
'E=789'
},
'Step_2': {
'Q=753',
'W=159',
'E=888'
}
}
I would like to end up with a function with an arbitrary number of Steps, in order to apply it to my dataset. Suppose that in the dataset there are lists like a with 1 <= n <6 Steps each.
My idea
Up to now, I came up with this:
nsteps = a.count("Q")
data = {}
for i in range(nsteps):
stepi = {}
for element in a:
new = element.split("=")
if new[0] not in stepi:
stepi[new[0]] = new[1]
else:
pass
data[f"Step_{i}"] = stepi
but it doesn't work as intended: both steps in the final dictionary contain the data of Step_1.
Any idea on how to solve this?
One way would be:
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
indices = [i for i, v in enumerate(a) if v[0:2] == 'Q=']
dictionary = {f'Step_{idx+1}': {k: v for k, v in (el.split('=') for el in a[s:e])}
for idx, (s, e) in enumerate(zip(indices, indices[1:] + [len(a)]))}
print(dictionary)
{'Step_1': {'Q': '123', 'W': '456', 'E': '789'},
'Step_2': {'Q': '753', 'W': '159', 'E': '888'}}
Details:
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
# Get indices where a step starts.
# This could handle also steps with variable amount of elements and keys starting with 'Q' that are not exactly 'Q'.
indices = [i for i, v in enumerate(a) if v[0:2] == 'Q=']
# Get the slices of the list starting at Q and ending before the next Q.
slices = list(zip(indices, indices[1:] + [len(a)]))
print(slices)
# [(0, 3), (3, 6)]
# Get step index and (start, end) pair for each slice.
idx_slices = list(enumerate(slices))
print(idx_slices)
# [(0, (0, 3)), (1, (3, 6))]
# Split the strings in the list slices and use the result as key-value pair for a given start:end.
# Here an example for step 1:
step1 = idx_slices[0][1] # This is (0, 3).
dict_step1 = {k: v for k, v in (el.split('=') for el in a[step1[0]:step1[1]])}
print(dict_step1)
# {'Q': '123', 'W': '456', 'E': '789'}
# Do the same for each slice.
step_dicts = {f'Step_{idx+1}': {k: v for k, v in (el.split('=') for el in a[s:e])}
for idx, (s, e) in idx_slices}
print(step_dicts)
# {'Step_1': {'Q': '123', 'W': '456', 'E': '789'}, 'Step_2': {'Q': '753', 'W': '159', 'E': '888'}}
You were almost there. The way you were counting the number of "Q"s was wrong and some lines of code had a wrong indentation (for instance data[f"Step_{i}"] = stepi)
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
def main():
nsteps = len([s for s in a if "Q" in s])
data = {}
for i in range(nsteps):
stepi = {}
for element in a:
new = element.split("=")
if new[0] not in stepi:
stepi[new[0]] = new[1]
data[f"Step_{i}"] = stepi
return data
if __name__ == "__main__":
data = main()
First group by items like this:
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
o = groupby(sorted(a, key=lambda x: x[0]), key=lambda x: x[0])
then create a dictionary like this:
d = {i: [j[1] for j in g] for i, g in o}
then iterate over them and make your result:
result = {f"step_{i+1}": [v[i] for v in r.items()] for i in range(len(max(r.values(), key=len)))}
the result will be:
Out[47]: {'step_1': ['E=789', 'Q=123', 'W=456'], 'step_2': ['E=888', 'Q=753', 'W=159']}
From what I understood from your question:
We can group the items in the list, in this case, a group of three elements, and loop through them three at a time.
With some help from this answer:
from itertools import zip_longest
a = ['Q=123', 'W=456', 'E=789', 'Q=753', 'W=159', 'E=888']
def grouper(n, iterable):
args = [iter(iterable)] * n
return zip_longest(*args)
result = dict()
for i, d in enumerate(grouper(3, a), start=1):
dict.update({f"Step_{i}": set(d)})
print(result)
{
'Step_1': {'E=789', 'Q=123', 'W=456'},
'Step_2': {'E=888', 'Q=753', 'W=159'}
}

how to calculate percentage with nested dictionary

I'm stuck with how to calculate percentages with nested dictionary. I have a dictionay defined by old_dict = {'X': {'a': 0.69, 'b': 0.31}, 'Y': {'a': 0.96, 'c': 0.04}}, and I know the percentage of Xand Y are in the table:
input= {"name":['X','Y'],"percentage":[0.9,0.1]}
table = pd.DataFrame(input)
OUTPUT:
name percentage
0 X 0.9
1 Y 0.1
But I hope to use the percentage of X and Y to multiply by a,b, c separately. That is, X*a = 0.9*0.69, X*b = 0.9*0.31,Y*a = 0.1*0.96, Y*c = 0.1*0.04... so that I can find the mixed percentage of a, b, and c, and finally got a new dictionary new_dict = {'a': 0.717, 'b': 0.279 ,'c': 0.004}.
I'm struggling with how to break through the nested dictionary and how to link X and Y with the corresponding value in the table. Can anyone help me? Thank you!
You could use a DataFrame for the first dictionary and a Series for the second and perform an aligned multiplication, then sum:
old_dict = {'X': {'a': 0.69, 'b': 0.31}, 'Y': {'a': 0.96, 'c': 0.04}}
df = pd.DataFrame(old_dict)
inpt = {"name":['X','Y'],"percentage":[0.9,0.1]}
table = pd.DataFrame(inpt)
# convert table to series:
ser = table.set_index('name')['percentage']
# alternative build directly a Series:
# ser = pd.Series(dict(zip(*inpt.values())))
# compute expected values:
out = (df*ser).sum(axis=1).to_dict()
output: {'a': 0.717, 'b': 0.279, 'c': 0.004}

How to avoid overwriting the data from a list to a dictionary using for loop?

I have a nested dictionary as follows:
bus = dict()
pvsystem = dict()
for j in range(500):
bus[j] = {'vm': {'a': 1, 'b': 1, 'c': 1}, 'va': {'a': 1, 'b': 1, 'c': 1}}
nw = dict()
for step in range(24):
c = step + 1
nw[str(c)] = {'bus': bus}
solution = {'nw': nw}
results = {'solution': solution}
I am using a for loop to fill up the values in the nested dictionary as follows:
for step in range(10):
c = step + 1
for b in range(20):
AllpuVmVa = dss.Bus.puVmagAngle()
results['solution']['nw'][str(c)]['bus'][b]["vm"]['a'] = AllpuVmVa[0]
results['solution']['nw'][str(c)]['bus'][b]["va"]['a'] = AllpuVmVa[1]
print("Step: ", c, " Voltages: ", AllpuVmVa)
AllpuVmVa is a list. Its values are changed once the step changed (It is defined based on function outside the loop).
Here, using the print function, it is clear that the values of AllpuVmVa at each step are different, but the values stored in (results['solution']['nw'][str(c)]['bus'][b]["vm"]['a']) and (results['solution']['nw'][str(c)]['bus'][b]["va"]['a']) are the same for all the steps, which is equal to the last step. It sounds that there is overwriting for the data.
Is there any idea to fix this issue?
The problem is, that you assign the same dictionary stored in bus to every value in nw dict. To fix the issue, you can make new bus dictionary every iteration. For example:
bus = dict()
pvsystem = dict()
def get_bus():
return {j: {'vm': {'a': 1, 'b': 1, 'c': 1}, 'va': {'a': 1, 'b': 1, 'c': 1}} for j in range(500)}
nw = dict()
for step in range(24):
c = step + 1
nw[str(c)] = {'bus': get_bus()} # <-- use get_bus() here
solution = {'nw': nw}
results = {'solution': solution}

Python How to find arrays that has a certain element efficiently

Given lists(a list can have an element that is in another list) and a string, I want to find all names of lists that contains a given string.
Simply, I could just go through all lists using if statements, but I feel that there is more efficient way to do so.
Any suggestion and advice would be appreciated. Thank you.
Example of Simple Method I came up with
arrayA = ['1','2','3','4','5']
arrayB = ['3','4','5']
arrayC = ['1','3','5']
arrayD = ['7']
foundArrays = []
if givenString in arrayA:
foundArrays.append('arrayA')
if givenString in arrayB:
foundArrays.append('arrayB')
if givenString in arrayC:
foundArrays.append('arrayC')
if givenString in arrayD:
foundArrays.append('arrayD')
return foundArrays
Lookup in a list is not very efficient; a set is much better.
Let's define your data like
data = { # a dict of sets
"a": {1, 2, 3, 4, 5},
"b": {3, 4, 5},
"c": {1, 3, 5},
"d": {7}
}
then we can search like
search_for = 3 # for example
in_which = {label for label,values in data.items() if search_for in values}
# -> in_which = {'a', 'b', 'c'}
If you are going to repeat this often, it may be worth pre-processing your data like
from collections import defaultdict
lookup = defaultdict(set)
for label,values in data.items():
for v in values:
lookup[v].add(label)
Now you can simply
in_which = lookup[search_for] # -> {'a', 'b', 'c'}
The simple one-liner is:
result = [lst for lst in [arrayA, arrayB, arrayC, arrayD] if givenString in lst]
or if you prefer a more functional style:
result = filter(lambda lst: givenString in lst, [arrayA, arrayB, arrayC, arrayD])
Note that neither of these gives you the NAME of the list. You shouldn't ever need to know that, though.
Array names?
Try something like this with eval() nonetheless using eval() is evil
arrayA = [1,2,3,4,5,'x']
arrayB = [3,4,5]
arrayC = [1,3,5]
arrayD = [7,'x']
foundArrays = []
array_names = ['arrayA', 'arrayB', 'arrayC', 'arrayD']
givenString = 'x'
result = [arr for arr in array_names if givenString in eval(arr)]
print result
['arrayA', 'arrayD']

Get average value from list of dictionary

I have lists of dictionary. Let's say it
total = [{"date": "2014-03-01", "value": 200}, {"date": "2014-03-02", "value": 100}{"date": "2014-03-03", "value": 400}]
I need get maximum, minimum, average value from it. I can get max and min values with below code:
print min(d['value'] for d in total)
print max(d['value'] for d in total)
But now I need get average value from it. How to do it?
Just divide the sum of values by the length of the list:
print sum(d['value'] for d in total) / len(total)
Note that division of integers returns the integer value. This means that average of the [5, 5, 0, 0] will be 2 instead of 2.5. If you need more precise result then you can use the float() value:
print float(sum(d['value'] for d in total)) / len(total)
I needed a more general implementation of the same thing to work on the whole dictionary. So here is one simple option:
def dict_mean(dict_list):
mean_dict = {}
for key in dict_list[0].keys():
mean_dict[key] = sum(d[key] for d in dict_list) / len(dict_list)
return mean_dict
Testing:
dicts = [{"X": 5, "value": 200}, {"X": -2, "value": 100}, {"X": 3, "value": 400}]
dict_mean(dicts)
{'X': 2.0, 'value': 233.33333333333334}
reduce(lambda x, y: x + y, [d['value'] for d in total]) / len(total)
catavaran's anwser is more easy, you don't need a lambda
An improvement on dsalaj's answer if the values are numeric lists instead:
def dict_mean(dict_list):
mean_dict = {}
for key in dict_list[0].keys():
mean_dict[key] = np.mean([d[key] for d in dict_list], axis=0)
return mean_dict

Categories