Reconstruct / Build new array based on array items - python

I have an array composed of varied sub array like:
[{'x':'xvalue', 'y':'yvalue', 'group':'groupname'}...{'x':'xnvalue', 'y':'ynvalue', 'group':groupnname'}]
I want to create a new array or serialize the same array in the form of:
[{'groupa':['x':'xvalue', 'y':'yvalue'}}...{'groupn':{'x':'xnvalue', 'y':'ynvalue'}]
Apologies for putting the question in a very weird way, but did'nt had any better explanation of the problem.
My preferred scripting language here is python.
Sample data:
{"id":"jMGTsJXWiI","key":"s1","value":{'group' : "x", 't':'45', 'xs':'x5e8'}}
{"id":"545sJXWiI","key":"s3","value":{'group' : "x", 't':'415', 'xs':'xr58'}}
{"id":"xjMdT45","key":"s2","value":{'group' : "y", 't':'405', 'xs':'xs58'}}

Assuming your data is really a list of dictionaries, this would work:
>>> groups
[{'y': 'yvalue', 'x': 'xvalue', 'group': 'groupname'}, {'y': 'ynvalue', 'x': 'xnvalue', 'group': 'groupnname'}]
>>> final_groups = {grp.pop('group'):grp for grp in groups}
>>> final_groups
{'groupname': {'y': 'yvalue', 'x': 'xvalue'}, 'groupnname': {'y': 'ynvalue', 'x': 'xnvalue'}}
This assumes 2.7+ because of dictionary comprehension. If 2.6-, then
>>> final_groups = dict((grp.pop('group'),grp) for grp in groups)
EDIT
To answer the question in your comment.
No, there is no import group. Here is the complete script:
>>> groups = [{'x':'xvalue', 'y':'yvalue', 'group':'groupname'},{'x':'xnvalue', 'y':'ynvalue', 'group':'groupnname'}]
>>> final_groups = dict((grp.pop('group'),grp) for grp in groups)
>>> final_groups
{'groupname': {'y': 'yvalue', 'x': 'xvalue'}, 'groupnname': {'y': 'ynvalue', 'x': 'xnvalue'}}
The {...} is 2.7+ specific. It is called a dictioary comprehension and if your python version is less that 2.7 then you can't do it like this, and instead can do it like I have listed above.
EDIT 2
How about something like:
final_groups = dict(
[
('%s.%s' % (item['value'].pop('group'), item['key']), item['value']) for item in groups
]
)
OUTPUT
{'y.s2': {'xs': 'xs58', 't': '405'}, 'x.s3': {'xs': 'xr58', 't': '415'}, 'x.s1': {'xs': 'x5e8', 't': '45'}}

One-liner:
>>> lis=[{'x':'xvalue', 'y':'yvalue', 'group':'groupname'},{'x':'xnvalue', 'y':'ynvalue', 'group':'groupnname'}]
>>> [{x['group']:{y:x[y] for y in x if y !='group'} for x in lis}]
[{'groupname': {'y': 'yvalue', 'x': 'xvalue'}, 'groupnname': {'y': 'ynvalue', 'x': 'xnvalue'}}]
using for loop:
lis=[{'x':'xvalue', 'y':'yvalue', 'group':'groupname'},{'x':'xnvalue', 'y':'ynvalue', 'group':'groupnname'}]
lis1=[{} for _ in range(len(lis))] # lis1= [{},{}]
for i,x in enumerate(lis):
lis1[i][x['group']]={} #creates lis1[{'groupname':{}}]
for y in x:
if y!='group':
lis1[i][x['group']][y]=x[y] #add values to lis1[{'groupname':{}}]
print(lis1)

Related

how to calculate percentage with nested dictionary

I'm stuck with how to calculate percentages with nested dictionary. I have a dictionay defined by old_dict = {'X': {'a': 0.69, 'b': 0.31}, 'Y': {'a': 0.96, 'c': 0.04}}, and I know the percentage of Xand Y are in the table:
input= {"name":['X','Y'],"percentage":[0.9,0.1]}
table = pd.DataFrame(input)
OUTPUT:
name percentage
0 X 0.9
1 Y 0.1
But I hope to use the percentage of X and Y to multiply by a,b, c separately. That is, X*a = 0.9*0.69, X*b = 0.9*0.31,Y*a = 0.1*0.96, Y*c = 0.1*0.04... so that I can find the mixed percentage of a, b, and c, and finally got a new dictionary new_dict = {'a': 0.717, 'b': 0.279 ,'c': 0.004}.
I'm struggling with how to break through the nested dictionary and how to link X and Y with the corresponding value in the table. Can anyone help me? Thank you!
You could use a DataFrame for the first dictionary and a Series for the second and perform an aligned multiplication, then sum:
old_dict = {'X': {'a': 0.69, 'b': 0.31}, 'Y': {'a': 0.96, 'c': 0.04}}
df = pd.DataFrame(old_dict)
inpt = {"name":['X','Y'],"percentage":[0.9,0.1]}
table = pd.DataFrame(inpt)
# convert table to series:
ser = table.set_index('name')['percentage']
# alternative build directly a Series:
# ser = pd.Series(dict(zip(*inpt.values())))
# compute expected values:
out = (df*ser).sum(axis=1).to_dict()
output: {'a': 0.717, 'b': 0.279, 'c': 0.004}

python naive string token matcher

I've written a very naive token string search matcher. It's a little too naive though, as with the following code, it would bring back every artists in the artists list, due to how 'a r i z o n a' is tokenised.
import collections
import re
def __tokenised_match(artist, search_artist):
matches = []
if len(re.split(r'[\\\s/-]', search_artist)) > 1:
a = [artist.sanitisedOne, search_artist]
bag_of_words = [ collections.Counter(re.findall(r'\w+', words)) for words in a]
sumbags = sum(bag_of_words, collections.Counter())
print(sumbags)
for key, value in sumbags.items():
if len(re.findall(r'\b({k})\b'.format(k=key), search_artist)) > 0 and value > 1:
matches.append(artist)
if len(matches):
return matches
artists = [
{ 'artist': 'A R I Z O N A', 'sanitisedOne': 'a r i z o n a'},
{ 'artist': 'Wutang Clan', 'sanitisedOne': 'wutang clan'}
]
search_artist = 'a r i z o n a'
for artist in artists:
print(__tokenised_match(artist, search_artist))
this'll create a sumbags like this:
Counter({'a': 4, 'r': 2, 'i': 2, 'z': 2, 'o': 2, 'n': 2})
Counter({'a': 2, 'wutang': 1, 'clan': 1, 'r': 1, 'i': 1, 'z': 1, 'o': 1, 'n': 1})
this is kind of edge casey, but i wonder how i can tighten up against this kind of edge case. it would be fine for 'wutang clang' to match, but when it's single letters like this... it's a little much and will bring back every artist due to a matching twice.
The basic problem is that you return success on only a single match. This will kill your accuracy for any artist with an easily matched token in the name. We could tune your algorithm for matching a certain percentage of words, or for doing a bag-of-letters, intersection-over-union ratio, but ...
I recommend that you use something a bit stronger, such as string similarity, which is easily found in Python code. Being already packaged, it's much easier to use than coding your own solution.

Python How to find arrays that has a certain element efficiently

Given lists(a list can have an element that is in another list) and a string, I want to find all names of lists that contains a given string.
Simply, I could just go through all lists using if statements, but I feel that there is more efficient way to do so.
Any suggestion and advice would be appreciated. Thank you.
Example of Simple Method I came up with
arrayA = ['1','2','3','4','5']
arrayB = ['3','4','5']
arrayC = ['1','3','5']
arrayD = ['7']
foundArrays = []
if givenString in arrayA:
foundArrays.append('arrayA')
if givenString in arrayB:
foundArrays.append('arrayB')
if givenString in arrayC:
foundArrays.append('arrayC')
if givenString in arrayD:
foundArrays.append('arrayD')
return foundArrays
Lookup in a list is not very efficient; a set is much better.
Let's define your data like
data = { # a dict of sets
"a": {1, 2, 3, 4, 5},
"b": {3, 4, 5},
"c": {1, 3, 5},
"d": {7}
}
then we can search like
search_for = 3 # for example
in_which = {label for label,values in data.items() if search_for in values}
# -> in_which = {'a', 'b', 'c'}
If you are going to repeat this often, it may be worth pre-processing your data like
from collections import defaultdict
lookup = defaultdict(set)
for label,values in data.items():
for v in values:
lookup[v].add(label)
Now you can simply
in_which = lookup[search_for] # -> {'a', 'b', 'c'}
The simple one-liner is:
result = [lst for lst in [arrayA, arrayB, arrayC, arrayD] if givenString in lst]
or if you prefer a more functional style:
result = filter(lambda lst: givenString in lst, [arrayA, arrayB, arrayC, arrayD])
Note that neither of these gives you the NAME of the list. You shouldn't ever need to know that, though.
Array names?
Try something like this with eval() nonetheless using eval() is evil
arrayA = [1,2,3,4,5,'x']
arrayB = [3,4,5]
arrayC = [1,3,5]
arrayD = [7,'x']
foundArrays = []
array_names = ['arrayA', 'arrayB', 'arrayC', 'arrayD']
givenString = 'x'
result = [arr for arr in array_names if givenString in eval(arr)]
print result
['arrayA', 'arrayD']

Matrix weight algorithm

I'm trying to work out how to write an algorithm to calculate the weights across different lists the most efficient way. I have a dict which contains various ids:
x["Y"]=[id1,id2,id3...]
x["X"]=[id2,id3....]
x["Z"]=[id3]
.
.
I have an associated weight for each of the elements:
w["Y"]=10
w["X"]=10
w["Z"]=5
Given an input, e.g. "Y","Z", I want to get an output of to give me:
(id1,10),(id2,10),(id3,15)
id3 gets 15 because it's in both x["Y"] and x["Z"].
Is there a way way I can do this with vector matrixes?
You can use the itertools library to group together common terms in a list:
import itertools
import operator
a = {'x': [2,3], 'y': [1,2,3], 'z': [3]}
b = {'x': 10, 'y': 10, 'z': 5}
def matrix_weight(letter1,letter2):
final_list = []
for i in a[letter1]:
final_list.append((i, b[letter1]))
for i in a[letter2]:
final_list.append((i, b[letter2]))
# final_list = [(1,10), (2,10), (3,10), (3,5)]
it = itertools.groupby(final_list, operator.itemgetter(0))
for key, subiter in it:
yield key, sum(item[1] for item in subiter)
print list(matrix_weight('y', 'z'))
I'll use the id in strings as in your example, but integer id works similarly.
def id_weights(x, w, keys):
result = {}
for key in keys:
for id in x[key]:
if id not in result:
result[id] = 0
result[id] += w[key]
return [(id, result[id]) for id in sorted(result.keys())]
x = {"Y": ["id1","id2","id3"],
"X": ["id2", "id3"],
"Z": ["id3"]}
w = {"Y": 10, "X": 10, "Z": 5}
if __name__ == "__main__":
keys = ["Y", "Z"]
print id_weights(x, w, keys)
gives
[('id1', 10), ('id2', 10), ('id3', 15)]

Accessing dictionary key with multiple values

I have an assignment and for the first part, I am to access a text file which will have a list of production rules. I created a list of dictionaries from this text file:
x = y
x = y x
y = 0
y = 1
that looks like this:
myList = [{'x': 'y'}, {'x':'y x'}, {'y': 0}, {'y': 1}]
I want to find all the possible outputs when applying these rules. I am going to attempt to write code later that will go through and replace the nonterminal values and output a bunch of binary. However, for this dictionary:
{'x': 'y x'}
'y x' is all one string so I cannot replace y or x with anything unless I explicitly say
'y x' = some value
I have written this code and written a really bad test code to see if the computer can see if a value for a key exists:
prodList = []
for line in open('name of file', "r"):
line = line.strip()
lhs, rhs = line.split(' = ')
myList.append({lhs:rhs})
if 'y' in myList[0].values():
print True
Now if I run this it will print True and I could move on, but I can't seem to write code where if I wrote:
if 'y' in myList[1].values():
print True
that it would be True.
I tried writing
myList.append({lhs:rhs.split()})
But that didn't help and I couldn't check for any values at all. Is there any way that I could have the list look like this:
myList = [{'x': 'y'}, {'x':'y', x'}, {'y': 0}, {'y': 1}]
So that if I wrote
if 'y' in myList[1].values():
print True
it would return True?
If this sounds confusing, please let me know so I can try to clarify more.
I also tried to make a dictionary instead of a list of dictionaries by doing this:
for line in open('file.txt', "r"):
line = line.strip()
lhs, rhs = line.split(' = ')
myDict[lhs] = rhs
but when I printed the dictionary, I only got this:
{'y': 1, 'x': 'x y'}
I'm sure there is a better way to do this but I can't seem to figure out a way that works.
I looked over the above code again and I was just looking at the list of values and not the values themselves.
My question now is how do I make the dictionary with multiple values for one key? When I run this code:
for line in open(fileName, "r"):
line = line.strip()
lhs, rhs = line.split(' = ')
prodList[lhs] = rhs.split()
print prodList
I end up with just this:
{'y': [1], 'x': ['y', 'x']}
I'm not sure as to how I get this
myList = {'x': ['y'], 'x':['y', 'x'], 'y': [0], 'y':[1]}

Categories