Statistics on a list of dictionaries considering multiples keys - python

I have a list of dicts:
input = [{'name':'A', 'Status':'Passed','id':'x1'},
{'name':'A', 'Status':'Passed','id':'x2'},
{'name':'A','Status':'Failed','id':'x3'},
{'name':'B', 'Status':'Passed','id':'x4'},
{'name':'B', 'Status':'Passed','id':'x5'}]
I want an output like :
output = [{'name':'A', 'Passed':'2', 'Failed':'1', 'Total':'3', '%Pass':'66%'},
{'name':'B', 'Passed':'2', 'Failed':'0', 'Total':'2', '%Pass':'100%'},
{'name':'Total', 'Passed':'4', 'Failed':'1', 'Total':'5', '%Pass':'80%'}]\
i started retrieving the different names by using a lookup :
lookup = {(d["name"]): d for d in input [::-1]}
names= [e for e in lookup.values()]
names= names[::-1]
and after using the list comprehension something like :\
for name in names :
name_passed = sum(["Passed" and "name" for d in input if 'Status' in d and name in d])
name_faled = sum(["Failed" and "name" for d in input if 'Status' in d and name in d])\
But i am not sure if there is a smartest way ? a simple loop and comparing dict values will be more simple!?

Assuming your input entries will always be grouped according to the "name" key-value pair:
entries = [
{"name": "A", "Status": "Passed", "id": "x1"},
{"name": "A", "Status": "Passed", "id": "x2"},
{"name": "A", "Status": "Failed", "id": "x3"},
{"name": "B", "Status": "Passed", "id": "x4"},
{"name": "B", "Status": "Passed", "id": "x5"}
]
def to_grouped(entries):
from itertools import groupby
from operator import itemgetter
for key, group_iter in groupby(entries, key=itemgetter("name")):
group = list(group_iter)
total = len(group)
passed = sum(1 for entry in group if entry["Status"] == "Passed")
failed = total - passed
perc_pass = (100 // total) * passed
yield {
"name": key,
"Passed": str(passed),
"Failed": str(failed),
"Total": str(total),
"%Pass": f"{perc_pass:.0f}%"
}
print(list(to_grouped(entries)))
Output:
[{'name': 'A', 'Passed': '2', 'Failed': '1', 'Total': '3', '%Pass': '66%'}, {'name': 'B', 'Passed': '2', 'Failed': '0', 'Total': '2', '%Pass': '100%'}]
This will not create the final entry you're looking for, which sums the statistics of all other entries. Though, that shouldn't be too hard to do.

Related

How to convert string to valid json or yaml

I have a large script that parses js with a dataframe entry, but to shorten the question, I put what I need in a separate variable.
My variable contains the following value
value = "{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3}"
I apply the following script and get data like this
value = "{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3}"
def parse_json(value):
arr = value.split("},")
arr = [x+"}" for x in arr]
arr[-1] = arr[-1][:-1]
return json.dumps({str(i):add_quotation_marks(x) for i, x in enumerate(arr)})
def add_quotation_marks(value):
words = re.findall(r'(\w+:)', value)
for word in words:
value = value.replace(word[:-1], f'"{word[:-1]}"')
return json.loads(value)
print(parse_json(value))
{"0": {"from": [3, 4], "to": [7, 4], "color": 2}, "1": {"from": [3, 6], "to": [10, 6], "color": 3}}
The script executes correctly, but I need to get a slightly different result.
This is what the result I want to get looks like:
{
"0": {
"from": {
"0": "3",
"1": "4"
},
"to": {
"0": "7",
"1": "4"
},
"color": "2"
},
"1": {
"from": {
"0": "3",
"1": "6"
},
"to": {
"0": "10",
"1": "6"
},
"color": "3"
}
}
This is valid json and valid yaml. Please tell me how can I do this
I'd suggest a regex approach in this case:
res = []
# iterates over each "{from:...,to:...,color:...}" group separately
for obj in re.findall(r'\{([^}]+)}', value):
item = {}
# iterates over each "...:..." key-value separately
for k, v in re.findall(r'(\w+):(\[[^]]+]|\d+)', obj):
if v.startswith('['):
v = v.strip('[]').split(',')
item[k] = v
res.append(item)
This produces this output in res:
[{'from': ['3', '4'], 'to': ['7', '4'], 'color': '2'}, {'from': ['3', '6'], 'to': ['10', '6'], 'color': '3'}]
Since your values can contain commas, trying to split on commas or other markers is fairly tricky, and using these regexes to match your desired values instead is more stable.
Here's the code that converts the the value to your desired output.
import json5 # pip install json5
value = "{from:[3,4],to:[7,4],color:2},{from:[3,6],to:[10,6],color:3}"
def convert(str_value):
str_value = f"[{str_value}]" # added [] to make it a valid json
parsed_value = json5.loads(str_value) # convert to python object
output = {} # create empty dict
# Loop through the list of dicts. For each item, create a new dict
# with the index as the key and the value as the value. If the value
# is a list, convert it to a dict with the index as the key and the
# value as the value. If the value is not a list, just add it to the dict.
for i, d in enumerate(parsed_value):
output[i] = {}
for k, v in d.items():
output[i][k] = {j: v[j] for j in range(len(v))} if isinstance(v, list) else v
return output
print(json5.dumps(convert(value)))
Output
{
"0": {
"from": {
"1": 4
},
"to": {
"0": 7,
"1": 4
},
"color": 2
},
"1": {
"from": {
"0": 3,
"1": 6
},
"to": {
"0": 10,
"1": 6
},
"color": 3
}
}
json5 package allows you to convert a javascrip object to a python dictionary so you dont have to do split("},{").
Then added [ and ] to make the string a valid json.
Then load the string using json5.loads()
Now you can loop through the dictionary and convert it to desired output format.

Get max length of value inside a list which contains other lists

I got a list with keys and other lists. I want to create a function that checks the list for the longest value(string). It should give me back the longest string as number. I found nothing useful on the internet. only the strings with the key (value) need to be checked.
Output : It should count each character of the longest value(string).
Hope you can help me.
List:
[{'name': 'title', 'value': 'titel{TM} D3', 'is_on_label': 1},
{'name': 'DK in', 'value': '24V max 2.5A', 'is_on_label': 1,
'id_configuration': 79,
'options': [{'value': '30V max 3A', 'id_configuration_v': '1668'},
{'value': 'none', 'id_configuration_v': '1696'}]}]
function:
def checkLenFromConfigs(self, configs):
max_lenght = max(map(len, configs))
return max_lenght
You could recursively search for all values in your data structure:
data = [{
"name": "title",
"value": "titel{TM} D3",
"is_on_label": 1
},
[{
"name": "title",
"value": "titel{TM} D3",
"is_on_label": 1,
"sub_options": [
{
"value": "30V max 3A",
"id_configuration_v": "1668"
},
{
"value": "none none none none",
"id_configuration_v": "1696"
}
]
}],
{
"name": "DK in",
"value": "24V max 2.5A",
"is_on_label": 1,
"id_configuration": 79,
"options": [{
"value": "30V max 3A",
"id_configuration_v": "1668"
},
{
"value": "none",
"id_configuration_v": "1696"
}
]
}
]
def recur(data, count):
if isinstance(data, list):
for item in data:
count = recur(item, count)
elif isinstance(data, dict):
for k, v in data.items():
if k == 'value':
count.append(len(v))
else:
count = recur(v, count)
return count
result = recur(data, [])
print(max(result))
Out:
19

Convert dict to list with same keys, values and layout

I´m trying to extract several keys/values out of a List.
My List:
a = [
{
"id": "1",
"system": "2",
},
{
"id": "3",
"system": "4",
}
]
Now i need to parse this into a function (next function) and it returns a[current] or a[0].
But now is a[current] or a[0] a dict.
Next step is just to extract the ID and the value of it. But this below only works if a is a list! So i need to convert the a[current] or a[0] into a list. The code below has to be the same because it´s a function and if i cannot change this for several reasons, so i need to convert the dict a into a list a.
c = list()
for data in a:
value = dict()
value["id"] = data.get("id")
c.append(value)
And here i stuck, i tried several methods like .keys(), .values(), but i can´t put them together to a list anymore. It needs to be scaleable/configurable because a changes from time to time (not a[0]["id"], ...). Currently a[0] looks like this: {'id': '1', 'system': '2'}, but it needs to be like this: [{'id': '1', 'system': '2'},], that i can parse it to my search function.
I need a new list like c:
c = [
{
"id": "1",
},
{
"id": "3",
}
]
Is this your your expected output:
a = [
{
"id": "1",
"system": "2",
},
{
"id": "3",
"system": "4",
}
]
c = list()
for data in a:
value={}
value["id"]=data.get("id")
c.append([value])
# c.extend([value])
print(c)
# [[{'id': '1'}], [{'id': '3'}]]
# print(c) # Extend's output
# [{'id': '1'},{'id': '3'}]
Or you can try one-line solution
print([[{"id":val.get("id")}] for val in a])
# [[{'id': '1'}], [{'id': '3'}]]
Or as your comment if you just want
[{'id': '1', 'system': '2'},]
Then :
print([a[0]])
code updated:
a = [
{
"id": "1",
"system": "2",
},
{
"id": "3",
"system": "4",
}
]
print([[value] for value in a ])
Result:
[[{'id': '1', 'system': '2'}], [{'id': '3', 'system': '4'}]]
Here is a function to filter your dicts list:
def filter_dicts(key, a):
return [{key: element[key]} for element in a]
Use it like this:
c = filter_dicts("id", a)
Note that this will cause an error if there is a dict without the specified key, which may or may not be what you want. To avoid this, replace element[key] with element.get(key, None).

Remove duplicate dictionary from a list on the basis of key value by priority

Suppose I have the following type of list of dictionaries:
iterlist = [
{"Name": "Mike", "Type": "Admin"},
{"Name": "Mike", "Type": "Writer"},
{"Name": "Mike", "Type": "Reader"},
{"Name": "Zeke", "Type": "Writer"},
{"Name": "Zeke", "Type": "Reader"}
]
I want to remove duplicates of "Name" on the basis of "Type" by the following priority (Admin > Writer > Reader), so the end result should be:
iterlist = [
{"Name": "Mike", "Type": "Admin"},
{"Name": "Zeke", "Type": "Writer"}
]
I found a similar question but it removes duplicates for one explicit type of key-value: Link
Can someone please guide me on how to move forward with this?
This is a modified form of the solution suggested by #azro, their solution and the other solution does not take into account the priority you mentioned, you can get over that using the following code. Have a priority dict.
iterlist = [
{"Name": "Mike", "Type": "Writer"},
{"Name": "Mike", "Type": "Reader"},
{"Name": "Mike", "Type": "Admin"},
{"Name": "Zeke", "Type": "Reader"},
{"Name": "Zeke", "Type": "Writer"}
]
# this is used to get the priority
priorites = {i:idx for idx, i in enumerate(['Admin', 'Writer', 'Reader'])}
sort_key = lambda x:(x['Name'], priorites[x['Type']])
groupby_key = lambda x:x['Name']
result = [next(i[1]) for i in groupby(sorted(iterlist, key=sort_key), key=groupby_key)]
print(result)
Output
[{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}]
You can also use pandas in the following way:
transform the list of dictionary to data frame:
import pandas as pd
df = pd.DataFrame(iterlist)
create a mapping dict:
m = {'Admin': 3, 'Writer': 2, 'Reader': 1}
create a priority column using replace:
df['pri'] = df['Type'].replace(m)
sort_values by pri and groupby by Name and get the first element only:
df = df.sort_values('pri', ascending=False).groupby('Name').first().reset_index()
drop the pri column and return to dictionary using to_dict:
df.drop('pri', axis='columns').to_dict(orient='records')
This will give you the following:
[{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}]
Here is solution you can try out,
unique = {}
for v in iterlist:
# check if key exists, if not update to `unique` dict
if not unique.get(v['Name']):
unique[v['Name']] = v
print(unique.values())
dict_values([{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}])

How to use for loop along with if inside lambda python?

I have a dataframe df that has a column tags . Each element of the column tags is a list of dictionary and looks like this:
[
{
"id": "leena123",
"name": "LeenaShaw",
"slug": null,
"type": "UserTag",
"endIndex": 0,
"startIndex": 0
},
{
"id": "1234",
"name": "abc ltd.",
"slug": "5678",
"type": "StockTag",
"endIndex": 0,
"startIndex": 0
}
]
The list can have any number of elements.
Sample dataset:
0 some_data [{'id': 'leena123', 'name': 'leenaShaw', 'slug': None, 'type...
1 some data [{'id': '6', 'name': 'new', 'slug': None, 'type...
I want to create a list of all the ids from the tags column where the type is UserTag
sample output:
['leena123', 'saily639,...]
I am trying with this :
list(df['tags'].apply(lambda x: d['name'] if any(d['type'] == 'UserTag' for d in x)))
but it doesn't work. Kindly help pn this.
Use List Comprehension with df.apply:
df['id'] = df.tags.apply(lambda x: [i['id'] for i in x if i.get('type') == 'UserTag'])
Create a list from id column:
import itertools
l = df['id'].values.tolist()
output_id_list = list(itertools.chain(*l))
If you want to drop id column from df, do:
df.drop('id', inplace=True)

Categories