I'm looking to turn a long dataset into a wide one using functional and iterative tools, and my understanding is that this is a task for groupby. I've asked a couple of questions about this before, and thought I had it, but not quite in this case, which ought to be simpler:
Python functional transformation of JSON list of dictionaries from long to wide
Correct use of a fold or reduce function to long-to-wide data in python or javascript?
Here's the data I have:
from itertools import groupby
from operator import itemgetter
from pprint import pprint
>>> longdat=[
{"id":"cat", "name" : "best meower", "value": 10},
{"id":"cat", "name" : "cleanest paws", "value": 8},
{"id":"cat", "name" : "fanciest", "value": 9},
{"id":"dog", "name" : "smelly", "value": 9},
{"id":"dog", "name" : "dumb", "value": 9},
]
Here's the format I want it in:
>>> widedat=[
{"id":"cat", "best meower": 10, "cleanest paws": 8, "fanciest": 9},
{"id":"dog", "smelly": 9, "dumb": 9},
]
Here are my failed attempts:
# WRONG
>>> gh = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> list(gh)
[('cat', <itertools._grouper object at 0x5d0b550>), ('dog', <itertools._grouper object at 0x5d0b210>)]
OK, need to get the second item out of the iterator, fair enough.
#WRONG
>>> gh = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> for g,v in gh:
... {"id":i["id"], i["name"]:i["value"] for i in v}
^
SyntaxError: invalid syntax
Weird, it looked valid. Let's unwind those loops to make sure.
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
data = {}
for g,v in gb:
data[g] = {}
for i in v:
data[g] = i
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
data = []
for g,v in gb:
for i in v:
data[g] = i
Ah! OK, let's go back to the one-line form
#WRONG
>>> gb = groupby(sorted(longdat,key=id),itemgetter('id'))
>>> [{"id":g, i["name"]:i["value"]} for i in k for g,k in gb]
[]
What? Why empty?! Let's unwind basically exactly this again:
#WRONG
gb = groupby(sorted(longdat,key=id),itemgetter('id'))
for g,k in gb:
for i in k:
print(g, i["name"],i["value"])
cat best meower 10
cat fanciest 9
cat cleanest paws 8
dog smelly 9
dog dumb 9
Now, this last one is obviously the worst---it's clear my data is basically right back where it started, as if I didn't even groupby.
Why is this not working and how can I get this in the format I'm seeking?
Also, is it possibly to phrase this entirely iteratively such that I could do
>>> result[0]
{"id":"cat", "best meower": 10, "cleanest paws": 8, "fanciest": 9}
and only get the first result without processing the entire list (beyond having to look at /all/ where id == 'cat'?)
key function passed to the sorted function is id. It will return all different values for all list items.
It should be itemgetter('id') or lambda x: x.id.
>>> id(longdat[0])
41859624L
>>> id(longdat[1])
41860488L
>>> id(longdat[2])
41860200L
>>> itemgetter('id')(longdat[1])
'cat'
>>> itemgetter('id')(longdat[2])
'cat'
>>> itemgetter('id')(longdat[3])
'cat'
from itertools import groupby
from operator import itemgetter
longdat = [
{"id":"cat", "name" : "best meower", "value": 10},
{"id":"cat", "name" : "cleanest paws", "value": 8},
{"id":"cat", "name" : "fanciest", "value": 9},
{"id":"dog", "name" : "smelly", "value": 9},
{"id":"dog", "name" : "dumb", "value": 9},
]
getid = itemgetter('id')
result = [
dict([['id', key]] + [[d['name'], d['value']] for d in grp])
for key, grp in groupby(sorted(longdat, key=getid), key=getid)
]
print(result)
output:
[{'best meower': 10, 'fanciest': 9, 'id': 'cat', 'cleanest paws': 8},
{'dumb': 9, 'smelly': 9, 'id': 'dog'}]
Related
I want to get all 'name' from this json and separate by comma
{
"example": [
{
"id": 1,
"name": "blah"
},
{
"id": 2,
"name": "nah"
},
{
"id": 5,
"name": "krah",
},
{
"id": 10,
"name": "ugh"
}
],
}
when im trying to:
example = r_json['example']['name']
print(example)
it returns me:
example = r_json['example']['name']
TypeError: list indices must be integers or slices, not str
output i want is something like:
blah, nah, krah, ugh
The value in "example" is a list. To access items you need to use an index. e.g.
>>> r_json["example"][0]["name"]
'blah'
If you want to get all of the names, you need to loop through each item in the list.
>>> for i in range(len(r_json["example"])):
... print(r_json["example"][i]["name"])
blah
nah
krah
ugh
A simpler way to do this would be to directly iterate over the list and not use an index:
>>> for example in r_json["example"]:
... print(example["name"])
blah
nah
krah
ugh
To put them in a list you can do:
>>> names = []
>>> for example in r_json["example"]:
... names.append(example["name"])
>>> names
['blah', 'nah', 'krah', 'ugh']
An even easier way is to use a comprehension:
>>> names = [example["name"] for example in r_json["example"]]
>>> names
['blah', 'nah', 'krah', 'ugh']
Once you have the names you can use str.join to make your final result:
>>> ", ".join(names)
'blah, nah, krah, ugh'
As a one liner just for fun:
>>> ", ".join(example["name"] for example in r_json["example"])
An even more fun one-liner!
>>> from operator import itemgetter
>>> ", ".join(map(itemgetter("name"), r_json["example"]))
'blah, nah, krah, ugh'
Just as the error says, the value under r_json['example'] is a list of dictionaries, so you can't access it like: r_json['example']['name']. One way to get the names is to use a list comprehension:
example = [d['name'] for d in r_json['example']]
print(*example)
Output:
blah nah krah ugh
I have a list in the below format.
['111: {"id":"de80ca97","data":"test"}', '222: {"id":"8916a167","data":"touch"}', '333: {"id":"12966e98","data":"tap"}']
I need to remove the data column from above list / json and replace it with key value of the list.
I need to transform it to the below structure.
Desired output:
[
{
"score":111,
"id":"de80ca97"
},
{
"score":222,
"id":"8916a167"
},
{
"score":333,
"id":"12966e98"
}
]
Any suggestions or ideas most welcome.
You can use a for loop or you can also use a list comprehension as follows:
>>> import json
>>> l = ['111: {"id":"de80ca97","data":"test"}', '222: {"id":"8916a167","data":"touch"}', '333: {"id":"12966e98","data":"tap"}']
>>> [{'score': int(e.split()[0][:-1]), 'id': json.loads(e.split()[1])['id']} for e in l]
If you prefer to use a for loop:
new_l = []
for e in l:
key, json_str = e.split()
new_l.append({'score': int(key[:-1]), 'id': json.loads(json_str)['id']})
Trying to convert a list of values to be used to find a particular key value in a Dictionary.
I am not able to figure out a pythonic way to do it.
Tried converting the list to string and pass as a key to the dictionary, but it is now working as the list contains integer values also.
l = ['tsGroups', 0, 'testCases', 0, 'parameters', 'GnbControlAddr', 'ip']
d={
"tsGroups": [{"tsId": 19,
"testCases": [{"name": "abcd",
"type": "xyz",
"parameters": {"GnbControlAddr":
{"ip": "192.1.1.1",
"mac": "",
"mtu": 1500,
"phy": "eth2",
}
}
}]
}]
}
print(d["tsGroups"][0]["testCases"][0]["parameters"]["GnbControlAddr"]
["ip"])
Need to convert input list 'l' to a format to be used as
d["tsGroups"][0]["testCases"][0]["parameters"]["GnbControlAddr"]["ip"]
In [5]: d={
...: "tsGroups": [{"tsId": 19,"testCases": [{"name": "abcd","type": "xyz",
...: "parameters": {"GnbControlAddr": {
...: "ip": "192.1.1.1",
...: "mac": "",
...: "mtu": 1500,
...: "phy": "eth2",
...: }
...: }}]}]}
In [6]: L = ['tsGroups', 0, 'testCases', 0, 'parameters', 'GnbControlAddr', 'ip']
In [7]: functools.reduce?
Docstring:
reduce(function, sequence[, initial]) -> value
Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5). If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.
Type: builtin_function_or_method
In [8]: t = d
In [9]: for e in L: t = t[e]
In [10]: t
Out[10]: '192.1.1.1'
Can't speak to how pythonic this is, but looping through the list and updating a reference to a new data structure appears to work:
current = d
for key in l:
current = current[key]
print current
So i have a complex list with dictionaries and lists as values.
This is the one:
list = [
{"folder1": [
{"file1": 5},
{"folder3": [{"file2": 7},
{"file3": 10}]},
{"file4": 9}
]
},
{"folder2": [
{"folder4": []},
{"folder5": [
{"folder6": [{"file5": 17}]},
{"file6": 6},
{"file7": 5}
]},
{"file8": 10}
]
}
]
I need to extract the path for each file like a directory tree how is stored on a hdd:
Output sample:
output:
folder1/file1
folder1/file4
folder1/folder3/file2
folder1/folder3/file3
folder2/file8
folder2/folder4
folder2/folder5/file6
folder2/folder5/file7
folder2/folder5/folder6/file5
Please help, i have been struggling and could not find a way.
Thank you
You can use recursion with yield:
def get_paths(d, seen):
for a, b in d.items():
if not isinstance(b, list) or not b:
yield '{}/{}'.format("/".join(seen), a)
else:
for c in b:
for t in get_paths(c, seen+[a]):
yield t
print('\n'.join([i for b in data for i in get_paths(b, [])]))
Output:
folder1/file1
folder1/folder3/file2
folder1/folder3/file3
folder1/file4
folder2/folder4
folder2/folder5/folder6/file5
folder2/folder5/file6
folder2/folder5/file7
folder2/file8
I have a school dictionary as follow-
{
ID('6a15ce'): {
'count': 5,
'amount': 0,
'r_amount': None,
'sub': < subobj >
}, ID('464ba1'): {
'count': 2,
'amount': 120,
'r_amount': None,
'sub': < subobj2 >
}
}
I want to find out the sum of amount , doing as follow-
{k:sum(v['amount']) for k,v in school.items()}
but here I am getting error TypeError: 'int' object is not iterable what could be efficient way to achieve.
You can do:
result = sum(v["amount"] for v in school.values())
You can also do it using the map function:
result = sum(map(lambda i: i['amount'], school.values()))
print(result)
Output:
120
This is a functional solution:
from operator import itemgetter
res = sum(map(itemgetter('amount'), school.values()))
sum(map(lambda schoolAmount: schoolAmount.amount, school))