I have a situation where I need to print the frequencies of data appearing in one of the columns of my dataframe.
Suppose my column is status, then performing
df['status'].value_counts().to_dict()
outputs
{
"Deleted": 56,
"New": 25,
"Draft": 24,
"Assigned": 11,
"Job Complete": 10,
"Active": 8,
"Requested": 3,
"Cancelled": 3,
"Footage Provided": 1
}
I want to format the output as:
{
{status: "Deleted", value: 56},
{status: "New", value: 25},
{status: "Draft", value: 24},
...
}
I'm new to pandas. Please help.
You can use the below list comprehension:
print([{'status': k, 'value': v} for k, v in df['status'].value_counts().to_dict().items()])
Output's gonna be expected.
You can just reformat the pandas output dictionary to your desired list format. Iterate over the dictionary and append the keys and values as a dictionary object to you list:
d1 = df['status'].value_counts().to_dict()
l = []
for k, v in d1.items():
l.append({'status': k, 'value': v})
print(l)
Output
[{'status': 'Deleted', 'value': 56},
{'status': 'New', 'value': 25},
{'status': 'Draft', 'value': 24},
...
]
Related
I've got two lists:
lst1 = [{"name": "Hanna", "age":3},
{"name": "Kris", "age": 18},
{"name":"Dom", "age": 15},
{"name":"Tom", "age": 5}]
and the second one contains a few of above key name values under different key:
lst2 = [{"username": "Kris", "Town": "Big City"},
{"username":"Dom", "Town": "NYC"}]
I would like to merge them with result:
lst = [{"name": "Hanna", "age":3},
{"name": "Kris", "age": 18, "Town": "Big City"},
{"name":"Dom", "age": 15, "Town": "NYC"},
{"name":"Tom", "age":"5"}]
The easiest way is to go one by one (for each element from lst1, check whether it exists in lst2), but for big lists, this is quite ineffective (my lists have a few hundred elements each). What is the most effective way to achieve this?
To avoid iterating over another list again and again, you can build a name index first.
lst1 = [{"name": "Hanna", "age":3},
{"name": "Kris", "age": 18},
{"name":"Dom", "age": 15},
{"name":"Tom", "age": 5}]
lst2 = [{"username": "Kris", "Town": "Big City"},
{"username":"Dom", "Town": "NYC"}]
name_index = { dic['username'] : idx for idx, dic in enumerate(lst2) if dic.get('username') }
for dic in lst1:
name = dic.get('name')
if name in name_index:
dic.update(lst2[name_index[name]]) # update in-place to further save time
dic.pop('username')
print(lst1)
One way to do this a lot more efficient than by lists is to create an intermediate dictionary from lst1 with name as key, so that you're searching a dictionary not a list.
d1 = {elem['name']: {k:v for k,v in elem.items()} for elem in lst1}
for elem in lst2:
d1[elem['username']].update( {k:v for k,v in elem.items() if k != 'username'} )
lst = list(d1.values())
Output:
[{'name': 'Hanna', 'age': 3}, {'name': 'Kris', 'age': 18, 'Town': 'Big City'}, {'name': 'Dom', 'age': 15, 'Town': 'NYC'}, {'name': 'Tom', 'age': 5}]
edited to only have one intermediate dict
Use zip function to pair both lists. We need to order both lists using some criteria, in this case, you must use the username and name keys for the lists because those values will be your condition to perform the updating action, for the above reason is used the sorted function with key param. It is important to sort them out to get the match.
Finally your list lst2 has a little extra procedure, I expanded it taking into account the length of lst1, that is what I do using lst2 * abs(len(lst1) - len(lst2). Theoretically, you are iterating once over an iterable zip object, therefore I consider this could be a good solution for your requirements.
for d1, d2 in zip(sorted(lst1, key=lambda d1: d1['name']),
sorted(lst2 * abs(len(lst1) - len(lst2)), key=lambda d2: d2['username'])):
if d1['name'] == d2['username']:
d1.update(d2)
# Just we delete the username
del d1['username']
print(lst1)
Output:
[{'name': 'Hanna', 'age': 3}, {'name': 'Kris', 'age': 18, 'Town': 'Big City'}, {'name': 'Dom', 'age': 15, 'Town': 'NYC'}, {'name': 'Tom', 'age': 5}]
I have a list of lists, where the first list is a 'header' list and the rest are data lists. Each of these 'sub' lists are the same size, as shown below:
list1 = [
["Sno", "Name", "Age", "Spirit", "Status"],
[1, "Rome", 43, "Gemini", None],
[2, "Legolas", 92, "Libra", None]
]
I want to merge all of these 'sub' lists into one dictionary, using that first 'sub' list as a header row that maps each value in the row to a corresponding value in subsequent rows.
This is how my output should look:
result_dict = {
1: {"Name": "Rome", "Age": 43, "Spirit": "Gemini", "Status": None},
2: {"Name": "Legolas", "Age": 92, "Spirit": "Libra", "Status": None}
}
As you can see, 1, 2, etc. are unique row numbers (they correspond to the Sno column in the header list).
So far, I am able to get every second element as the key using this code:
list1_as_dict= {p[0]:p[1:] for p in list1}
print(list1_as_dict)
Which outputs:
{
'Sno': ['Name', 'Age', 'Spirit', 'Status'],
1: ['Rome', 43, 'Gemini', None],
2: ['Legolas', 92, 'Libra', None]
}
But I don't know how make each of the data 'sub' lists a dictionary mapped to the corresponding headers.
How can I get my desired output?
Something like
spam = [["Sno","Name","Age","Spirit","Status"],[1, "Rome", 43,"Gemini",None],[2,"Legolas", 92, "Libra",None]]
keys = spam[0][1:] # get the keys from first sub-list, excluding element at index 0
# use dict comprehension to create the desired result by iterating over
# sub-lists, unpacking sno and rest of the values
# zip keys and values and create a dict and sno:dict pair respectively
result = {sno:dict(zip(keys, values)) for sno, *values in spam[1:]}
print(result)
output
{1: {'Name': 'Rome', 'Age': 43, 'Spirit': 'Gemini', 'Status': None}, 2: {'Name': 'Legolas', 'Age': 92, 'Spirit': 'Libra', 'Status': None}}
I'm sure you'd love Pandas. There is a bit of learning curve, but it offers a lot in exchange - most of operations over this dataset can be done in a single line of code.
import pandas as pd
pd.DataFrame(list1[1:], columns=list1[0]).set_index('Sno')
Output:
Name Age Spirit Status
Sno
1 Rome 43 Gemini None
2 Legolas 92 Libra None
my_dict = {p[0]: dict((x,y) for x, y in zip(list1[0][1:], p[1:])) for p in list1[1:]}
print(my_dict)
Results in:
{1: {'Name': 'Rome', 'Age': 43, 'Spirit': 'Gemini', 'Status': None}, 2: {'Name': 'Legolas', 'Age': 92, 'Spirit': 'Libra', 'Status': None}}
I have here a very tricky task here.I want to compare x number of lists in list of lists and that lists contain dictionaries.So i want to compare the dictionaries in these lists based on the 'name' key in the dictionaries if it match it should pass if not it should copy the whole dictionary to the lists that don't have it with editing the 'balance' key vlaue to '0'.
For example let's assume we have list of lists like this :
list_of_lists=[[{'name': u'Profit','balance': 10},{'name': u'Income','balance': 30},{'name': u'NotIncome','balance': 15}],[{'name': u'Profit','balance': 20},{'name': u'Income','balance': 10}]]
So the result should be :
list_of_lists=[[{'name': u'Profit','balance': 10},{'name': u'Income','balance': 30},{'name': u'NotIncome','balance': 15}],[{'name': u'Profit','balance': 20},{'name': u'Income','balance': 10},{'name': u'NotIncome','balance': 0}]]
Here is my code but i can't get it work with 2 lists or more(I don't know the number of lists in the list (maybe 2,3 or 4 etc...) :
for line in lines:
for d1, d2 in zip(line[0], line[1]):
for key, value in d1.items():
if value != d2[key]:
print key, value, d2[key]
You could first create a set containing all the names and then iterate the sublists one by one adding the missing dicts:
import pprint
l = [
[
{'name': u'Profit','balance': 10},
{'name': u'Income','balance': 30},
{'name': u'NotIncome','balance': 15}
],
[
{'name': u'Profit','balance': 20},
{'name': u'Income','balance': 10}
],
[]
]
all_names = {d['name'] for x in l for d in x}
for sub_list in l:
for name in (all_names - {d['name'] for d in sub_list}):
sub_list.append({'name': name, 'balance': 0})
pprint.pprint(l)
Output:
[[{'balance': 10, 'name': u'Profit'},
{'balance': 30, 'name': u'Income'},
{'balance': 15, 'name': u'NotIncome'}],
[{'balance': 20, 'name': u'Profit'},
{'balance': 10, 'name': u'Income'},
{'balance': 0, 'name': u'NotIncome'}],
[{'balance': 0, 'name': u'Profit'},
{'balance': 0, 'name': u'Income'},
{'balance': 0, 'name': u'NotIncome'}]]
That said you should consider converting sublists to dicts where keys are names and values are balances in order to ease the processing.
I have a python dict:
{'John': 23, 'Matthew': 8, 'Peter': 45}
I want to create a D3 pie chart and need to move my data from the keys so that I can access the values. So I want to end up with:
[
{name: 'John', age: 23},
{name: 'Matthew', age: 8},
{name: 'Peter', age: 45}
]
How can I do this dynamically (given that I may not know what the current key is, eg. 'John')?
data = [{"name": key, "age": value} for key, value in my_dict.items()]
An example:
>>> my_dict = {'John': 23, 'Matthew': 8, 'Peter': 45}
>>> data = [{"name": key, "age": value} for key, value in my_dict.items()]
>>> data
[{'age': 8, 'name': 'Matthew'}, {'age': 23, 'name': 'John'}, {'age': 45, 'name': 'Peter'}]
If you are trying to create a javascript friendly representation of the data, then you will need to convert the list of dictionaries to json.
I've got a list of Tokens which looks something like:
[{
Value: "Blah",
StartOffset: 0,
EndOffset: 4
}, ... ]
What I want to do is get a count of how many times each value occurs in the list of tokens.
In VB.Net I'd do something like...
Tokens = Tokens.
GroupBy(Function(x) x.Value).
Select(Function(g) New With {
.Value = g.Key,
.Count = g.Count})
What's the equivalent in Python?
IIUC, you can use collections.Counter:
>>> from collections import Counter
>>> tokens = [{"Value": "Blah", "SO": 0}, {"Value": "zoom", "SO": 5}, {"Value": "Blah", "SO": 2}, {"Value": "Blah", "SO": 3}]
>>> Counter(tok['Value'] for tok in tokens)
Counter({'Blah': 3, 'zoom': 1})
if you only need a count. If you want them grouped by the value, you could use itertools.groupby and something like:
>>> from itertools import groupby
>>> def keyfn(x):
return x['Value']
...
>>> [(k, list(g)) for k,g in groupby(sorted(tokens, key=keyfn), keyfn)]
[('Blah', [{'SO': 0, 'Value': 'Blah'}, {'SO': 2, 'Value': 'Blah'}, {'SO': 3, 'Value': 'Blah'}]), ('zoom', [{'SO': 5, 'Value': 'zoom'}])]
although it's a little trickier because groupby requires the grouped terms to be contiguous, and so you have to sort by the key first.
Let's assume that is your python list, containing dictionnaries:
my_list = [{'Value': 'Blah',
'StartOffset': 0,
'EndOffset': 4},
{'Value': 'oqwij',
'StartOffset': 13,
'EndOffset': 98},
{'Value': 'Blah',
'StartOffset': 6,
'EndOffset': 18}]
A one liner:
len([i for i in a if i['Value'] == 'Blah']) # returns 2
import collections
# example token list
tokens = [{'Value':'Blah', 'Start':0}, {'Value':'BlahBlah'}]
count=collections.Counter([d['Value'] for d in tokens])
print count
shows
Counter({'BlahBlah': 1, 'Blah': 1})
token = [{
'Value': "Blah",
'StartOffset': 0,
'EndOffset': 4
}, ... ]
value_counter = {}
for t in token:
v = t['Value']
if v not in value_counter:
value_counter[v] = 0
value_counter[v] += 1
print value_counter
Another efficient way is to convert data to Pandas DataFrame and then aggregate them. Like this:
import pandas as pd
df = pd.DataFrame(data)
df.groupby('key')['value'].count()
df.groupby('key')['value'].sum()