Related
I have a df such as
Letter | Stats
B 0
B 1
C 22
B 0
C 0
B 3
How can I filter for a value in the Letter column and also then convert the stats column for that value into an array?
Basically want to filter for B and convert the Stats column to an array, Thanks!
here is one way to do it
# function received, dataframe and letter as parameter
# return stats values as list for the passed Letter
def grp(df, letter):
return df.loc[df['Letter'].eq(letter)]['Stats'].values.tolist()
# pass the dataframe, and the letter
result=grp(df,'B')
print(result)
[0, 1, 0, 3]
data used
data ={'Letter': {0: 'B', 1: 'B', 2: 'C', 3: 'B', 4: 'C', 5: 'B'},
'Stats': {0: 0, 1: 1, 2: 22, 3: 0, 4: 0, 5: 3}}
df=pd.DataFrame(data)
Although I believe that solution proposed by #Naveed is enough for this problem one little extension could be suggested.
If you would like to get result as an pandas series and obtain some statistic for the series:
data ={'Letter': {0: 'B', 1: 'B', 2: 'C', 3: 'B', 4: 'C', 5: 'B'},
'Stats': {0: 0, 1: 1, 2: 22, 3: 0, 4: 0, 5: 3}}
df = pd.DataFrame(data)
letter = 'B'
ser = pd.Series(name=letter, data=df.loc[df['Letter'].eq(letter)]['Stats'].values)
print(f"Max value: {ser.max()} | Min value: {ser.min()} | Median value: {ser.median()}") etc.
Output:
Max value: 3 | Min value: 0 | Median value: 0.5
From a dataframe, I build a dictionary that has as keys each distinct value from a given column.
The value of each key is a nested dictionary, being the key the distinct values from another column.
The Values in the nested dictionary will be updated by iterating a dataframe (third column).
Example:
import pandas as pd
data = [['computer',1, 10]
,['computer',2,20]
,['computer',4, 40]
,['laptop',1, 100]
,['laptop',3, 30]
,['printer',2, 200]
]
df = pd.DataFrame(data,columns=['Product','id', 'qtt'])
print (df)
Product
id
qtt
computer
1
10
computer
2
20
computer
4
40
laptop
1
100
laptop
3
30
printer
2
200
kdf_key_dic = {key: None for key in df['id'].unique().tolist()}
product_key_dic = {key: kdf_key_dic for key in df['Product'].unique().tolist()}
print ("product_key_dic: ", product_key_dic)
product_key_dic: {
'computer': {1: None, 2: None, 4: None, 3: None},
'laptop': {1: None, 2: None, 4: None, 3: None},
'printer': {1: None, 2: None, 4: None, 3: None}
}
Now I'd like to update the product_key_dic dictionary, but I can't get it right, it always uses the same key-dict for each key in the main dictionary!
for index, row in df.iterrows():
product_key_dic[row['Product']].update({row['id']:row['qtt']})
print("\n product_key_dic:\n", product_key_dic)
I get:
product_key_dic:
{ 'computer': {1: 100, 2: 200, 4: 40, 3: 30},
'laptop': {1: 100, 2: 200, 4: 40, 3: 30},
'printer': {1: 100, 2: 200, 4: 40, 3: 30}
}
I expect:
{ 'computer': {1: 10, 2: 20, 4: 40, 3: None},
'laptop': {1: 100, 2: None, 4: None, 3: 30},
'printer': {1: None, 2: 200, 4: None, 3: None}
}
I can't understand the problem, somehow it's like each key has the nested dictoinary..?
We can try a different approach creating a MultiIndex.from_product based on the unique values from Product and Id then reshaping so we can call DataFrame.to_dict directly:
cols = ['Product', 'id']
product_key_dic = (
df.set_index(cols).reindex(
pd.MultiIndex.from_product(
[df[col].unique() for col in cols],
names=cols
)
) # Reindex to ensure all pairs are present in the DF
.replace({np.nan: None}) # Replace nan with None
.unstack('Product') # Create Columns from Product
.droplevel(0, axis=1) # Remove qtt from column MultiIndex
.to_dict()
)
product_key_dic:
{
'computer': {1: 10.0, 2: 20.0, 3: None, 4: 40.0},
'laptop': {1: 100.0, 2: None, 3: 30.0, 4: None},
'printer': {1: None, 2: 200.0, 3: None, 4: None}
}
Methods Used:
DataFrame.set_index
DataFrame.reindex
MultiIndex.from_product
Series.unique
DataFrame.replace
DataFrame.unstack
DataFrame.droplevel
DataFrame.to_dict
Setup and imports:
import numpy as np
import pandas as pd
data = [['computer', 1, 10], ['computer', 2, 20], ['computer', 4, 40],
['laptop', 1, 100], ['laptop', 3, 30], ['printer', 2, 200]]
df = pd.DataFrame(data, columns=['Product', 'id', 'qtt'])
The initial solution could be modified by adding a copy call to the dictionary in the comprehension to make them separate dictionaries rather than multiple references to the same one (How to copy a dictionary and only edit the copy). However, iterating over DataFrames is discouraged (Does pandas iterrows have performance issues?):
kdf_key_dic = {key: None for key in df['id'].unique().tolist()}
product_key_dic = {key: kdf_key_dic.copy()
for key in df['Product'].unique().tolist()}
for index, row in df.iterrows():
product_key_dic[row['Product']].update({row['id']: row['qtt']})
product_key_dic:
{
'computer': {1: 10.0, 2: 20.0, 3: None, 4: 40.0},
'laptop': {1: 100.0, 2: None, 3: 30.0, 4: None},
'printer': {1: None, 2: 200.0, 3: None, 4: None}
}
This is because you are reusing same dict object. Let's take these two statements.
kdf_key_dic = {key: None for key in df['id'].unique().tolist()}
product_key_dic = {key: kdf_key_dic for key in df['Product'].unique().tolist()}
You are passing kdf_key_dic as value(in the second statement) which is same object in each iteration.
So instead of this you can pass a copy of kdf_key_dic while constructing product_key_dic
product_key_dic = {key: kdf_key_dic.copy() for key in df['Product'].unique().tolist()}
My code is below
apply pd.to_numeric to the columns where supposed to int or float but coming as object. Can we convert more into pandas way like applying np.where
if df.dtypes.all() == 'object':
df=df.apply(pd.to_numeric,errors='coerce').fillna(df)
else:
df = df
A simple one liner is assign with selest_dtypes which will reassign existing columns
df.assign(**df.select_dtypes('O').apply(pd.to_numeric,errors='coerce').fillna(df))
np.where:
df[:] = (np.where(df.dtypes=='object',
df.apply(pd.to_numeric,errors='coerce').fillna(df),df)
Example (check Price column) :
d = {'CusID': {0: 1, 1: 2, 2: 3},
'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'},
'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'},
'Price': {0: '24000', 1: 'a', 2: '900'}}
df = pd.DataFrame(d)
print(df)
CusID Name Shop Price
0 1 Paul Pascal 24000
1 2 Mark Casio a
2 3 Bill Nike 900
df.to_dict()
{'CusID': {0: 1, 1: 2, 2: 3},
'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'},
'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'},
'Price': {0: '24000', 1: 'a', 2: '900'}}
(df.assign(**df.select_dtypes('O').apply(pd.to_numeric,errors='coerce')
.fillna(df)).to_dict())
{'CusID': {0: 1, 1: 2, 2: 3},
'Name': {0: 'Paul', 1: 'Mark', 2: 'Bill'},
'Shop': {0: 'Pascal', 1: 'Casio', 2: 'Nike'},
'Price': {0: 24000.0, 1: 'a', 2: 900.0}}
Equivalent of your if/else is df.mask
df_out = df.mask(df.dtypes =='O', df.apply(pd.to_numeric, errors='coerce')
.fillna(df))
I have two dictionaries. A is empty and B is a dictionary that is something I want to feed into A, but I should feed different values in a different loop.
A = {'format': None,
'items' : None,
'status' : None,
'name': None}
B = {'format': 'json',
'items' : ['A', 'B', 'C'],
'status' : [1, 2, 3],
'name': 'test'}
I have a stupid method to get this answer, but actually I want something like this:
while not finish:
for key, values in B.items():
if type(values) != list:
A[key] = values
else :
for items in values:
A[key] = items
# do something here
But it seems this can't achieve the targets I want, i.e.:
A-1, A-2, A-3, B-1, B-2, B-3 ... C-3
First iteration:
A = {'format': 'json',
'items' : 'A',
'status' : 1,
'name': 'test'}
Second iteration:
A = {'format': 'json',
'items' : 'A',
'status' : 2,
'name': 'test'}
and so on...
Final iteration:
A = {'format': 'json',
'items' : 'C',
'status' : 3,
'name': 'test'}
Use pandas:
Given B as shown in the question:
import pandas as pd
from itertools import product
B = {'format': ['json'],
'items' : ['A', 'B', 'C'],
'status' : [1, 2, 3],
'name': ['test']}
df = pd.DataFrame(product(*(v for _, v in B.items())), columns=B.keys())
Now the data is in a form useful for further analysis:
format items status name
0 json A 1 test
1 json A 2 test
2 json A 3 test
3 json B 1 test
4 json B 2 test
5 json B 3 test
6 json C 1 test
7 json C 2 test
8 json C 3 test
Data can easily be saved to a file:
df.to_json('test.json')
{'format': {0: 'json', 1: 'json', 2: 'json', 3: 'json', 4: 'json', 5: 'json', 6: 'json', 7: 'json', 8: 'json'},
'items': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B', 5: 'B', 6: 'C', 7: 'C', 8: 'C'},
'name': {0: 'test', 1: 'test', 2: 'test', 3: 'test', 4: 'test', 5: 'test', 6: 'test', 7: 'test', 8: 'test'},
'status': {0: 1, 1: 2, 2: 3, 3: 1, 4: 2, 5: 3, 6: 1, 7: 2, 8: 3}}
Data can be read back in:
df1 = pd.read_json('test.json')
With list comprehension - no pandas:
if all the values in B are lists:
B = {'format': ['json'],
'items' : ['A', 'B', 'C'],
'status' : [1, 2, 3],
'name': ['test']}
list_dicts = [dict(zip(B.keys(), x)) for x in product(*(v for _, v in B.items()))]
if values in B are str:
B = {'format': 'json',
'items' : ['A', 'B', 'C'],
'status' : [1, 2, 3],
'name': 'test'}
list_dicts = [dict(zip(B.keys(), x)) for x in product(*([v] if type(v) == str else v for _, v in B.items()))]
You don't need A, just the values you'd like to iterate over, and use itertools.product:
from itertools import product
items = ['A', 'B', 'C']
status = [1, 2, 3]
for i, s in product(items, status):
print({'format': 'json',
'items' : i,
'status' : s,
'name': 'test'})
Outputs
{'format': 'json', 'items': 'A', 'status': 1, 'name': 'test'}
{'format': 'json', 'items': 'A', 'status': 2, 'name': 'test'}
{'format': 'json', 'items': 'A', 'status': 3, 'name': 'test'}
{'format': 'json', 'items': 'B', 'status': 1, 'name': 'test'}
{'format': 'json', 'items': 'B', 'status': 2, 'name': 'test'}
{'format': 'json', 'items': 'B', 'status': 3, 'name': 'test'}
{'format': 'json', 'items': 'C', 'status': 1, 'name': 'test'}
{'format': 'json', 'items': 'C', 'status': 2, 'name': 'test'}
{'format': 'json', 'items': 'C', 'status': 3, 'name': 'test'}
I want to generate all possible ways of using dicts, based on the values in them. To explain in code, I have:
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
I want to be able to pick (say) exactly 7 items from these dicts, and all the possible ways I could do it in.
So:
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
would give me:
(0, 3, 4)
(1, 2, 4)
(1, 3, 3)
...
What I'd really like is:
({'name' : 'a', 'picked': 0}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 2}, {'name': 'c', 'picked': 4})
({'name' : 'a', 'picked': 1}, {'name': 'b', 'picked': 3}, {'name': 'c', 'picked': 3})
....
Any ideas on how to do this, cleanly?
Here it is
import itertools
import operator
a = {'name' : 'a', 'items': 3}
b = {'name' : 'b', 'items': 4}
c = {'name' : 'c', 'items': 5}
dcts = [a,b,c]
x = itertools.product(range(a['items']), range(b['items']), range(c['items']))
y = itertools.ifilter(lambda i: sum(i)==7, x)
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(dcts, val)]) for val in y)
for zz in z:
print zz
You can modify it to create copies of dictionaries. If you need a new dict instance on every iteration, you can change z line to
z = (tuple([[dct, operator.setitem(dct, 'picked', vval)][0] \
for dct,vval in zip(map(dict,dcts), val)]) for val in y)
easy way is to generate new dicts:
names = [x['name'] for x in [a,b,c]]
ziped = map(lambda x: zip(names, x), y)
maped = map(lambda el: [{'name': name, 'picked': count} for name, count in el],
ziped)