I converted the following dictionary to a dataframe:
dic = {'US':{'Traffic':{'new':1415, 'repeat':670}, 'Sales':{'new':67068, 'repeat':105677}},
'UK': {'Traffic':{'new':230, 'repeat':156}, 'Sales':{'new':4568, 'repeat':10738}}}
d1 = defaultdict(dict)
for k, v in dic.items():
for k1, v1 in v.items():
for k2, v2 in v1.items():
d1[(k, k2)].update({k1: v2})
df.insert(loc=2, column=' ', value=None)
df.insert(loc=0, column='Mode', value='Website')
df.columns = df.columns.rename("Metric", level=1)
The dataframe currently looks like:
How do I move the column header - Mode to the following row?
To get an output of this sort:
Change this:
df.insert(loc=0, column='Mode', value='Website')
to this:
df.insert(loc=0, column=('', 'Mode'), value='Website')
then your full code looks like this:
import pandas as pd
from collections import defaultdict
dic = {'US':{'Traffic':{'new':1415, 'repeat':670}, 'Sales':{'new':67068, 'repeat':105677}},
'UK': {'Traffic':{'new':230, 'repeat':156}, 'Sales':{'new':4568, 'repeat':10738}}}
d1 = defaultdict(dict)
for k, v in dic.items():
for k1, v1 in v.items():
for k2, v2 in v1.items():
d1[(k, k2)].update({k1: v2})
df = pd.DataFrame.from_dict(d1)
df.insert(loc=0, column=('', 'Mode'), value='Website')
and this is your df
Rinse and repeat with your empty column between US and UK.
(though, admittedly, this looks like a strange way of handling stuff)
Related
I'm not understanding how the k:v coding works. I've read that k:v pairs the items. k is the key and v is the item. If I want an additional field called 'cusip' in addition to 'lastPrice', how would I add that? Thanks
response_dict = response.json()
new_dict = {k: v['lastPrice'] for k, v in response_dict.items()}
df = pd.DataFrame.from_dict(new_dict, orient='index', columns=['lastPrice'])
You just need to build the appropriate tuple.
new_dict = {k: (v['lastPrice'], v['cusIP']) for k, v in response_dict.items()}
In a dictionary comprehension {key_expr: value_expr for ...}, both key_expr and value_expr are allowed to be arbitrary expressions.
Your question is vague, but I would like simulate it maybe useful or close to your issue solution:
Instead of response I defined a new dict with initial values.
import pandas as pd
response_dict = {"price":[10,5,9],"Products":["shoes","clothes","hat"], "lastprices":[5,6,7]}
new_dict = {k: v for k, v in response_dict.items()}
df = pd.DataFrame.from_dict(new_dict)
df
If you keen to add new key with new values just try to modify dictioner: for instance
new_dict["cusip"]=[1,2,3]
df = pd.DataFrame.from_dict(new_dict)
df
I've this type of dictionary:
{'xy': [['value1', 'value2'], ['value3', 'value4']],
'yx': [['value5', 'value6'], ['value7', 'value8']]}
I would like to create a dataFrame pyspark in which I have 3 columns and 2 rows. Every key of the dict has a row. For example, first row:
First column: xy
Second column: ["value1", "value2"]
Third column: ["value3", "value4"]
What's the better way to do this? I'm only able to create 2 columns, in which there is a key and only one column with all the list but it's not my desired result.
This is your data dictionary:
data = {
'xy': [['value1', 'value2'], ['value3', 'value4']],
'yx': [['value5', 'value6'], ['value7', 'value8']]
}
You can just use a for loop:
df = spark.createDataFrame(
[[k] + v for k, v in data.items()],
schema=['col1', 'col2', 'col3']
)
df.show(10, False)
+----+----------------+----------------+
|col1|col2 |col3 |
+----+----------------+----------------+
|xy |[value1, value2]|[value3, value4]|
|yx |[value5, value6]|[value7, value8]|
+----+----------------+----------------+
I'm trying to get dictionary with same keys and merge its values and if there is a duplicate leave only one value of duplicate.
data = {"test1":["data1", "data2"],
"test1":["data3", "data4", "data2"],
"test2":["1data", "2data"],
"test2":["3data", "4data", "2data"]
}
desired_result = {"test1":["data1", "data2", "data3", "data4"],
"test2":["1data", "2data", "3data", "4data"]
}
any ideas how to get result?
First you need create list of dict (because you can't have dictionary with same keys) then iterate over them and extend them to list with key of dict then use set for delete duplicated like below:
data = [{"test1":["data1", "data2"]},{"test1":["data3", "data4", "data2"]},{"test2":["1data", "2data"]},{"test2":["3data", "4data", "2data"]}]
from collections import defaultdict
rslt_out = defaultdict(list)
for dct in data:
for k,v in dct.items():
rslt_out[k].extend(v)
for k,v in rslt_out.items():
rslt_out[k] = list(set((v)))
print(rslt_out)
output:
defaultdict(list,
{'test1': ['data3', 'data4', 'data2', 'data1'],
'test2': ['2data', '3data', '1data', '4data']})
I want to merge a few dictionaries with the same keys together without deleting any key / value pairs and convert to a DataFrame.
I have tried dict.update(), but it replaces former values with new ones for duplicate keys.
dict3[1]
{'DB': 'M',
'TITLE': 'ACM Journal of Computer Documentation ',
'ISSN': '1527-6805',
'e-ISSN': '1557-9441',
'ISBN': nan,
'e-ISBN': nan}
dict4[0]
{'DB': 'D',
'TITLE': 'ACM Computing Surveys ',
'ISSN': '0360-0300',
'e-ISSN': '1557-7341',
'ISBN': nan,
'e-ISBN': nan}
I would like the result to keep all the keys in the same row no matter if the values are same or different although they are overlapping keys.
The table should look as follows:
DB TITLE ISSN e-ISSN ... DB TITLE ISSN ...
0 M ACM Journal... 1527-6805 1557-9441 ... D ACM Comput... 0360-0300...
You could concatenate the keys of each dict to represent your row columns, and then concatenate the values of each dict into a single row (passed to the DataFrame constructor as a nested dict to create a row rather than a single column). For example:
import pandas as pd
nan = float('nan')
d1 = {'DB': 'M', 'TITLE': 'ACM Journal of Computer Documentation', 'ISSN': '1527-6805', 'e-ISSN': '1557-9441', 'ISBN': nan, 'e-ISBN': nan}
d2 = {'DB': 'D', 'TITLE': 'ACM Computing Surveys', 'ISSN': '0360-0300', 'e-ISSN': '1557-7341', 'ISBN': nan, 'e-ISBN': nan}
columns = [*d1.keys(), *d2.keys()]
row = [*d1.values(), *d2.values()]
df = pd.DataFrame([row], columns=columns)
print(df)
# DB TITLE ... DB TITLE
# 0 M ACM Journal of Computer Documentation ... D ACM Computing Surveys
You could create a simple function to convert an arbitrary number of dicts to a single row DataFrame using the same basic approach. For example:
def dicts_to_single_row_df(*args):
columns = [k for d in args for k in d.keys()]
row = [v for d in args for v in d.values()]
return pd.DataFrame([row], columns=columns)
df = dicts_to_single_row_df(d1, d2)
I have a snippet of data from which I need to extract specific information. The Data looks like this:
pid log Date
91 json D1
189 json D2
276 json D3
293 json D4
302 json D5
302 json D6
343 json D7
The LOG is a json file stored in a column of an excel file which looks something like this:
{"Before":{"freq_term":"Daily","ideal_pmt":"246.03","datetime":"2015-01-08 06:26:11},"After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}
{"Before":{"freq_term":"Daily","ideal_pmt":"637.5","datetime":"2015-01-08 06:26:11"},"After":{"freq_term":"Weekly","ideal_pmt":"3346.88","datetime":"2015-02-02 06:16:07"}}
{"Before":{"buy_rate":"1.180","irr":"31.63","uwfee":"","freq_term":"Weekly"}, "After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}
Now, what I want is an output something like this:
{
"pid": 91,
"Date": "2016-05-15 03:54:24"
"Before": {
"freq_term": "Daily"
},
"After": {
"freq_term": "Weekly",
}
}
Basically I want only the "freq_term" and "Datetime" of "Before" and "After" from the log file. So far I have done the following code. After this whatever I do it gives me the error: list object is not callable. Any help appreciated. Thanks.
import pandas as pd
data = pd.read_excel("C:\\Users\\Desktop\\dealChange.xlsx")
df = pd.DataFrame(data, columns = ['pid', 'log', 'date'])
li = df.to_dict('records')
dict(kv for d in li for kv in d.iteritems()) # error: list obj is not callable
How do I convert the list into a dictionary so that I can access only the data required..
I believe you need:
df = pd.DataFrame({'log':['{"Before":{"freq_term":"Daily","ideal_pmt":"637.5","datetime":"2015-01-08 06:26:11"},"After":{"freq_term":"Weekly","ideal_pmt":"3346.88","datetime":"2015-02-02 06:16:07"}}','{"Before":{"buy_rate":"1.180","irr":"31.63","uwfee":"","freq_term":"Weekly"}, "After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}']})
print (df)
log
0 {"Before":{"freq_term":"Daily","ideal_pmt":"63...
1 {"Before":{"buy_rate":"1.180","irr":"31.63","u...
First convert values to nested dictionaries and then filter by nested dict comprehension:
df['log'] = df['log'].apply(pd.io.json.loads)
L1 = ['Before','After']
L2 = ['freq_term','datetime']
f = lambda x: {k:{k1:v1 for k1,v1 in v.items() if k1 in L2} for k,v in x.items() if k in L1}
df['new'] = df['log'].apply(f)
print (df)
log \
0 {'After': {'ideal_pmt': '3346.88', 'freq_term'...
1 {'After': {'ideal_pmt': '2583.33', 'freq_term'...
new
0 {'After': {'freq_term': 'Weekly', 'datetime': ...
1 {'After': {'freq_term': 'Bi-Monthly'}, 'Before...
EDIT:
For find all rows with unparseable values is possible use:
def f(x):
try:
return ast.literal_eval(x)
except:
return 1
print (df[df['log'].apply(f) == 1])