Create a dataframe from a dictionary with multiple keys and values - python

So I have a dictionary with 20 keys, all structured like so (same length):
{'head': X Y Z
0 -0.203363 1.554352 1.102800
1 -0.203410 1.554336 1.103019
2 -0.203449 1.554318 1.103236
3 -0.203475 1.554299 1.103446
4 -0.203484 1.554278 1.103648
... ... ... ...
7441 -0.223008 1.542740 0.598634
7442 -0.222734 1.542608 0.599076
7443 -0.222466 1.542475 0.599520
7444 -0.222207 1.542346 0.599956
7445 -0.221962 1.542225 0.600375
I'm trying to convert this dictionary to a dataframe, but I'm having trouble with getting the output I want. What I want is a dataframe structured like so: columns = [headX, headY, headZ etc.] and rows being the 0-7445 rows.
Is that possible? I've tried:
df = pd.DataFrame.from_dict(mydict, orient="columns")
And different variations of that, but can't get the desired output.
Any help will be great!
EDIT: The output I want has 60 columns in total, i.e. from each of the 20 keys, I want an X, Y, Z for each of them. So columns would be: [key1X, key1Y, key1Z, key2X, key2Y, key2Z, ...]. So the dataframe will be 60 columns x 7446 rows.

Use concat with axis=1 and then flatten Multiindex by f-strings:
df = pd.concat(d, axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')

Related

Concatanate Lists in one DataFrame

I have the following lists:
rBruta = [76843339.93, 68564200.34, 114946898.37, 75687842.36, 34530505.68, 116481217.14, 95696528.10000002, 40015273.68, 33416618.4, 34530505.68, 33416618.4, 81118744.08]
rLiquida = [417648532.25, 362509251.24, 410746539.59, 365572296.03, 335338029.26, 416780171.86, 423577376.06, 385353312.36, 380507243.23, 404170649.16, 380269620.17, 426637510.38
rEmpres = [1169415.89, 1015025.9, 1150090.31, 1023602.43, 938946.48, 1166984.48, 1186016.65, 1078989.27, 1065420.28, 1131677.82, 1064754.94, 1194585.03
And i need to concatenate those 3 lists into 1 single DataFrame. Like stacking one in another.
I tried to transform each on into a dataframe Column, then, used the T for transpose the columns.
Worked, but i have 16 lists to concatenate with different names.
The below should do the work
df = pd.DataFrame([rBruta, rLiquida, ... all lists], columns = ["a1", "a2", ... "a12"])
# as you have 12 columns in your data

How to split a column into many columns where the name of this columns change

I defined a data frame into a "function" where the name of each column in the dataframes changes continuously so I can't specify the name of this column and then split it to many columns. For example, I can't say df ['name'] and then split it into many columns. The number of columns and rows of this dataframes is not constant. I need to split any column contains more than one item to many components (columns).
For example:
This is one of the dataframes which I have:
name/one name/three
(192.26949,) (435.54,436.65,87.3,5432)
(189.4033245,) (45.51,56.612, 54253.543, 54.321)
(184.4593252,) (45.58,56.6412,654.876,765.66543)
I want to convert it to:
name/one name/three1 name/three2 name/three3 name/three4
192.26949 435.54 436.65 87.3 5432
189.4033245 45.51 56.612 54253.543 54.321
184.4593252 45.58 56.6412 654.876 765.66543
Solution if all data are tuples in all rows and all columns use concat with DataFrame constructor and DataFrame.add_prefix:
df = pd.concat([pd.DataFrame(df[c].tolist()).add_prefix(c) for c in df.columns], axis=1)
print (df)
name/one0 name/three0 name/three1 name/three2 name/three3
0 192.269490 435.54 436.6500 87.300 5432.00000
1 189.403324 45.51 56.6120 54253.543 54.32100
2 184.459325 45.58 56.6412 654.876 765.66543
If possible string repr of tuples:
import ast
L = [pd.DataFrame([ast.literal_eval(y) for y in df[c]]).add_prefix(c) for c in df.columns]
df = pd.concat(L, axis=1)
print (df)
name/one0 name/three0 name/three1 name/three2 name/three3
0 192.269490 435.54 436.6500 87.300 5432.00000
1 189.403324 45.51 56.6120 54253.543 54.32100
2 184.459325 45.58 56.6412 654.876 765.66543

flatten a dataframe into dictionary with key given by index and column names

I have a dataframe:
v1 v2 v3
c1
a 1.593979 1.679763 1.613202
n 1.327004 2.551197 1.492442
z 1.615528 1.156273 1.817987
I would like to create a dictionary from the dataframe.
I know I can do
d = {str(i): v for i,v in enumerate(var.values.flatten()))}
and create a dictionary with items 1.593979,1.679763 , ... and keys '0','1','2', etc...
However, I would like the keys of my new dictionary to be a combination of the original columns and index names e.g.
dict_result={'v1_a':1.593979,'v2_a':1.679763,...,'v3_z':1.817987}
How to achieve this using pandas?
Use unstack and flatten indices, last call to_dict:
s = df.unstack()
s.index = s.index.map('_'.join)
#alternative
#s.index = ['{}_{}'.format(x, y) for x, y in s.index ]
d = s.to_dict()
print (d)
{'v2_n': 2.551197, 'v1_a': 1.593979, 'v1_n': 1.327004,
'v2_a': 1.6797630000000001, 'v3_z': 1.8179869999999998,
'v3_n': 1.492442, 'v1_z': 1.615528, 'v2_z': 1.1562729999999999, 'v3_a': 1.613202}

Dictionary in Pandas DataFrame, how to split the columns

I have a DataFrame that consists of one column ('Vals') which is a dictionary. The DataFrame looks more or less like this:
In[215]: fff
Out[213]:
Vals
0 {u'TradeId': u'JP32767', u'TradeSourceNam...
1 {u'TradeId': u'UUJ2X16', u'TradeSourceNam...
2 {u'TradeId': u'JJ35A12', u'TradeSourceNam...
When looking at an individual row the dictionary looks like this:
In[220]: fff['Vals'][100]
Out[218]:
{u'BrdsTraderBookCode': u'dffH',
u'Measures': [{u'AssetName': u'Ie0',
u'DefinitionId': u'6dbb',
u'MeasureValues': [{u'Amount': -18.64}],
u'ReportingCurrency': u'USD',
u'ValuationId': u'669bb'}],
u'SnapshotId': 12739,
u'TradeId': u'17304M',
u'TradeLegId': u'31827',
u'TradeSourceName': u'xxxeee',
u'TradeVersion': 1}
How can I split the the columns and create a new DataFrame, so that I get one column with TradeId and another one with MeasureValues?
try this:
l=[]
for idx, row in df['Vals'].iteritems():
temp_df = pd.DataFrame(row['Measures'][0]['MeasureValues'])
temp_df['TradeId'] = row['TradeId']
l.append(temp_df)
pd.concat(l,axis=0)
Here's a way to get TradeId and MeasureValues (using twice your sample row above to illustrate the iteration):
new_df = pd.DataFrame()
for id, data in fff.iterrows():
d = {'TradeId': data.ix[0]['TradeId']}
d.update(data.ix[0]['Measures'][0]['MeasureValues'][0])
new_df = pd.concat([new_df, pd.DataFrame.from_dict(d, orient='index').T])
Amount TradeId
0 -18.64 17304M
0 -18.64 17304M

Given a pandas dataframe, is there an easy way to print out a command to generate it?

After running some commands I have a pandas dataframe, eg.:
>>> print df
B A
1 2 1
2 3 2
3 4 3
4 5 4
I would like to print this out so that it produces simple code that would recreate it, eg.:
DataFrame([[2,1],[3,2],[4,3],[5,4]],columns=['B','A'],index=[1,2,3,4])
I tried pulling out each of the three pieces (data, columns and rows):
[[e for e in row] for row in df.iterrows()]
[c for c in df.columns]
[r for r in df.index]
but the first line fails because e is not a value but a Series.
Is there a pre-build command to do this, and if not, how do I do it? Thanks.
You can get the values of the data frame in array format by calling df.values:
df = pd.DataFrame([[2,1],[3,2],[4,3],[5,4]],columns=['B','A'],index=[1,2,3,4])
arrays = df.values
cols = df.columns
index = df.index
df2 = pd.DataFrame(arrays, columns = cols, index = index)
Based on #Woody Pride's approach, here is the full solution I am using. It handles hierarchical indices and index names.
from types import MethodType
from pandas import DataFrame, MultiIndex
def _gencmd(df, pandas_as='pd'):
"""
With this addition to DataFrame's methods, you can use:
df.command()
to get the command required to regenerate the dataframe df.
"""
if pandas_as:
pandas_as += '.'
index_cmd = df.index.__class__.__name__
if type(df.index)==MultiIndex:
index_cmd += '.from_tuples({0}, names={1})'.format([i for i in df.index], df.index.names)
else:
index_cmd += "({0}, name='{1}')".format([i for i in df.index], df.index.name)
return 'DataFrame({0}, index={1}{2}, columns={3})'.format([[xx for xx in x] for x in df.values],
pandas_as,
index_cmd,
[c for c in df.columns])
DataFrame.command = MethodType(_gencmd, None, DataFrame)
I have only tested it on a few cases so far and would love a more general solution.

Categories