I have datarame like the sample data below. I'm trying to convert one row from the dataframe in to a dict like the desired output below. But when I use to_dict I get the indice along with the column value. Does anyone know how to get convert the row to a dict like the desired output? Any tips greatly appreciated.
Sample data:
print(catStr_df[['Bottle Volume (ml)', 'Pack']][:5])
Bottle Volume (ml) Pack
595 750 12
1889 750 12
3616 1000 12
4422 750 12
5022 750 12
Code:
v = catStr_df[catStr_df['Item Number']==34881][['Bottle Volume (ml)', 'Pack']]\
.drop_duplicates(keep='first').to_dict()
v
Output:
{'Bottle Volume (ml)': {9534: 1000}, 'Pack': {9534: 12}}
Desired output:
{'Bottle Volume (ml)': 1000, 'Pack': 12}
Try adding .to_dict('records')[0] to the row you want
catStr_df[catStr_df['Item Number']==34881].to_dict('records')[0]
Use df.to_dict(orient='index') to have index value as keys for easy retrieval of data
taking a different tactic, this works but you need to get a list of columns. This assumed you want the index number as a dict item
def row_converter(row, listy):
#convert pandas row to a dictionary
#requires a list of columns and a row as a tuple
count = 1
pictionary = {}
pictionary['Index'] = row[0]
for item in listy:
pictionary[item] = row[count]
count += 1
print(pictionary)
return pictionary
df = PD.read_csv("yourFile", dtype=object, delimiter=",", na_filter=False)
listy = df.columns
for row in df.itertuples():
rowDict = row_converter(row, listy)
Related
I define my code to find correlation between 'day' and 'sales' each store by using this code
def store_corr(store_num):
subset= df[df['store']== store_num]
subset= subset[['day','dollar_sales']]
subset_corr = subset.corr()
corr_val = subset_corr.iloc[0,1]
return corr_val
And Next step is running for loop each store like this
store = df.store.unique()
for i in store:
corre = store_corr(i)
print(corre)
But All I need are columns that provide store_num and corr for each store, so How I can coding to get output like this:
store
corre
1
0.26
2
0.12
3
-0.96
Thank you
I think you could do this:
store = df.store.unique()
values = {"store": [], "corre": []}
for i in store:
corre = store_corr(i)
values["store"].append(i)
values["corre"].append(corre)
print(i, corre)
df = pd.DataFrame(values)
df dataframe should have the output you want
I have an empty Pandas dataframe and I'm trying to add a row to it. Here's what I mean:
text_img_count = len(BeautifulSoup(html, "lxml").find_all('img'))
print 'img count: ', text_img_count
keys = ['text_img_count', 'text_vid_count', 'text_link_count', 'text_par_count', 'text_h1_count',
'text_h2_count', 'text_h3_count', 'text_h4_count', 'text_h5_count', 'text_h6_count',
'text_bold_count', 'text_italic_count', 'text_table_count', 'text_word_length', 'text_char_length',
'text_capitals_count', 'text_sentences_count', 'text_middles_count', 'text_rows_count',
'text_nb_digits', 'title_char_length', 'title_word_length', 'title_nb_digits']
values = [text_img_count, text_vid_count, text_link_count, text_par_count, text_h1_count,
text_h2_count, text_h3_count, text_h4_count, text_h5_count, text_h6_count,
text_bold_count, text_italic_count, text_table_count, text_word_length,
text_char_length, text_capitals_count, text_sentences_count, text_middles_count,
text_rows_count, text_nb_digits, title_char_length, title_word_length, title_nb_digits]
numeric_df = pd.DataFrame()
for key, value in zip(keys, values):
numeric_df[key] = value
print numeric_df.head()
However, the output is this:
img count: 2
Empty DataFrame
Columns: [text_img_count, text_vid_count, text_link_count, text_par_count, text_h1_count, text_h2_count, text_h3_count, text_h4_count, text_h5_count, text_h6_count, text_bold_count, text_italic_count, text_table_count, text_word_length, text_char_length, text_capitals_count, text_sentences_count, text_middles_count, text_rows_count, text_nb_digits, title_char_length, title_word_length, title_nb_digits]
Index: []
[0 rows x 23 columns]
This makes it seem like numeric_df is empty after I just assigned values for each of its columns.
What's going on?
Thanks for the help!
What I usually do to add a column to the empty data frame is to append the information into a list and then give it a data frame structure. For example:
df=pd.DataFrame()
L=['a','b']
df['SomeName']=pd.DataFrame(L)
And you have to use pd.Series() if the list is make of numbers.
I have looked at many similar questions, yet I still cannot get pandas to rename the rows of a df from a list of values from another df. What am I doing wrong?
def calculate_liabilities(stakes_df):
if not stakes_df.empty:
liabilities_df = pd.DataFrame( decimal_odds_lay.values * stakes_df.values ) #makes df with stakes rows, decimal odds columns
stakes_list = stakes_df.to_dict()
print(stakes_list)
liabilities_df = liabilities_df.rename(stakes_list)
return liabilities_df
else:
print ("Failure to calculate liabilities")
stakes_list = stakes_df.to_dict() gives the following dict:
{'Stakes': {0: 3.7400000000000002, 1: 5.5999999999999996, 2: 7.0700000000000003}}
I want the rows of liabilities_df to be renamed 3.7400000000000002, 5.5999999999999996 and 7.0700000000000003 respectively.
if you want to rename liabilities_df's row name(index) to stakes_df's value, you need to give dict not dict of dict.
liabilities_df = liabilities_df.rename(stakes_list['Stakes'])
example:
df= pd.DataFrame([1,2,3])
0
0 1
1 2
2 3
df.rename({0: 3.7400000000000002, 1: 5.5999999999999996, 2: 7.0700000000000003})
0
3.74 1
5.60 2
7.07 3
You can rename the rows with a data.frame, here you have a dictionary, that's why.
would be better if you gave us the data, but here you don't have to make a dictionary from stakes_list
I have a DataFrame that consists of one column ('Vals') which is a dictionary. The DataFrame looks more or less like this:
In[215]: fff
Out[213]:
Vals
0 {u'TradeId': u'JP32767', u'TradeSourceNam...
1 {u'TradeId': u'UUJ2X16', u'TradeSourceNam...
2 {u'TradeId': u'JJ35A12', u'TradeSourceNam...
When looking at an individual row the dictionary looks like this:
In[220]: fff['Vals'][100]
Out[218]:
{u'BrdsTraderBookCode': u'dffH',
u'Measures': [{u'AssetName': u'Ie0',
u'DefinitionId': u'6dbb',
u'MeasureValues': [{u'Amount': -18.64}],
u'ReportingCurrency': u'USD',
u'ValuationId': u'669bb'}],
u'SnapshotId': 12739,
u'TradeId': u'17304M',
u'TradeLegId': u'31827',
u'TradeSourceName': u'xxxeee',
u'TradeVersion': 1}
How can I split the the columns and create a new DataFrame, so that I get one column with TradeId and another one with MeasureValues?
try this:
l=[]
for idx, row in df['Vals'].iteritems():
temp_df = pd.DataFrame(row['Measures'][0]['MeasureValues'])
temp_df['TradeId'] = row['TradeId']
l.append(temp_df)
pd.concat(l,axis=0)
Here's a way to get TradeId and MeasureValues (using twice your sample row above to illustrate the iteration):
new_df = pd.DataFrame()
for id, data in fff.iterrows():
d = {'TradeId': data.ix[0]['TradeId']}
d.update(data.ix[0]['Measures'][0]['MeasureValues'][0])
new_df = pd.concat([new_df, pd.DataFrame.from_dict(d, orient='index').T])
Amount TradeId
0 -18.64 17304M
0 -18.64 17304M
I parsed a .xlsx file to a pandas dataframe and desire converting to a list of tuples. The pandas dataframe has two columns.
The list of tuples requires the product_id grouped with the transaction_id. I saw a post on creating a pandas dataframe to list of tuples, but the code result grouped with transaction_id grouped with `product_id.
How can I get the list of tuples in the desired format on the bottom of the page?
import pandas as pd
import xlrd
#Import data
trans = pd.ExcelFile('/Users/Transactions.xlsx')
#parse xlsx file into dataframe
transdata = trans.parse('Orders')
#view dataframe
#print transdata
transaction_id product_id
0 20001 48165
1 20001 48162
2 20001 48166
3 20004 48815
4 20005 48165
transdata = trans.parse('Orders')
#Create tuple
trans_set = [tuple(x) for x in subset.values]
print trans_set
[(20001, (48165), (20001, 48162), (20001, 48166), (20004, 48815), (20005, 48165)]
Desired Result:
[(20001, [48165, 48162, 48166]), (20004, 48815), (20005, 48165)]
trans_set = [(key,list(grp)) for key, grp in
transdata.groupby(['transaction_id'])['product_id']]
In [268]: trans_set
Out[268]: [(20001, [48165, 48162, 48166]), (20004, [48815]), (20005, [48165])]
This is a little different than your desired result -- note the (20004, [48815]), for example -- but I think it is more consistent. The second item in each tuple is a list of all the product_ids which are associate with the transaction_id. It might consist of only one element, but it is always a list.
To write trans_set to a CSV, you could use the csv module:
import csv
with open('/tmp/data.csv', 'wb') as f:
writer = csv.writer(f)
for key, grp in trans_set:
writer.writerow([key]+grp)
yields a file, /tmp/data.csv, with content:
20001,48165,48162,48166
20004,48815
20005,48165