Say that I'm given a dataframe that summarizes different companies:
summary=pandas.DataFrame(columns=['Company Name', 'Formation Date', 'Revenue', 'Profit', 'Loss'])
And then say each company in that dataframe has its own corresponding dataframe, named after the company, giving a more in-depth picture of the company's history and stats. Something like:
exampleco=pandas.Dataframe(columns=['Date', 'Daily Profit', 'Daily Loss', 'Daily Revenue'])
I have a script that processes each row of the summary dataframe, but I would like to grab the name from row['Company Name'] and use it to access the company's own dataframe.
In other words I'd love it if there was something that worked like this:
.
.
>>> company=row['Company Name']
>>> pandas.get_dataframe_from_variable(company)
Empty DataFrame
Columns: ['Date', 'Daily Profit', 'Daily Loss', 'Daily Revenue']
Index: []
[0 rows x 2 columns]
.
.
Any ideas of how I might get this to work would be much appreciated.
Thanks in advance!
You can use a dictionary to contain your DataFrames and use strings as the keys.
companies = {'company1':pandas.DataFrame(columns=['Date', 'Daily Profit',
'Daily Loss', 'Daily Revenue']),
'company2':pandas.DataFrame(columns=['Date', 'Daily Profit',
'Daily Loss', 'Daily Revenue'])}
company=row['Company Name'] # Get your company name as a string from your summary.
company_details = companies[company] # Returns a DataFrame.
Related
Apologies, I didn't even know how to title/describe the issue I am having, so bear with me. I have the following code:
import pandas as pd
data = {'Invoice Number':[1279581, 1279581,1229422, 1229422, 1229422],
'Project Key':[263736, 263736, 259661, 259661, 259661],
'Project Type': ['Visibility', 'Culture', 'Spend', 'Visibility', 'Culture']}
df= pd.DataFrame(data)
How do I get the output to basically group the Invoice Numbers so that there is only 1 row per Invoice Number and combine the multiple Project Types (per that 1 Invoice) into 1 row?
Code and output for output is below.
Thanks much appreciated.
import pandas as pd
data = {'Invoice Number':[1279581,1229422],
'Project Key':[263736, 259661],
'Project Type': ['Visibility_Culture', 'Spend_Visibility_Culture']
}
output = pd.DataFrame(data)
output
>>> (df
.groupby(['Invoice Number', 'Project Key'])['Project Type']
.apply(lambda x: '_'.join(x))
.reset_index()
)
Invoice Number Project Key Project Type
0 1229422 259661 Spend_Visibility_Culture
1 1279581 263736 Visibility_Culture
I have a dataframe that looks something like this
import pandas as pd
sectors = [['Industrials', 'Health Care', 'Information Technology', 'Industrials'], ['Health Care', 'Health Care', 'Information Technology'], ['Industrials', 'Information Technology', 'Health Care', 'Information Technology', 'Information Technology'], ['Information Technology', 'Health Care']]
some_date = ['2015-12-01', '2016-01-05', '2016-02-01', '2016-03-01']
somelist = []
for i in range(len(some_date)):
somelist.append((some_date[i], sectors[i]))
df = pd.DataFrame(somelist, columns = ['date', 'sectors'])
I would like to create a plt.stackplot where the X-axis is the date and the Y-axis is number of times any sector is mentioned.
The problem is that it's strings and not integers, one approach could be to iterate through each row of the DataFrame and count how many times each sector is mentioned for each date, but I don't always know the names of the sectors I have so I'm wondering if there's a more efficient way to solve this?
I tried to plot a plt.pie by using df['sectors'].sum() to check how many times throughout the complete date-range each sector is mentioned, but for this I would also somehow need to convert the strings.
Not sure how efficient this is, but I fixed the data as shown here;
plot_sectors = list(set(df['sectors'].sum()))
plot_sectors = {key: [0]*df.shape[0] for key in plot_sectors}
for i in range(df.shape[0]):
for sector in df.iloc[i]['sectors']:
plot_sectors[sector][i] += 1
For the stacked plot, I used;
y = plot_sectors.values()
x = np.arange(df.shape[0])
plt.stackplot(x,y, labels = plot_sectors.keys())
And for the pie plot I used;
plt.pie([sum(values) for key, values in plot_sectors.items()], autopct='%1.1f%%',
labels=plot_sectors.keys())
plt.axis('equal')
plt.show()
I have a dataframe that is dynamically created.
I create my first set of rows as:
df['tourist_spots'] = pd.Series(<A list of tourist spots in a city>)
To this df I add:
df['city'] = <City Name>
So far so good. A bunch of rows are created with the same city name for multiple tourist spots.
I want to add a new city. So I do:
df['tourist_spots'].append(pd.Series(<new data>))
Now, when I append a new city with:
df['city'].append('new city')
the previously updated city data is gone. It is as if every time the rows are replaced and not appended.
Here's an example of what I want:
Step 1:
df['tourist_spot'] = pd.Series('Golden State Bridge' + a bunch of other spots)
For all the rows created by the above data I want:
df['city'] = 'San Francisco'
Step 2:
df['tourist_spot'].append(pd.Series('Times Square' + a bunch of other spots)
For all the rows created by the above data, I want:
df['city'] = 'New York'
How can I achieve this?
Use dictionary to add rows to your data frame, it is faster method.
Here is an e.g.
STEP 1
Create dictionary:
dict_df = [{'tourist_spots': 'Jones LLC', 'City': 'Boston'},
{'tourist_spots': 'Alpha Co', 'City': 'Boston'},
{'tourist_spots': 'Blue Inc', 'City': 'Singapore' }]
STEP2
Convert dictionary to dataframe:
df = pd.DataFrame(dict_df)
STEP3
Add new entries to dataframe in dictionary format:
df = df.append({'tourist_spots': 'New_Blue', 'City': 'Singapore'}, ignore_index=True)
References:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html
Why am I getting a TypeError: unhashable type: numpy.ndarray error? Also, I don't recall importing numpy into my code I what is numpy.ndarray doing? The error is in the last line of the codes
import pandas as pd
import matplotlib.pyplot as plt
entries_csv = "C:\\Users\\Asus\\Desktop\\Entries.csv"
listofaccounts_csv = "C:\\Users\\Asus\\Desktop\\List of Accounts.csv"
data_entries = pd.read_csv(entries_csv)
data_listofaccounts = pd.read_csv(listofaccounts_csv)
i = 0
summary_name = [0]*len(data_listofaccounts)
summary = [0]*1*len(data_listofaccounts)
for account_name in data_listofaccounts['Account Name']:
summary_name[i] = account_name
for debit_account in data_entries['DEBIT ACCOUNT']:
if account_name == debit_account:
summary[i] += data_entries['DEBIT AMOUNT']
i += 1
plt.bar(list(summary_name), list(summary))
These are the data:
1.) Entries:
2.) List of Accounts:
Basically for each item in list of accounts, I want to make a summary where all the debit amounts will sum for each type of account
I think in this case you really want to utilize the pd.merge functionality between your two dataframes. See here: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.merge.html . Once you have joined the two tables you want to groupby according to the Account Name and perform your aggregations. So for example:
list_of_accounts_df = pd.DataFrame({
'Account Name': ['ACCOUNT PAYABLE', 'OUTSIDE SERVICE'],
'Type': ['CURRENT LIABILITY', 'EXPENSE']
})
entries_df = pd.DataFrame({
'DEBIT ACCOUNT':['OUTSIDE SERVICE', 'OUTSIDE SERVICE'],
'DEBIT AMOUNT': [46375.8, 42091.42] ,
'CREDIT ACCOUNT':['CASH IN BANK', 'CASH ON HAND'],
'CREDIT AMOUNT':[46375.8, 42091.42]
})
pd.merge(list_of_accounts_df, entries_df, left_on='Account Name', right_on='DEBIT ACCOUNT', how='left').fillna(0).groupby('Account Name')['DEBIT AMOUNT'].sum()
The output becomes a series where each index is the Account Name, and the value is the sum of all the debit amounts for that series. So in this case:
Account Name
ACCOUNT PAYABLE 0.00
OUTSIDE SERVICE 88467.22
And then regarding your question of how to plot it, for bar plots, you cannot directly provide string values for the x or y-axis.
Using this example: https://pythonspot.com/matplotlib-bar-chart/, in our case you can just do:
objects = x.index.values
y_pos = range(len(objects)
vals = x.values
plt.bar(y_pos, vals, align='center')
plt.xticks(y_pos, objects)
plt.ylabel('Sum of Debits')
plt.title('Total Debits Per Account')
plt.show()
Which gives this in our simple example:
I have a csv merge that has many columns. I am having trouble formatting price columns.I need to have them follow this format $1,000.00.Is there a function I can use to achieve this for just two columns (Sales Price and Payment Amount)? Here is my code so far:
df3 = pd.merge(df1, df2, how='left', on=['Org ID', 'Org Name'])
cols = ['Org Name', 'Org Type', 'Chapter', 'Join Date', 'Effective Date', 'Expire Date',
'Transaction Date', 'Product Name', 'Sales Price',
'Invoice Code', 'Payment Amount', 'Add Date']
df3 = df3[cols]
df3 = df3.fillna("-")
out_csv = root_out + "report-merged.csv"
df3.to_csv(out_csv, index=False)
A solution that I thought was going to work but I get an error (ValueError: Unknown format code 'f' for object of type 'str')
df3['Sales Price'] = df3['Sales Price'].map('${:,.2f}'.format)
Based on your error ("Unknown format code 'f' for object of type 'str'"), the columns that you are trying to format are being treated as strings. So using .astype(float) in the code below addresses this.
There is not a great way to set this formatting during (within) your to_csv call. However, in an intermediate line you could use:
cols = ['Sales Price', 'Payment Amount']
df3.loc[:, cols] = df3[cols].astype(float).applymap('${:,.2f}'.format)
Then call to_csv.