My code takes a bank statement from Excel and creates a dataframe that categorises each transaction based on description:
import pandas as pd
import openpyxl
import datetime as dt
import numpy as np
dff = pd.DataFrame({'Date': ['20221003', '20221005'],
'Tran Type': ['BOOK TRANSFER CREDIT', 'ACH DEBIT'],
'Debit Amount': [0.00, -220000.00],
'Credit Amount': [182.90, 0.0],
'Description': ['BOOK TRANSFER CREDIT FROM ACCOUNT 98743987', 'USREF2548 ACH OFFSET'],
'Amount': [-220000.00, 182.90]})
import re
dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE)
But this code will not work. Any ideas why?
pivotf = dff
pivotf = pd.pivot_table(pivotf,
index=["Date"], columns="Category",
values=['Amount'],
margins=False, margins_name="Total")
The error message is TypeError: unhashable type: 'list'
When I change columns from "Category" to anything else, it works fine.
Thanks!
Add this line before executing (untested):
import numpy as np
dff['category'] = [x[0] if not x.isempty() else np.nan for x in dff['category']]
This will make sure your category is not a list (which can't be hashed).
Related
I want to make the names of some stock symbols the actual name of a pandas dataframe.
import pandas as pd
import pandas_datareader.data as pdr
choices = ['ROK', 'HWM', 'PYPL', 'V', 'KIM', 'FISV', 'REG', 'EMN', 'GS', 'TYL']
for c in choices:
pdr.DataReader(c, data_source='yahoo', start=datetime(2000,1,1),
end=datetime(2020,1,1)).to_csv(f'Data/{c}.csv')
f'{c}'['Price'] = pd.read_csv(f'Data/{c}.csv', index_col='Date')['Adj Close']
I'm getting this error:
TypeError: 'str' object does not support item assignment
Is there a way to go about doing this? Maybe perhaps using the name of the stock symbol as the name of the dataframe is not the best convention.
Thank you
You can put it in a data structure as a dictionary.
import pandas as pd
import pandas_datareader.data as pdr
choices = ['ROK', 'HWM', 'PYPL', 'V', 'KIM', 'FISV', 'REG', 'EMN', 'GS', 'TYL']
dataframes = {}
for c in choices:
pdr.DataReader(c, data_source='yahoo', start=datetime(2000,1,1),
end=datetime(2020,1,1)).to_csv(f'Data/{c}.csv')
dataframes[c] = pd.read_csv(f'Data/{c}.csv', index_col='Date')['Adj Close']
So, you will get a structure like the one bellow:
>>> print(dataframes)
{'ROK': <your_ROK_dataframe_here>,
'HWM': <your_HWM_dataframe_here>,
...
}
Then, you can access a specific dataframe by using dataframes['XXXX'] where XXXX is one of the choices.
You shouldn't be storing variables with string as it can get quite messy down the line. If you wanted to keep with your convention I'd advise storing your dataframes as a dictionary with the stock symbols as a key
choices = ['ROK', 'HWM', 'PYPL', 'V', 'KIM', 'FISV', 'REG', 'EMN', 'GS', 'TYL']
choices_dict = {}
for c in choices:
pdr.DataReader(c, data_source='yahoo', start=datetime(2000,1,1),
end=datetime(2020,1,1)).to_csv(f'Data/{c}.csv')
csv_pd = pd.read_csv(f'Data/{c}.csv', index_col='Date')['Adj Close']
choices_dict[c] = pd.DataFrame(csv_pd, columns=['Price'])
Apologies, I didn't even know how to title/describe the issue I am having, so bear with me. I have the following code:
import pandas as pd
data = {'Invoice Number':[1279581, 1279581,1229422, 1229422, 1229422],
'Project Key':[263736, 263736, 259661, 259661, 259661],
'Project Type': ['Visibility', 'Culture', 'Spend', 'Visibility', 'Culture']}
df= pd.DataFrame(data)
How do I get the output to basically group the Invoice Numbers so that there is only 1 row per Invoice Number and combine the multiple Project Types (per that 1 Invoice) into 1 row?
Code and output for output is below.
Thanks much appreciated.
import pandas as pd
data = {'Invoice Number':[1279581,1229422],
'Project Key':[263736, 259661],
'Project Type': ['Visibility_Culture', 'Spend_Visibility_Culture']
}
output = pd.DataFrame(data)
output
>>> (df
.groupby(['Invoice Number', 'Project Key'])['Project Type']
.apply(lambda x: '_'.join(x))
.reset_index()
)
Invoice Number Project Key Project Type
0 1229422 259661 Spend_Visibility_Culture
1 1279581 263736 Visibility_Culture
Why am I getting a TypeError: unhashable type: numpy.ndarray error? Also, I don't recall importing numpy into my code I what is numpy.ndarray doing? The error is in the last line of the codes
import pandas as pd
import matplotlib.pyplot as plt
entries_csv = "C:\\Users\\Asus\\Desktop\\Entries.csv"
listofaccounts_csv = "C:\\Users\\Asus\\Desktop\\List of Accounts.csv"
data_entries = pd.read_csv(entries_csv)
data_listofaccounts = pd.read_csv(listofaccounts_csv)
i = 0
summary_name = [0]*len(data_listofaccounts)
summary = [0]*1*len(data_listofaccounts)
for account_name in data_listofaccounts['Account Name']:
summary_name[i] = account_name
for debit_account in data_entries['DEBIT ACCOUNT']:
if account_name == debit_account:
summary[i] += data_entries['DEBIT AMOUNT']
i += 1
plt.bar(list(summary_name), list(summary))
These are the data:
1.) Entries:
2.) List of Accounts:
Basically for each item in list of accounts, I want to make a summary where all the debit amounts will sum for each type of account
I think in this case you really want to utilize the pd.merge functionality between your two dataframes. See here: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.merge.html . Once you have joined the two tables you want to groupby according to the Account Name and perform your aggregations. So for example:
list_of_accounts_df = pd.DataFrame({
'Account Name': ['ACCOUNT PAYABLE', 'OUTSIDE SERVICE'],
'Type': ['CURRENT LIABILITY', 'EXPENSE']
})
entries_df = pd.DataFrame({
'DEBIT ACCOUNT':['OUTSIDE SERVICE', 'OUTSIDE SERVICE'],
'DEBIT AMOUNT': [46375.8, 42091.42] ,
'CREDIT ACCOUNT':['CASH IN BANK', 'CASH ON HAND'],
'CREDIT AMOUNT':[46375.8, 42091.42]
})
pd.merge(list_of_accounts_df, entries_df, left_on='Account Name', right_on='DEBIT ACCOUNT', how='left').fillna(0).groupby('Account Name')['DEBIT AMOUNT'].sum()
The output becomes a series where each index is the Account Name, and the value is the sum of all the debit amounts for that series. So in this case:
Account Name
ACCOUNT PAYABLE 0.00
OUTSIDE SERVICE 88467.22
And then regarding your question of how to plot it, for bar plots, you cannot directly provide string values for the x or y-axis.
Using this example: https://pythonspot.com/matplotlib-bar-chart/, in our case you can just do:
objects = x.index.values
y_pos = range(len(objects)
vals = x.values
plt.bar(y_pos, vals, align='center')
plt.xticks(y_pos, objects)
plt.ylabel('Sum of Debits')
plt.title('Total Debits Per Account')
plt.show()
Which gives this in our simple example:
I am having a list as
a=[{'name': 'xyz','inv_name':'asd','quant':300,'amt':20000, 'current':30000},{'name': 'xyz','inv_name':'asd','quant':200,'amt':2000,'current':3000}]
This list i have fetched using itertools groupby.
I want to form a list after adding up the quant, amt and current filed for same name and inv_name and create a list something like : [{'name':'xyz','inv_name':'asd','quant':500,'amt':22000,'current':33000}
Any suggestions on how to achieve this?
If you are happy using a 3rd party library, pandas accepts a list of dictionaries:
import pandas as pd
a=[{'name': 'xyz','inv_name':'asd','quant':300,'amt':20000, 'current':30000},
{'name': 'xyz','inv_name':'asd','quant':200,'amt':2000,'current':3000}]
df = pd.DataFrame(a)
res = df.groupby(['name', 'inv_name'], as_index=False).sum().to_dict(orient='records')
# [{'amt': 22000,
# 'current': 33000,
# 'inv_name': 'asd',
# 'name': 'xyz',
# 'quant': 500}]
I am trying to get the mean value for a list of percentages from an Excel file which has data. My current code is as follows:
import numpy as pd
data = pd.DataFrame =({'Percentages': [.20, .10, .05], 'Nationality':['American', 'Mexican', 'Russian'],
'Gender': ['Male', 'Female'], 'Question': ['They have good looks']})
pref = data[data.Nationality == 'American']
prefPref = pref.pivot_table(data.Percentage.mean(), index=['Question'], column='Gender')
The error is coming from where I try to get the .mean() from my ['Percentage'] list. So, how can I get the mean from the list of Percentages? Do I need to create a variable for the mean value, and if so how to I implement that into the code?
["Percentage"] is a list containging the single string item "Percentage". It isn't possible to calculate a mean from lists of text.
In addition, the method .mean() doesn't exist in Python for generic lists, have a look at numpy for calculating means and other mathematical operations.
For example:
import numpy
numpy.array([4,2,6,5]).mean()
Here is a reworked version of your pd.pivot_table. See also How to pivot a dataframe.
import pandas as pd, numpy as np
data = pd.DataFrame({'Percentages': [0.20, 0.10, 0.05],
'Nationality': ['American', 'American', 'Russian'],
'Gender': ['Male', 'Female', 'Male'],
'Question': ['Q1', 'Q2', 'Q3']})
pref = data[data['Nationality'] == 'American']
prefPref = pref.pivot_table(values='Percentages', index='Question',\
columns='Gender', aggfunc='mean')
# Gender Female Male
# Question
# Q1 NaN 0.2
# Q2 0.1 NaN