It's my first time working with Panda, so I am trying to wrap my head around all of its functionalities.
Essentially, I want to download my bank statements in CSV and search for a keyword (e.g. steam) and compute the money I spent.
I was able to use panda to locate lines that contain my keyword, but I do not know how to iterate through them and attribute the cost of that purchase to a variable that I will sum up as the iteration grows.
If you look in the image I upload, I am able to find the lines containing my keyword in the dataframe, but what I want to do is for each line found, I want to take the content of the col1 and sum it up together.
Attempt At Code
# importing pandas module
import pandas as pd
keyword = input("Enter the keyword you wish to search in the statement: ")
# reading csv file from url
df = pd.read_csv('accountactivity.csv',header=None)
dff=df.loc[df[1].str.contains(keyword,case=False)]
value=df.values[68][2] #Fetches value of a specific cell in the CSV/dataframe created
print(dff)
print(value)
EDIT:
I essentially was almost able to complete the code I wanted, using only the CSV reader, but I can't get that code to find substrings. It only works if I enter the exact same string, meaning if I enter netflix it doesn't work, I would need to write it exactly as it appears on the statement like NETFLIX.COM _V. Here is another screenshot of that working code. I essentially want to mimic that with the capabilities of just finding substrings.
Working Code using CSV reader
import csv
data=[]
with open("accountactivity.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
keyword = input("Enter the keyword you wish to search in the statement: ")
col = [x[1] for x in data]
Sum = 0
if keyword in col:
for x in range(0, len(data)):
if keyword == data[x][1]:
PartialSum=float(data[x][2])
Sum=Sum+PartialSum
print(data[x][1])
print("The sum for expenses at ",keyword," is of: ",Sum,"$",sep = '')
else:
print("Keyword returned no results.")
The format of the CSV is the following: CSV Format
column 0 Date of transaction
column 1 Name of transaction
column 2 Money spent from account
column 3 Money received to account
The CSV file downloaded directly from my bank has no headers. So I refer to columns using col[0] etc...
Thanks for your help, I will continue meanwhile to look at how to potentially do this.
dff[dff.columns[col_index]].sum()
where col_index is the index of the column you want to sum together.
Thanks everyone for your help. I ended up understanding more how dataframe with Pandas work and I used the command: df[df.columns["col_index"]].sum() (which was suggested to me by Jonny Kong) with the column of interest (which in my case is column 2 containing my expenses). It computes the sum of my expenses for the searched keyword which is what I need!
#Importing pandas module
import pandas as pd
#Keyword searched through bank statement
keyword = input("Enter the keyword you wish to search in the statement: ")
#Reading the bank statement CSV file
df = pd.read_csv('accountactivity.csv',header=None)
#Creating dataframe from bank statement with lines that match search keyword
dff=df.loc[df[1].str.contains(keyword,case=False)]
#Sum the column which contains total money spent on the keyword searched
Sum=dff[dff.columns[2]].sum()
#Prints the created dataframe
print("\n",dff,"\n")
#Prints the sum of expenses for the keyword searched
print("The sum for expenses at ",keyword," is of: ",Sum,"$",sep = '')
Working Code!
Again, thanks everyone for helping and supporting me through my first post on SO!
Related
I have a CSV file, which has several columns and several rows. Please, see the picture above. In the picture is shown just the two first baskets, but in the original CSV -file I have hundreds of them.
[1]: https://i.stack.imgur.com/R2ZTo.png
I would like to calculate average for every Fruit in every Basket using Python. Here is my code but it doesn't seem to work as it should be. Better ideas? I have tried to fix this also importing and using numpy but I didn't succeed with it.
I would appreciate any help or suggestions! I'm totally new in this.
import csv
from operator import itemgetter
fileLineList = []
averageFruitsDict = {} # Creating an empty dictionary here.
with open('Fruits.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
fileLineList.append(row)
for column in fileLineList:
highest = 0
lowest = 0
total = 0
average = 0
for column in row:
if column.isdigit():
column = int(column)
if column > highest:
highest = column
if column < lowest or lowest == 0:
lowest = column
total += column
average = total / 3
averageFruitsDict[row[0]] = [highest, lowest, round(average)]
averageFruitsList = []
for key, value in averageFruitsDict.items():
averageFruitsList.append([key, value[2]])
print('\nFruits in Baskets\n')
print(averageFruitsList)
--- So I'm know trying with this code:
import pandas as pd
fruits = pd.read_csv('fruits.csv', sep=';')
print(list(fruits.columns))
fruits['Unnamed: 0'].fillna(method='ffill', inplace = True)
fruits.groupby('Unnamed: 0').mean()
fruits.groupby('Bananas').mean()
fruits.groupby('Apples').mean()
fruits.groupby('Oranges').mean()
fruits.to_csv('results.csv', index=False)
It creates a new CSV file for me and it looks correct, I don't get any errors but I can't make it calculate the mean of every fruit for every basket. Thankful of all help!
So using the image you posted and replicating/creating an identical test csv called fruit - I was able to create this quick solution using pandas.
import pandas as pd
fruit = pd.read_csv('fruit.csv')
With the unnamed column containing the basket numbers with NaNs in between - we fill with the preceding value. By doing so we are then able to group by the basket number (by using the 'Unnamed: 0' column and apply the mean to all other columns)
fruit['Unnamed: 0'].fillna(method='ffill', inplace = True)
fruit.groupby('Unnamed: 0').mean()
This gets you your desired output of a fruit average for each basket (please note I made up values for basket 3)
I have a Pandas CSV file and I would like to know how to create a Python user search function to find a row. Below is a sample of the CSV -
I would like to create a function whereby it ask the user for the ICAO code, which is one of the columns, then it returns the whole row of information. For example if someone typed EHAM it would return all the information in that row (Position, ICAO, Airport, Country, Total Movements, Position Change in the last 24hrs)
As a bonus but I am not sure it is possible, I would also love to show the 2 rows above and 2 rows below the requested search when displaying the results. So for example it would show not only EHAM, but also EDDF, EGKK (2 rows above) and also KBOS and KATL (2 rows below)
Try the following code for the exact match. Suppose your CSV file is stored in a data frame called df
icao = input("Enter the ICAO code: ")
print(df[df["ICAO"] == icao]])
Moreover, if you believe that the input value has at least one instance in the data frame, and indices are in an incremental order, you can try the following one:
icao = input("Enter the ICAO code: ")
requested = df[df["ICAO"] == icao]]
print(df.iloc[requested.index[0] - 2 : requested.index[0] + 3, :])
Furthermore, you have to decide what do you want to do for the instances in the first two or last two rows.
Hi I am trying to make a program that takes any CSV.file and puts it into a list to read from. The row for the CSV.file will be displayed to the user for them to choose what they would like to do. Their decision will take them to a specific row and it will keep doing this till the user ends the program. The row they are taken to needs to be decided by the CSV.file and not user input. In the CSV.file there will be numbers at the end of the row showing which row to print out based on the user input. For example eat, sleep, run, walk 1, 2, 3, 4. So if they choose eat print row 1 if they choose sleep print row 2 and so on. But it cannot be hard coded because different CSV.files can be used in the program.
So im not sure if I should take the CSV.file and separate the integers from the text into two separate lists and then reference and have them reference each other?
Hello you have a good plan for thinking
import csv
# For the average
from statistics import mean
def calculate_averages(input_file_name, output_file_name):
with open("pr.file1.csv") as f:
red=csv.reader(f)
for row in red:
name=row[0]
aray1=row[1:]
aray2=[]
for number in aray1:
aray2.append(int(number))
return mean(aray2)
If this helped you please mark it as the answer to your question!
First, read the CSV file into a data frame then you'll have all the info in a list like data structure. Use pandas CSV => to dataframe. Use pandas.df.iloc[0] to get the row and specific column. This is what I understood from your question. It will go something like below:
import pandas as pd
def getRow(df, i, j): # use this function loop over it to get a specific row
print(df.iloc[i])
print(df.iloc[i, j])
df = pd.read_csv("filename.csv")
while(True):
i = int(input("Row Index")) # Loop over these two lines to continuously do that
j = int(input("Column Index"))
getRow(df, i, j)
I am trying to make a leader board for my python program
I have sorted out writing the different scores to the leader board already
However, I am having trouble finding a way that I can sort this data
(Highest score at the top and lowest at the bottom)
Also, I am sorry but I do not have any code that is even vaguely functional, everything I have tried has just been incorrect
Also I only have limited access to modules as it is for a school project which makes it even harder for me (I have CSV,Random,Time,)
Thank you so much
I would really appreciate any help I can recieve
You can read in the file with pandas, sort it by a column, and overwrite the old csv with the new values. The code would look similar to this:
import pandas as pd
path = your_file_path
df = pd.read_csv(path)
df = df.sort_values(by=["column_name"], ascending=False)
df.to_csv(path)
This problem can be done in 3 parts using standard Python:
Read all of the data (assuming it has a header row). A csv_reader() is used to parse your file and read in each row as a list of values. By calling list() it will read all rows as a list of rows.
Sort the data
Write all of the data (add back the header first), this time using a csv.writer() to automatically take your list of rows and write the correct format to the file.
This can be done using Python's csv library which you say you can use. Secondly you need to tell the sort() function how to sort your rows. In this example it assumes the scores are in the second column. The csv library will read each row as a list of values (starting from 0), so the score in this example is column 1. The key parameter gives sort() a function to call for each row that it is sorting. The function receives a row and returns which parts of the row to sort on, that way you don't have to sort on the first column. lambda is just shorthand for writing a single line function, it takes a parameter x and returns the elements from the row to sort on. Here we use a Python tuple to return two elements, the score and the name. First convert the score string x[1] into an integer. Adding a - will make the highest score sort to the top. x[0] then uses the Name column to sort for cases where two scores are the same:
import csv
with open('scores.csv', newline='') as f_input:
csv_input = csv.reader(f_input)
header = next(csv_input)
data = list(csv_input)
data.sort(key=lambda x: (-int(x[1]), x[0]))
with open('scores_sorted.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
csv_output.writerows(data)
So for a sample CSV file containing:
name,score
fred,5
wilma,10
barney,8
betty,4
dino,10
You would get a sorted output CSV looking like:
name,score
dino,10
wilma,10
barney,8
fred,5
betty,4
Note, dino and wilma both have the same score, but dino is alphabetically first.
This assumes you are using Python 3.x
I'm attempting to load numerical data from CSV files in order to loop through the calculated data of each stock(file) separately and determine if the calculated value is greater than a specific number (731 in this case). However, the method I am using seems to make Python repeat the list as well as add quotation marks around the numbers ('500'), as an example, making them strings. Unfortunately, I think the final "if" statement can't handle this and as a result it doesn't seem to function appropriately. I'm not sure what's going on and why Python what I need to do to get this code running properly.
import csv
stocks = ['JPM','PG','GOOG','KO']
for stock in stocks:
Data = open("%sMin.csv" % (stock), 'r')
stockdata = []
for row in Data:
stockdata.extend(map(float, row.strip().split(',')))
stockdata.append(row.strip().split(',')[0])
if any(x > 731 for x in stockdata):
print "%s Minimum" % (stock)
Currently you're adding all columns of each row to a list, then adding to the end of that, the first column of the row again? So are all columns significant, or just the first?
You're also loading all data from the file before the comparison but don't appear to be using it anywhere, so I guess you can shortcut earlier...
If I understand correctly, your code should be this (or amend to only compare first column).
Are you basically writing this?
import csv
STOCKS = ['JPM', 'PG', 'GOOG', 'KO']
for stock in STOCKS:
with open('{}Min.csv'.format(stock)) as csvin:
for row in csv.reader(csvin):
if any(col > 731 for col in map(float, row)):
print '{} minimum'.format(stock)
break