I have a Pandas CSV file and I would like to know how to create a Python user search function to find a row. Below is a sample of the CSV -
I would like to create a function whereby it ask the user for the ICAO code, which is one of the columns, then it returns the whole row of information. For example if someone typed EHAM it would return all the information in that row (Position, ICAO, Airport, Country, Total Movements, Position Change in the last 24hrs)
As a bonus but I am not sure it is possible, I would also love to show the 2 rows above and 2 rows below the requested search when displaying the results. So for example it would show not only EHAM, but also EDDF, EGKK (2 rows above) and also KBOS and KATL (2 rows below)
Try the following code for the exact match. Suppose your CSV file is stored in a data frame called df
icao = input("Enter the ICAO code: ")
print(df[df["ICAO"] == icao]])
Moreover, if you believe that the input value has at least one instance in the data frame, and indices are in an incremental order, you can try the following one:
icao = input("Enter the ICAO code: ")
requested = df[df["ICAO"] == icao]]
print(df.iloc[requested.index[0] - 2 : requested.index[0] + 3, :])
Furthermore, you have to decide what do you want to do for the instances in the first two or last two rows.
Related
It's my first time working with Panda, so I am trying to wrap my head around all of its functionalities.
Essentially, I want to download my bank statements in CSV and search for a keyword (e.g. steam) and compute the money I spent.
I was able to use panda to locate lines that contain my keyword, but I do not know how to iterate through them and attribute the cost of that purchase to a variable that I will sum up as the iteration grows.
If you look in the image I upload, I am able to find the lines containing my keyword in the dataframe, but what I want to do is for each line found, I want to take the content of the col1 and sum it up together.
Attempt At Code
# importing pandas module
import pandas as pd
keyword = input("Enter the keyword you wish to search in the statement: ")
# reading csv file from url
df = pd.read_csv('accountactivity.csv',header=None)
dff=df.loc[df[1].str.contains(keyword,case=False)]
value=df.values[68][2] #Fetches value of a specific cell in the CSV/dataframe created
print(dff)
print(value)
EDIT:
I essentially was almost able to complete the code I wanted, using only the CSV reader, but I can't get that code to find substrings. It only works if I enter the exact same string, meaning if I enter netflix it doesn't work, I would need to write it exactly as it appears on the statement like NETFLIX.COM _V. Here is another screenshot of that working code. I essentially want to mimic that with the capabilities of just finding substrings.
Working Code using CSV reader
import csv
data=[]
with open("accountactivity.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
keyword = input("Enter the keyword you wish to search in the statement: ")
col = [x[1] for x in data]
Sum = 0
if keyword in col:
for x in range(0, len(data)):
if keyword == data[x][1]:
PartialSum=float(data[x][2])
Sum=Sum+PartialSum
print(data[x][1])
print("The sum for expenses at ",keyword," is of: ",Sum,"$",sep = '')
else:
print("Keyword returned no results.")
The format of the CSV is the following: CSV Format
column 0 Date of transaction
column 1 Name of transaction
column 2 Money spent from account
column 3 Money received to account
The CSV file downloaded directly from my bank has no headers. So I refer to columns using col[0] etc...
Thanks for your help, I will continue meanwhile to look at how to potentially do this.
dff[dff.columns[col_index]].sum()
where col_index is the index of the column you want to sum together.
Thanks everyone for your help. I ended up understanding more how dataframe with Pandas work and I used the command: df[df.columns["col_index"]].sum() (which was suggested to me by Jonny Kong) with the column of interest (which in my case is column 2 containing my expenses). It computes the sum of my expenses for the searched keyword which is what I need!
#Importing pandas module
import pandas as pd
#Keyword searched through bank statement
keyword = input("Enter the keyword you wish to search in the statement: ")
#Reading the bank statement CSV file
df = pd.read_csv('accountactivity.csv',header=None)
#Creating dataframe from bank statement with lines that match search keyword
dff=df.loc[df[1].str.contains(keyword,case=False)]
#Sum the column which contains total money spent on the keyword searched
Sum=dff[dff.columns[2]].sum()
#Prints the created dataframe
print("\n",dff,"\n")
#Prints the sum of expenses for the keyword searched
print("The sum for expenses at ",keyword," is of: ",Sum,"$",sep = '')
Working Code!
Again, thanks everyone for helping and supporting me through my first post on SO!
Lets say I have a csv as such:
make,model,UCity,UHighway,year
Alfa Romeo,Spider Veloce 2000,23.3333,35,1985
Ferrari,Testarossa,11,19,1985
Dodge,Charger,29,47,1985
Dodge,B150/B250 Wagon 2WD,12.2222,16.6667,1985
I want to access the 'Ucity' and 'Uhighway' columns of specific rows based on user input.
So lets say the user inputs 'dodge' as the make and 'charger' as the model and '1985' as the year. Using the users input how would I then access '29' and '47' which is the respective 'Ucity' and 'Uhighway' if possible. I appreciate any and all feedback thank you!
you can use pandas to read in the csv, and go from there
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
The code should be something along the lines of the following. Please forgive me for not having much time to do thorough testing to see if the code works, but you can refer to the documentation above possibly to see how you can subset and extract data from dataframes
import pandas as pd
df = pd.read_csv('/path/to/file.csv')
make = input('enter the make:')
model = input('enter the model:')
subset = df[(df['make'] == make) & (df['model'] == model)]
The above code subsets the dataframe to the row with the desired data
we can now call the specific columns as follows
# We know that there will only be one row
ucity = subset.UCity.values[0]
uhwy = subset.UHighway.values[0]
Now i have the desired values stored in variables and i can do whatever i need with them, like, maybe store them in a dictionary/json? depending on what you need to do with the output
{
"UCity" : ucity,
"UHighway": uhwy
}
Hi I am trying to make a program that takes any CSV.file and puts it into a list to read from. The row for the CSV.file will be displayed to the user for them to choose what they would like to do. Their decision will take them to a specific row and it will keep doing this till the user ends the program. The row they are taken to needs to be decided by the CSV.file and not user input. In the CSV.file there will be numbers at the end of the row showing which row to print out based on the user input. For example eat, sleep, run, walk 1, 2, 3, 4. So if they choose eat print row 1 if they choose sleep print row 2 and so on. But it cannot be hard coded because different CSV.files can be used in the program.
So im not sure if I should take the CSV.file and separate the integers from the text into two separate lists and then reference and have them reference each other?
Hello you have a good plan for thinking
import csv
# For the average
from statistics import mean
def calculate_averages(input_file_name, output_file_name):
with open("pr.file1.csv") as f:
red=csv.reader(f)
for row in red:
name=row[0]
aray1=row[1:]
aray2=[]
for number in aray1:
aray2.append(int(number))
return mean(aray2)
If this helped you please mark it as the answer to your question!
First, read the CSV file into a data frame then you'll have all the info in a list like data structure. Use pandas CSV => to dataframe. Use pandas.df.iloc[0] to get the row and specific column. This is what I understood from your question. It will go something like below:
import pandas as pd
def getRow(df, i, j): # use this function loop over it to get a specific row
print(df.iloc[i])
print(df.iloc[i, j])
df = pd.read_csv("filename.csv")
while(True):
i = int(input("Row Index")) # Loop over these two lines to continuously do that
j = int(input("Column Index"))
getRow(df, i, j)
I have two csv files which have user name and their different Id's. According to the users input I should be able to switch their Id's and retrieve it. Eg. the user inputs the student ID and wants the employee Id1 of a particular person name I should be able to retrieve it. I'm still very new to programming so I don't know much about programming. How do I do this? help me pleasee
A sample of my file
Name, Student_id,EmployeeId1, Employee_Id2
import pandas as pd
df = pd.read_csv("my1.csv")
colls = input("Enter the column")
roww = input("Enter the row")
df.loc ["roww","colls"]
The user will not know the rows and columns but I just wanted to try but it didn't work.
You are looking for the row with label "roww" rather than what was input in the variable roww.
Try removing the quote marks, i.e.:
df.loc [roww, colls]
You might also have a problem that your row might be a number, and by default the value from input is a string. In this case try:
df.loc [int(roww),colls]
So for example, if your csv file was:
Name, Student_id,EmployeeId1, Employee_Id2
Bob, 1, 11, 12
Steve, 2, 21, 22
Then with input Name for column and 1 for row, you'd get Steve as the answer (since the rows start from 0.
I have many files in a folder that like this one:
enter image description here
and I'm trying to implement a dictionary for data. I'm interested in create it with 2 keys (the first one is the http address and the second is the third field (plugin used), like adblock). The values are referred to different metrics so my intention is to compute the for each site and plugin the mean,median and variance of each metric, once the dictionary has been implemented. For example for the mean, my intention is to consider all the 4-th field values in the file, etc. I tried to write this code but, first of all, I'm not sure that it is correct.
enter image description here
I read others posts but no-one solved my problem, since they threats or only one key or they don't show how to access the different values inside the dictionary to compute the mean,median and variance.
The problem is simple, admitting that the dictionary implementation is ok, in which way must I access the different values for the key1:www.google.it -> key2:adblock ?
Any kind oh help is accepted and I'm available for any other answer.
You can do what you want using a dictionary, but you should really consider using the Pandas library. This library is centered around tabular data structure called "DataFrame" that excels in column-wise and row-wise calculations such as the one that you seem to need.
To get you started, here is the Pandas code that reads one text file using the read_fwf() method. It also displays the mean and variance for the fourth column:
# import the Pandas library:
import pandas as pd
# Read the file 'table.txt' into a DataFrame object. Assume
# a header-less, fixed-width file like in your example:
df = pd.read_fwf("table.txt", header=None)
# Show the content of the DataFrame object:
print(df)
# Print the fourth column (zero-indexed):
print(df[3])
# Print the mean for the fourth column:
print(df[3].mean())
# Print the variance for the fourth column:
print(df[3].var())
There are different ways of selecting columns and rows from a DataFrame object. The square brackets [ ] in the previous examples selected a column in the data frame by column number. If you want to calculate the mean of the fourth column only from those rows that contain adblock in the third column, you can do it like so:
# Print those rows from the data frame that have the value 'adblock'
# in the third column (zero-indexed):
print(df[df[2] == "adblock"])
# Print only the fourth column (zero-indexed) from that data frame:
print(df[df[2] == "adblock"][3])
# Print the mean of the fourth column from that data frame:
print(df[df[2] == "adblock"][3].mean())
EDIT:
You can also calculate the mean or variance for more than one column at the same time:
# Use a list of column numbers to calculate the mean for all of them
# at the same time:
l = [3, 4, 5]
print(df[l].mean())
END EDIT
If you want to read the data from several files and do the calculations for the concatenated data, you can use the concat() method. This method takes a list of DataFrame objects and concatenates them (by default, row-wise). Use the following line to create a DataFrame from all *.txt files in your directory:
df = pd.concat([pd.read_fwf(file, header=None) for file in glob.glob("*.txt")],
ignore_index=True)