I have two csv files which have user name and their different Id's. According to the users input I should be able to switch their Id's and retrieve it. Eg. the user inputs the student ID and wants the employee Id1 of a particular person name I should be able to retrieve it. I'm still very new to programming so I don't know much about programming. How do I do this? help me pleasee
A sample of my file
Name, Student_id,EmployeeId1, Employee_Id2
import pandas as pd
df = pd.read_csv("my1.csv")
colls = input("Enter the column")
roww = input("Enter the row")
df.loc ["roww","colls"]
The user will not know the rows and columns but I just wanted to try but it didn't work.
You are looking for the row with label "roww" rather than what was input in the variable roww.
Try removing the quote marks, i.e.:
df.loc [roww, colls]
You might also have a problem that your row might be a number, and by default the value from input is a string. In this case try:
df.loc [int(roww),colls]
So for example, if your csv file was:
Name, Student_id,EmployeeId1, Employee_Id2
Bob, 1, 11, 12
Steve, 2, 21, 22
Then with input Name for column and 1 for row, you'd get Steve as the answer (since the rows start from 0.
Related
It's my first time working with Panda, so I am trying to wrap my head around all of its functionalities.
Essentially, I want to download my bank statements in CSV and search for a keyword (e.g. steam) and compute the money I spent.
I was able to use panda to locate lines that contain my keyword, but I do not know how to iterate through them and attribute the cost of that purchase to a variable that I will sum up as the iteration grows.
If you look in the image I upload, I am able to find the lines containing my keyword in the dataframe, but what I want to do is for each line found, I want to take the content of the col1 and sum it up together.
Attempt At Code
# importing pandas module
import pandas as pd
keyword = input("Enter the keyword you wish to search in the statement: ")
# reading csv file from url
df = pd.read_csv('accountactivity.csv',header=None)
dff=df.loc[df[1].str.contains(keyword,case=False)]
value=df.values[68][2] #Fetches value of a specific cell in the CSV/dataframe created
print(dff)
print(value)
EDIT:
I essentially was almost able to complete the code I wanted, using only the CSV reader, but I can't get that code to find substrings. It only works if I enter the exact same string, meaning if I enter netflix it doesn't work, I would need to write it exactly as it appears on the statement like NETFLIX.COM _V. Here is another screenshot of that working code. I essentially want to mimic that with the capabilities of just finding substrings.
Working Code using CSV reader
import csv
data=[]
with open("accountactivity.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
keyword = input("Enter the keyword you wish to search in the statement: ")
col = [x[1] for x in data]
Sum = 0
if keyword in col:
for x in range(0, len(data)):
if keyword == data[x][1]:
PartialSum=float(data[x][2])
Sum=Sum+PartialSum
print(data[x][1])
print("The sum for expenses at ",keyword," is of: ",Sum,"$",sep = '')
else:
print("Keyword returned no results.")
The format of the CSV is the following: CSV Format
column 0 Date of transaction
column 1 Name of transaction
column 2 Money spent from account
column 3 Money received to account
The CSV file downloaded directly from my bank has no headers. So I refer to columns using col[0] etc...
Thanks for your help, I will continue meanwhile to look at how to potentially do this.
dff[dff.columns[col_index]].sum()
where col_index is the index of the column you want to sum together.
Thanks everyone for your help. I ended up understanding more how dataframe with Pandas work and I used the command: df[df.columns["col_index"]].sum() (which was suggested to me by Jonny Kong) with the column of interest (which in my case is column 2 containing my expenses). It computes the sum of my expenses for the searched keyword which is what I need!
#Importing pandas module
import pandas as pd
#Keyword searched through bank statement
keyword = input("Enter the keyword you wish to search in the statement: ")
#Reading the bank statement CSV file
df = pd.read_csv('accountactivity.csv',header=None)
#Creating dataframe from bank statement with lines that match search keyword
dff=df.loc[df[1].str.contains(keyword,case=False)]
#Sum the column which contains total money spent on the keyword searched
Sum=dff[dff.columns[2]].sum()
#Prints the created dataframe
print("\n",dff,"\n")
#Prints the sum of expenses for the keyword searched
print("The sum for expenses at ",keyword," is of: ",Sum,"$",sep = '')
Working Code!
Again, thanks everyone for helping and supporting me through my first post on SO!
I have a script in Excel which cleans data for a phone number contact list, which I am trying to convert into Python.
I have figured out how spaces are removed, but what I am struggling with is getting data moved into the correct position. Below is an example of the data which comes across in a spreadsheet, and what I want to achieve is to move the mobile number in the phone column across to the Mobile phone column, it also needs to remove the row is the Mobile Phone column is NaN and the Phone Column is not a mobile number. The reason for this is because it is for a mobile number only list, and to know the difference is based on the first two characters of the number so if 07 keep else remove.
So what I want to achieve is the following:
Below is the code I have so far, I can replace the NaN with a 0 but I can't shift a cell across.
import pandas as pd
import numpy as np
import xlrd as xlrd
UsrName = input(print("Please enter the path of the excel file "))
df = pd.read_excel(UsrName)
# This cleans up the spaces etc
Data = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
Data['Mobile Phone'] = Data['Mobile Phone'].str.replace(" ","")
Data["Phone"] = Data["Phone"].str.replace(" ","")
#This now shifts empty phone numbers
Data['Mobile Phone'] = Data['Mobile Phone'].replace(np.NaN,0)
print(Data)
it also needs to remove the row is the Mobile Phone column is NaN and the Phone Column is not a mobile number
I'm not quite sure what the parameters would be for determining if 'Phone' is not a mobile number. I can see that the 'Frank' row was dropped in the example, but I'm not entirely sure why.
As far as transferring data from the 'Phone' column to the 'Mobile Phone' column, this should work:
Edit:
Added a line for filtering non-phone numbers from the 'Phone' column.
df = df[(df['Phone'].str.startswith('07',na=False))|(df['Mobile Phone'].notna())]
df['Mobile Phone'] = np.where(((df['Mobile Phone'].isna())&(df['Phone'].notna())),df['Phone'],df['Mobile Phone'])
I have a Pandas CSV file and I would like to know how to create a Python user search function to find a row. Below is a sample of the CSV -
I would like to create a function whereby it ask the user for the ICAO code, which is one of the columns, then it returns the whole row of information. For example if someone typed EHAM it would return all the information in that row (Position, ICAO, Airport, Country, Total Movements, Position Change in the last 24hrs)
As a bonus but I am not sure it is possible, I would also love to show the 2 rows above and 2 rows below the requested search when displaying the results. So for example it would show not only EHAM, but also EDDF, EGKK (2 rows above) and also KBOS and KATL (2 rows below)
Try the following code for the exact match. Suppose your CSV file is stored in a data frame called df
icao = input("Enter the ICAO code: ")
print(df[df["ICAO"] == icao]])
Moreover, if you believe that the input value has at least one instance in the data frame, and indices are in an incremental order, you can try the following one:
icao = input("Enter the ICAO code: ")
requested = df[df["ICAO"] == icao]]
print(df.iloc[requested.index[0] - 2 : requested.index[0] + 3, :])
Furthermore, you have to decide what do you want to do for the instances in the first two or last two rows.
I have an excel file that contains 1000+ company names in one column and about 20,000 company names in another column.
The goal is to match as many names as possible. The problem is that the names in column one (1000+) are poorly formatted, meaning that "Company Name" string can look something like "9Com(panynAm9e00". I'm trying to figure out the best way to solve this. (only 12 names match exactly)
After trying different methods, I've ended up with attempting to match 4-5 or more characters in each name, depending on the length of each string, using regex. But I'm just struggling to find the most efficient way to do this.
For instance:
Column 1
1. 9Com(panynAm9e00
2. NikE4
3. Mitrosof2
Column 2
1. Microsoft
2. Company Name
3. Nike
Take first element in Column 1 and look for a match in Column 2. If no exact match, then look for a string with 4-5 same characters.
Any suggestions?
I would suggest reading your Excel file with pandas and pd.read_excel(), and then using fuzzywuzzy to perform your matching, for example:
import pandas as pd
from fuzzywuzzy import process, fuzz
df = pd.DataFrame([['9Com(panynAm9e00'],
['NikE4'],
['Mitrosof2']],
columns=['Name'])
known_list = ['Microsoft','Company Name','Nike']
def find_match(x):
match = process.extractOne(x, known_list, scorer=fuzz.partial_token_sort_ratio)[0]
return match
df['match found'] = [find_match(row) for row in df['Name']]
Yields:
Name match found
0 9Com(panynAm9e00 Company Name
1 NikE4 Nike
2 Mitrosof2 Microsoft
I imagine numbers are not very common in actual company names, so an initial filter step will help immensely going forward, but here is one implementation that should work relatively well even without this. A bag-of-letters (bag-of-words) approach, if you will:
convert everything (col 1 and 2) to lowercase
For each known company in column 2, store each unique letter, and how many times it appears (count) in a dictionary
Do the same (step 2) for each entry in column 1
For each entry in col 1, find the closest bag-of-letters (dictionary from step 2) from the list of real company names
The dictionary-distance implementation is up to you.
I would like to write a CSV that outputs the following:
John
Titles Values
color black
age 15
Laly
Titles Values
color pink
age 20
total age 35
And so far have:
import csv
students_file = open(‘./students_file’, ‘w’)
file_writer = csv.DictWriter(students_file)
…
file_writer.writeheader(name_title) #John
file_writer.writeheader(‘Titles’:’Values’)
file_writer.writerow(color_title:color_val) #In first column: color, in second column: black
file_writer.writerow(age_title:age_val) #In first column: age, in second column: 15
file_writer.writerow('total age': total_val) #In first column: 'total age', in second column: total_val
But seems like it just writes on new rows rather than putting corresponding values next to each other.
What is the proper way of creating the example above?
I don't think you got the concept of .csv right. The c stands for comma (or any other delimiter). I guess it means Comma,Separated,Values.
Think of it as an excel sheet or a relational database table (eg. MySQL).
Mostly the document starts with a line of col names, then the values follow. You are not in need to write eg. age twice.
Example structure of a .csv
firstname,lastname,age,favorite_meal
Jim,Joker,33,"fried hamster"
Bert,Uber,19,"salad, vegan"
Often text is enclosed in " to, so that itself can contain commas and dont't disturb your .csv structure.