Replace character in multiple columns of panda dataframe

Replace character in multiple columns of panda dataframe - python

I have a dataframe that I create by reading a XLSX file and I need to replace a lot of dot by commas because I don't know why in Excel it shows a comma but in panda dataframe it is a dot instead for decimal separator. I have 29 columns to replace dot by commas, so I figured out it would be better to use a list to store all my columns name and a for loop to iterate though all the column where I want to replace those dots by commas.
But unfortunately I got error when I tried the following code :
import tkinter as tk
from tkinter import filedialog
from tkinter import messagebox
import win32com.client
import pandas as pd
import pathlib
root = tk.Tk()
canvas1 = tk.Canvas(root, width=300, height=300, bg='lightsteelblue2', relief='raised')
canvas1.pack()
label1 = tk.Label(root, text='File Conversion Tool', bg='lightsteelblue2')
label1.config(font=('helvetica', 20))
canvas1.create_window(150, 60, window=label1)
read_file = pd.DataFrame()
def get_excel_onefolder():
global read_file
import_dir_path = filedialog.askdirectory()
file_ext = "*.xlsx"
list_xlsx_file = list(pathlib.Path(import_dir_path).glob(file_ext))
lst_rpl = ['col24', 'col25', 'col26', 'col45', 'col46', 'col47', 'col69', 'col75', 'col76', 'col77', 'col105', 'col106',
'col107', 'col108', 'col109', 'col110', 'col111', 'col112', 'col254', 'col255', 'col256', 'col257', 'col258',
'col259', 'col260', 'col261', 'col262', 'col352', 'col353']
len_lst = len(lst_rpl)
for xlsx_file_path in list_xlsx_file:
read_file = pd.read_excel(xlsx_file_path)
read_file['Time'] = read_file['Time'].str.replace(',', '.')
for i in range(len_lst):
read_file[lst_rpl[i]] = read_file[lst_rpl[i]].str.replace('.', ',')
output_path = str(xlsx_file_path) + ".csv"
read_file.to_csv(output_path, index=None, header=True, decimal=',', sep=';')
tk.messagebox.showinfo(title="Import success", message="CSV file import successful !")
XLSX_to_CSV = tk.Button(text="Import Excel File & Convert to CSV", command=get_excel_onefolder, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(150, 180, window=XLSX_to_CSV)
root.mainloop()
The error I got is KeyError: 'col24'
Edit :
I fixed my problem by adding argument for NaN value, na_values to .read_excel with decimal=',' and it works fine now
read_file = pd.read_excel(xlsx_file_path, decimal=',', na_values=['#NV', ' '])
My problem was because I had column which was not recognized as float because of NaN value.
The working function is now :
def get_excel_onefolder():
global read_file
import_dir_path = filedialog.askdirectory()
file_ext = "*.xlsx"
list_xlsx_file = list(pathlib.Path(import_dir_path).glob(file_ext))
for xlsx_file_path in list_xlsx_file:
read_file = pd.read_excel(xlsx_file_path, decimal=',', na_values=['#NV', ' '])
read_file['Time'] = read_file['Time'].str.replace(',', '.')
path_without_ext = os.path.splitext(str(xlsx_file_path))[0]
output_path = path_without_ext + ".csv"
read_file.to_csv(output_path, index=None, header=True, decimal=',', sep=';')
tk.messagebox.showinfo(title="Import success", message="CSV file import successful !")

pandas tries to autoformat commas into dots. You can change this behavior with the decimal parameter:
read_file = pd.read_excel(xlsx_file_path) -> read_file = pd.read_excel(xlsx_file_path, decimal=",")

for x in list_col:
df[list_col] = df[list_col].apply(lamba x: str(x).replace('.' , ',') if '.' in str(x) else x)
Don't know whether this code might be helpful to you or not. This snippet helps you to replace dots as commas in all the columns if a dot is present.

Related

How to resolve DtypeWarning: Columns (23) have mixed types. Specify dtype option on import or set low_memory=False?

I'm having this error in my script "DtypeWarning: Columns (23) have mixed types. Specify dtype option on import or set low_memory=False", though the file is being created, I feel like it needs to be corrected. I have tried inserting this syntax in my code but it doesn't work:
new_df = pd.read_csv('partial.csv', low_memory=False)
What and where should I add in my code?
import os
import pandas as pd
import tkinter
from tkinter import messagebox
root = tkinter.Tk()
root.withdraw()
directory = 'C:/path'
ext = ('.csv')
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
if f.endswith(ext):
head_tail = os.path.split(f)
head_tail1 = 'C:/Output'
k =head_tail[1]
r=k.split(".")[0]
p=head_tail1 + "/" + r + " - .csv"
mydata = pd.read_csv(f)
# to pull columns and values
new = mydata[["A","B","C","D"]]
new = new.rename(columns={'D': 'F'})
new['F'] = 1
print(new.columns)
new["B"] = (pd.to_datetime(new["B"], format="%d-%b", errors="coerce").dt.strftime("%#m-%#d").fillna(new["B"]))
new.to_csv(p ,index=False)
#to merge columns and values
merge_columns = ['A', 'B', 'C']
merged_col = ''.join(merge_columns).replace('ABC', 'G')
new[merged_col] = new[merge_columns].astype(str).apply(lambda x: '.'.join(x), axis=1)
new.drop(merge_columns, axis=1, inplace=True)
new = new.groupby(merged_col).count().reset_index()
new.to_csv(p, index=False)
messagebox.showinfo("Parts has been counted", "New csv file can be found in C:/Path")

How to make a dropdown list from an existing list in python (pandas and tkinter)

I want the user to submit their csv excel file and choose the columns from a dropdown menu he wants for analysis.
import pandas as pd
import os
import sys
from tkinter import *
root = Tk()
root.title('Eng3')
filepath = input('Enter filepath: ')
assert os.path.exists(filepath), "I did not find the file at, " + str(filepath)
f = open(filepath, 'r+')
print("Hooray we found your file!")
f.close()
file = pd.read_csv(filepath, encoding='latin1', delimiter=',')
column_list = file.columns.tolist()
print(column_list)
So I made the columns names from the excel file into a list. How can I make a dropdown menu from this list(column_list) to show all column names? When I tried:
tkvar = StringVar(column_list)
menu = OptionMenu(root, tvkar, column_list)
I get this error:
AttributeError: 'list' object has no attribute '_root'

I looked around and found this post How can I create a dropdown menu from a List in Tkinter?. Very useful
file = pd.read_csv(filepath, encoding='latin1', delimiter=',')
column_list = file.columns.tolist() #convert pandas dataframe to simple python list
OPTIONS = column_list #this is what solved my problem
master = Tk()
master.title('Eng3')
variable = StringVar(master)
variable.set(OPTIONS[0]) # default value
w = OptionMenu(master, variable, *OPTIONS)
w.pack()
def ok():
print ("value is:" + variable.get())
button = Button(master, text="OK", command=ok)
button.pack()

How to display pandas dataframe properly using tkinter?

I am fairly new to python and I am attempting to create a tool which displays the number of rows and columns of all sheets of Excel workbooks in a folder. I am looking to display a data frame as the final result using tkinter, however the display is not coming out correctly as the last two columns of the dataframe appear on a new line. I was wondering how to rectify this issue. I have tried using PyQT5, but this kept crashing my Kernel, and I have tried using Treeviews, I but can't figure out how to write this dataframe properly to a Treeview. Below is my current code:
import pandas as pd
import tkinter as tk
import glob
import os
import xlrd
def folder_row_count():
folder_path = f_path_entry.get()
file_extension = file_ext_var.get()
window = tk.Tk()
t1 = tk.Text(window)
t1.grid()
if file_extension == "xlsx":
filenames = []
sheetnames = []
sheetrows = []
sheetcols = []
for fname in glob.glob(os.path.join(folder_path, f"*.{file_extension}")):
wb = xlrd.open_workbook(fname)
filename = []
sheetname = []
sheetrow = []
sheetcol = []
for sheet in wb.sheets():
filename.append(os.path.basename(fname))
sheetname.append(sheet.name)
sheetrow.append(sheet.nrows)
sheetcol.append(sheet.ncols)
filenames.append(filename)
sheetnames.append(sheetname)
sheetrows.append(sheetrow)
sheetcols.append(sheetcol)
flat_filenames = [item for filename in filenames for item in filename]
flat_sheetnames = [item for sheetname in sheetnames for item in sheetname]
flat_sheetrows = [item for sheetrow in sheetrows for item in sheetrow]
flat_sheetcols = [item for sheetcol in sheetcols for item in sheetcol]
df = pd.DataFrame({'File Name': flat_filenames,
'Sheet Name': flat_sheetnames,
'Number Of Rows': flat_sheetrows,
'Number Of Columns': flat_sheetcols
})
main_df = df.append(df.sum(numeric_only = True).rename('Total'))
t1.insert(tk.END, main_df)
window.mainloop()
file_ext_list = ["xlsx"]
window = tk.Tk()
window.title("Row Counter")
tk.Label(window, text = "Choose File Type:").grid(row = 1, column = 0)
file_ext_var = tk.StringVar(window)
file_ext_dd = tk.OptionMenu(window, file_ext_var, *file_ext_list)
file_ext_dd.config(width = 10)
file_ext_dd.grid(row = 1, column = 1)
tk.Label(window, text = "Folder Path:").grid(row = 2, column = 0)
f_path_entry = tk.Entry(window)
f_path_entry.grid(row = 2, column = 1)
tk.Button(window, text = "Count Rows", command = folder_row_count).grid(row = 4, column = 1)
window.mainloop()
Secondly, I would greatly appreciate any commentary on how I can improve upon this code and make it more efficient.
Thanks in advance.

You just need to iterate over your df by iterrows and insert them into your Treeview. Below is a basic sample:
import tkinter as tk
from tkinter import ttk
import pandas as pd
root = tk.Tk()
sample = {"File Name":[f"file_{i}" for i in range(5)],
'Sheet Name': [f"sheet_{i}" for i in range(5)],
'Number Of Rows': [f"row_{i}" for i in range(5)],
'Number Of Columns': [f"col_{i}" for i in range(5)]
}
df = pd.DataFrame(sample)
cols = list(df.columns)
tree = ttk.Treeview(root)
tree.pack()
tree["columns"] = cols
for i in cols:
tree.column(i, anchor="w")
tree.heading(i, text=i, anchor='w')
for index, row in df.iterrows():
tree.insert("",0,text=index,values=list(row))
root.mainloop()
Also I see you are using xlrd to first read your excel before turning it into a Dataframe. Why don't you use pandas.read_excel instead?

How do I call a file imported through tkinter filedialog?

I am trying to create a very simple GUI that will import a file, run it through some data formatting code, and export it as an .xlsx file
The file would be an excel file. An example would be:
col1
a
b
c
and my current python script does this:
df = read_excel('file.xlsx')
mapping = {'a':'apple','b':'banana','c':'carrot'}
df = df.replace({"col1":mapping}, regex=True)
and it returns:
col1
apple
banana
carrot
but now I am trying to create a GUI that will run it instead (:
This is the code I have so far (I get the error ValueError: DataFrame constructor not properly called!):
import tkinter as tk
from tkinter import filedialog
import pandas as pd
from datetime import datetime, date
def UploadAction(event=None):
filename = filedialog.askopenfilename()
print('Selected:', filename)
df = pd.DataFrame(eval(data=filename))
mmapping = {'a':'apple','b':'banana','c':'carrot'}
df = df.replace({"col1":mapping}, regex=True)
print(df['col1'])
root = tk.Tk()
button = tk.Button(root, text='Open', command=UploadAction)
button.pack()
root.mainloop()
For the excel export, I know the code should be:
writer = pd.ExcelWriter("newfile.xlsx",
engine='xlsxwriter',
datetime_format='yyyymmdd',
date_format='yyyymmdd')
df.to_excel(writer, sheet_name = ('Sheet1'))
workbook = writer.book
worksheet = writer.sheets['Sheet1']
worksheet.set_column('B:C', 20)
writer.save()
but I am not sure of how to include it in the GUI program.
So what would you suggest?

There you go:
=^..^=
import pandas as pd
import tkinter as tk
from tkinter import filedialog
def open_file():
# open file
filename = filedialog.askopenfilename()
# load data into data frame
data = pd.read_csv(filename, sep=" ", header=None)
return data
root = tk.Tk()
button = tk.Button(root, text='Open', command=open_file)
button.pack()
# do something with data
df_data = open_file()
df = df_data.drop(0, axis=1)
# save data to excel
df.to_excel("output.xlsx")
root.mainloop()

Python pass variable values in functions

So this is my full code. All I want is append excel files to one excel by sheets from a specific folder. It's GUI and has 3 buttons browse, append, and quit. How do i get path value from browsed folder(filename) ? thanks
from tkinter import *
from tkinter.filedialog import askdirectory
import tkinter as tk
import glob
import pandas as pd
import xlrd
root = Tk()
def browsefunc():
filename = askdirectory()
pathlabel.config(text=filename)
return filename
def new_window():
all_data = pd.DataFrame()
all_data1 = pd.DataFrame()
path = browsefunc()+"/*.xlsx"
for f in glob.glob(path):
df = pd.read_excel(f,sheetname='Scoring',header=0)
df1 = pd.read_excel(f,sheetname='Sheet1',header=0)
all_data = all_data.append(df,ignore_index=False)
all_data1 = all_data1.append(df1,ignore_index=True)
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Scoring')
all_data1.to_excel(writer, sheet_name='Sheet1')
writer.save()
browsebutton = Button(root, text="Browse", command=browsefunc).pack()
Button(root, text='Append', command=new_window).pack()
Button(root, text='quit', command=root.destroy).pack()
pathlabel = Label(root)
pathlabel.pack()
mainloop()

It is not entirely clear what you are asking, so can you edit the question to be more specific?
I think you are trying to get the local variable filename (from inside the function browsefunc) able to be accessed outside the function as a global variable. Use return. This tutorial explains it nicely.
At the end of browsefunc you add
return filename
and when you call browsefunc you run
path = browsefunc()
That assigns the variable fdback to whatever you return from browsefunc. It can be an integer, float, string, or list etc.
So, final code is:
def browsefunc():
filename = askdirectory()
pathlabel.config(text=filename)
return filename
def new_window():
path = browsefunc()
I would recommend using more explicit variable and function names.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace character in multiple columns of panda dataframe - python

pandas tries to autoformat commas into dots. You can change this behavior with the decimal parameter: read_file = pd.read_excel(xlsx_file_path) -> read_file = pd.read_excel(xlsx_file_path, decimal=",")

for x in list_col: df[list_col] = df[list_col].apply(lamba x: str(x).replace('.' , ',') if '.' in str(x) else x) Don't know whether this code might be helpful to you or not. This snippet helps you to replace dots as commas in all the columns if a dot is present.

Related

How to resolve DtypeWarning: Columns (23) have mixed types. Specify dtype option on import or set low_memory=False?

How to make a dropdown list from an existing list in python (pandas and tkinter)

How to display pandas dataframe properly using tkinter?

How do I call a file imported through tkinter filedialog?

Python pass variable values in functions

Categories

Resources