Open file dialog for pandas dataframe within tkinter app - python

I'm fairly new to python so hoping this isn't a bad question. I wrote a script to trim down a very large csv to a usable file for report purposes. The script works well, but i'd like to apply this to a windows app using tkinter. I'm having a hard time figuring out if you can define a pandas dataframe from an open file diaglog within tkinter (prompts user for file and location instead of hard coding the file location).
I'm also interested if I can have a user input box to provide a text value for filtering the data. (Ex. If a user enters "CA" it will filter the column for State = "CA".
Below is my full script that i'm trying to apply to a tkinter app.
import pandas as pd
import tkinter as tk
import numpy as np
import openpyxl as xl
from tkinter import *
from tkinter.ttk import *
from tkinter import filedialog
window = Tk()
# DataFrame 1 Can this be changed to a button with askopenfilename for Tkinter App?
df1 = pd.read_csv(r'G:\1GEO\GiganticFile.csv', encoding='ISO-8859-1')
# DataFrame 2 Can this be changed to a button with askopenfileneame for Tkinter App?
df2 = pd.read_csv(r'G:\1GEO\FIPS LU.csv')
# Change DF2 FIPS & DF1 BlockCode column to string, create block length field
df2['FIPS']=df2['FIPS'].apply(str)
df1['Block_LEN'] = df1['BlockCode'].apply(len)
df1['BlockCode']=df1['BlockCode'].apply(str)
#Slice based on character length of BlockCode field
df1.loc[df1['Block_LEN'] == 15, 'FIPS'] = df1['BlockCode'].str.slice(0,5)
df1.loc[df1['Block_LEN'] == 14, 'FIPS'] = df1['BlockCode'].str.slice(0,4)
# Create DataFrame 3 from merging DF1 & DF2
df3 = pd.merge(df1, df2, on ='FIPS', how ='inner')
# Filter on State & County Name, Can these two fields be user input fields in the Tkinter App?
County_Results = df3[(df3['StateAbbr']=='CA') & (df3['County Name']=='Modoc')]
# Print head for testing purpose, replace with Excel Export on finalizing App
County_Results.head()
window.mainloop()

Related

Python Script saves files one directory above user input directory

I am writing a python script which uses tkinter to take the user input for a .xlsx file and segregate the data present it it by grouping the data by location and then exporting individual csv files for each unique value of location alongside the columns I tell it to keep. The issue with it is while taking the user input for the directory to store the files in, the script is saving the file one directory above it.
Ex- lets say the user selects the directory for the files to be saved in as \Desktop\XYZ\Test, the script is saving the exported file one directory above it i.e. \Desktop\XYZ while adding the name for the subdirectory Test into the exported file name. The code I'm using is attached below.
This is probably a simple issue but being a newbie I'm at my wits end trying to resolve this so any help is appreciated.
Code:
import pandas as pd
import csv
import locale
import os
import sys
import unicodedata
import tkinter as tk
from tkinter import simpledialog
from tkinter.filedialog import askopenfilename
from tkinter import *
from tkinter import ttk
ROOT = tk.Tk()
ROOT.withdraw()
data_df = pd.read_excel(askopenfilename())
grouped_df = data_df.groupby('LOCATION')
folderpath = filedialog.askdirectory()
for data in grouped_df.LOCATION:
grouped_df.get_group(data[0]).to_csv(folderpath+data[0]+".csv",encoding='utf-8', mode='w+')
filename =data[0]
f=pd.read_csv(folderpath+filename+".csv", sep=',')
#print f
keep_col = ['ID','NAME','DATA1','DATA4']
new_f = f[keep_col]
new_f.to_csv(folderpath+data[0]+".csv", index=False)
Sample data
P.S- There will be data is DATA3 and DATA 4 columns but I just didn't enter it here
How the Script is giving the output:
Thanks in Advance!
It seems like the return value of filedialog.askdirectory() ends with the folder the uses selected without a trailing slash, i.e:
\Desktop\XYZ\Test
You're full path created by folderpath+data[0]+".csv" with an example value for data[0] of "potato" will be
\Desktop\XYZ\Testpotato.csv
You need to at least append the \ manualy
for data in grouped_df.LOCATION:
grouped_df.get_group(data[0]).to_csv(folderpath+"\\"+data[0]+".csv",encoding='utf-8', mode='w+')
filename =data[0]

Write excel Function FILTER with Python

I try use function in my excel sheet because after I run the code all function is breaking ( start with {=FILTER instead =Filter( )
That why I write directly in cell the function.
from tkinter import filedialog, scrolledtext, messagebox, ttk
from tkinter import *
import pandas
from decimal import Decimal
import openpyxl
import os
import re
book = openpyxl.load_workbook(filename,keep_vba=True,data_only=TRUE)
book['POF Workbook']['Q33']= '=COUNTIF(B2:B10, ">0")' => ok working
book['POF Workbook']['D33']= 'FILTER(Table1,Table1[POF "'"#]='POF Workbook'!A3,"+'"No data"'+")" => no working
in excel I have :
FILTER(Table1,Table1[POF "#]='POF Workbook'!A3,"No data")
when I try to write : book['POF Workbook']['D33']= '=FILTER(Table1,Table1[POF "'"#]='POF Workbook'!A3,"+'"No data"'+")" nothing happened
I can't find a solution during 3 days :(

Tkintr/Pandas: Selecting a CSV and editing with Pandas

I'm trying to grab a filepath using a drop down menu with Tkintr, then edit that CSV using Pandas.
Here's the code:
from Tkinter import Tk
from tkinter.filedialog import askdirectory, askopenfilename
import tkinter.messagebox as tkmb
import pandas as pd
def testPandas(filePath):
data = pd.read_csv(filePath)
data.drop(data.index[[1,2]])
data.to_csv(filePath, index = False)
def wrapQuotes(fileString):
return "'{}'".format(fileString)
Tk().withdraw() #get rid of of the tkinter window
tkmb.showinfo(title=' ', message='Select File')
filePath = askopenfilename() #dialogue box for original file
wrapQuotes(filePath)
testPandas(filePath)
print(filePath)
Here's the CSV:
The process I'm going with is:
(1) Using tkintr, I create a drop down menu, user selects the file and I get the filepath
(2) I read the filepath in through my testPandas function and I would delete the first two rows
At first I was thinking the CSV wasn't getting edited because the path wasn't wrapped in quotations so thats why I added in wrapQuotes but that didn't look like it did anything.
When I run my program, the csv stays the same.
Any help would be appreciated!
The issue appears to be that you use pandas.DataFrame.drop as an inplace method. By default, most pandas methods are not inplace and you'll need to assign the returned object to something inorder to use it or you can use the inplace = True argument in the method's caller:
def testPandas(filePath):
data = pd.read_csv(filePath)
data = data.drop(data.index[[1,2]])
data.to_csv(filePath, index = False)
or
def testPandas(filePath):
data = pd.read_csv(filePath)
data.drop(data.index[[1,2]], inplace=True)
data.to_csv(filePath, index = False)
Should fix your issue. The other function with wrapping quotes is not needed.
You make the same mistake two times - you have to assign result to variable
data = data.drop(...)
and
filePath = wrapQuotes(filePath)
BTW: you don't need to wrap file path to read it.
Eventually you can use option inplace=True
data.drop(..., inplace=True)

Pyviz panel: can't work with FileInput widget

I must be missing something basic about how FileInput widget works in pyviz panel.
In the following code, I let the user select a csv file and the number of rows to display. If a file isn't selected, I generate some random data.
import pandas as pd; import numpy as np; import matplotlib.pyplot as plt
import panel as pn
import panel.widgets as pnw
pn.extension()
datafile = pnw.FileInput()
head = pnw.IntSlider(name='head', value=3, start=1, end=60)
#pn.depends(datafile, head)
def f(datafile, head):
if datafile is None:
data = pd.DataFrame({'x': np.random.rand(10)})
else:
data = pd.read_csv(datafile)
return pn.Column(f'## {head} first rows', data.head(head))
widgets = pn.Column(datafile, head)
col = pn.Column(widgets, f)
col
Here's the problem. If I don't select a file and play with the head widget, the pane acts as expected: the number of displayed rows changes as I change the head widget, and I can see that the data is different after each update.
However, once I select a file, two problems occur. First, the data isn't loaded. Secondly, the column stops reacting to my interactions.
Can anybody tell me what my problem is?
The problem in the code above is that the datafile variable in function f is not a file name but the file contents, as a bytes string. Due to the error, the function throws an unhandled exception that, unfortunately, isn't registered anywhere.
Thus, the data reading line should be
data = pd.read_csv(io.BytesIO(datafile))

Python not reading all the information form excel properly

I am trying to open a few excel folders inside a directory and then be able to do stuff with the data (like take the average of one row for three files).
My main goal right now is just to be able to display the information in each excel file. I used the following code to do so. But when I display it, it prints out the '0' element to the '29' element...then it skips 30-50 and and it prints out 51-80.
Here is a snip of my output on python:
import numpy as np
import scipy.io as sio
import scipy
import matplotlib.pyplot as plt
import os
import pandas as pd
from tkinter import filedialog
from tkinter import *
import matplotlib.image as image
import xlsxwriter
import openpyxl
import xlwt
import xlrd
#GUI
root=Tk()
root.withdraw() #closes tkinter window pop-up
path=filedialog.askdirectory(parent=root,title='Choose a file')
path=path+'/'
print('Folder Selected',path)
files=os.listdir(path)
length=len(files)
print('Files inside the folder',files)
Files=[]
for s in os.listdir(path):
Files.append(pd.read_excel(path+s))
print (Files)
I'm quite sure your data is being correctly read. The dots between rows 29 and 51 show that there is more data there. pandas elides these rows, so your console looks cleaner. If you want to see all the rows, you could use the solution from this answer:
with pd.option_context('display.max_rows', None, 'display.max_columns', 3):
print(Files)
Where None sets display limit on rows (no limit) and 3 sets display limit on columns. Here you can find more info on options.
This is actually the standard way to print the data, notice the ellipses between 29 and 51:
29 7.8000 [cont.]
...
51 12.19999 [cont.]
You can still operate on every row. To get the number of rows in a dataframe, you can call
len(df.index)

Categories