Write excel Function FILTER with Python - python

I try use function in my excel sheet because after I run the code all function is breaking ( start with {=FILTER instead =Filter( )
That why I write directly in cell the function.
from tkinter import filedialog, scrolledtext, messagebox, ttk
from tkinter import *
import pandas
from decimal import Decimal
import openpyxl
import os
import re
book = openpyxl.load_workbook(filename,keep_vba=True,data_only=TRUE)
book['POF Workbook']['Q33']= '=COUNTIF(B2:B10, ">0")' => ok working
book['POF Workbook']['D33']= 'FILTER(Table1,Table1[POF "'"#]='POF Workbook'!A3,"+'"No data"'+")" => no working
in excel I have :
FILTER(Table1,Table1[POF "#]='POF Workbook'!A3,"No data")
when I try to write : book['POF Workbook']['D33']= '=FILTER(Table1,Table1[POF "'"#]='POF Workbook'!A3,"+'"No data"'+")" nothing happened
I can't find a solution during 3 days :(

Related

Open file dialog for pandas dataframe within tkinter app

I'm fairly new to python so hoping this isn't a bad question. I wrote a script to trim down a very large csv to a usable file for report purposes. The script works well, but i'd like to apply this to a windows app using tkinter. I'm having a hard time figuring out if you can define a pandas dataframe from an open file diaglog within tkinter (prompts user for file and location instead of hard coding the file location).
I'm also interested if I can have a user input box to provide a text value for filtering the data. (Ex. If a user enters "CA" it will filter the column for State = "CA".
Below is my full script that i'm trying to apply to a tkinter app.
import pandas as pd
import tkinter as tk
import numpy as np
import openpyxl as xl
from tkinter import *
from tkinter.ttk import *
from tkinter import filedialog
window = Tk()
# DataFrame 1 Can this be changed to a button with askopenfilename for Tkinter App?
df1 = pd.read_csv(r'G:\1GEO\GiganticFile.csv', encoding='ISO-8859-1')
# DataFrame 2 Can this be changed to a button with askopenfileneame for Tkinter App?
df2 = pd.read_csv(r'G:\1GEO\FIPS LU.csv')
# Change DF2 FIPS & DF1 BlockCode column to string, create block length field
df2['FIPS']=df2['FIPS'].apply(str)
df1['Block_LEN'] = df1['BlockCode'].apply(len)
df1['BlockCode']=df1['BlockCode'].apply(str)
#Slice based on character length of BlockCode field
df1.loc[df1['Block_LEN'] == 15, 'FIPS'] = df1['BlockCode'].str.slice(0,5)
df1.loc[df1['Block_LEN'] == 14, 'FIPS'] = df1['BlockCode'].str.slice(0,4)
# Create DataFrame 3 from merging DF1 & DF2
df3 = pd.merge(df1, df2, on ='FIPS', how ='inner')
# Filter on State & County Name, Can these two fields be user input fields in the Tkinter App?
County_Results = df3[(df3['StateAbbr']=='CA') & (df3['County Name']=='Modoc')]
# Print head for testing purpose, replace with Excel Export on finalizing App
County_Results.head()
window.mainloop()

Python Script saves files one directory above user input directory

I am writing a python script which uses tkinter to take the user input for a .xlsx file and segregate the data present it it by grouping the data by location and then exporting individual csv files for each unique value of location alongside the columns I tell it to keep. The issue with it is while taking the user input for the directory to store the files in, the script is saving the file one directory above it.
Ex- lets say the user selects the directory for the files to be saved in as \Desktop\XYZ\Test, the script is saving the exported file one directory above it i.e. \Desktop\XYZ while adding the name for the subdirectory Test into the exported file name. The code I'm using is attached below.
This is probably a simple issue but being a newbie I'm at my wits end trying to resolve this so any help is appreciated.
Code:
import pandas as pd
import csv
import locale
import os
import sys
import unicodedata
import tkinter as tk
from tkinter import simpledialog
from tkinter.filedialog import askopenfilename
from tkinter import *
from tkinter import ttk
ROOT = tk.Tk()
ROOT.withdraw()
data_df = pd.read_excel(askopenfilename())
grouped_df = data_df.groupby('LOCATION')
folderpath = filedialog.askdirectory()
for data in grouped_df.LOCATION:
grouped_df.get_group(data[0]).to_csv(folderpath+data[0]+".csv",encoding='utf-8', mode='w+')
filename =data[0]
f=pd.read_csv(folderpath+filename+".csv", sep=',')
#print f
keep_col = ['ID','NAME','DATA1','DATA4']
new_f = f[keep_col]
new_f.to_csv(folderpath+data[0]+".csv", index=False)
Sample data
P.S- There will be data is DATA3 and DATA 4 columns but I just didn't enter it here
How the Script is giving the output:
Thanks in Advance!
It seems like the return value of filedialog.askdirectory() ends with the folder the uses selected without a trailing slash, i.e:
\Desktop\XYZ\Test
You're full path created by folderpath+data[0]+".csv" with an example value for data[0] of "potato" will be
\Desktop\XYZ\Testpotato.csv
You need to at least append the \ manualy
for data in grouped_df.LOCATION:
grouped_df.get_group(data[0]).to_csv(folderpath+"\\"+data[0]+".csv",encoding='utf-8', mode='w+')
filename =data[0]

How to select multiple files or an entire folder(display names of all the files it contains) in python using tkinter?

I have written a code to display the contents of a file using tkinter askopenfile() method. Now I need to select an entire folder(directory) and print the names of the files it contains or select multiple files.
I'm new to the concepts of tkinter and having a difficult time understanding this. Is there any method to do this?
Thanks in advance.
I assume you are using python 2, so here you go:
from Tkinter import *
from Tkinter import *
import Tkinter, Tkconstants, tkFileDialog
root = Tk()
root.filename = tkFileDialog.askopenfilename(initialdir = "/")
print(root.filename)
Hope this helps!
FYI: I would suggest you update to python 3. Python 2 has been sun-setted(on January 1st, 2020).

No lines of output shown after importing pandas

I installed pandas through pip, but when I import it, the code runs but no output is shown at all, right after the import statement.
Here's a sample of my code
import xlrd, xlwt
print("1")
import pandas as pd
print("2")
from math import trunc
1 is printed, but 2 isn't. After 1 is printed, the script just hangs for a few seconds and terminates. This occurs regardless of the code written below the import statement. I also seem to get the same error for the openpyxl module. Does anyone know a fix to this?

Python not reading all the information form excel properly

I am trying to open a few excel folders inside a directory and then be able to do stuff with the data (like take the average of one row for three files).
My main goal right now is just to be able to display the information in each excel file. I used the following code to do so. But when I display it, it prints out the '0' element to the '29' element...then it skips 30-50 and and it prints out 51-80.
Here is a snip of my output on python:
import numpy as np
import scipy.io as sio
import scipy
import matplotlib.pyplot as plt
import os
import pandas as pd
from tkinter import filedialog
from tkinter import *
import matplotlib.image as image
import xlsxwriter
import openpyxl
import xlwt
import xlrd
#GUI
root=Tk()
root.withdraw() #closes tkinter window pop-up
path=filedialog.askdirectory(parent=root,title='Choose a file')
path=path+'/'
print('Folder Selected',path)
files=os.listdir(path)
length=len(files)
print('Files inside the folder',files)
Files=[]
for s in os.listdir(path):
Files.append(pd.read_excel(path+s))
print (Files)
I'm quite sure your data is being correctly read. The dots between rows 29 and 51 show that there is more data there. pandas elides these rows, so your console looks cleaner. If you want to see all the rows, you could use the solution from this answer:
with pd.option_context('display.max_rows', None, 'display.max_columns', 3):
print(Files)
Where None sets display limit on rows (no limit) and 3 sets display limit on columns. Here you can find more info on options.
This is actually the standard way to print the data, notice the ellipses between 29 and 51:
29 7.8000 [cont.]
...
51 12.19999 [cont.]
You can still operate on every row. To get the number of rows in a dataframe, you can call
len(df.index)

Categories