I'm trying to grab a filepath using a drop down menu with Tkintr, then edit that CSV using Pandas.
Here's the code:
from Tkinter import Tk
from tkinter.filedialog import askdirectory, askopenfilename
import tkinter.messagebox as tkmb
import pandas as pd
def testPandas(filePath):
data = pd.read_csv(filePath)
data.drop(data.index[[1,2]])
data.to_csv(filePath, index = False)
def wrapQuotes(fileString):
return "'{}'".format(fileString)
Tk().withdraw() #get rid of of the tkinter window
tkmb.showinfo(title=' ', message='Select File')
filePath = askopenfilename() #dialogue box for original file
wrapQuotes(filePath)
testPandas(filePath)
print(filePath)
Here's the CSV:
The process I'm going with is:
(1) Using tkintr, I create a drop down menu, user selects the file and I get the filepath
(2) I read the filepath in through my testPandas function and I would delete the first two rows
At first I was thinking the CSV wasn't getting edited because the path wasn't wrapped in quotations so thats why I added in wrapQuotes but that didn't look like it did anything.
When I run my program, the csv stays the same.
Any help would be appreciated!
The issue appears to be that you use pandas.DataFrame.drop as an inplace method. By default, most pandas methods are not inplace and you'll need to assign the returned object to something inorder to use it or you can use the inplace = True argument in the method's caller:
def testPandas(filePath):
data = pd.read_csv(filePath)
data = data.drop(data.index[[1,2]])
data.to_csv(filePath, index = False)
or
def testPandas(filePath):
data = pd.read_csv(filePath)
data.drop(data.index[[1,2]], inplace=True)
data.to_csv(filePath, index = False)
Should fix your issue. The other function with wrapping quotes is not needed.
You make the same mistake two times - you have to assign result to variable
data = data.drop(...)
and
filePath = wrapQuotes(filePath)
BTW: you don't need to wrap file path to read it.
Eventually you can use option inplace=True
data.drop(..., inplace=True)
Related
I've searched for about an hour for an answer to this and none of the solutions I've found are working. I'm trying to get a folder full of CSVs into a single dataframe, to output to one big csv. Here's my current code:
import os
sourceLoc = "SOURCE"
destLoc = sourceLoc + "MasterData.csv"
masterDF = pd.DataFrame([])
for file in os.listdir(sourceLoc):
workingDF = pd.read_csv(sourceLoc + file)
print(workingDF)
masterDF.append(workingDF)
print(masterDF)
The SOURCE is a folder path but I've had to remove it as it's a work network path. The loop is reading the CSVs to the workingDF variable as when I run it it prints the data into the console, but it's also finding 349 rows for each file. None of them have that many rows of data in them.
When I print masterDF it prints Empty DataFrame Columns: [] Index: []
My code is from this solution but that example is using xlsx files and I'm not sure what changes, if any, are needed to get it to work with CSVs. The Pandas documentation on .append and read_csv is quite limited and doesn't indicate anything specific I'm doing wrong.
Any help would be appreciated.
There are a couple of things wrong with your code, but the main thing is that pd.append returns a new dataframe, instead of modifying in place. So you would have to do:
masterDF = masterDF.append(workingDF)
I also like the approach taken by I_Al-thamary - concat will probably be faster.
One last thing I would suggest, is instead of using glob, check out pathlib.
import pandas as pd
from pathlib import Path
path = Path("your path")
df = pd.concat(map(pd.read_csv, path.rglob("*.csv"))))
you can use glob
import glob
import pandas as pd
import os
path = "your path"
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join(path,'*.csv'))))
print(df)
You may store them all in a list and pd.concat them at last.
dfs = [
pd.read_csv(os.path.join(sourceLoc, file))
for file in os.listdir(sourceLoc)
]
masterDF = pd.concat(df)
I am writing a python script which uses tkinter to take the user input for a .xlsx file and segregate the data present it it by grouping the data by location and then exporting individual csv files for each unique value of location alongside the columns I tell it to keep. The issue with it is while taking the user input for the directory to store the files in, the script is saving the file one directory above it.
Ex- lets say the user selects the directory for the files to be saved in as \Desktop\XYZ\Test, the script is saving the exported file one directory above it i.e. \Desktop\XYZ while adding the name for the subdirectory Test into the exported file name. The code I'm using is attached below.
This is probably a simple issue but being a newbie I'm at my wits end trying to resolve this so any help is appreciated.
Code:
import pandas as pd
import csv
import locale
import os
import sys
import unicodedata
import tkinter as tk
from tkinter import simpledialog
from tkinter.filedialog import askopenfilename
from tkinter import *
from tkinter import ttk
ROOT = tk.Tk()
ROOT.withdraw()
data_df = pd.read_excel(askopenfilename())
grouped_df = data_df.groupby('LOCATION')
folderpath = filedialog.askdirectory()
for data in grouped_df.LOCATION:
grouped_df.get_group(data[0]).to_csv(folderpath+data[0]+".csv",encoding='utf-8', mode='w+')
filename =data[0]
f=pd.read_csv(folderpath+filename+".csv", sep=',')
#print f
keep_col = ['ID','NAME','DATA1','DATA4']
new_f = f[keep_col]
new_f.to_csv(folderpath+data[0]+".csv", index=False)
Sample data
P.S- There will be data is DATA3 and DATA 4 columns but I just didn't enter it here
How the Script is giving the output:
Thanks in Advance!
It seems like the return value of filedialog.askdirectory() ends with the folder the uses selected without a trailing slash, i.e:
\Desktop\XYZ\Test
You're full path created by folderpath+data[0]+".csv" with an example value for data[0] of "potato" will be
\Desktop\XYZ\Testpotato.csv
You need to at least append the \ manualy
for data in grouped_df.LOCATION:
grouped_df.get_group(data[0]).to_csv(folderpath+"\\"+data[0]+".csv",encoding='utf-8', mode='w+')
filename =data[0]
I am currently working on importing and formatting a large number of excel files (all the same format/scheme, but different values) with Python.
I have already read in and formatted one file and everything worked fine so far.
I would now do the same for all the other files and combine everything in one dataframe, i.e. read in the first excel in one dataframe, add the second at the bottom of the dataframe, add the third at the bottom the dataframe, and so on until I have all the excel files in one dataframe.
So far my script looks something like this:
import pandas as pd
import numpy as np
import xlrd
import os
path = os.getcwd()
path = "path of the directory"
wbname = "name of the excel file"
files = os.listdir(path)
files
wb = xlrd.open_workbook(path + wbname)
# I only need the second sheet
df = pd.read_excel(path + wbname, sheet_name="sheet2", skiprows = 2, header = None,
skipfooter=132)
# here is where all the formatting is happening ...
df
So, "files" is a list with all file relevant names. Now I have to try to put one file after the other into a loop (?) so that they all eventually end up in df.
Has anyone ever done something like this or can help me here.
Something like this might work:
import os
import pandas as pd
list_dfs=[]
for file in os.listdir('path_to_all_xlsx'):
df = pd.read_excel(file, <the rest of your config to parse>)
list_dfs.append(df)
all_dfs = pd.concat(list_dfs)
You read all the dataframes and add them to a list, and then the concat method adds them all together int one big dataframe.
I am trying to run a script onto over 900 files using the Spyder platform, that aims to delete the first 3 rows of data and certain columns. I tried looking into other similar questions but was unable to achieve the intended results.
My code for one text file is as follows:
import pandas as pd
mydataset = pd.read_csv('vectors_0001.txt')
df = pd.DataFrame(mydataset)
df.drop(df.iloc[:,:2], inplace = True, axis = 1)
df.drop([0,1,3], axis = 0, inplace = True)
df = df.dropna(axis = 0, subset=['Column3','Column4'])
Then I want to modify the code above so it can be applied to the consecutive text files, all the text file names are: vectors_0001, vectors_0002, ..., vectors_0900. I tried to do something similar but I keep getting errors. Take the one below as an example:
(Note: that 'u [m/s]', 'v [m/s]' are the columns I want to keep for further data analysis and the other columns I want to get rid of.)
import glob
import os.path
import sys
import pandas as pd
dir_of_interest = sys.argv[1] if len(sys.argv) > 1 else '.'
files = glob.glob(os.path.join(dir_of_interest, "*.txt"))
for file in files:
with open('file.txt', 'w') as f:
f.writelines(3:)
df = pd.read_csv("*.txt")
df_new = df[['u [m/s]', 'v [m/s]']
df_new.to_csv('*.txt', header=True, index=None)
with open('file.txt','r+') as f:
print(f.read())
However I tried to run it and I got the error:
f.writelines(3:)
^
SyntaxError: invalid syntax
I really want to get this figured out and move onto my data analysis. Please and thank you in advance.
I'm not totally sure of what you are trying to achieve here but you're using the writelines functions incorrectly. It accepts a list as an argument
https://www.w3schools.com/python/ref_file_writelines.asp
You're giving it "3:" which is not valid. Maybe you want to give it a fraction of an existing list ?
I must be missing something basic about how FileInput widget works in pyviz panel.
In the following code, I let the user select a csv file and the number of rows to display. If a file isn't selected, I generate some random data.
import pandas as pd; import numpy as np; import matplotlib.pyplot as plt
import panel as pn
import panel.widgets as pnw
pn.extension()
datafile = pnw.FileInput()
head = pnw.IntSlider(name='head', value=3, start=1, end=60)
#pn.depends(datafile, head)
def f(datafile, head):
if datafile is None:
data = pd.DataFrame({'x': np.random.rand(10)})
else:
data = pd.read_csv(datafile)
return pn.Column(f'## {head} first rows', data.head(head))
widgets = pn.Column(datafile, head)
col = pn.Column(widgets, f)
col
Here's the problem. If I don't select a file and play with the head widget, the pane acts as expected: the number of displayed rows changes as I change the head widget, and I can see that the data is different after each update.
However, once I select a file, two problems occur. First, the data isn't loaded. Secondly, the column stops reacting to my interactions.
Can anybody tell me what my problem is?
The problem in the code above is that the datafile variable in function f is not a file name but the file contents, as a bytes string. Due to the error, the function throws an unhandled exception that, unfortunately, isn't registered anywhere.
Thus, the data reading line should be
data = pd.read_csv(io.BytesIO(datafile))