Pyviz panel: can't work with FileInput widget - python

I must be missing something basic about how FileInput widget works in pyviz panel.
In the following code, I let the user select a csv file and the number of rows to display. If a file isn't selected, I generate some random data.
import pandas as pd; import numpy as np; import matplotlib.pyplot as plt
import panel as pn
import panel.widgets as pnw
pn.extension()
datafile = pnw.FileInput()
head = pnw.IntSlider(name='head', value=3, start=1, end=60)
#pn.depends(datafile, head)
def f(datafile, head):
if datafile is None:
data = pd.DataFrame({'x': np.random.rand(10)})
else:
data = pd.read_csv(datafile)
return pn.Column(f'## {head} first rows', data.head(head))
widgets = pn.Column(datafile, head)
col = pn.Column(widgets, f)
col
Here's the problem. If I don't select a file and play with the head widget, the pane acts as expected: the number of displayed rows changes as I change the head widget, and I can see that the data is different after each update.
However, once I select a file, two problems occur. First, the data isn't loaded. Secondly, the column stops reacting to my interactions.
Can anybody tell me what my problem is?

The problem in the code above is that the datafile variable in function f is not a file name but the file contents, as a bytes string. Due to the error, the function throws an unhandled exception that, unfortunately, isn't registered anywhere.
Thus, the data reading line should be
data = pd.read_csv(io.BytesIO(datafile))

Related

Run a Python script on every text file in a folder using Spyder

I am trying to run a script onto over 900 files using the Spyder platform, that aims to delete the first 3 rows of data and certain columns. I tried looking into other similar questions but was unable to achieve the intended results.
My code for one text file is as follows:
import pandas as pd
mydataset = pd.read_csv('vectors_0001.txt')
df = pd.DataFrame(mydataset)
df.drop(df.iloc[:,:2], inplace = True, axis = 1)
df.drop([0,1,3], axis = 0, inplace = True)
df = df.dropna(axis = 0, subset=['Column3','Column4'])
Then I want to modify the code above so it can be applied to the consecutive text files, all the text file names are: vectors_0001, vectors_0002, ..., vectors_0900. I tried to do something similar but I keep getting errors. Take the one below as an example:
(Note: that 'u [m/s]', 'v [m/s]' are the columns I want to keep for further data analysis and the other columns I want to get rid of.)
import glob
import os.path
import sys
import pandas as pd
dir_of_interest = sys.argv[1] if len(sys.argv) > 1 else '.'
files = glob.glob(os.path.join(dir_of_interest, "*.txt"))
for file in files:
with open('file.txt', 'w') as f:
f.writelines(3:)
df = pd.read_csv("*.txt")
df_new = df[['u [m/s]', 'v [m/s]']
df_new.to_csv('*.txt', header=True, index=None)
with open('file.txt','r+') as f:
print(f.read())
However I tried to run it and I got the error:
f.writelines(3:)
^
SyntaxError: invalid syntax
I really want to get this figured out and move onto my data analysis. Please and thank you in advance.
I'm not totally sure of what you are trying to achieve here but you're using the writelines functions incorrectly. It accepts a list as an argument
https://www.w3schools.com/python/ref_file_writelines.asp
You're giving it "3:" which is not valid. Maybe you want to give it a fraction of an existing list ?

Tkintr/Pandas: Selecting a CSV and editing with Pandas

I'm trying to grab a filepath using a drop down menu with Tkintr, then edit that CSV using Pandas.
Here's the code:
from Tkinter import Tk
from tkinter.filedialog import askdirectory, askopenfilename
import tkinter.messagebox as tkmb
import pandas as pd
def testPandas(filePath):
data = pd.read_csv(filePath)
data.drop(data.index[[1,2]])
data.to_csv(filePath, index = False)
def wrapQuotes(fileString):
return "'{}'".format(fileString)
Tk().withdraw() #get rid of of the tkinter window
tkmb.showinfo(title=' ', message='Select File')
filePath = askopenfilename() #dialogue box for original file
wrapQuotes(filePath)
testPandas(filePath)
print(filePath)
Here's the CSV:
The process I'm going with is:
(1) Using tkintr, I create a drop down menu, user selects the file and I get the filepath
(2) I read the filepath in through my testPandas function and I would delete the first two rows
At first I was thinking the CSV wasn't getting edited because the path wasn't wrapped in quotations so thats why I added in wrapQuotes but that didn't look like it did anything.
When I run my program, the csv stays the same.
Any help would be appreciated!
The issue appears to be that you use pandas.DataFrame.drop as an inplace method. By default, most pandas methods are not inplace and you'll need to assign the returned object to something inorder to use it or you can use the inplace = True argument in the method's caller:
def testPandas(filePath):
data = pd.read_csv(filePath)
data = data.drop(data.index[[1,2]])
data.to_csv(filePath, index = False)
or
def testPandas(filePath):
data = pd.read_csv(filePath)
data.drop(data.index[[1,2]], inplace=True)
data.to_csv(filePath, index = False)
Should fix your issue. The other function with wrapping quotes is not needed.
You make the same mistake two times - you have to assign result to variable
data = data.drop(...)
and
filePath = wrapQuotes(filePath)
BTW: you don't need to wrap file path to read it.
Eventually you can use option inplace=True
data.drop(..., inplace=True)

Loading a CVS file where the data is all in one column

I am importing a CSV file that contains data which is all in a single column (the TXT file has the data separated by ";"
Is there anyway to get the data to load into Anaconda (using Panda) so that it is in separate columns, or can it be manipulated afterwards into columns?
The data can be found at the following web-address (this is data about sunspots):
http://www.sidc.be/silso/INFO/snmtotcsv.php
From this website http://www.sidc.be/silso/datafiles
I have managed to do this so far:
Start code by loading the Panda command set
from pandas import *
#Initial setup commands
import warnings
warnings.simplefilter('ignore', FutureWarning)
import matplotlib
matplotlib.rcParams['axes.grid'] = True # show gridlines by default
%matplotlib inline
from scipy.stats import spearmanr
#load data from CSV file
startdata = read_csv('SN_m_tot_V2.0.csv',header=None)
startdata = startdata.reset_index()
I received an answer elsewhere; the lines of code that takes into account the lack of column headings AND the separator being s semi-colon is:
colnames=['Year','Month','Year (fraction)','Sunspot number','Std dev.','N obs.','Provisional']
ssdata=read_csv('SN_m_tot_V2.0.csv',sep=';',header=None,names=colnames)

Python not reading all the information form excel properly

I am trying to open a few excel folders inside a directory and then be able to do stuff with the data (like take the average of one row for three files).
My main goal right now is just to be able to display the information in each excel file. I used the following code to do so. But when I display it, it prints out the '0' element to the '29' element...then it skips 30-50 and and it prints out 51-80.
Here is a snip of my output on python:
import numpy as np
import scipy.io as sio
import scipy
import matplotlib.pyplot as plt
import os
import pandas as pd
from tkinter import filedialog
from tkinter import *
import matplotlib.image as image
import xlsxwriter
import openpyxl
import xlwt
import xlrd
#GUI
root=Tk()
root.withdraw() #closes tkinter window pop-up
path=filedialog.askdirectory(parent=root,title='Choose a file')
path=path+'/'
print('Folder Selected',path)
files=os.listdir(path)
length=len(files)
print('Files inside the folder',files)
Files=[]
for s in os.listdir(path):
Files.append(pd.read_excel(path+s))
print (Files)
I'm quite sure your data is being correctly read. The dots between rows 29 and 51 show that there is more data there. pandas elides these rows, so your console looks cleaner. If you want to see all the rows, you could use the solution from this answer:
with pd.option_context('display.max_rows', None, 'display.max_columns', 3):
print(Files)
Where None sets display limit on rows (no limit) and 3 sets display limit on columns. Here you can find more info on options.
This is actually the standard way to print the data, notice the ellipses between 29 and 51:
29 7.8000 [cont.]
...
51 12.19999 [cont.]
You can still operate on every row. To get the number of rows in a dataframe, you can call
len(df.index)

plotting using pandas in python

What i am trying to do is slightly basic, however i am very new to python, and am having trouble.
Goal: is to plot the yellow highlighted Row(which i have highlighted, however it will not be highlighted when i need to read the data) on the Y-Axis and plot the "Time" Column on the X-Axis.
Here is a photo of the Data, and then the code that i have tried along with its error.
Code
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
#Reading CSV and converting it to a df(Data_Frame)
df1 = pd.read_csv('Test_Sheet_1.csv', skiprows = 8)
#Creating a list from df1 and labeling it 'Time'
Time = df1['Time']
print(Time)
#Reading CSV and converting it to a df(Data_Frame)
df2 = pd.read_csv('Test_Sheet_1.csv').T
#From here i need to know how to skip 4 lines.
#I need to skip 4 lines AFTER the transposition and then we can plot DID and
Time
DID = df2['Parameters']
print(DID)
Error
As you can see from the code, right now i am just trying to print the Data so that i can see it, and then i would like to put it onto a graph.
I think i need to use the 'skiplines' function after the transposition, so that python can know where to read the "column" labeled parameters(its only a column after the Transposition), However i do not know how to use the skip lines function after the transposition unless i transpose it to a new Excel Document, but this is not an option.
Any help is very much appreciated,
Thank you!
Update
This is the output I get when I add print(df2.columns.tolist())

Categories