Just started learning python and trying to read a CSV file with pandas.
import pandas as pd
df = pd.read_csv(os.path.join(os.path.dirname(__file__), "C:\\Anaconda\\SPY.csv"))
But I get the error:
File data\SPY.csv does not exist
Tried with both one and two / and \ and ' instead of "
this is the connection string: C:\Anaconda\SPY.csv
(This is a file from yahoo finance. I first tried to call to yahoo but was unable so instead I just downloaded the file and saved it as a CSV)
The error is occurring because you are trying to join your current directory which is named "data" but your file is actually in "Anaconda".
Try a simple
import pandas as pd
df = pd.read_csv("C:\\Anaconda\\SPY.csv")
If you really want to use os.path.join, this should do:
import pandas as pd
import os
path = os.path.join("C:","Anaconda","SPY.csv")
df = pd.read_csv(path)
Also, if your SPY.csv file is in the same directory as your Python file, you should replace the path with a simple SPY.csv
Related
I would like a excel file to be stored in a .xlsx format into a specific folder that i called data. The file is in the same folder as the programm running.
The programm create a new mydict every hour that's why I have it in the name so I can work on it later on.
import pandas as pd
from pandas import ExcelWriter
import datetime
mydict = self._detailed_cost
todays_date = str(datetime.datetime.now().strftime("%Y-%m-%d-%H%M"))
df = pd.DataFrame.from_dict(mydict, orient='index')
with ExcelWriter('data/' + todays_date + '-cost_function'+'.xlsx') as writer:
df.to_excel(writer, 'costs', index=True)
Running this code i get the following error:
OSError: Cannot save file into a non-existent directory: '..\data'
Idealy i would'nt give an absolute path since I'm coding on one PC and I'd like it to run on an other one with a different path.
with ExcelWriter(r'data/' + todays_date + '-cost_function'+'.xlsx') as writer:
As the above use r at the start of the path to indicate that a relative path.
I am downloading an excel file from a website.
If I just use pandas to open the file
import pandas as pd
df = pd.read_excel('filepath')
I get an error CompDocError: Workbook corruption: seen[2] == 4
If I resave file before opening it everything works fine
import pandas as pd
import win32com.client
def resave_excel(filename):
xcl = win32com.client.Dispatch('Excel.Application')
wb = xcl.workbooks.open(filename)
xcl.DisplayAlerts = False
wb.Save()
xcl.Quit()
resave_excel('filepath')
df = pd.read_excel('filepath')
The problem with this approach is that I actually call Excel application and it is not the safest thing to do, especially if I want to run the full script on some automated basis or if I want to run it on a different platform.
Is there a different approach that I am missing?
The only solution that I found is discussed on https://github.com/python-excel/xlrd/issues/149.
Instead of pandas you need to use xlrd and make changes to xlrd/compdoc.py.
My code is:
import pandas as pd
df=pd.read_csv('Project_Wind_Data.csv'), usecols = ['U100', 'V100']) with open
('Project_Wind_Data.csv',"r") as csvfile:
I am trying to access certain columns within the csv file. I recive an error message saying that the data file does not exist
My data is in the following form:
This is must a be trivial issue but help would be much appreciated.
If your csv file is in the same working directory as your .py code, you use directly
import pandas as pd
df=pd.read_csv('Project_Wind_Data.csv'), usecols = ['U100', 'V100'])
If the file is in another directory, replace 'Project_Wind_Data.csv' with the full path to the file like c:User/Documents/file.txt
I can't use the read_excel method from pandas library in my Ipython note book.
After some test and cleaning in the Excel file, I understood their is a complete column of drawings (or images). When I deleted this column I stop the error message. Does somebody know how to configure read_excel option to collect only dataes? This is my code:
import pandas as pd
import os
# File selection
userfilepath = r'C:\Temp'
filename = "exportCS12.xlsx"
filenameCS12 = os.path.join(userfilepath, filename)
print(filenameCS12)
# workbook upload
df = pd.read_excel(filenameCS12, sheetname='Sheet1')
Pandas import was not working due to a none clean excel file. Problem sovlve with openpyxl, able to navigate in excel only in validated areas.
I have been trying for a while to save a pandas dataframe to an HDF5 file. I tried various different phrasings eg. df.to_hdf etc. but to no avail. I am running this in a python virtual environment see here. Even without the use of the VE it has the same error. The following script comes up with the error below:
''' This script reads in a pickles dictionary converts it to panda
dataframe and then saves it to an hdf file. The arguments are the
file names of the pickle files.
'''
import numpy as np
import pandas as pd
import pickle
import sys
# read in filename arguments
for fn in sys.argv[1:]:
print 'converting file %s to hdf format...' % fn
fl = open(fn, 'r')
data = pickle.load(fl)
fl.close()
frame = pd.DataFrame(data)
fnn = fn.split('.')[0]+'.h5'
store = pd.HDFStore(fnn)
store.put([fn.split('.')[0]], frame)
store.close()
frame = 0
data = 0
Error is:
$ ./p_to_hdf.py LUT_*.p
converting file LUT_0.p to hdf format...
Traceback (most recent call last):
File "./p_to_hdf.py", line 22, in <module>
store = pd.HDFStore(fnn)
File "/usr/lib/python2.7/site-packages/pandas/io/pytables.py", line 270, in __init__
raise Exception('HDFStore requires PyTables')
Exception: HDFStore requires PyTables
pip list shows both pandas and tables are installed and the latest versions.
pandas (0.16.2)
tables (3.2.0)
The solution had noting to do with the code but how to source a virtual environment in python. The correct way is to use . venv/bin/activate instead of source ~/venv/bin/activate. Now which python shows the python installed under ~/venv/bin/python and the code runs correctly.