Saving an excel file using pandas in a specific file - python

I would like a excel file to be stored in a .xlsx format into a specific folder that i called data. The file is in the same folder as the programm running.
The programm create a new mydict every hour that's why I have it in the name so I can work on it later on.
import pandas as pd
from pandas import ExcelWriter
import datetime
mydict = self._detailed_cost
todays_date = str(datetime.datetime.now().strftime("%Y-%m-%d-%H%M"))
df = pd.DataFrame.from_dict(mydict, orient='index')
with ExcelWriter('data/' + todays_date + '-cost_function'+'.xlsx') as writer:
df.to_excel(writer, 'costs', index=True)
Running this code i get the following error:
OSError: Cannot save file into a non-existent directory: '..\data'
Idealy i would'nt give an absolute path since I'm coding on one PC and I'd like it to run on an other one with a different path.

with ExcelWriter(r'data/' + todays_date + '-cost_function'+'.xlsx') as writer:
As the above use r at the start of the path to indicate that a relative path.

Related

Python: how to make a loop to copy data from different Excel files into a new one in an iterative way with pandas

I need to copy data from different Excel files into a new one. I would like to just tell the program to take all the files into a specific folder and copy two columns from each of them into a new Excel file. I tried a for loop but it overwrites data coming from different files and I get a new Excel file with just one sheet with data copied from the last file read by the program. Could you help me, please?
Here is my code:
import os.path
import pandas as pd
folder=r'C:\\Users\\PycharmProjects\\excelfile\\'
for fn in os.listdir(folder):
fx = pd.read_excel(os.path.join(folder, fn), usecols='H,E')
with pd.ExcelWriter('Output.xlsx') as writer:
ws = os.path.splitext(fn)[0]
fx.to_excel(writer, sheet_name=ws)
You should open the output file in append mode like so:
with pd.ExcelWriter("Output.xlsx", engine='openpyxl', mode='a') as writer:
ws = os.path.splitext(fn)[0]
fx.to_excel(writer, sheet_name=ws)

Opening Multiple`.xls` files from a folder in a different directory and creating one dataframe using Pandas

I am trying to open multiple xls files in a folder from a particular directory. I wish to read into these files and open all of them in one data frame. So far I am able to access the directory and put all the xls files into a list like this
import os
import pandas as pd
path = ('D:\Anaconda Hub\ARK analysis\data\year2021\\february')
files = os.listdir(path)
files
# outputting the variable files which appears to be a list.
Output:
['ARK_Trade_02012021_0619PM_EST_601875e069e08.xls',
'ARK_Trade_02022021_0645PM_EST_6019df308ae5e.xls',
'ARK_Trade_02032021_0829PM_EST_601b2da2185c6.xls',
'ARK_Trade_02042021_0637PM_EST_601c72b88257f.xls',
'ARK_Trade_02052021_0646PM_EST_601dd4dc308c5.xls',
'ARK_Trade_02082021_0629PM_EST_6021c739595b0.xls',
'ARK_Trade_02092021_0642PM_EST_602304eebdd43.xls',
'ARK_Trade_02102021_0809PM_EST_6024834cc5c8d.xls',
'ARK_Trade_02112021_0639PM_EST_6025bf548f5e7.xls',
'ARK_Trade_02122021_0705PM_EST_60270e4792d9e.xls',
'ARK_Trade_02162021_0748PM_EST_602c58957b6a8.xls']
I am now trying to get it into one dataframe like this:
frame = pd.DataFrame()
for f in files:
data = pd.read_excel(f, 'Sheet1')
frame.append(data)
df = pd.concat(frame, axis=0, ignore_index=True)
However, when doing this I sometimes obtain a blank data frame or it throws an error like this:
FileNotFoundError: [Errno 2] No such file or directory: 'ARK_Trade_02012021_0619PM_EST_601875e069e08.xls'
Help would truly be appreciated with this task.
Thanks in advance.
The issue happens because if you simply put the file name the interpreter assumes that it is in the current working directory, therefore you need to use os module to get proper location:
import os
import pandas as pd
path = ('D:\Anaconda Hub\ARK analysis\data\year2021\\february')
files = os.listdir(path)
#frame = pd.DataFrame() ...This will not work!
frame = [] # Do this instead
for f in files:
data = pd.read_excel(os.path.join(path, f), 'Sheet1') # Here join filename with folder location
frame.append(data)
df = pd.concat(frame, axis=0, ignore_index=True)
The other issue is that frame should be a list or some other iterable. Pandas has append method for dataframes but if you want to use concat then it will need to be a list.
you can add a middle step to check if the path exists or not, I suspect this is an isolated issue with your server, from memory when working on older windows servers (namely 2012), I would sometimes have issues where the Path couldn't be found even though it 100% existed.
import pandas as pd
from pathlib import Path
# assuming you want xls and xlsx.
files = Path('folder_location').glob('*.xls*')
dfs = []
for file in files:
if file.is_file():
df = pd.read_excel(file, sheet_name='sheet')
dfs.append(df)
final_df = pd.concat(dfs)

Permission error when pandas dataframe is write to xlsx file

I get this error while i want to keep my dataframe in excel file which name pandas_simple.xlsx
Below is my error:
This is my code:
import pandas as pd
df = pd.DataFrame({'Car': [101, 20, 350, 20, 15, 320, 454]})
writer = pd.ExcelWriter('pandas_simple.xlsx')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
writer.close()
Anyone can share some idea to me here?
This error could also occur if you have a version of the same file (pandas_simple.xlsx in this case) already open on your desktop. In that case, python will not have permission to close and overwrite the same file as well. Closing the excel file and re-running the script should resolve the issue.
You try to write to a folder where you need administration rights. Change:
writer = pd.ExcelWriter("pandas_simple.xlsx")
to:
writer = pd.ExcelWriter("C:\\...\\pandas_simple.xlsx")
with the full path and you will not have a problem.
The documentation of pandas.DataFrame.to_excel says that the first argument can be a string that represents the file path. In your case i would drop all lines with writer and just try
df.to_excel('pandas_simple.xlsx')
That should write pandas_simple.xlsx to your current working directory. If that does not work try to provide the full path name (e.g. C:\\Users\\John\\pandas_simple.xlsx). Also make sure that you don't try to write to a directory which needs adminstration rights.
What if the path is correct?!!!
Try closing the xlsx file opened in Excel application and run the code again, it worked for me and same should happen with you.
I am attaching my code snippet for your reference
import pandas as pd
file='C:/Users/Aladahalli/Desktop/Book1.xlsx'
xls = pd.ExcelFile(file)
df = pd.read_excel(xls, sheet_name='Sheet1')
#create a column by name Final and store concatinated columns
df["Final"] = df["Name"] + " " + df["Rank/Designation"] + " " + df["PS"]
print(df.head())
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('C:/Users/Aladahalli/Desktop/Final.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
writer.close()
Make sure you dont have the file open that you are trying to write to.

Extracting and manipulating data from excel worksheet with python

Scenario: I am trying to come up with a python code that reads all the workbooks in a given folder, gets the data of each and puts it to a single data frame (each workbook becomes a dataframe, so I can manipulate them individually).
Issue1: With this code, even though I am using the proper path and file types, I keep getting the error:
File "<ipython-input-3-2a450c707fbe>", line 14, in <module>
f = open(file,'r')
FileNotFoundError: [Errno 2] No such file or directory: '(1)Copy of
Preisanfrage_17112016.xlsx'
Issue2: The reason for me to create different data frames is that each workbook has an individual format (rows are my identifiers and columns are dates). My problem is that some of these workbooks have data on a sheet named "Closing", or "Opening" or the name is not specified. So I will try to configure each data frame individually and them join them afterwards.
Issue3: Considering the final output once the data frame data is already unified, my objective is to output them in a format like:
date 1 identifier 1 value
date 1 identifier 2 value
date 1 identifier 3 value
date 1 identifier 4 value
date 2 identifier 1 value
date 2 identifier 4 value
date 2 identifier 5 value
Obs1: For the output, not all dates have the same array of identifiers.
Question 1: Any ideas why the code is yielding this error? Is there a better way to extract data from excel?
Question 2: Is it possible to create a unique dataframe for each worksheet? Is this a good practice?
Question 3: Can I do this type of output using a loop? Is this a good practice?
Obs2: I don't know how relevant this is, but I am using Python 3.6 with Anaconda.
Code so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob, os
import datetime as dt
from datetime import datetime
import matplotlib as mpl
directory = os.path.join("C:\\","Users\\Dgms\\Desktop\\final 2")
for root,dirs,files in os.walk(directory):
for file in files:
print(file)
f = open(file,'r')
df1 = pd.read_excel(file)
think you do not need your open. And I would store them in a list. you can either use pd.concat(list_of_dfs) or some manual changes.
list_of_dfs = []
for root,dirs,files in os.walk(directory):
for file in files:
f = os.path.join(root, file)
print(f)
list_of_dfs .append(pd.read_excel(f))
or using glob:
import glob
list_of_dfs = []
for file in glob.iglob(directory + '*.xlsx')
print(file)
list_of_dfs .append(pd.read_excel(file))
or as jackie suggests you can read specific sheets list_of_dfs.append(pd.concat([pd.read_excel(file, 'Opening'), pd.read_excel(file, 'Closing')])). If you have only either of them available, you could even change to
try:
list_of_dfs.append(pd.concat([pd.read_excel(file, 'Opening'))
except:
pass
try:
list_of_dfs.append(pd.concat([pd.read_excel(file, 'Closing'))
except:
pass
(Of course, you should specify the exact error, but can't test that atm)
Issue 1: If you are using IDE or Jupyter put absolute path to file.
Or add the project folder to system path (workaround, not recommended).

File data\SPY.csv does not exist

Just started learning python and trying to read a CSV file with pandas.
import pandas as pd
df = pd.read_csv(os.path.join(os.path.dirname(__file__), "C:\\Anaconda\\SPY.csv"))
But I get the error:
File data\SPY.csv does not exist
Tried with both one and two / and \ and ' instead of "
this is the connection string: C:\Anaconda\SPY.csv
(This is a file from yahoo finance. I first tried to call to yahoo but was unable so instead I just downloaded the file and saved it as a CSV)
The error is occurring because you are trying to join your current directory which is named "data" but your file is actually in "Anaconda".
Try a simple
import pandas as pd
df = pd.read_csv("C:\\Anaconda\\SPY.csv")
If you really want to use os.path.join, this should do:
import pandas as pd
import os
path = os.path.join("C:","Anaconda","SPY.csv")
df = pd.read_csv(path)
Also, if your SPY.csv file is in the same directory as your Python file, you should replace the path with a simple SPY.csv

Categories