I have issue with my code, and not sure what goes wrong.
The background of my issue:
I use pandas to query the data from the web for share price (multiple stocks).
Then, export the data into existing excel file.
The data frame indeed has data.
But, the file has no data after completion (I use both ExcelWriter and itertuples, but not successful).
Please help, much appreciated.Please see code below:
wb = op.load_workbook(file_location)
full_path = os.path.abspath(file_location)
for stock in stocklist:
if stock in avail_sheets:
#Delete existing tabs for having fresh start.
wb.remove(wb[stock])
wb.create_sheet(stock)
symbol = stock+".AX" #to specify ASX stock
url = get_url(symbol, start_date, end_date)
stock_data = pd.read_csv(url)
writer = pd.ExcelWriter(full_path)
stock_data.to_excel(writer, sheet_name =stock ,index = False, header = True)
writer.save()
# current_sheet = wb[stock]
# for row in stock_data.itertuples(index=False):
# current_sheet.append(row)
wb.save(file_location)
as per pandas documentation pandas documentation
you should use context manager when using ExcelWriter object specially if you want to save to multiple sheets and you have to specify the mode for writing the file :
'a' = append.
'r' = read.
'w' = write.
,if only one sheet just pass the output.xlsx file to the .to_excel() method and specify the sheet name.
`
# for single sheet
stock_data.to_excel('output.xlsx', sheet_name=stock, index=False, header=True)
# for multiple sheets or even single sheet
with pd.ExcelWriter('output.xlsx', mode='a') as writer:
stock_data.to_excel(writer, sheet_name=stock, index=False, header=True)
`
Related
I have a CSV file which have multiple sheets in it. Want to read it sheet by sheet and filter some data and want to create csv file in same format. how can I do that. Please suggest. I was trying it though pandas.ExcelReader but its not working for CSV file.
you can use the following code for this may help!
import pandas as pd
def read_excel_sheets(xls_path):
"""Read all sheets of an Excel workbook and return a single DataFrame"""
print(f'Loading {xls_path} into pandas')
xl = pd.ExcelFile(xls_path)
df = pd.DataFrame()
columns = None
for idx, name in enumerate(xl.sheet_names):
print(f'Reading sheet #{idx}: {name}')
sheet = xl.parse(name)
if idx == 0:
# Save column names from the first sheet to match for append
columns = sheet.columns
sheet.columns = columns
# Assume index of existing data frame when appended
df = df.append(sheet, ignore_index=True)
return df
the resource for this code is the link below:
click here
and for converting it back to csv you can follow the post which link is
attached here
Trying to write a script where I currently have an excel VBA sheet that has two tabs with 1st being a graph and second being a backend file. Backend is updated by a master file. In the master file there is a city column where I want to loop through all the unique city rows write those rows in to the VBA file and save the VBA file with the city's name.
master_backend = pd.read_excel(path)
city = master_backend[(master_backend["City"]=="NY")]
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
from openpyxl import load_workbook
import pandas as pd
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
writer.book = load_workbook(filename, keep_vba = True)
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
if truncate_sheet and sheet_name in writer.book.sheetnames:
idx = writer.book.sheetnames.index(sheet_name)
writer.book.remove(writer.book.worksheets[idx])
writer.book.create_sheet(sheet_name, idx)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
pass
if startrow is None:
startrow = 0
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
writer.save()
Essentially what I want is 5 files since there are 5 cities all named with their city name
as I don't know VBA and you posted this under the python tag I'll provide my take on this.
assuming your datasheet is called file you could try something like this :
import shutil
for city in master_backend.City.unique():
df = master_backend.loc[master_backend.City == city]
shutil.copy(file,f"{city}.xlsx")
append_df_excel(f"{city}.xlsx", df,sheet_name='Backend')
cracking function btw, I would use put some doc strings in it for easy of use : )
I think you can simplify this script significantly by understanding that pandas will create a dataframe for you when you read the excel file. Then it's just a simple matter of collecting the info you want from the dataframe and re-writting it to a file. It's unclear what you want in your new file, but suppose you just want to filter the second sheet and keep everything in the first sheet it might look like this.
# Open the file,
# NOTE: when you open the file, if there are multiple sheets
# then the result is a dictionary of dataframes keyed on the sheet name
master_data = pd.read_excel(file_path, ....)
# Assuming second sheet name is 'City'
city_df=master_data['City']
# Replace 'columnName' with the name of the column (if includes headers) or column number
for city in pd.unique(city_df['columnName']):
with pd.ExcelWriter(city + '.xlsx') as writer:
master_data['Sheet1'].to_excel(writer, sheet_name='Sheet1')
city_df[city_df['columnName']==city].to_excel(writer, sheet_name='City')
i want merge multi excel file(1.xlsm, 2.xlsm....) to [A.xlsm] file with macro, 3sheets
so i try to merge
# input_file = (./*.xlsx)
all_data = pd.DataFrame()
for f in (input_file):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True, sort=False)
writer = pd.ExcelWriter(A.xlsm, engine='openpyxl')
all_data.to_excel(writer,'Sheet1')
writer.save()
the code dose not error,
but result file[A.xlsm] is error to open,
so i change extension to A.xlsx and open.
it opening is OK but disappear all Sheets and macro.
how can i merge multi xlsx file to xlsm file with macro?
I believe that if you want to use macro-enabled workbooks you need to load them with keep_vba=True:
from openpyxl import load_workbook
XlMacroFile = load_workbook('A.xlsm',keep_vba=True)
To preserve separate sheets, you can do something like
df_list = #list of your dataframes
filename = #name of your output file
with pd.ExcelWriter(filename) as writer:
for df in df_list:
df.to_excel(writer, sheet_name='sheet_name_goes_here')
This will write each dataframe in a separate sheet in your output excel file.
I need to write a program to scrap daily quote from a certain web page and collect them into a single excel file. I wrote something which finds next empty row and starts writing new quotes on it but deletes previous rows too:
wb = openpyxl.load_workbook('gold_quote.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
.
.
.
z = 1
x = sheet['A{}'.format(z)].value
while x != None:
x = sheet['A{}'.format(z)].value
z += 1
writer = pd.ExcelWriter('quote.xlsx')
df.to_excel(writer, sheet_name='Sheet1',na_rep='', float_format=None,columns=['Date', 'Time', 'Price'], header=True,index=False, index_label=None, startrow=z-1, startcol=0, engine=None,merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
writer.save()
Question: How to write on existing excel files without losing previous information
openpyxl uses append to write after last used Row:
wb = openpyxl.load_workbook('gold_quote.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
rowData = ['2017-08-01', '16:31', 1.23]
sheet.append(rowData)
wb.save('gold_quote.xlsx')
writer.book = wb
writer.sheets = dict((ws.title, ws) for ws in wb.worksheets)
I figured it out, first we should define a reader to read existing data of excel file then concatenate recently extracted data from web with a defined writer, and we should drop duplicates otherwise any time the program is executed there will be many duplicated data. Then we can write previous and new data altogether:
excel_reader = pd.ExcelFile('gold_quote.xlsx')
to_update = {"Sheet1": df}
excel_writer = pd.ExcelWriter('gold_quote.xlsx')
for sheet in excel_reader.sheet_names:
sheet_df = excel_reader.parse(sheet)
append_df = to_update.get(sheet)
if append_df is not None:
sheet_df = pd.concat([sheet_df, df]).drop_duplicates()
sheet_df.to_excel(excel_writer, sheet, index=False)
excel_writer.save()
I have a web scraper which creates an excel file for this month's scrapes. I want to add today's scrape and every scrape for that month into that file as a new sheet every time it is run. My issue, however, has been that it only overwrites the existing sheet with a new sheet instead of adding it as a separate new sheet. I've tried to do it with xlrd, xlwt, pandas, and openpyxl.
Still brand new to Python so simplicity is appreciated!
Below is just the code dealing with writing the excel file.
# My relevant time variables
ts = time.time()
date_time = datetime.datetime.fromtimestamp(ts).strftime('%y-%m-%d %H_%M_%S')
HourMinuteSecond = datetime.datetime.fromtimestamp(ts).strftime('%H_%M_%S')
month = datetime.datetime.now().strftime('%m-%y')
# Creates a writer for this month and year
writer = pd.ExcelWriter(
'C:\\Users\\G\\Desktop\\KickstarterLinks(%s).xlsx' % (month),
engine='xlsxwriter')
# Creates dataframe from my data, d
df = pd.DataFrame(d)
# Writes to the excel file
df.to_excel(writer, sheet_name='%s' % (HourMinuteSecond))
writer.save()
Update:
This functionality has been added to pandas 0.24.0:
ExcelWriter now accepts mode as a keyword argument, enabling append to existing workbooks when using the openpyxl engine (GH3441)
Previous version:
Pandas has an open feature request for this.
In the mean time, here is a function which adds a pandas.DataFrame to an existing workbook:
Code:
def add_frame_to_workbook(filename, tabname, dataframe, timestamp):
"""
Save a dataframe to a workbook tab with the filename and tabname
coded to timestamp
:param filename: filename to create, can use strptime formatting
:param tabname: tabname to create, can use strptime formatting
:param dataframe: dataframe to save to workbook
:param timestamp: timestamp associated with dataframe
:return: None
"""
filename = timestamp.strftime(filename)
sheet_name = timestamp.strftime(tabname)
# create a writer for this month and year
writer = pd.ExcelWriter(filename, engine='openpyxl')
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# copy existing sheets
writer.sheets = dict(
(ws.title, ws) for ws in writer.book.worksheets)
except IOError:
# file does not exist yet, we will create it
pass
# write out the new sheet
dataframe.to_excel(writer, sheet_name=sheet_name)
# save the workbook
writer.save()
Test Code:
import datetime as dt
import pandas as pd
from openpyxl import load_workbook
data = [x.strip().split() for x in """
Date Close
2016-10-18T13:44:59 2128.00
2016-10-18T13:59:59 2128.75
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])
name_template = './sample-%m-%y.xlsx'
tab_template = '%d_%H_%M'
now = dt.datetime.now()
in_an_hour = now + dt.timedelta(hours=1)
add_frame_to_workbook(name_template, tab_template, df, now)
add_frame_to_workbook(name_template, tab_template, df, in_an_hour)
(Source)