Passing a file that hasn't been created, as input -- Python - python

I'm attempting to create a script that pulls an html table using pandas, doing some other intermediary steps, and then transposing the data into an Excel file.
The problem is, I want to pass the website, the ship's name, and then the subsequent filename that is created into the script but it keeps erroring out stating the file doesn't exist. I know it doesn't exist because it hasn't been created by the program.
Is there a way to run through the script passing the intended filename to be created as input? Thanks!
import pandas as pd
import os
import openpyxl
#This segment of code initially grabs the webpage from the website
webpage = input("Enter the webpage here: ")
ship_name = input("Enter the name of the ship here: ")
df = pd.read_html(webpage, skiprows=[7,14,15,16], index_col=None)
df[0].to_excel(ship_name + ".xlsx")
#This code segment does an initial clean up of the data: Gets rid of copied column data that comes over due to the colspan=3 tag in the original html source code
filename = ("aase.xlsx")
wb = openpyxl.load_workbook(filename)
sheet = wb['Sheet1']
status = sheet.cell(sheet.min_row, 1).value
print(status)
sheet.delete_rows(1)
sheet.delete_cols(3,2)
wb.save(filename)

Instead of calling this:
wb = openpyxl.load_workbook(filename)
Create a blank workbook in memory:
wb = Workbook()
Then save it later.
You could also refer to the "Simple Usage" documentation for a clearer example of this.

Related

Pandas dataframe to specific sheet in a excel file without losing formatting

I have a dataframe like as shown below
Date,cust,region,Abr,Number,
12/01/2010,Company_Name,Somecity,Chi,36,
12/02/2010,Company_Name,Someothercity,Nyc,156,
df = pd.read_clipboard(sep=',')
I would like to write this dataframe to a specific sheet (called temp_data) in the file output.xlsx
Therfore I tried the below
import pandas
from openpyxl import load_workbook
book = load_workbook('output.xlsx')
writer = pandas.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
I also tried the below
path = 'output.xlsx'
with pd.ExcelWriter(path) as writer:
writer.book = openpyxl.load_workbook(path)
final_df.to_excel(writer, sheet_name='temp_data',startrow=10)
writer.save()
But am not sure whether I am overcomplicating it. I get an error like as shown below. But I verifiedd in task manager, no excel file/task is running
BadZipFile: File is not a zip file
Moreover, I also lose my formatting of the output.xlsx file when I manage to write the file based on below suggestions. I already have a neatly formatted font,color file etc and just need to put the data inside.
Is there anyway to write the pandas dataframe to a specific sheet in an existing excel file? WITHOUT LOSING FORMATTING OF THE DESTIATION FILE
You need to just use to_excel from pandas dataframe.
Try below snippet:
df1.to_excel("output.xlsx",sheet_name='Sheet_name')
If there is existing data please try below snippet:
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('output.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.save()
writer.close()
Are you restricted to using pandas or openpyxl?
Because if you're comfortable using other libraries, the easiest way is probably using win32com to puppet excel as if you were a user manually copying and pasting the information over.
import pandas as pd
import io
import win32com.client as win32
import os
csv_text = """Date,cust,region,Abr,Number
12/01/2010,Company_Name,Somecity,Chi,36
12/02/2010,Company_Name,Someothercity,Nyc,156"""
df = pd.read_csv(io.StringIO(csv_text),sep = ',')
temp_path = r"C:\Users\[User]\Desktop\temp.xlsx" #temporary location where to write this dataframe
df.to_excel(temp_path,index = False) #temporarily write this file to excel, change the output path as needed
excel = win32.Dispatch("Excel.Application")
excel.Visible = True #Switch these attributes to False if you'd prefer Excel to be invisible while excecuting this script
excel.ScreenUpdating = True
temp_wb = excel.Workbooks.Open(temp_path)
temp_ws = temp_wb.Sheets("Sheet1")
output_path = r"C:\Users\[User]\Desktop\output.xlsx" #Path to your output excel file
output_wb = excel.Workbooks.Open(output_path)
output_ws = output_wb.Sheets("Output_sheet")
temp_ws.Range('A1').CurrentRegion.Copy(Destination = output_ws.Range('A1')) # Feel free to modify the Cell where you'd like the data to be copied to
input('Check that output looks like you expected\n') # Added pause here to make sure script doesn't overwrite your file before you've looked at the output
temp_wb.Close()
output_wb.Close(True) #Close output workbook and save changes
excel.Quit() #Close excel
os.remove(temp_path) #Delete temporary excel file
Let me know if this achieves what you were after.
I spent all day on this (and a co-worker of mine spent even longer). Thankfully, it seems to work for my purposes - pasting a dataframe into an Excel sheet without changing any of the Excel source formatting. It requires the pywin32 package, which "drives" Excel as if it a user, using VBA.
import pandas as pd
from win32com import client
# Grab your source data any way you please - I'm defining it manually here:
df = pd.DataFrame([
['LOOK','','','','','','','',''],
['','MA!','','','','','','',''],
['','','I pasted','','','','','',''],
['','','','into','','','','',''],
['','','','','Excel','','','',''],
['','','','','','without','','',''],
['','','','','','','breaking','',''],
['','','','','','','','all the',''],
['','','','','','','','','FORMATTING!']
])
# Copy the df to clipboard, so we can later paste it as text.
df.to_clipboard(index=False, header=False)
excel_app = client.gencache.EnsureDispatch("Excel.Application") # Initialize instance
wb = excel_app.Workbooks.Open("Template.xlsx") # Load your (formatted) template workbook
ws = wb.Worksheets(1) # First worksheet becomes active - you could also refer to a sheet by name
ws.Range("A3").Select() # Only select a single cell using Excel nomenclature, otherwise this breaks
ws.PasteSpecial(Format='Unicode Text') # Paste as text
wb.SaveAs("Updated Template.xlsx") # Save our work
excel_app.Quit() # End the Excel instance
In general, when using the win32com approach, it's helpful to record yourself (with a macro) doing what you want to accomplish in Excel, then reading the generated macro code. Often this will give you excellent clues as to what commands you could invoke.
The solution to your problem exists here: How to save a new sheet in an existing excel file, using Pandas?
To add a new sheet from a df:
import pandas as pd
from openpyxl import load_workbook
import os
import numpy as np
os.chdir(r'C:\workdir')
path = 'output.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book
### replace with your df ###
x = np.random.randn(100, 2)
df = pd.DataFrame(x)
df.to_excel(writer, sheet_name = 'x')
writer.save()
writer.close()
You can try xltpl.
Create a template file based on your output.xlsx file.
Render a file with your data.
from xltpl.writerx import BookWriterx
writer = BookWriterx('template.xlsx')
d = {'rows': df.values}
d['tpl_name'] = 'tpl_sheet'
d['sheet_name'] = 'temp_data'
writer.render_sheet(d)
d['tpl_name'] = 'other_sheet'
d['sheet_name'] = 'other'
writer.render_sheet(d)
writer.save('out.xls')
See examples.

Python code not writing output in excel sheet but is able to take input from another sheet in same workbook

Background:
I am fetching Option chain for a symbol from web and then writing it to an excel sheet. I also have another excel sheet in the same workbook from which I take inputs for the program to run. All of this I am doing with excel 2016.
Sample of the code from program as the whole program is pretty long:
import xlwings as xw
excel_file = 'test.xlsx'
wb = xw.Book(excel_file)
wb.save()
# Fetching User input for Script/Ticker else it will be set to NIFTY as default
try:
Script_Input = pd.read_excel(excel_file, sheet_name = 'Input_Options', usecols = 'C')
script = Script_Input.iloc[0,0]
except:
script = 'NIFTY'
# Writing data in the sheet
sht_name = script + '_OC'
try:
wb.sheets.add(sht_name)
print('new sheet added')
wb.save()
except:
pass
# print('sheet already present')
# directing pointer towards current sheet to be written
sheet = wb.sheets(sht_name)
sheet.range('A4').options(index = False, header = False).value = df
sheet.range('B1').value = underlying
sheet.range('C1').value = underlying_Value
# sheet.range('A3').options(index = False, header = False).value = ce_data_final
# sheet.range('J3').options(index = False, header = False).value = pe_data_final
wb.save()
Problem: Since yesterday, I am able to open my excel workbook with excel 2016 and change inputs for my program but, I do not get any data written in the sheet that takes output from the program. The program runs perfectly as I can test the output on terminal. Also, once I delete the sheet no new sheet is being created as it should.
What I tried: I have uninstalled every other version of excel I had, so now only excel 2016 is present.
I have made sure that all the respective file formats use excel 2016 as the default app.
Also note that, 2 days ago I was able to write data perfectly in the respective sheet but now I am not able to do so.
Any help appreciated...
Sorry to everyone who tried to solve this question.
after #buran asked about 'df' I looked into my code and found that I had a return statement before writing 'df' into sheet (I have created a separate function to write data in excel). Now that I have moved that statement to its proper place the code is working fine. I am extremely sorry as I did not realise what the problem was in the 1st place and assumed it had to do with excel and python. Now the program runs perfectly and I am getting the output I want.

Is there a way to save data in named Excel cells using Python?

I have used openpyxl for outputting values in Excel in my Python code. However, now I find myself in a situation where the cell locations in excel file may change based on the user. To avoid any problems with the program, I want to name the cells where the code can save the output to. Is there any way to have Python interact with named ranges in Excel?
For a workbook level defined name
import openpyxl
wb = openpyxl.load_workbook("c:/tmp/SO/namerange.xlsx")
ws = wb["Sheet1"]
mycell = wb.defined_names['mycell']
for title, coord in mycell.destinations:
ws = wb[title]
ws[coord] = "Update"
wb.save('updated.xlsx')
print("{} {} updated".format(ws,coord))
I was able to find the parameters of the named range using defined_names. After that I just worked like it was a normal Excel cell.
from openpyxl import load_workbook
openWB=load_workbook('test.xlsx')
rangeDestination = openWB.defined_names['testCell']
print(rangeDestination)
sheetName=str(rangeDestination.attr_text).split('!')[0]
cellName = str(rangeDestination.attr_text).split('!')[1]
sheetToWrite=openWB[sheetName]
cellToWrite=sheetToWrite[cellName]
sheetToWrite[cellName]='TEST-A3'
print(sheetName)
print(cellName)
openWB.save('test.xlsx')
openWB.close()

Using openpyxl is it possible to preserve the print area?

I use the following to write pandas dataframes to a preformatted excel template and then save with a different name:
def writer(self):
'''
Calls the excel writer function to create an object for writing out the
report to excel. It loads an excel template, populates it and then
saves the file. '''
book = load_workbook(os.path.join(self.env.REPORT_TEMPLATE_PATH
, self.env.REPORT_TEMPLATE))
writer = pd.ExcelWriter(self.filename()
, engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
return writer
Named Ranges are used to avoid broken links.
The issue I have is that the code when run outputs this:
\Continuum\Anaconda3\lib\sitepackages\openpyxl\workbook\names\named_range.py:124: UserWarning: Discarded range with reserved name
warnings.warn("Discarded range with reserved name")
Everything seems to be fine in terms of the results but the print area is reset.
Is there a way to preserve the print area from the template in the new file?
Is there a way to see what is being discarded?
This will be possible in openpyxl 2.4. Until this is released you will have to manually recreate the print settings.

How get a excel sheet with its code name property with "python"

I want to get a Excel's sheet with Python. I can do this with the sheet's name but I want get it with its Code Name property. The following is a code using the sheet's name:
from openpyxl import load_workbook
wb_donnees = load_workbook("Données.xlsm", read_only = True)
name_ws_1 = wb_donnees.get_sheet_name()[0]
ws_1 = wb_donnees[name_ws_1]
But I want get the sheet with its Code Name property. Is it possible ?
Charlie Clark's answer works for me in read mode.
I'm not sure whether OP needed this, but when writing a new workbook, you cannot get the codename this way. Instead, you will need to specify it yourself, otherwise the function returns None, and sheets will only be codenamed 'Sheet1' etc at workbook creation.
wb = load_workbook('input.xlsm')
wsx = wb.create_sheet('New Worksheet')
wsx.sheet_properties.codeName = 'wsx'
wb.save('output.xlsm')
The following should will only work if the file is not opened in read-only mode:
from openpyxl import load_workbook
wb = load_workbook("Données.xlsm")
for n in wb.sheetnames:
ws = wb[n]
print(n, ws.sheet_properties.codeName)

Categories