I am writing a script for automation of a process which requires updating an excel file and then plotting a graph based on some data present in Excel file and then insert the graph in the same Excel file .
I have used openpyxl for reading and writing excel file and then used matplotlib for drawing a graph for data and then inserted the graph to the same excel file. The data in the Excel file is being updated once or twice a week. Everytime the data is updated I need to plot a updated graph and insert the graph in the Excel file.Right now my script is updating the values in the Excel file Automatically and plotting the graph for the updated data but when I insert the graph it doesnot overwrites the previous graph it everytime appends the graph above the previous graph because of which the size of the Excel file will keep on increasing.
Right now the code that i am using for plotting and inserting the graph in Excel file is-
fig = plt.figure(figsize=(8,4))
PLT = fig.add_axes([0.04, 0.08, 0.79, 0.8])
plt.xlabel("WORKING WEEK (WW)",fontsize=7)
plt.ylabel("UTILIZATION [%]",fontsize=7)
plt.title("PATCH UTILIZATION",fontsize=9)
#PLT.subplots_adjust(right=0.8)
for i in range(len(p)):
PLT.plot(x,p[i],label = '%s'%row[0],marker="s",markersize=2)
PLT.legend(bbox_to_anchor=(1.21,1),borderaxespad=0,prop={'size':6})
PLT.tick_params(axis='both', which='major', labelsize=4)
plt.savefig("myplot.png",dpi=160)
wb=load_workbook('Excel.xlsm',read_only=False,keep_vba=True)
ws=wb['Patch Util']
img = openpyxl.drawing.image.Image("myplot.png")
img.anchor='D50'
ws.add_image(img)
wb.save('Excel.xlsm')
"x" and "p" are two lists (p is list of lists) which are containing data and will be updated when the data in the Excel file is updated.
What I want is to plot a graph and insert it once. Now whenever the data is updated I want to access the same graph from the Excel file, plot it for updated data and re-inserting it in Excel file,instead of inserting a new graph everytime at the top of previous graph,so that the size of Excel file remains same.
It will be great help if anyone can help me with this
Comment: No..I am using 2.5.6 version and in my case every graph and chart is retained
Show me the output of the following:
from openpyxl import load_workbook
wb = load_workbook(<your file path>)
ws = wb.worksheets[<your sheet index>]
for image in ws._images:
print("_id:{}, img.path:{}".format(image._id, image.path))
Comment: the output i got is- _id:1, img.path:/xl/media/image1.png
Question: Can we access and replace a image in a xlsx
You can do it replacing the Image object in ws._images.
For the first time , you have to initalise the ref data, doing it as usual using ws.add_image(...). If an image exists len(ws._images) == 1 you can replace it with a new Image object.
For example:
if len(ws._images) == 0:
# Initalise the `ref` data, do ws.add_image(...)
img = openpyxl.drawing.image.Image("myplot.png")
img.anchor='D50'
ws.add_image(img)
elif len(ws._images) == 1:
# Replace the first image do **only** the following:
ws._images[0] = openpyxl.drawing.image.Image("myplot.png")
# Update the default anchor `A1` to your needs
ws._images[0].anchor='D50'
else:
raise(ValueError, "Found more than 1 Image!")
Note: You are using a class private property, this could result in unexpected side effect.
Working with openpyxl Version 2.5.6
Related
I have an excel workbook that uses functions like OFFSET, UNIQUE, and FILTER which spill into other cells. I'm using python to analyze and write some data to the workbook, but after doing so these formulas revert into normal arrays. This means they now take up a fixed number of cells (however many they took up before opening the file in python) instead of adjusting to fit all of the data. I can revert the change by selecting the formula and hitting enter, but there are many of these formulas it's more work to fix them than to just print the data to a text file and paste it into excel manually. Is there any way to prevent this behavior?
I've been using openpyxl to open and save the workbook, but after encountering this issue also tried xlsxwriter and the dataframe to excel function from pandas. Both of them had the same issue as openpyxl. For context I am on python 3.11 and using the most recent version of these modules. I believe this issue is on the Python side and not the Excel side, so I don't think changing Excel settings will help, but maybe there is something there I missed.
Example:
I've created an empty workbook with two sheets, one called 'main' and one called 'input'. The 'main' sheet will analyze data from the 'input' sheet which will be entered with openpyxl. The data will just be values in the first column.
In cell A1 of the 'main' sheet, enter =OFFSET(input!A1,0,0,COUNTA(input!A:A),1).
This formula will just show a copy of the data. Since there currently isn't any data it gives a #REF! error, so it only takes up one cell.
Now I'll run the following python code to add the numbers 0-9 into the first column of the input sheet:
from openpyxl import load_workbook
wb = load_workbook('workbook.xlsx')
ws = wb['input']
for i in range(10):
ws.append([i])
wb.save('workbook_2.xlsx')
When opening the new file, cell A1 on the 'main' sheet only has the first value, 0, instead of the range 0--9. When selecting the cell, you can see the formula is now {=OFFSET(input!A1,0,0,COUNTA(input!A:A),1)}. The curly brackets make it an array, so it wont spill. By hitting enter in the formula the array is removed and the sheet properly becomes the full range.
If I can get this simple example to work, then expanding it to the data I'm using shouldn't be a problem.
I have an excel file with a series of formatted charts on a tab called Charts. I have named the charts, Figure1, Figure2, Figure3, etc.
I have an existing PowerPoint template. The template has 2 placeholders per slide (so that it can accommodate 2 charts per slide).
I would like to paste Figure1 in the left placeholder of slide 3, and Figure2 in the right placeholder of slide 3. I want to do this in python as the data analysis is done in python and excel is used to share stored results with colleagues.
Attempt 1:
Attempt 1 uses win32com.client. I am following this example: How to copy chart from excel and paste it as chart into powerpoint (not image) using python
but I cannot get the syntax right to insert the chart into the placeholder. When I follow the syntax in the solution, nothing happens and I get a message
<bound method Paste of <COMObject >>
Current code:
xlApp = win32.Dispatch('Excel.Application')
wb = xlApp.Workbooks.Open(outputPath+'Chart Pack.xlsb')
pptApp = win32.Dispatch('PowerPoint.Application')
ppt = pptApp.Presentations.Open(template)
# attempt 1
wb.sheets('Charts').ChartObjects('Figure1').Copy
ppt.slides[2].Shapes.Paste
# attempt 2
wb.sheets('Charts').ChartObjects('Figure1').Copy
ppt.slides[2].placeholders[1].Paste
Attempt 2:
Attempt 2 uses python-pptx. I looked at the documentation here:
https://python-pptx.readthedocs.io/en/latest/user/placeholders-using.html
but the example involves creating an excel chart from scratch in PowerPoint (I am not sure why you would ever do that), and I can't figure out the syntax to insert an existing chart from excel.
Current code:
from pptx import Presentation
xlApp = win32.Dispatch('Excel.Application')
wb = xlApp.Workbooks.Open(outputPath+'Chart Pack.xlsb')
prs = Presentation(template)
slide = prs.slides[3]
for shape in slide.placeholders:
print('%d %s' % (shape.placeholder_format.idx, shape.name))
placeholder = prs.slides[3].placeholders[1]
placeholder.name
placeholder.placeholder_format.type
placeholder.insert_chart(wb.sheets('Charts').ChartObjects('Figure1').Copy)
Requirements:
I would like to paste the excel chart as an excel object (rather than as a picture) as my colleague likes to be able to click on data series to get the underlying values etc.
I would like to paste the chart inside a placeholder (rather than on top of it).
I don't want to generate a new chart in PowerPoint as most examples do. A colleague has prepared an excel dashboard and formatted a number of charts as desired, and doing it all from scratch would be laborious.
I don't want to do this in VBA. I would like to do it in python as it is part of a broader program.
I don't want to use something like plotnine, seaborn, matplotlib etc. As per 1, my colleague likes excel objects that he can click on to show underlying values. This can't be done in these programs.
Python should be able to do this. Any ideas?
You're very close! Copy and Paste are methods, so to call them you need to add brackets after them, e.g. Copy().
To get slide 2, you need to use the Item method of the Slides class: ppt.Slides.Item(2)
import win32com.client as win32
xlApp = win32.Dispatch('Excel.Application')
wb = xlApp.Workbooks.Open(outputPath+'Chart Pack.xlsb')
pptApp = win32.Dispatch('PowerPoint.Application')
ppt = pptApp.Presentations.Open(template)
slide_num = 3
LEFT_PLACEHOLDER = 3
RIGHT_PLACEHOLDER = 2
# Figure1
window.View.GotoSlide(slide_num)
wb.sheets('Charts').ChartObjects('Figure1').Copy()
ppt.Slides.Item(slide_num).Shapes.Paste().Select()
window.Selection.Cut()
ppt.Slides.Item(slide_num).Shapes(LEFT_PLACEHOLDER).Select()
window.View.Paste()
# Figure2
window.View.GotoSlide(slide_num)
wb.sheets('Charts').ChartObjects('Figure2').Copy()
ppt.Slides.Item(slide_num).Shapes.Paste().Select()
window.Selection.Cut()
ppt.Slides.Item(slide_num).Shapes(RIGHT_PLACEHOLDER).Select()
window.View.Paste()
EDIT: I was able to paste the chart directly into the placeholder after fiddling around a bit trying to implement the solution in this answer. It a bit hacky, but it works.
This question already has answers here:
How to save a pandas DataFrame table as a png
(13 answers)
Closed 5 years ago.
working on pandas describe function. simple enough code:
df['Revenue'].describe()
output is :
perfect. my issue is i want to be able to save this data as either a png or a table so that i can place in a single page. this is for my EDA (exploratory data analysis) i have 6 main charts or information that i want to evaluate on each feature. each chart will be a seperate png file. i will then combine into one pdf file. i iterate over 300 + features so doing one at a time is not an option especially seeing as it is done monthly.
if you know of a way to save this table as a png or other similar file format that would be great. thanks for the look
Saving as a csv or xlsx file
You may use to_csv("filename.csv") or to_excel("filename.xlsx") methods to save the file in a comma separated format and then manipulate/format it in Excel however you want.
Example:
df['Revenue'].describe().to_csv("my_description.csv")
Saving as a png file
As mentioned in the comments, this post explains how to save pandas dataframe to png file via matplot lib. In your case, this should work:
import matplotlib.pyplot as plt
from pandas.plotting import table
desc = df['Revenue'].describe()
#create a subplot without frame
plot = plt.subplot(111, frame_on=False)
#remove axis
plot.xaxis.set_visible(False)
plot.yaxis.set_visible(False)
#create the table plot and position it in the upper left corner
table(plot, desc,loc='upper right')
#save the plot as a png file
plt.savefig('desc_plot.png')
I am using xlsxwriter to add charts to different worksheets in ipython and everything works, except my graphs are never showing up in the worksheets. There are no error messages.
When I tested the code from the documentation I also get a empty excel workbook.
I've tried it with xlsxwriter.Workbook and pd.ExcelWriter('test.xlsx', engine='xlsxwriter') but with both the workbook generates but no graphs are added.
How can I make the graphs show up?
Code from the documentation:
http://xlsxwriter.readthedocs.org/en/latest/working_with_charts.html
import xlsxwriter
workbook = xlsxwriter.Workbook('chart_line.xlsx')
worksheet = workbook.add_worksheet()
# Add the worksheet data to be plotted.
data = [10, 40, 50, 20, 10, 50]
worksheet.write_column('A1', data)
# Create a new chart object.
chart = workbook.add_chart({'type': 'line'})
# Add a series to the chart.
chart.add_series({'values': '=Sheet1!$A$1:$A$6'})
# Insert the chart into the worksheet.
worksheet.insert_chart('C1', chart)
workbook.close()
The results for
print(xlsxwriter.version)
0.5.7
print(zipfile.ZipFile("chart_line.xlsx").namelist())
['xl/worksheets/sheet1.xml', 'xl/workbook.xml', 'xl/charts/chart1.xml', 'xl/drawings/drawing1.xml', 'docProps/app.xml', 'docProps/core.xml', '[Content_Types].xml', 'xl/styles.xml', 'xl/theme/theme1.xml', '_rels/.rels', 'xl/_rels/workbook.xml.rels', 'xl/worksheets/_rels/sheet1.xml.rels', 'xl/drawings/_rels/drawing1.xml.rels']
There haven't been any reported issues of charts not displaying in Excel in any version of XlsxWriter that supported charts.
There are also almost 300 chart comparison tests in the XlsxWriter codebase that test the charts that it produces byte for byte against files produces by Excel. These are all passing.
Also, the output from zipfile in your post clearly shows the chart elements are there. If they were present but incorrect Excel would complain when it loaded the file.
And the code that you link to has a screenshot of the output that clearly shows a chart.
I also ran the code and see the chart in 3 versions of Excel and 1 version of LibreOffice.
So you need to go back and verify your results. If you think there is an issue then create a small working program that demonstrates it and submit a bug report.
I have been trying to generate data in Excel.
I generated .CSV file.
So up to that point it's easy.
But generating graph is quite hard in Excel...
I am wondering, is python able to generate data AND graph in excel?
If there are examples or code snippets, feel free to post it :)
Or a workaround can be use python to generate graph in graphical format like .jpg, etc or .pdf file is also ok..as long as workaround doesn't need dependency such as the need to install boost library.
Yes, Xlsxwriter[docs][pypi] has a lot of utility for creating excel charts in Python. However, you will need to use the xlsx file format, there is not much feedback for incorrect parameters, and you cannot read your output.
import xlsxwriter
import random
# Example data
# Try to do as much processing outside of initializing the workbook
# Everything beetween Workbook() and close() gets trapped in an exception
random_data = [random.random() for _ in range(10)]
# Data location inside excel
data_start_loc = [0, 0] # xlsxwriter rquires list, no tuple
data_end_loc = [data_start_loc[0] + len(random_data), 0]
workbook = xlsxwriter.Workbook('file.xlsx')
# Charts are independent of worksheets
chart = workbook.add_chart({'type': 'line'})
chart.set_y_axis({'name': 'Random jiggly bit values'})
chart.set_x_axis({'name': 'Sequential order'})
chart.set_title({'name': 'Insecure randomly jiggly bits'})
worksheet = workbook.add_worksheet()
# A chart requires data to reference data inside excel
worksheet.write_column(*data_start_loc, data=random_data)
# The chart needs to explicitly reference data
chart.add_series({
'values': [worksheet.name] + data_start_loc + data_end_loc,
'name': "Random data",
})
worksheet.insert_chart('B1', chart)
workbook.close() # Write to file
You have 2 options:
If you are on windows, you can use pywin32 (included in ActivePython) library to automate Excel using OLE automation.
from win32com.client import Dispatch
ex = Dispatch("Excel.Application")
# you can use the ex object to invoke Excel methods etc.
If all you want to just generate basic plots etc. you can use matplotlib.
I suggest you to try gnuplot while drawing graph from data files.
If you do decide to use matplotlib, check out my excel to python class PyWorkbooks to get the data. It lets you retrieve data efficiently and easily as numpy arrays (the native datatype for matplotlib).
https://sourceforge.net/projects/pyworkbooks/
#David Gao, I am looking at doing something similar. Currently I am looking at using the raw csv or converting it to json and just dropping it in a folder that is being read by jqplot.jquery plotting and graphing library. Then all I need to do is have the user or myself display the plot in any web browser.