use python to generate graph in excel - python

I have been trying to generate data in Excel.
I generated .CSV file.
So up to that point it's easy.
But generating graph is quite hard in Excel...
I am wondering, is python able to generate data AND graph in excel?
If there are examples or code snippets, feel free to post it :)
Or a workaround can be use python to generate graph in graphical format like .jpg, etc or .pdf file is also ok..as long as workaround doesn't need dependency such as the need to install boost library.

Yes, Xlsxwriter[docs][pypi] has a lot of utility for creating excel charts in Python. However, you will need to use the xlsx file format, there is not much feedback for incorrect parameters, and you cannot read your output.
import xlsxwriter
import random
# Example data
# Try to do as much processing outside of initializing the workbook
# Everything beetween Workbook() and close() gets trapped in an exception
random_data = [random.random() for _ in range(10)]
# Data location inside excel
data_start_loc = [0, 0] # xlsxwriter rquires list, no tuple
data_end_loc = [data_start_loc[0] + len(random_data), 0]
workbook = xlsxwriter.Workbook('file.xlsx')
# Charts are independent of worksheets
chart = workbook.add_chart({'type': 'line'})
chart.set_y_axis({'name': 'Random jiggly bit values'})
chart.set_x_axis({'name': 'Sequential order'})
chart.set_title({'name': 'Insecure randomly jiggly bits'})
worksheet = workbook.add_worksheet()
# A chart requires data to reference data inside excel
worksheet.write_column(*data_start_loc, data=random_data)
# The chart needs to explicitly reference data
chart.add_series({
'values': [worksheet.name] + data_start_loc + data_end_loc,
'name': "Random data",
})
worksheet.insert_chart('B1', chart)
workbook.close() # Write to file

You have 2 options:
If you are on windows, you can use pywin32 (included in ActivePython) library to automate Excel using OLE automation.
from win32com.client import Dispatch
ex = Dispatch("Excel.Application")
# you can use the ex object to invoke Excel methods etc.
If all you want to just generate basic plots etc. you can use matplotlib.

I suggest you to try gnuplot while drawing graph from data files.

If you do decide to use matplotlib, check out my excel to python class PyWorkbooks to get the data. It lets you retrieve data efficiently and easily as numpy arrays (the native datatype for matplotlib).
https://sourceforge.net/projects/pyworkbooks/

#David Gao, I am looking at doing something similar. Currently I am looking at using the raw csv or converting it to json and just dropping it in a folder that is being read by jqplot.jquery plotting and graphing library. Then all I need to do is have the user or myself display the plot in any web browser.

Related

How to paste an Excel chart into PowerPoint placeholder using Python?

I have an excel file with a series of formatted charts on a tab called Charts. I have named the charts, Figure1, Figure2, Figure3, etc.
I have an existing PowerPoint template. The template has 2 placeholders per slide (so that it can accommodate 2 charts per slide).
I would like to paste Figure1 in the left placeholder of slide 3, and Figure2 in the right placeholder of slide 3. I want to do this in python as the data analysis is done in python and excel is used to share stored results with colleagues.
Attempt 1:
Attempt 1 uses win32com.client. I am following this example: How to copy chart from excel and paste it as chart into powerpoint (not image) using python
but I cannot get the syntax right to insert the chart into the placeholder. When I follow the syntax in the solution, nothing happens and I get a message
<bound method Paste of <COMObject >>
Current code:
xlApp = win32.Dispatch('Excel.Application')
wb = xlApp.Workbooks.Open(outputPath+'Chart Pack.xlsb')
pptApp = win32.Dispatch('PowerPoint.Application')
ppt = pptApp.Presentations.Open(template)
# attempt 1
wb.sheets('Charts').ChartObjects('Figure1').Copy
ppt.slides[2].Shapes.Paste
# attempt 2
wb.sheets('Charts').ChartObjects('Figure1').Copy
ppt.slides[2].placeholders[1].Paste
Attempt 2:
Attempt 2 uses python-pptx. I looked at the documentation here:
https://python-pptx.readthedocs.io/en/latest/user/placeholders-using.html
but the example involves creating an excel chart from scratch in PowerPoint (I am not sure why you would ever do that), and I can't figure out the syntax to insert an existing chart from excel.
Current code:
from pptx import Presentation
xlApp = win32.Dispatch('Excel.Application')
wb = xlApp.Workbooks.Open(outputPath+'Chart Pack.xlsb')
prs = Presentation(template)
slide = prs.slides[3]
for shape in slide.placeholders:
print('%d %s' % (shape.placeholder_format.idx, shape.name))
placeholder = prs.slides[3].placeholders[1]
placeholder.name
placeholder.placeholder_format.type
placeholder.insert_chart(wb.sheets('Charts').ChartObjects('Figure1').Copy)
Requirements:
I would like to paste the excel chart as an excel object (rather than as a picture) as my colleague likes to be able to click on data series to get the underlying values etc.
I would like to paste the chart inside a placeholder (rather than on top of it).
I don't want to generate a new chart in PowerPoint as most examples do. A colleague has prepared an excel dashboard and formatted a number of charts as desired, and doing it all from scratch would be laborious.
I don't want to do this in VBA. I would like to do it in python as it is part of a broader program.
I don't want to use something like plotnine, seaborn, matplotlib etc. As per 1, my colleague likes excel objects that he can click on to show underlying values. This can't be done in these programs.
Python should be able to do this. Any ideas?
You're very close! Copy and Paste are methods, so to call them you need to add brackets after them, e.g. Copy().
To get slide 2, you need to use the Item method of the Slides class: ppt.Slides.Item(2)
import win32com.client as win32
xlApp = win32.Dispatch('Excel.Application')
wb = xlApp.Workbooks.Open(outputPath+'Chart Pack.xlsb')
pptApp = win32.Dispatch('PowerPoint.Application')
ppt = pptApp.Presentations.Open(template)
slide_num = 3
LEFT_PLACEHOLDER = 3
RIGHT_PLACEHOLDER = 2
# Figure1
window.View.GotoSlide(slide_num)
wb.sheets('Charts').ChartObjects('Figure1').Copy()
ppt.Slides.Item(slide_num).Shapes.Paste().Select()
window.Selection.Cut()
ppt.Slides.Item(slide_num).Shapes(LEFT_PLACEHOLDER).Select()
window.View.Paste()
# Figure2
window.View.GotoSlide(slide_num)
wb.sheets('Charts').ChartObjects('Figure2').Copy()
ppt.Slides.Item(slide_num).Shapes.Paste().Select()
window.Selection.Cut()
ppt.Slides.Item(slide_num).Shapes(RIGHT_PLACEHOLDER).Select()
window.View.Paste()
EDIT: I was able to paste the chart directly into the placeholder after fiddling around a bit trying to implement the solution in this answer. It a bit hacky, but it works.

Splitting up a CSV into separate excel files based on a columns values, then change formatting of excel before saving

Hi I am vey new to python but have been tasked with creating a tool that does the following:
1) opens a csv file
2) splits the data frame up by the values of a single column
3) It will then save those groupings to individual excel workbooks and manipulate the formatting (may add a chart to one of the worksheets based on the newly added Data)
I have found this code, which groups and saves to csv. I can change to the excel format, but I’m really struggling I do the formatting and chart bit. Any help would be very appreciated.
gp = df.groupby('CloneID')
for g in gp.groups:
path = 'CloneID' + str(g) + '.txt'
gp.get_group(g).to_csv(path)
One easy way to create nicely formatted excel sheets is to pre-format a template, and use openpyxl to fill in the rows as you need.
At a high level, your project should include a template, which will be an xlsx file (excel). If you named your project my_project, for example, the structure of your project should look like this:
my_project
--__init__.py
--templates
----formated_excel.xlsx
--main.py
where templates is a directory, formatted_excel is an xlsx file, and main.py is your code.
In main.py, the basic logic of your code would work like this:
import os
import openpyxl
TEMPLATE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
'templates', 'formated_excel.xlsx')
wb = openpyxl.load_workbook(TEMPLATE)
# to use wb[VALUE], your template must have a sheet called VALUE
data_sheet = wb['Data']
# have enumerate start at 2, as in most cases row 1 of a sheet
# is the header
for row, value in enumerate(data, start=2):
data_sheet[f'A{row}'] = value
wb.save('my_output.xlsx')
This example is a very, very basic explanation of how to use openpyxl.
Note that I've assumed you are using python3, if not, you'll have to use the appropriate string formatting when setting the data_sheet row that you are writing to. Openpyxl also has Chart Support, which you can read up on to hep you in formatting your chart.
You haven't provided much detail in exactly what you want to do or the data you are using, so you will have to extend this example to fit your dataset.

From Python web app: insert data into spreadsheet (e.g. LibreOffice / Excel), calculate and save as pdf

I am facing the problem, that I would like to push data (one large dataframe and one image) from my python web app (running on Tornado Webserver and Ubuntu) into a spreadsheet, calculate, save as pdf and the deliver to the frontend.
I took a look at several libs like openpyxl for writing Sheets in MS Excel, but that would solve just one part. I was thinking about using LibreOffice and pyoo, but it seems that I need the same python version on my backend as shipped with LibeOffice when importing pyuno.
Does somebody has solved a similar issue and have a recommendation how to solve this?
Thanks
I came up to a let's say not pretty, but rare solution that works very flexible for me.
use openpyxl to open an existing Excel workbook that includes layout (Template)
insert the dataframe into a separate sheet in that workbook
use openpyxl to save as temporary_file.xlsx
call LibeOffice with --headless --convert-to pdf temporary_file.xlsx
While executing the last call, all integrated formulas are recalculated/updated and the pdf is created (you have to configure calc so that auto calc is enabled when files are opened)
deliver pdf to frontend or process as you like
delete temporary_file.xlsx
import openpyxl
import pandas as pd
from subprocess import call
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
now = datetime.datetime.now().strftime("%Y%m%d_%H%M_%f")
wb_template_name = 'Template.xlsx'
wb_temp_name = now + wb_template_name
wb = openpyxl.load_workbook(wb_template_name)
ws = wb['dataframe_sheet']
pdf_convert_cmd = 'soffice --headless --convert-to pdf ' + wb_temp_name
for r in dataframe_to_rows(df, index=True, header=True):
ws.append(r)
wb.save(wb_temp_name)
call(pdf_convert_cmd, shell=True)
The reason why I'm doing this, is that I would like to be able to style the layout of the pdf independently from the data. I use named ranges or lookups that are referenced to the separate dataframe-sheet in excel.
I didn't try the image insertion yet, but this should work similar. I think there could be a way to increase the performance while simply dump the dataframe into the xlsx file (which is a zipped file of xmls), so that you don't need openpyxl.

python pandas formula to dataframe

I am creating a dataframe with a bunch of calculations and adding new columns using these formulas (calculations). Then I am saving the dataframe to an Excel file.
I lose the formula after I save the file and open the file again.
For example, I am using something like:
total = 16
for s in range(total):
df_summary['Slopes(avg)' + str(s)]= df_summary[['Slope_S' + str(s)]].mean(axis=1)*df_summary['Correction1']/df_summary['Correction2'].mean(axis=1)
How can I make sure this formula appears in my excel file I write to, similar to how we have a formula in an excel worksheet?
You can write formulas to an excel file using the XlsxWriter module. Use .write_formula() https://xlsxwriter.readthedocs.org/worksheet.html#worksheet-write-formula. If you're not attached to using an excel file to store your dataframe you might want to look into using the pickle module.
import pickle
# to save
pickle.dump(df,open('saved_df.p','wb'))
# to load
df = pickle.load(open('saved_df.p','rb'))
I think my answer here may be responsive. The short of it is you need to use openpyxl (or possibly xlrd if they've added support for it) to extract the formula, and then xlsxwriter to write the formula back in. It can definitely be done.
This assumes, of course, as #jay s pointed out, that you first write Excel formulas into the DataFrame. (This solution is an alternative to pickling.)

Efficient way of exporting large R dataset to excel

As title, I have a dataset with about 13000 rows and 255 columns (actually I have more than 255 columns but RODBC package seems to limit the number of columns exported to 255, so I trimmed it a bit) that need to be exported to xls/xlsx file.
I tried RODBC and xlsx package, both takes more than 5 minutes for export. I wonder if there is any other more efficient way of doing this?
I knew a little bit of python (using python to connect to outlook for listing emails in mailbox), if there is way for export using python instead, it is welcomed also.
update 01
Quite a few suggested using csv, it may not very possible in my case because there is a field containing free text that I cannot control what kind of character is entered in that field, making selection of separator difficult.
update 02
thank you for the suggestions, but I found that the R packages are fine only if the dataframe is relatively small and it is even slow for dataframe with all columns being character. Any suggestions?
There are lots of options:
Use xlsx with mutliple sheets (you've tried this and it's too slow, I know)
Use write.csv should be faster and it's readable by Excel
Use odbcConnectExcel2007 within RODBC
Use the package bigmemory to help you manage the large dataframe, especially if you can make it into a sparse matrix
XLConnect which worked for this guy with the same problem
Write it to a SQL datatabase with RODBC or RPostgreSQL, etc and then make a connection to the DB within Excel. I do this a lot. Here's a related resource.
Use Pandas
Create a tab-delimited text file and then import it to Excel: write.table (table,sep="\t",quote=FALSE,row.names=FALSE,file=file.name)
Use fread
Try a cloud-based solution (I'm not sure if this will actually be faster, but it would at least be a trendy solution with extra benefits such as providing a nice way to store your data safely and let you query whatever you need from it using Excel on any computer)
RExcel
XLLoop
Finally, here's a nice little article on "A Million Ways to Connect R and Excel" which you may find useful, though I think I've actually given you more options than the article does.
I would start with the most simple solutions, like fread, then work your way to the relatively more complicated solutions if you're still not getting the results you want.
Depending on the exact nature of your project, you might even benefit from parallelism or multicore processing. Those don't boost your I/O speed in most cases, but it could speed up any processing/transformation of your data which takes place in your process, thus making your overall data pipeline faster.
Python is also very well-equipped to handle this problem, but there are so many solutions within R, hopefully you won't need to resort to switching languages just to write out data. Still, you could try
XlsxWriter in Constant Memory mode, or
Optimized Reader and Writer of the openpyxl package
if you want to try a Python-based solution.
try to use openxlsx package its quite fast.
https://cran.r-project.org/web/packages/openxlsx/openxlsx.pdf
Install package openxlsx
load the library openxlsx
use write.xlsx() or writeData() command to write into xlsx file
A small example of basic operations using openxlsx library
taken from openxlsx documentation
`## setup a workbook with 3 worksheets
wb <- createWorkbook()
addWorksheet(wb = wb, sheetName = "Sheet 1", gridLines = FALSE)
writeDataTable(wb = wb, sheet = 1, x = iris)
addWorksheet(wb = wb, sheetName = "mtcars (Sheet 2)", gridLines = FALSE)
writeData(wb = wb, sheet = 2, x = mtcars)
addWorksheet(wb = wb, sheetName = "Sheet 3", gridLines = FALSE)
writeData(wb = wb, sheet = 3, x = Formaldehyde)
worksheetOrder(wb)
names(wb)
worksheetOrder(wb) <- c(1,3,2) # switch position of sheets 2 & 3
writeData(wb, 2, 'This is still the "mtcars" worksheet', startCol = 15)
worksheetOrder(wb)
names(wb) ## ordering within workbook is not changed
saveWorkbook(wb, "worksheetOrderExample.xlsx", overwrite = TRUE)
worksheetOrder(wb) <- c(3,2,1)
saveWorkbook(wb, "worksheetOrderExample2.xlsx", overwrite = TRUE)`
Gani

Categories