Creating publication quality tables in python - python

I'd like to create publication quality tables for output as svg or jpg or png images using python.
I'm familiar with the texttable module which produces nice text tables but if I have for example
data = [['Head 1','Head 2','Head 3'],['Sample Set Type 1',12.8,True],['Sample Set Type 2',15.7,False]]
and I wanted to produce something that looked like
Is there a module I can turn to, or can you point me to a process for going about it?

There are large amounts of possibilities for you.
You can convert a Pandas dataframe to Latex as per https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_latex.html
You can also use Tabular to output latex source as per http://en.wikibooks.org/wiki/LaTeX/Tables
You can use ReportLab, as per Python reportlab inserting image into table
You could also just write an HTML table file and style it with css.
with open("example.html", "w") as of:
of.write("<html><table>")
for index, row in enumerate(data):
if index == 0:
of.write("<th>")
else:
of.write("<tr>")
for cell in row:
of.write("<td>" + cell + "</td>")
if index == 0:
of.write("</th>")
else:
of.write("</tr>")
of.write("</table></html>")
You can do something similar with Latex tables as an output.

Related

How to write Pandas Dataframe as a table to PDF using fpdf python

I want to create a pdf that contains customized tables. For that I have converted a text file into dataframe using pandas as this text file contains some string and numerical data ( abc = 1234 format).
with open(input2_filename,"rt") as ip1:
for line in ip1:
for i in patterns:
match1 = re.match(i,line)
if match1 != None:
er4.append(match1.groups())
df4 = pd.DataFrame(er4, columns =['Attribute', 'Values'])
Issues:
How to convert dataframe into customized table that is presentable?
How to add this table directly in the PDF without saving as an image?
Do anyone have solution for this? If anyone knows how to do it in a better way, that would be appreciated.
I have tried tabulate module for converting dataframe into table, but I was unable to add it directly to the pdf. I also tried to save this table as an image and then adding it to the pdf, but it also doesn't work for me.

Some Hyperlinks not opening with Openpyxl

I have a few hundred files with data and hyperlinks in them that I was trying to upload and append to a single DataFrame when I realized that Pandas was not reading any of the hyperlinks.
I then tried to use Openpyxl to read the hyperlinks in the input Excel files and write a new column into the excels with the text of the hyperlink that hopefully Pandas can read into my dataframe.
However, I am running into issues with my testing the openpyxl code. It is able to read and write some of the hyperlinks but not the others.
My sample file has three rows and looks like this:
My actual data has hyperlinks in the way that I have it for "Google" in my test data set.
The other two hyperlinks in my text data, I inserted by right clicking on the cell and pasting the link.
Sample Test file here: Text.xlsx
Here is the code I wrote to read the hyperlink and paste it in a new column. It works for the first two rows (India and China) but fails for the third row (Google). It's unfortunate because all of my actual data is of that type. Can someone please help me figure it out?
import openpyxl
wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active
column_indices = [1]
max_col = ws.max_column
ws.cell(row=1,column = max_col+1).value = "Hyperlink Text"
for row in range(2,ws.max_row+1):
for col in column_indices:
print(ws.cell(row, column=1).hyperlink.target)
ws.cell(column=max_col+1,row=row).value = ws.cell(row, column=1).hyperlink.target
wb.save('test.xlsx')
The cells where you are using the HYPERLINK function (like google.com) will not be of type hyperlink. You will need to process the cells with HyperLink function using re so similar function.
The values looks like below,
>>> ws.cell(2,1).value
'China'
>>> ws.cell(3,1).value
'India'
>>> ws.cell(4,1).value
'=HYPERLINK("www.google.com","google")'
Suggested code to handle HYPERLINK :
val = ws.cell(row,column).value
if val.find("=HYPERLINK") >= 0 :
hyplink = ws.cell(4,1).value # Or use re module for more robust check
Note : The second for loop to iterate over columns seems not required since you are always using column=1.

When reading excel files with pandas, what determines the datatype of the cells being read?

I am reading an excel sheet and plucking data from rows containing the given PO.
import pandas as pd
xlsx = pd.ExcelFile('Book2.xlsx')
df = pd.read_excel(xlsx)
PO_arr = ['121121','212121']
for i in PO_arr:
PO = i
PO_DATA = df.loc[df['PONUM'] == PO]
for i in range(1, max(PO_DATA['POLINENUM'].values) +1):
When I take this Excel sheet straight from its source, my code works fine. But when I cut out only the rows I want and paste them to a new spreadsheet with the exact same formatting and read this new spreadsheet, I have to change PO_DATA to look for an integer instead of a string as such:
PO_DATA = df.loc[df['PONUM'] == int(PO)]
If not, I get an error, and calling PO_DATA returns an empty dataframe.
C:\...\pandas\core\ops\array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
res_values = method(rvalues)
I checked the cell formatting in Excel and in both cases, they are formatted as 'General' cells.
What is going on that makes it so when I chop up my spreadsheet, I have to look for an integer and not a string? What do I have to do to make it work for sheets I've created and pasted relevant data into instead of only sheets from the source?
Excel can do some funky formatting when copy and paste is used: ctl-c : ctl-v.
I am sure you tried these but...
A) Try copy ctl-c then ctl-alt-v:"v":enter ... on new sheet/file
B) Try using the format painter in Excel : Looks like a paintbrush on the home tab - select the properly formatted cells first - double click format painter - move to your new file/sheet - select cells you want the format to conform to.
C) Select your new file/table you pasted into - select purple eraser icon from the top options in excel - clear all formats
Update: I found an old related thread that didn't necessarily answer the question but solved the problem.
you can force pandas to import values as a certain datatype when reading from excel using the converters argument for read_excel.
df = pd.read_excel(xlsx, converters={'POLINENUM':int,'PONUM':int})

Splitting up a CSV into separate excel files based on a columns values, then change formatting of excel before saving

Hi I am vey new to python but have been tasked with creating a tool that does the following:
1) opens a csv file
2) splits the data frame up by the values of a single column
3) It will then save those groupings to individual excel workbooks and manipulate the formatting (may add a chart to one of the worksheets based on the newly added Data)
I have found this code, which groups and saves to csv. I can change to the excel format, but I’m really struggling I do the formatting and chart bit. Any help would be very appreciated.
gp = df.groupby('CloneID')
for g in gp.groups:
path = 'CloneID' + str(g) + '.txt'
gp.get_group(g).to_csv(path)
One easy way to create nicely formatted excel sheets is to pre-format a template, and use openpyxl to fill in the rows as you need.
At a high level, your project should include a template, which will be an xlsx file (excel). If you named your project my_project, for example, the structure of your project should look like this:
my_project
--__init__.py
--templates
----formated_excel.xlsx
--main.py
where templates is a directory, formatted_excel is an xlsx file, and main.py is your code.
In main.py, the basic logic of your code would work like this:
import os
import openpyxl
TEMPLATE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
'templates', 'formated_excel.xlsx')
wb = openpyxl.load_workbook(TEMPLATE)
# to use wb[VALUE], your template must have a sheet called VALUE
data_sheet = wb['Data']
# have enumerate start at 2, as in most cases row 1 of a sheet
# is the header
for row, value in enumerate(data, start=2):
data_sheet[f'A{row}'] = value
wb.save('my_output.xlsx')
This example is a very, very basic explanation of how to use openpyxl.
Note that I've assumed you are using python3, if not, you'll have to use the appropriate string formatting when setting the data_sheet row that you are writing to. Openpyxl also has Chart Support, which you can read up on to hep you in formatting your chart.
You haven't provided much detail in exactly what you want to do or the data you are using, so you will have to extend this example to fit your dataset.

How to apply conditional formatting in openpyxl?

I am using openpyxl to manipulate a Microsoft Excel Worksheet.
What I want to do is to add a Conditional Formatting Rule that fills the rows with a given colour if the row number is even, leaves the row blank if not.
In Excel this can be done by selecting all the worksheet, creating a new formatting rule with the text =MOD(ROW();2)=0 or =EVEN(ROW()) = ROW().
I tried to implement this behaviour with the following lines of code (considering for example the first 10 rows):
redFill = PatternFill(start_color='EE1111', end_color='EE1111', fill_type='solid')
ws2.conditional_formatting.add('A1:A10', FormulaRule(formula=['MOD(ROW();2) = 0'], stopIfTrue=False, fill=redFill))
My program runs correctly but when I try to open the output Excel file, it tells me that the file contains unreadable content and it asks me if I want to recover the worksheet content. By clicking yes, the worksheet is what I expect but there is no formatting.
What is the correct way to apply such a formatting in openpyxl (possibly to the entire worksheet)?
Unfortunately, the way formulae are handled in conditional formatting is particularly opaque. The best thing to do is to create a file with the relevant conditional format and inspect the relevant file by unzipping it. The rules are stored in the relevant worksheet files and the formats in the styles file.
However, I suspect that the problem may simply because you are using ";" to separate parameters in the function: you must always use commas for this.
A sample formula from one of my projects:
green_text = Font(color="006100")
green_fill = PatternFill(bgColor="C6EFCE")
dxf2 = DifferentialStyle(font=green_text, fill=green_fill)
r3 = Rule(type="expression", dxf=dxf2)
r3.formula = ["AND(ISNUMBER(C2), C2>=400)"]

Categories