I have a decision tree output in a 'text' format which is very hard to read and interpret. There are ton of pipes and indentation to follow the tree/nodes/leaf. I was wondering if there are tools out there where I can feed in a decision tree like below and get a tree diagram like Weka, Python, ...etc does?
Since my decision tree is very large, below is the sample/partial decision to give an idea of my text decision tree. Thanks a bunch!
"bio" <= 0.5:
| "ml" <= 0.5:
| | "algorithm" <= 0.5:
| | | "bioscience" <= 0.5:
| | | | "microbial" <= 0.5:
| | | | | "assembly" <= 0.5:
| | | | | | "nano-tech" <= 0.5:
| | | | | | | "smith" <= 0.5:
| | | | | | | | "neurons" <= 0.5:
| | | | | | | | | "process" <= 1.5:
| | | | | | | | | | "program" <= 1.5:
| | | | | | | | | | | "mammal" <= 1.0:
| | | | | | | | | | | | "lab" <= 0.5:
| | | | | | | | | | | | | "human-machine" <= 1.5:
| | | | | | | | | | | | | | "tech" <= 0.5:
| | | | | | | | | | | | | | | "smith" <= 0.5:
I'm not aware of any tool to interpret that format so I think you're going to have to write something, either to interpret the text format or to retrieve the tree structure using the DecisionTree class in MALLET's Java API.
Interpreting the text in Python shouldn't be too hard: for example, if
line = '| | | | | "assembly" <= 0.5:'
then you can get the indent level, the predictor name and the split point with
parts = line.split('"')
indent = parts[0].count('| ')
predictor = parts[1]
splitpoint = float(parts[2][-1-parts[2].rfind(' '):-1])
To create graphical output, I would use GraphViz. There are Python APIs for it, but it's simple enough to build a file in its text-based dot format and create a graphic from it with the dot command. For example, the file for a simple tree might look like
digraph MyTree {
Node_1 [label="Predictor1"]
Node_1 -> Node_2 [label="< 0.335"]
Node_1 -> Node_3 [label=">= 0.335"]
Node_2 [label="Predictor2"]
Node_2 -> Node_4 [label="< 1.42"]
Node_2 -> Node_5 [label=">= 1.42"]
Node_3 [label="Class1
(p=0.897, n=26)", shape=box,style=filled,color=lightgray]
Node_4 [label="Class2
(p=0.993, n=17)", shape=box,style=filled,color=lightgray]
Node_5 [label="Class3
(p=0.762, n=33)", shape=box,style=filled,color=lightgray]
}
and the resulting output from dot
Related
I'm trying to scrape the data from the table in the specifications section of this webpage:
Lochinvar Water Heaters
I'm using beautiful soup 4. I've tried searching for it by class - for example - (class="Table__Cell-sc-1e0v68l-0 kdksLO") but bs4 can't find the class on the webpage. I listed all the available classes that it could find and it doesn't find anything useful. Any help is appreciated.
Here's the code I tried to get the classes
import requests
from bs4 import BeautifulSoup
URL = "https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("div", class_='Table__Wrapper-sc-1e0v68l-3 iFOFNW')
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
classes = sorted(classes)
for cass in classes:
print(cass)
The page is populated with javascript, but fortunately in this case, much of the data [including the specs table you want] seems to be inside a script tag within the fetched html. The script just has one statement, so it's fairly easy to extract it as json
import json
### copied from your q ####
import requests
from bs4 import BeautifulSoup
URL = "https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
###########################
wrInf = soup.find(lambda l: l.name == 'script' and '__routeInfo' in l.text)
wrInf = wrInf.text.replace('window.__routeInfo = ', '', 1) # remove variable name
wrInf = wrInf.strip()[:-1] # get rid of ; at end
wrInf = json.loads(wrInf) # convert to python dictionary
specsTables = wrInf['data']['product']['specifications'][0]['table'] # get table (tsv string)
specsTables = [tuple(row.split('\t')) for row in specsTables.split('\n')] # convert rows to tuples
To view it, you could use pandas,
import pandas
headers = specsTables[0]
st_df = pandas.DataFrame([dict(zip(headers, r)) for r in specsTables[1:]])
# or just
# st_df = pandas.DataFrame(specsTables[1:], columns=headers)
print(st_df.head())
or you could simply print it
for i, r in enumerate(specsTables):
print(" | ".join([f'{c:^18}' for c in r]))
if i == 0: print()
output:
Model Number | Btu/Hr Input | Thermal Efficiency | GPH # 100ºF Rise | A | B | C | D | E | F | G | H | I | J | K | L | M | Gas Conn. | Water Conn. | Air Inlet | Vent Size | Ship. Wt.
AWH0400NPM | 399,000 | 99% | 479 | 45" | 24" | 30-1/2" | 42-1/2" | 29-3/4" | 20-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1" | 2" | 4" | 4" | 326
AWH0500NPM | 500,000 | 99% | 600 | 45" | 24" | 30-1/2" | 42-1/2" | 29-3/4" | 20-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1" | 2" | 4" | 4" | 333
AWH0650NPM | 650,000 | 98% | 772 | 45" | 24" | 41" | 53" | 30-1/2" | 15-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2" | 4" | 6" | 424
AWH0800NPM | 800,000 | 98% | 950 | 45" | 24" | 41" | 53" | 30-1/2" | 15-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2" | 4" | 6" | 434
AWH1000NPM | 999,000 | 98% | 1,187 | 45" | 24" | 48" | 62" | 30-1/2" | 15-3/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2-1/2" | 6" | 6" | 494
AWH1250NPM | 1,250,000 | 98% | 1,485 | 51-1/2" | 34" | 49" | 59" | 5-1/2" | 5-1/2" | 13-1/2" | 6-3/4" | 46-3/4" | 5-3/4" | 19-3/4" | 23" | 22-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,568
AWH1500NPM | 1,500,000 | 98% | 1,782 | 51-1/2" | 34" | 52-3/4" | 62-3/4" | 4-1/2" | 4-1/2" | 13-1/2" | 6-3/4" | 46-3/4" | 5-3/4" | 19-3/4" | 23" | 22-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,649
AWH2000NPM | 1,999,000 | 98% | 2,375 | 51-1/2" | 34" | 65-1/2" | 75-1/2" | 7" | 5-3/4" | 14-3/4" | 7-1/4" | 46-3/4" | 6-3/4" | 18-3/4" | 23" | 23-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,911
AWH3000NPM | 3,000,000 | 98% | 3,564 | 67-1/4" | 48-1/4" | 79-3/4" | 93-3/4" | 4-3/4" | 6-3/4" | 17-3/4" | 8-3/4" | 60-1/4" | 8-1/2" | 25-1/2" | 29-1/2" | 40" | 2" | 4" | 10" | 10" | 3,147
AWH4000NPM | 4,000,000 | 98% | 4,752 | 67-1/4" | 48-1/4" | 96" | 110" | 5" | 7-1/2" | 17-3/4" | 8-3/4" | 60-1/4" | 8-1/2" | 25-1/2" | 29-1/2" | 40" | 2-1/2" | 4" | 12" | 12" | 3,694
If you wanted a specific models specs:
modelNo = 'AWH1000NPM'
mSpecs = [r for r in specsTables if r[0] == modelNo]
mSpecs = [[]] if mSpecs == [] else mSpecs # in case there is no match
mSpecs = dict(zip(specsTables[0], mSpecs[0])) # convert to dictionary
print(mSpecs)
output:
{'Model Number': 'AWH1000NPM', 'Btu/Hr Input': '999,000', 'Thermal Efficiency': '98%', 'GPH # 100ºF Rise': '1,187', 'A': '45"', 'B': '24"', 'C': '48"', 'D': '62"', 'E': '30-1/2"', 'F': '15-3/4"', 'G': '12"', 'H': '20"', 'I': '38"', 'J': '3-1/2"', 'K': '10-1/2"', 'L': '19-1/4"', 'M': '20"', 'Gas Conn.': '1-1/4"', 'Water Conn.': '2-1/2"', 'Air Inlet': '6"', 'Vent Size': '6"', 'Ship. Wt.': '494'}
The contents for constructing the table are within a script tag. You can extract the relevant string and re-create the table through string manipulation.
import requests, re
import pandas as pd
r = requests.get('https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater/').text
s = re.sub(r'\\"', '"', re.search(r'table":"([\s\S]+?)(?:","tableFootNote)', r).groups(1)[0])
lines = [i.split('\\t') for i in s.split('\\n')]
df = pd.DataFrame(lines[1:], columns = lines[:1])
df.head(5)
I have 2 views as below:
experiments:
select * from experiments;
+--------+--------------------+-----------------+
| exp_id | exp_properties | value |
+--------+--------------------+-----------------+
| 1 | indicator:chemical | phenolphthalein |
| 1 | base | NaOH |
| 1 | acid | HCl |
| 1 | exp_type | titration |
| 1 | indicator:color | faint_pink |
+--------+--------------------+-----------------+
calculations:
select * from calculations;
+--------+------------------------+--------------+
| exp_id | exp_report | value |
+--------+------------------------+--------------+
| 1 | molarity:base | 0.500000000 |
| 1 | volume:acid:in_ML | 23.120000000 |
| 1 | volume:base:in_ML | 5.430000000 |
| 1 | moles:H | 0.012500000 |
| 1 | moles:OH | 0.012500000 |
| 1 | molarity:acid | 0.250000000 |
+--------+------------------------+--------------+
I managed to pivot each of these views individually as below:
experiments_pivot:
+-------+--------------------+------+------+-----------+----------------+
|exp_id | indicator:chemical | base | acid | exp_type | indicator:color|
+-------+--------------------+------+------+-----------+----------------+
| 1 | phenolphthalein | NaOH | HCl | titration | faint_pink |
+------+---------------------+------+------+-----------+----------------+
calculations_pivot:
+-------+---------------+---------------+--------------+-------------+------------------+-------------------+
|exp_id | molarity:base | molarity:acid | moles:H | moles:OH | volume:acid:in_ML| volume:base:in_ML |
+-------+---------------+---------------+--------------+-------------+------------------+-------------------+
| 1 | 0.500000000 | 0.250000000 | 0.012500000 | 0.012500000 | 23.120000000 | 5.430000000 |
+------+---------------------+------+------+-----------+----------------------------------------------------+
My question is how to get these two pivot results as a single row? Desired result is as below:
+-------+--------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
|exp_id | indicator:chemical | base | acid | exp_type | indicator:color|molarity:base | molarity:acid | moles:H | moles:OH | volume:acid:in_ML| volume:base:in_ML |
+-------+--------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
| 1 | phenolphthalein | NaOH | HCl | titration | faint_pink | 0.500000000 | 0.250000000 | 0.012500000 | 0.012500000 | 23.120000000 | 5.430000000 |
+------+---------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
Database Used: Mysql
Important Note: Each of these views can have increasing number of rows. Hence I considered "dynamic pivoting" for each of the view individually.
For reference -- Below is a prepared statement I used to pivot experiments in MySQL(and a similar statement to pivot the other view as well):
set #sql = Null;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(IF(exp_properties = ''',
exp_properties,
''', value, NULL)) AS ',
concat("`",exp_properties, "`")
)
)into #sql
from experiments;
set #sql = concat(
'select exp_id, ',
#sql,
' from experiment group by exp_id'
);
prepare stmt from #sql;
execute stmt;
I came across the concept of flags in python on some occasions, for example in wxPython. An example is the initialization of a frame object.
The attributes that are passed to "style".
frame = wx.Frame(None, style=wx.MAXIMIZE_BOX | wx.RESIZE_BORDER | wx.SYSTEM_MENU | wx.CAPTION | wx.CLOSE_BOX)
I don't really understand the concept of flags. I haven't even found a solid explanation what exactly the term "flag" means in Python. How are all these attributes passed to one variable?
The only thing i can think of is that the "|" character is used as a boolean operator, but in that case wouldn't all the attributes passed to style just evaluate to a single boolean expression?
What is usually meant with flags in this sense are bits in a single integer value. | is the ususal bit-or operator.
Let's say wx.MAXIMIZE_BOX=8 and wx.RESIZE_BORDER=4, if you or them together you get 12. In this case you can actually use + operator instead of |.
Try printing the constants print(wx.MAXIMIZE_BOX) etc. and you may get a better understanding.
Flags are not unique to Python; the are a concept used in many languages. They build on the concepts of bits and bytes, where computer memory stores information using, essentially, a huge number of flags. Those flags are bits, they either are off (value 0) or on (value 1), even though you usually access the computer memory in groups of at least 8 of such flags (bytes, and for larger groups, words of a multiple of 8, specific to the computer architecture).
Integer numbers are an easy and common representation of the information stored in bytes; a single byte can store any integer number between 0 and 255, and with more bytes you can represent bigger integers. But those integers still consist of bits that are either on or off, and so you can use those as switches to enable or disable features. You pass in specific integer values with specific bits enabled or disabled to switch features on and off.
So a byte consists of 8 flags (bits), and enabling one of these means you have 8 different integers; 1, 2, 4, 8, 16, 32, 64 and 128, and you can pass a combination of those numbers to a library like wxPython to set different options. For multi-byte integers, the numbers just go up by doubling.
But you a) don't want to remember what each number means, and b) need a method of combining them into a single integer number to pass on.
The | operator does the latter, and the wx.MAXIMIZE_BOX, wx.RESIZE_BORDER, etc names are just symbolic constants for the integer values, set by the wxWidget project in various C header files, and summarised in wx/toplevel.h and wx/defs.h:
/*
Summary of the bits used (some of them are defined in wx/frame.h and
wx/dialog.h and not here):
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|15|14|13|12|11|10| 9| 8| 7| 6| 5| 4| 3| 2| 1| 0|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | \_ wxCENTRE
| | | | | | | | | | | | | | \____ wxFRAME_NO_TASKBAR
| | | | | | | | | | | | | \_______ wxFRAME_TOOL_WINDOW
| | | | | | | | | | | | \__________ wxFRAME_FLOAT_ON_PARENT
| | | | | | | | | | | \_____________ wxFRAME_SHAPED
| | | | | | | | | | \________________ wxDIALOG_NO_PARENT
| | | | | | | | | \___________________ wxRESIZE_BORDER
| | | | | | | | \______________________ wxTINY_CAPTION_VERT
| | | | | | | \_________________________
| | | | | | \____________________________ wxMAXIMIZE_BOX
| | | | | \_______________________________ wxMINIMIZE_BOX
| | | | \__________________________________ wxSYSTEM_MENU
| | | \_____________________________________ wxCLOSE_BOX
| | \________________________________________ wxMAXIMIZE
| \___________________________________________ wxMINIMIZE
\______________________________________________ wxSTAY_ON_TOP
...
*/
and
/*
Summary of the bits used by various styles.
High word, containing styles which can be used with many windows:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|31|30|29|28|27|26|25|24|23|22|21|20|19|18|17|16|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | \_ wxFULL_REPAINT_ON_RESIZE
| | | | | | | | | | | | | | \____ wxPOPUP_WINDOW
| | | | | | | | | | | | | \_______ wxWANTS_CHARS
| | | | | | | | | | | | \__________ wxTAB_TRAVERSAL
| | | | | | | | | | | \_____________ wxTRANSPARENT_WINDOW
| | | | | | | | | | \________________ wxBORDER_NONE
| | | | | | | | | \___________________ wxCLIP_CHILDREN
| | | | | | | | \______________________ wxALWAYS_SHOW_SB
| | | | | | | \_________________________ wxBORDER_STATIC
| | | | | | \____________________________ wxBORDER_SIMPLE
| | | | | \_______________________________ wxBORDER_RAISED
| | | | \__________________________________ wxBORDER_SUNKEN
| | | \_____________________________________ wxBORDER_{DOUBLE,THEME}
| | \________________________________________ wxCAPTION/wxCLIP_SIBLINGS
| \___________________________________________ wxHSCROLL
\______________________________________________ wxVSCROLL
...
*/
The | operator is the bitwise OR operator; it combines the bits of two integers, each matching bit is paired up and turned into an output bit according to the boolean rules for OR. When you do this for those integer constants, you get a new integer number with multiple flags enabled.
So the expression
wx.MAXIMIZE_BOX | wx.RESIZE_BORDER | wx.SYSTEM_MENU | wx.CAPTION | wx.CLOSE_BOX
gives you an integer number with the bits numbers 9, 6, 11, 29, and 12 set; here I used '0' and '1' strings to represent the bits and int(..., 2) to interpret a sequence of those strings as a single integer number in binary notation:
>>> fourbytes = ['0'] * 32
>>> fourbytes[9] = '1'
>>> fourbytes[6] = '1'
>>> fourbytes[11] = '1'
>>> fourbytes[29] = '1'
>>> fourbytes[12] = '1'
>>> ''.join(fourbytes)
'00000010010110000000000000000100'
>>> int(''.join(fourbytes), 2)
39321604
On the receiving end, you can use the & bitwise AND operator to test if a specific flag is set; that return 0 if the flag is not set, or the same integer as assigned to the flag constant if the flag bit had been set. In both C and in Python, a non-zero number is true in a boolean test, so testing for a specific flag is usually done with:
if ( style & wxMAXIMIZE_BOX ) {
for determining that a specific flag is set, or
if ( !(style & wxBORDER_NONE) )
to test for the opposite.
It is a boolean operator - not logical one, but bitwise one. wx.MAXIMIZE_BOX and the rest are typically integers that are powers of two - 1, 2, 4, 8, 16... which makes it so that only one bit in them is 1, all the rest of them are 0. When you apply bitwise OR (x | y) to such integers, the end effect is they combine together: 2 | 8 (0b00000010 | 0b00001000) becomes 10 (0b00001010). They can be pried apart later using the bitwise AND (x & y) operator, also calling a masking operator: 10 & 8 > 0 will be true because the bit corresponding to 8 is turned on.
Is there a more efficient way to parse dependency version requirement specification referenced here and extract the version in python. (https://maven.apache.org/pom.html#Dependency_Version_Requirement_Specification)
This is what I've got so far and it feels not the most efficient way. and I guess it's buggy.
for version in versions:
pattern = re.findall("\(,[.0-9]+|[.0-9]+\)|[.0-9]+|\([.0-9]+", version)
if pattern:
for matches in pattern:
if ([match for match in re.findall("[.0-9]+\)", matches)]):
# this is the less pattern
pattern_version = "<" + str(matches.decode('utf8')[:-1])
elif ([match for match in re.findall("\(,[.0-9]+", matches)]):
pattern_version = ">" + str(matches.decode('utf8')[2:])
elif ([match for match in re.findall("\([.0-9]+", matches)]):
pattern_version = ">" + str(matches.decode('utf8')[1:])
else:
pattern_version = str(matches.decode('utf8'))
expected output would be:
(,1.0],[1.2,) parse to: x <= 1.0 or x >= 1.2
(?P<eq>^[\d.]+$)|(?:^\[(?P<heq>[\d.]+)\]$)|(?:(?P<or>(?<=\]|\)),(?=\[|\())|,|(?:(?<=,)(?:(?P<lte>[\d.]+)\]|(?P<lt>[\d.]+)\)))|(?:(?:\[(?P<gte>[\d.]+)|\((?P<gt>[\d.]+))(?=,)))+
Regex will proceed to match version in this order:
First try to match "Soft" requirement using:
(?P<eq>^[\d.]+$)
Then try to match "Hard" requirement using:
(?:^\[(?P<heq>[\d.]+)\]$)
Otherwise try to match ranges in this order:
First determine if this is a multiple set using:
(?:(?P<or>(?<=\]|\)),(?=\[|\())
which will only match the comma separating sets.
Then try to match the comma separating ranges within the same set using:
,.
Then proceeds to match the actual ranges:
start matching the upper bound value using
(?:(?<=,)(?:(?P<lte>[\d.]+)\]|(?P<lt>[\d.]+)\)))
then the lower bound value using
(?:(?:\[(?P<gte>[\d.]+)|\((?P<gt>[\d.]+))(?=,))
The result of versions included in the specification will be:
| version | eq | heq | gte | gt | or | lte | lt |
| ------------- | --- | --- | --- | --- | -- | --- | --- |
| 1.0 | 1.0 | | | | | | |
| [1.0] | | 1.0 | | | | | |
| (,1.0] | | | | | | 1.0 | |
| [1.2,1.3] | | | 1.2 | | | 1.3 | |
| [1.0,2.0) | | | 1.0 | | | | 2.0 |
| [1.5,) | | | 1.5 | | | | |
| (,1.0],[1.2,) | | | 1.2 | | , | 1.0 | |
| (,1.1),(1.1,) | | | | 1.1 | , | | 1.1 |
so I have the following class which prints the text and header you call with it. I also provided the code I am using to call the Print function. I have a string 'outputstring' which contains the text I want to print. My expected output is below and my actual output is below. It seems to be removing spaces which are needed for proper legibility. How can I print while keeping the spaces?
Class:
#Printer Class
class Printer(HtmlEasyPrinting):
def __init__(self):
HtmlEasyPrinting.__init__(self)
def GetHtmlText(self,text):
"Simple conversion of text. Use a more powerful version"
html_text = text.replace('\n\n','<P>')
html_text = text.replace('\n', '<BR>')
return html_text
def Print(self, text, doc_name):
self.SetHeader(doc_name)
self.PrintText(self.GetHtmlText(text),doc_name)
def PreviewText(self, text, doc_name):
self.SetHeader(doc_name)
HtmlEasyPrinting.PreviewText(self, self.GetHtmlText(text))
Expected Print:
+-------------------+---------------------------------+------+-----------------+-----------+
| Domain: | Mail Server: | TLS: | # of Employees: | Verified: |
+-------------------+---------------------------------+------+-----------------+-----------+
| bankofamerica.com | ltwemail.bankofamerica.com | Y | 239000 | Y |
| | rdnemail.bankofamerica.com | Y | | Y |
| | kcmemail.bankofamerica.com | Y | | Y |
| | rchemail.bankofamerica.com | Y | | Y |
| citigroup.com | mx-b.mail.citi.com | Y | 248000 | N |
| | mx-a.mail.citi.com | Y | | N |
| bnymellon.com | cluster9bny.us.messagelabs.com | ? | 51400 | N |
| | cluster9bnya.us.messagelabs.com | Y | | N |
| usbank.com | mail1.usbank.com | Y | 65565 | Y |
| | mail2.usbank.com | Y | | Y |
| | mail3.usbank.com | Y | | Y |
| | mail4.usbank.com | Y | | Y |
| us.hsbc.com | vhiron1.us.hsbc.com | Y | 255200 | Y |
| | vhiron2.us.hsbc.com | Y | | Y |
| | njiron1.us.hsbc.com | Y | | Y |
| | njiron2.us.hsbc.com | Y | | Y |
| | nyiron1.us.hsbc.com | Y | | Y |
| | nyiron2.us.hsbc.com | Y | | Y |
| pnc.com | cluster5a.us.messagelabs.com | Y | 49921 | N |
| | cluster5.us.messagelabs.com | ? | | N |
| tdbank.com | cluster5.us.messagelabs.com | ? | 0 | N |
| | cluster5a.us.messagelabs.com | Y | | N |
+-------------------+---------------------------------+------+-----------------+-----------+
Actual Print:
The same thing as expected but the spaces are removed making it very hard to read.
Function call:
def printFile():
outputstring = txt_tableout.get(1.0, 'end')
print(outputstring)
app = wx.PySimpleApp()
p = Printer()
p.Print(outputstring, "Data Results")
For anyone else struggling, this is the modified class function I used to generate a nice table with all rows and columns.
def GetHtmlText(self,text):
html_text = '<h3>Data Results:</h3><p><table border="2">'
html_text += "<tr><td>Domain:</td><td>Mail Server:</td><td>TLS:</td><td># of Employees:</td><td>Verified</td></tr>"
for row in root.ptglobal.to_csv():
html_text += "<tr>"
for x in range(len(row)):
html_text += "<td>"+str(row[x])+"</td>"
html_text += "</tr>"
return html_text + "</table></p>"
maybe try
`html_text = text.replace(' ',' ').replace('\n','<br/>')`
that would replace your spaces with html space characters ... but it would still not look right since it is not a monospace font ... this will be hard to automate ... you really want to probably put it in a table structure ... but that would require some work
you probably want to invest a little more time in your html conversion ... perhaps something like (making assumptions based on what you have shown)
def GetHtmlText(self,text):
"Simple conversion of text. Use a more powerful version"
text_lines = text.splitlines()
html_text = "<table>"
html_text += "<tr><th> + "</th><th>".join(text_lines[0].split(":")) + "</th></tr>
for line in text_lines[1:]:
html_text += "<tr><td>"+"</td><td>".join(line.split()) +"</td></tr>
return html_text + "</table>"