Beautiful Soup returning None

Beautiful Soup returning None - python

I am trying to scrape the th element, but the result keeps returning None. What am I doing wrong?
This is the code I have tried:
import requests
import bs4
import urllib3
dateList = []
openList = []
closeList = []
highList = []
lowList = []
r = requests.get(
'https://coinmarketcap.com/currencies/bitcoin/historical-data/')
soup = bs4.BeautifulSoup(r.text, 'lxml')
td = soup.find('th')
print(td)

There's an API endpoint so you can fetch the data from there.
Here's how:
import pandas as pd
import requests
from tabulate import tabulate
api_endpoint = "https://web-api.coinmarketcap.com/v1/cryptocurrency/ohlcv/historical?id=1&convert=USD&time_start=1609804800&time_end=1614902400"
bitcoin = requests.get(api_endpoint).json()
df = pd.DataFrame([q["quote"]["USD"] for q in bitcoin["data"]["quotes"]])
print(tabulate(df, headers="keys", showindex=False, disable_numparse=True, tablefmt="pretty"))
Output:
+----------------+----------------+----------------+----------------+--------------------+-------------------+--------------------------+
| open | high | low | close | volume | market_cap | timestamp |
+----------------+----------------+----------------+----------------+--------------------+-------------------+--------------------------+
| 34013.614533 | 36879.69856854 | 33514.03374162 | 36824.36441009 | 75289433810.59091 | 684671246323.6501 | 2021-01-06T23:59:59.999Z |
| 36833.87435728 | 40180.3679073 | 36491.18981083 | 39371.04235311 | 84762141031.49448 | 732062681138.1346 | 2021-01-07T23:59:59.999Z |
| 39381.76584266 | 41946.73935079 | 36838.63599637 | 40797.61071993 | 88107519479.50471 | 758625941266.7522 | 2021-01-08T23:59:59.999Z |
| 40788.64052286 | 41436.35000639 | 38980.87690625 | 40254.54649816 | 61984162837.0747 | 748563483043.1383 | 2021-01-09T23:59:59.999Z |
| 40254.21779758 | 41420.19103255 | 35984.62712175 | 38356.43950662 | 79980747690.35463 | 713304617760.9486 | 2021-01-10T23:59:59.999Z |
| 38346.52950301 | 38346.52950301 | 30549.59876946 | 35566.65594049 | 123320567398.62296 | 661457321418.0524 | 2021-01-11T23:59:59.999Z |
| 35516.36114084 | 36568.52697414 | 32697.97662163 | 33922.9605815 | 74773277909.4566 | 630920422745.0479 | 2021-01-12T23:59:59.999Z |
| 33915.11958124 | 37599.96059774 | 32584.66767186 | 37316.35939997 | 69364315979.27992 | 694069582193.7559 | 2021-01-13T23:59:59.999Z |
| 37325.10763475 | 39966.40524241 | 36868.5632453 | 39187.32812109 | 63615990033.01017 | 728904366964.3611 | 2021-01-14T23:59:59.999Z |
| 39156.7080858 | 39577.71118833 | 34659.58974449 | 36825.36585131 | 67760757880.723885 | 685005864471.3622 | 2021-01-15T23:59:59.999Z |
| 36821.64873201 | 37864.36887891 | 35633.55401669 | 36178.13890106 | 57706187875.104546 | 673000645230.8221 | 2021-01-16T23:59:59.999Z |
| 36163.64923243 | 36722.34987621 | 34069.32218533 | 35791.27792129 | 52359854336.21185 | 665831621390.9865 | 2021-01-17T23:59:59.999Z |
| 35792.23666766 | 37299.28580604 | 34883.84404829 | 36630.07568284 | 49511702429.3542 | 681470030572.0747 | 2021-01-18T23:59:59.999Z |
| 36642.23272357 | 37755.89185872 | 36069.80639361 | 36069.80639361 | 57244195485.50075 | 671081200699.8711 | 2021-01-19T23:59:59.999Z |
and so on ...

I think the request object doesn't have a text attribute.
Try soup = bs4.BeautifulSoup(r.content, 'lxml')

Related

Scrape table from url

I am trying to scrape data from "https://www.investing.com/equities/pre-market" here the picture of what I need :
class="datatable_table__D_jso PreMarketMostActiveStocksTable_preMarketMostActiveStocksTable__9yGOv datatable_table--mobile-basic__W2ilt datatable_table--freeze-column__7YoIE"
It seems that this HTML code contains the table, I tried to scrape using
soup.find but I get no result.
here is my code.
import requests
from bs4 import BeautifulSoup
url = "https://www.investing.com/equities/pre-market"
html = requests.get(url).content
soup = BeautifulSoup(html)
table = soup.find('table', {'class': 'datatable_row__qHMpQ'})
print(soup)
Thanks!

The class you're using belongs to the header row of the table, not the table tag itself. (It's indicated by the class name itself - "datatable_row__qHMpQ"...)
You can use one of the table classes instead (like datatable_table__D_jso) or you could use the data-test attribute:
table = soup.find('table', {'data-test': "pre-market-most-active-stocks-table"})
# import pandas
print(pandas.read_html(table.prettify())[0].to_markdown(index=False))
prints
| Name | Symbol | Last | Chg. | Chg. % | Vol. | Time |
|:--------------------|:---------|-------:|-------:|:---------|:-------|:---------|
| Jiuzi Holdings Inc | JZXN | 0.24 | 0.092 | +62.09% | 19.14M | 09:27:41 |
| OpGen Inc | OPGN | 0.245 | 0.12 | +96.00% | 16.41M | 09:27:41 |
| Powerbridge | PBTS | 0.1001 | 0.0084 | +9.16% | 12.07M | 09:26:40 |
| Faraday Future Int. | FFIE | 0.57 | 0.11 | +24.59% | 12.03M | 09:27:56 |
| Magenta Therapeuti. | MGTA | 1.45 | 0.3 | +26.09% | 9.12M | 09:27:58 |
| Starry Holdings | STRY | 0.122 | 0.022 | +21.50% | 8.34M | 09:26:57 |
| Netcapital Inc | NCPL | 3.03 | 1.64 | +117.99% | 6.51M | 09:27:59 |
| China Pharma | CPHI | 0.1449 | 0.0044 | +3.13% | 3.55M | 09:26:52 |
| 111 Inc | YI | 3.81 | 0.27 | +7.63% | 2.98M | 09:28:00 |
| Amesite | AMST | 0.369 | 0.059 | +19.03% | 2.45M | 09:21:45 |
EDIT: full code with some additions for debugging and/or error handling:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import os
url = "https://www.investing.com/equities/pre-market"
resp = requests.get(url)
# resp.raise_for_status() # halt program right here if bad response
soup = BeautifulSoup(resp.content)
table = soup.find('table', {'data-test':"pre-market-most-active-stocks-table"})
if table is None and resp.status_code == 200: # ok respose, but no table
hfn = 'MISSING_DATA investing-com_equities_premarket.html'
with open(hfn, 'wb') as f: f.write(resp.content)
print('no such table found - inspect [on editor]: ', os.path.abspath(hfn))
elif table: print(pd.read_html(table.prettify())[0].to_markdown(index=False))
else: print(f'{resp.status_code} {resp.reason} - failed to scrape {url}')

Web Scrape table data from this webpage

I'm trying to scrape the data from the table in the specifications section of this webpage:
Lochinvar Water Heaters
I'm using beautiful soup 4. I've tried searching for it by class - for example - (class="Table__Cell-sc-1e0v68l-0 kdksLO") but bs4 can't find the class on the webpage. I listed all the available classes that it could find and it doesn't find anything useful. Any help is appreciated.
Here's the code I tried to get the classes
import requests
from bs4 import BeautifulSoup
URL = "https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("div", class_='Table__Wrapper-sc-1e0v68l-3 iFOFNW')
classes = [value
for element in soup.find_all(class_=True)
for value in element["class"]]
classes = sorted(classes)
for cass in classes:
print(cass)

The page is populated with javascript, but fortunately in this case, much of the data [including the specs table you want] seems to be inside a script tag within the fetched html. The script just has one statement, so it's fairly easy to extract it as json
import json
### copied from your q ####
import requests
from bs4 import BeautifulSoup
URL = "https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
###########################
wrInf = soup.find(lambda l: l.name == 'script' and '__routeInfo' in l.text)
wrInf = wrInf.text.replace('window.__routeInfo = ', '', 1) # remove variable name
wrInf = wrInf.strip()[:-1] # get rid of ; at end
wrInf = json.loads(wrInf) # convert to python dictionary
specsTables = wrInf['data']['product']['specifications'][0]['table'] # get table (tsv string)
specsTables = [tuple(row.split('\t')) for row in specsTables.split('\n')] # convert rows to tuples
To view it, you could use pandas,
import pandas
headers = specsTables[0]
st_df = pandas.DataFrame([dict(zip(headers, r)) for r in specsTables[1:]])
# or just
# st_df = pandas.DataFrame(specsTables[1:], columns=headers)
print(st_df.head())
or you could simply print it
for i, r in enumerate(specsTables):
print(" | ".join([f'{c:^18}' for c in r]))
if i == 0: print()
output:
Model Number | Btu/Hr Input | Thermal Efficiency | GPH # 100ÂºF Rise | A | B | C | D | E | F | G | H | I | J | K | L | M | Gas Conn. | Water Conn. | Air Inlet | Vent Size | Ship. Wt.
AWH0400NPM | 399,000 | 99% | 479 | 45" | 24" | 30-1/2" | 42-1/2" | 29-3/4" | 20-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1" | 2" | 4" | 4" | 326
AWH0500NPM | 500,000 | 99% | 600 | 45" | 24" | 30-1/2" | 42-1/2" | 29-3/4" | 20-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1" | 2" | 4" | 4" | 333
AWH0650NPM | 650,000 | 98% | 772 | 45" | 24" | 41" | 53" | 30-1/2" | 15-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2" | 4" | 6" | 424
AWH0800NPM | 800,000 | 98% | 950 | 45" | 24" | 41" | 53" | 30-1/2" | 15-1/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2" | 4" | 6" | 434
AWH1000NPM | 999,000 | 98% | 1,187 | 45" | 24" | 48" | 62" | 30-1/2" | 15-3/4" | 12" | 20" | 38" | 3-1/2" | 10-1/2" | 19-1/4" | 20" | 1-1/4" | 2-1/2" | 6" | 6" | 494
AWH1250NPM | 1,250,000 | 98% | 1,485 | 51-1/2" | 34" | 49" | 59" | 5-1/2" | 5-1/2" | 13-1/2" | 6-3/4" | 46-3/4" | 5-3/4" | 19-3/4" | 23" | 22-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,568
AWH1500NPM | 1,500,000 | 98% | 1,782 | 51-1/2" | 34" | 52-3/4" | 62-3/4" | 4-1/2" | 4-1/2" | 13-1/2" | 6-3/4" | 46-3/4" | 5-3/4" | 19-3/4" | 23" | 22-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,649
AWH2000NPM | 1,999,000 | 98% | 2,375 | 51-1/2" | 34" | 65-1/2" | 75-1/2" | 7" | 5-3/4" | 14-3/4" | 7-1/4" | 46-3/4" | 6-3/4" | 18-3/4" | 23" | 23-1/2" | 1-1/2" | 2-1/2" | 8" | 8" | 1,911
AWH3000NPM | 3,000,000 | 98% | 3,564 | 67-1/4" | 48-1/4" | 79-3/4" | 93-3/4" | 4-3/4" | 6-3/4" | 17-3/4" | 8-3/4" | 60-1/4" | 8-1/2" | 25-1/2" | 29-1/2" | 40" | 2" | 4" | 10" | 10" | 3,147
AWH4000NPM | 4,000,000 | 98% | 4,752 | 67-1/4" | 48-1/4" | 96" | 110" | 5" | 7-1/2" | 17-3/4" | 8-3/4" | 60-1/4" | 8-1/2" | 25-1/2" | 29-1/2" | 40" | 2-1/2" | 4" | 12" | 12" | 3,694
If you wanted a specific models specs:
modelNo = 'AWH1000NPM'
mSpecs = [r for r in specsTables if r[0] == modelNo]
mSpecs = [[]] if mSpecs == [] else mSpecs # in case there is no match
mSpecs = dict(zip(specsTables[0], mSpecs[0])) # convert to dictionary
print(mSpecs)
output:
{'Model Number': 'AWH1000NPM', 'Btu/Hr Input': '999,000', 'Thermal Efficiency': '98%', 'GPH # 100ÂºF Rise': '1,187', 'A': '45"', 'B': '24"', 'C': '48"', 'D': '62"', 'E': '30-1/2"', 'F': '15-3/4"', 'G': '12"', 'H': '20"', 'I': '38"', 'J': '3-1/2"', 'K': '10-1/2"', 'L': '19-1/4"', 'M': '20"', 'Gas Conn.': '1-1/4"', 'Water Conn.': '2-1/2"', 'Air Inlet': '6"', 'Vent Size': '6"', 'Ship. Wt.': '494'}

The contents for constructing the table are within a script tag. You can extract the relevant string and re-create the table through string manipulation.
import requests, re
import pandas as pd
r = requests.get('https://www.lochinvar.com/products/commercial-water-heaters/armor-condensing-water-heater/').text
s = re.sub(r'\\"', '"', re.search(r'table":"([\s\S]+?)(?:","tableFootNote)', r).groups(1)[0])
lines = [i.split('\\t') for i in s.split('\\n')]
df = pd.DataFrame(lines[1:], columns = lines[:1])
df.head(5)

what is NavigableString in this error refers to, and why that happened?

from bs4 import BeautifulSoup
import numpy as np
import requests
from selenium import webdriver
from nltk.tokenize import sent_tokenize,word_tokenize
html = webdriver.Firefox(executable_path=r'D:\geckodriver.exe')
html.get("https://www.tsa.gov/coronavirus/passenger-throughput")
def TSA_travel_numbers(html):
soup = BeautifulSoup(html,'lxml')
for i,rows in enumerate(soup.find('div',class_='view-content'),1):
# print(rows.content)
for header in rows.find('tr'):
number = rows.find_all('td',class_='views-field views-field field-2021-throughput views-align-center')
print(number.text)
TSA_travel_numbers(html.page_source)
My error as follows :
Traceback (most recent call last):
File "TSA_travel.py", line 23, in <module>
TSA_travel_numbers(html.page_source)
File "TSA_travel.py", line 15, in TSA_travel_numbers
for header in rows.find('tr'):
TypeError: 'int' object is not iterable
What is happening here?
I can't iter thru 'tr' tags, please help me to solve this problem.
Sorry for your time and advance thanks!

As the error says, you can't iterate over an int, which is your rows.
Also, there's no need for a webdriver as data on the page is static.
Here's my take on it:
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
def get_page(url):
return requests.get(url).text
def get_data(page):
soup = BeautifulSoup(page, 'lxml')
return [
item.getText(strip=True) for item in soup.select(".views-align-center")
]
def build_table(table_rows):
t = [table_rows[i:i + 4] for i in range(0, len(table_rows[1:]), 4)]
h = t[0]
return t[1:], h
if __name__ == '__main__':
source = "https://www.tsa.gov/coronavirus/passenger-throughput"
table, header = build_table(get_data(get_page(source)))
print(tabulate(table, headers=header, tablefmt="pretty"))
Output:
+------------+--------------------------+--------------------------+--------------------------+
| Date | 2021 Traveler Throughput | 2020 Traveler Throughput | 2019 Traveler Throughput |
+------------+--------------------------+--------------------------+--------------------------+
| 5/9/2021 | 1,707,805 | 200,815 | 2,419,114 |
| 5/8/2021 | 1,429,657 | 169,580 | 1,985,942 |
| 5/7/2021 | 1,703,267 | 215,444 | 2,602,631 |
| 5/6/2021 | 1,644,050 | 190,863 | 2,555,342 |
| 5/5/2021 | 1,268,938 | 140,409 | 2,270,662 |
| 5/4/2021 | 1,134,103 | 130,601 | 2,106,597 |
| 5/3/2021 | 1,463,672 | 163,692 | 2,470,969 |
| 5/2/2021 | 1,626,962 | 170,254 | 2,512,598 |
| 5/1/2021 | 1,335,535 | 134,261 | 1,968,278 |
| 4/30/2021 | 1,558,553 | 171,563 | 2,546,029 |
| 4/29/2021 | 1,526,681 | 154,695 | 2,499,461 |
| 4/28/2021 | 1,184,326 | 119,629 | 2,256,442 |
| 4/27/2021 | 1,077,199 | 110,913 | 2,102,068 |
| 4/26/2021 | 1,369,410 | 119,854 | 2,412,770 |
| 4/25/2021 | 1,571,220 | 128,875 | 2,506,809 |
| 4/24/2021 | 1,259,724 | 114,459 | 1,990,464 |
| 4/23/2021 | 1,521,393 | 123,464 | 2,521,897 |
| 4/22/2021 | 1,509,649 | 111,627 | 2,526,961 |
| 4/21/2021 | 1,164,099 | 98,968 | 2,254,209 |
| 4/20/2021 | 1,082,443 | 92,859 | 2,227,475 |
| 4/19/2021 | 1,412,500 | 99,344 | 2,594,171 |
| 4/18/2021 | 1,572,383 | 105,382 | 2,356,802 |
| 4/17/2021 | 1,277,815 | 97,236 | 1,988,205 |
| 4/16/2021 | 1,468,218 | 106,385 | 2,457,133 |
| 4/15/2021 | 1,491,435 | 95,085 | 2,616,158 |
| 4/14/2021 | 1,152,703 | 90,784 | 2,317,381 |
| 4/13/2021 | 1,085,034 | 87,534 | 2,208,688 |
| 4/12/2021 | 1,468,972 | 102,184 | 2,484,580 |
| 4/11/2021 | 1,561,495 | 90,510 | 2,446,801 |
and so on ...
Or, an ever shorter approach, just use pandas:
import pandas as pd
import requests
from tabulate import tabulate
if __name__ == '__main__':
source = "https://www.tsa.gov/coronavirus/passenger-throughput"
df = pd.read_html(requests.get(source).text, flavor="bs4")[0]
print(tabulate(df.head(10), tablefmt="pretty", showindex=False))
Output:
+-----------+-----------+--------+---------+
| 5/9/2021 | 1707805.0 | 200815 | 2419114 |
| 5/8/2021 | 1429657.0 | 169580 | 1985942 |
| 5/7/2021 | 1703267.0 | 215444 | 2602631 |
| 5/6/2021 | 1644050.0 | 190863 | 2555342 |
| 5/5/2021 | 1268938.0 | 140409 | 2270662 |
| 5/4/2021 | 1134103.0 | 130601 | 2106597 |
| 5/3/2021 | 1463672.0 | 163692 | 2470969 |
| 5/2/2021 | 1626962.0 | 170254 | 2512598 |
| 5/1/2021 | 1335535.0 | 134261 | 1968278 |
| 4/30/2021 | 1558553.0 | 171563 | 2546029 |
+-----------+-----------+--------+---------+

pivot multiple views into single result table/view

I have 2 views as below:
experiments:
select * from experiments;
+--------+--------------------+-----------------+
| exp_id | exp_properties | value |
+--------+--------------------+-----------------+
| 1 | indicator:chemical | phenolphthalein |
| 1 | base | NaOH |
| 1 | acid | HCl |
| 1 | exp_type | titration |
| 1 | indicator:color | faint_pink |
+--------+--------------------+-----------------+
calculations:
select * from calculations;
+--------+------------------------+--------------+
| exp_id | exp_report | value |
+--------+------------------------+--------------+
| 1 | molarity:base | 0.500000000 |
| 1 | volume:acid:in_ML | 23.120000000 |
| 1 | volume:base:in_ML | 5.430000000 |
| 1 | moles:H | 0.012500000 |
| 1 | moles:OH | 0.012500000 |
| 1 | molarity:acid | 0.250000000 |
+--------+------------------------+--------------+
I managed to pivot each of these views individually as below:
experiments_pivot:
+-------+--------------------+------+------+-----------+----------------+
|exp_id | indicator:chemical | base | acid | exp_type | indicator:color|
+-------+--------------------+------+------+-----------+----------------+
| 1 | phenolphthalein | NaOH | HCl | titration | faint_pink |
+------+---------------------+------+------+-----------+----------------+
calculations_pivot:
+-------+---------------+---------------+--------------+-------------+------------------+-------------------+
|exp_id | molarity:base | molarity:acid | moles:H | moles:OH | volume:acid:in_ML| volume:base:in_ML |
+-------+---------------+---------------+--------------+-------------+------------------+-------------------+
| 1 | 0.500000000 | 0.250000000 | 0.012500000 | 0.012500000 | 23.120000000 | 5.430000000 |
+------+---------------------+------+------+-----------+----------------------------------------------------+
My question is how to get these two pivot results as a single row? Desired result is as below:
+-------+--------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
|exp_id | indicator:chemical | base | acid | exp_type | indicator:color|molarity:base | molarity:acid | moles:H | moles:OH | volume:acid:in_ML| volume:base:in_ML |
+-------+--------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
| 1 | phenolphthalein | NaOH | HCl | titration | faint_pink | 0.500000000 | 0.250000000 | 0.012500000 | 0.012500000 | 23.120000000 | 5.430000000 |
+------+---------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
Database Used: Mysql
Important Note: Each of these views can have increasing number of rows. Hence I considered "dynamic pivoting" for each of the view individually.
For reference -- Below is a prepared statement I used to pivot experiments in MySQL(and a similar statement to pivot the other view as well):
set #sql = Null;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(IF(exp_properties = ''',
exp_properties,
''', value, NULL)) AS ',
concat("`",exp_properties, "`")
)
)into #sql
from experiments;
set #sql = concat(
'select exp_id, ',
#sql,
' from experiment group by exp_id'
);
prepare stmt from #sql;
execute stmt;

wxPython print string with formatting

so I have the following class which prints the text and header you call with it. I also provided the code I am using to call the Print function. I have a string 'outputstring' which contains the text I want to print. My expected output is below and my actual output is below. It seems to be removing spaces which are needed for proper legibility. How can I print while keeping the spaces?
Class:
#Printer Class
class Printer(HtmlEasyPrinting):
def __init__(self):
HtmlEasyPrinting.__init__(self)
def GetHtmlText(self,text):
"Simple conversion of text. Use a more powerful version"
html_text = text.replace('\n\n','<P>')
html_text = text.replace('\n', '<BR>')
return html_text
def Print(self, text, doc_name):
self.SetHeader(doc_name)
self.PrintText(self.GetHtmlText(text),doc_name)
def PreviewText(self, text, doc_name):
self.SetHeader(doc_name)
HtmlEasyPrinting.PreviewText(self, self.GetHtmlText(text))
Expected Print:
+-------------------+---------------------------------+------+-----------------+-----------+
| Domain: | Mail Server: | TLS: | # of Employees: | Verified: |
+-------------------+---------------------------------+------+-----------------+-----------+
| bankofamerica.com | ltwemail.bankofamerica.com | Y | 239000 | Y |
| | rdnemail.bankofamerica.com | Y | | Y |
| | kcmemail.bankofamerica.com | Y | | Y |
| | rchemail.bankofamerica.com | Y | | Y |
| citigroup.com | mx-b.mail.citi.com | Y | 248000 | N |
| | mx-a.mail.citi.com | Y | | N |
| bnymellon.com | cluster9bny.us.messagelabs.com | ? | 51400 | N |
| | cluster9bnya.us.messagelabs.com | Y | | N |
| usbank.com | mail1.usbank.com | Y | 65565 | Y |
| | mail2.usbank.com | Y | | Y |
| | mail3.usbank.com | Y | | Y |
| | mail4.usbank.com | Y | | Y |
| us.hsbc.com | vhiron1.us.hsbc.com | Y | 255200 | Y |
| | vhiron2.us.hsbc.com | Y | | Y |
| | njiron1.us.hsbc.com | Y | | Y |
| | njiron2.us.hsbc.com | Y | | Y |
| | nyiron1.us.hsbc.com | Y | | Y |
| | nyiron2.us.hsbc.com | Y | | Y |
| pnc.com | cluster5a.us.messagelabs.com | Y | 49921 | N |
| | cluster5.us.messagelabs.com | ? | | N |
| tdbank.com | cluster5.us.messagelabs.com | ? | 0 | N |
| | cluster5a.us.messagelabs.com | Y | | N |
+-------------------+---------------------------------+------+-----------------+-----------+
Actual Print:
The same thing as expected but the spaces are removed making it very hard to read.
Function call:
def printFile():
outputstring = txt_tableout.get(1.0, 'end')
print(outputstring)
app = wx.PySimpleApp()
p = Printer()
p.Print(outputstring, "Data Results")

For anyone else struggling, this is the modified class function I used to generate a nice table with all rows and columns.
def GetHtmlText(self,text):
html_text = '<h3>Data Results:</h3><p><table border="2">'
html_text += "<tr><td>Domain:</td><td>Mail Server:</td><td>TLS:</td><td># of Employees:</td><td>Verified</td></tr>"
for row in root.ptglobal.to_csv():
html_text += "<tr>"
for x in range(len(row)):
html_text += "<td>"+str(row[x])+"</td>"
html_text += "</tr>"
return html_text + "</table></p>"

maybe try
`html_text = text.replace(' ',' ').replace('\n','<br/>')`
that would replace your spaces with html space characters ... but it would still not look right since it is not a monospace font ... this will be hard to automate ... you really want to probably put it in a table structure ... but that would require some work
you probably want to invest a little more time in your html conversion ... perhaps something like (making assumptions based on what you have shown)
def GetHtmlText(self,text):
"Simple conversion of text. Use a more powerful version"
text_lines = text.splitlines()
html_text = "<table>"
html_text += "<tr><th> + "</th><th>".join(text_lines[0].split(":")) + "</th></tr>
for line in text_lines[1:]:
html_text += "<tr><td>"+"</td><td>".join(line.split()) +"</td></tr>
return html_text + "</table>"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Beautiful Soup returning None - python

I think the request object doesn't have a text attribute. Try soup = bs4.BeautifulSoup(r.content, 'lxml')

Related

Scrape table from url

Web Scrape table data from this webpage

what is NavigableString in this error refers to, and why that happened?

pivot multiple views into single result table/view

wxPython print string with formatting

Categories

Resources