beautiful soup - get tag desired text - python

Very new to beautiful soup. I'm attempting to get the text between tags.
databs.txt
<p>$343,343</p><h3>Single</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$101,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Multi</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$201,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Single</h3>
Python
#!/usr/bin/python
import os
from bs4 import BeautifulSoup
f = open(os.path.join("databs.txt"), "r")
text = f.read()
soup = BeautifulSoup(text, 'html.parser')
page1 = soup.find('p').getText()
print("P1:",page1)
page2 = soup.find('h3').getText()
print("H3:",page2)
Question:
How do I get the text "$101,900, Multi, $201,900, Single"?

If you want to get the tags that have attributes, you can use lambda function to get them as follows:
from bs4 import BeautifulSoup
html = """
<p>$343,343</p>
<h3>Single</h3>
<p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$101,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Multi</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$201,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Single</h3>
"""
soup = BeautifulSoup(html, 'lxml')
tags_with_attribute = soup.find_all(attrs=lambda x: x is not None)
clean_text = ", ".join([tag.get_text() for tag in tags_with_attribute])
Output would look like:
'$101,900, Multi, $201,900, Single'

Use find_all method to find all tags:
for p, h3 in zip(soup.find_all('p'), soup.find_all('h3')):
print("P:",p.getText())
print("H3:",h3.getText())

Related

How to use css's :after for widgets in pyqt

I am thinking to create a PyQt6 Application.
I want that application to be more beautiful and modern. So I found some good looking css buttons and chose this codepen button:
* {
margin: 0;
padding: 0;
}
html,
body {
box-sizing: border-box;
height: 100%;
width: 100%;
}
body {
background: #FFF;
font-family: 'Noto Sans JP', sans-serif;
font-weight: 400;
}
.buttons {
display: flex;
flex-direction: row;
flex-wrap: wrap;
justify-content: center;
text-align: center;
width: 100%;
height: 100%;
margin: 0 auto;
/* padding: 2em 0em; */
}
.container {
align-items: center;
display: flex;
flex-direction: column;
justify-content: center;
text-align: center;
background-color: #FFF;
padding: 40px 0px;
width: 240px;
}
h1 {
text-align: left;
color: #444;
letter-spacing: 0.05em;
margin: 0 0 0.4em;
font-size: 1em;
}
p {
text-align: left;
color: #444;
letter-spacing: 0.05em;
font-size: 0.8em;
margin: 0 0 2em;
}
.btn {
letter-spacing: 0.1em;
cursor: pointer;
font-size: 14px;
font-weight: 400;
line-height: 45px;
max-width: 160px;
position: relative;
text-decoration: none;
text-transform: uppercase;
width: 100%;
}
.btn:hover {
text-decoration: none;
}
/*btn_background*/
.effect01 {
color: #FFF;
border: 4px solid #000;
box-shadow:0px 0px 0px 1px #000 inset;
background-color: #000;
overflow: hidden;
position: relative;
transition: all 0.3s ease-in-out;
}
.effect01:hover {
border: 4px solid #666;
background-color: #FFF;
box-shadow:0px 0px 0px 4px #EEE inset;
}
/*btn_text*/
.effect01 span {
transition: all 0.2s ease-out;
z-index: 2;
}
.effect01:hover span{
letter-spacing: 0.13em;
color: #333;
}
/*highlight*/
.effect01:after {
background: #FFF;
border: 0px solid #000;
content: "";
height: 155px;
left: -75px;
opacity: .8;
position: absolute;
top: -50px;
-webkit-transform: rotate(35deg);
transform: rotate(35deg);
width: 50px;
transition: all 1s cubic-bezier(0.075, 0.82, 0.165, 1);/*easeOutCirc*/
z-index: 1;
}
.effect01:hover:after {
background: #FFF;
border: 20px solid #000;
opacity: 0;
left: 120%;
-webkit-transform: rotate(40deg);
transform: rotate(40deg);
}
<div class="buttons">
<div class="container">
<h1>光の反射</h1>
<p>Light reflection</p>
<span>Hover</span>
</div>
</div>
My question is: how can I use pyqt's functions to perform the :after.
I guess I cant just use a widget's stylesheet to perform.
Here is the Python Code that I have written so far:
from PyQt6.QtCore import QRect, Qt
from PyQt6.QtGui import QEnterEvent
from PyQt6.QtWidgets import QGraphicsOpacityEffect, QMainWindow, QPushButton
class QButton(QPushButton):
def __init__(self , main_window: QMainWindow):
super().__init__(main_window)
self.main_window = main_window
self.setText("Welcome")
self.render()
def render(self):
self.setStyleSheet("""
QPushButton{
background-color: #2E3440;
letter-spacing: 0.1px;
font-size: 14px;
font-weight: 400;
line-height: 45px;
text-decoration: none;
color: #FFF;
border: 4px solid #4C566A;
}
""")
self.setFlat(True)
self.setCursor(Qt.CursorShape.PointingHandCursor)
self.setGeometry(QRect(200 , 200 , 200 , 50))
def setText(self , text: str):
text = text.upper()
super().setText(text)
def enterEvent(self, event: QEnterEvent) -> None:
self.setStyleSheet("""
background-color: #D8DEE9;
letter-spacing: 0.13em;
color: #2E3440;
border: 0px;
""")
opacity = QGraphicsOpacityEffect()
opacity.setOpacity(.8)
self.setGraphicsEffect(opacity)
return super().enterEvent(event)
class Renderer:
def __init__(self, main_window: QMainWindow) -> None:
self.main_window = main_window
def render(self):
button = QButton(self.main_window)
if __name__ == "__main__":
app = QApplication([])
window = QMainWindow()
renderer = Renderer(window)
renderer.render()
window.show()
exit(app.exec())
Unfortunately, you can't. Qt Style Sheets are fundamentally based on CSS 2.1, and only support its basic implementation (see the syntax documentation).
If you want a more customized and advanced behavior, you have only two options:
subclass the widget and/or use QProxyStyle;
use QML;

Creating matrix in python with pandas and numpy

I am facing the problem to creating a matrix in python which divides but instead it multiplies.
I have two dataframes:
df_in = pd.DataFrame([[77.279999], [80.099998]], index=[2019, 2020], columns=['Price'])
df_out = pd.DataFrame([[71.849998], [77.400002]], index=[2019, 2020], columns=['Price])
Now I will create the matrix:
df_matrix = pd.DataFrame(np.outer(df_in, df_out), df_in.index, df_out.index)
The output I get is:
<style type="text/css">
table.tableizer-table {
font-size: 12px;
border: 1px solid #CCC;
font-family: Arial, Helvetica, sans-serif;
}
.tableizer-table td {
padding: 4px;
margin: 3px;
border: 1px solid #CCC;
}
.tableizer-table th {
background-color: #104E8B;
color: #FFF;
font-weight: bold;
}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th></th><th>2019</th><th>2020</th></tr></thead><tbody>
<tr><td>2019</td><td>5552.567794</td><td>5755.184768</td></tr>
<tr><td>2020</td><td>5981.472023</td><td>6199.740004</td></tr>
</tbody></table>
It is multiplying instead of dividing. The next problem I am facing is that if
df_in.index > df_out.index
then value should be 0.
The result that I would like to see is:
<style type="text/css">
table.tableizer-table {
font-size: 12px;
border: 1px solid #CCC;
font-family: Arial, Helvetica, sans-serif;
}
.tableizer-table td {
padding: 4px;
margin: 3px;
border: 1px solid #CCC;
}
.tableizer-table th {
background-color: #104E8B;
color: #FFF;
font-weight: bold;
}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th></th><th>2019</th><th>2020</th></tr></thead><tbody>
<tr><td>2019</td><td>1,075574</td><td>1,114822</td></tr>
<tr><td>2020</td><td>0</td><td>1,034883</td></tr>
</tbody></table>
So thanks to all for your advices.
You can do divide.outer:
pd.DataFrame(np.divide.outer(df_in,df_out)[:,0,:,0], df_in.index, df_out.index)
Output:
2019 2020
2019 1.075574 0.998450
2020 1.114823 1.034884

I need a div to be pushed down when several options from a dcc.dropdown menu have been selected (dash component, python)

I am having trouble making the div below the 'multi' dropdown (dcc.dropdown from dash) to be pushed down when several options from the menu have been selected instead of overlapping as in the image below (or be sent to the back depending on the z-index). The dropdown is inside another div. I´ve tried changing css display and position with no positive outcome yet.
The code looks something like this:
html.Div([
html.Div(children=html.H2('SIMILAR PLAYERS', className='titulo_ventanah2'), className='titulo_ventana'),
html.Div(children=[(html.I(className='search')),'Search by:'],style={'display':'inline-block','padding-left':'15%', 'font-size':'13px'}),
html.Div(children=(dcc.Dropdown(style={'height':'20px', 'font-size':'14px'},persistence_type='session')),style={'display':'inline-block', 'padding':'0px 0px 0px 10px', 'width':'200px', 'margin-top':'5px'}),
html.Div(children=(dcc.Dropdown(multi=True,style={'height':'20px', 'font-size':'14px'})),style={'padding':'0px 0px 0px 10px', 'width':'400px', 'margin-top':'5px'}, className='similardiv'),
html.Div([
html.Div(children=[html.Div(html.H3('Top 15 most similar players',className='titulo_ventanah3'),className='titulo_ventanaint'),
html.Hr(),
html.Div([children=dcc.Graph())])],className='similar_players'),
])
],className='container1')
There is also CSS code for some components:
.container1 {
position: fixed;
width:80%;
height:100%;
display:inline-block;
left:20%;
margin: 0 auto;
padding: 0 0px;
box-sizing: border-box;
overflow: hidden;
top:0;
overflow:auto;
background-color: #f5f5f5;
}
.titulo_ventana {
top:0;
display:inline-block;
background-color: #f5f5f5;
padding-left:2%;
margin: 0px 0px 0px 0px;
}
.titulo_ventanah2 {
text-decoration: none;
font-size: 20px;
line-height:45px;
color: #8f8f8f;
margin: 0px 0px 0px 0px;
display: inline-block;
letter-spacing: 0.01em;
}
.titulo_ventanaint {
overflow: hidden;
z-index:5;
height:39px;
display:inline-block;
background-color: white;
}
.titulo_ventanah3 {
max-width:100%;
text-decoration: none;
font-size: 14px;
color: black;
line-height:45px;
margin: 0px 0px 0px 0px;
padding: 0px 0px 0px 10px;
text-align: left;
overflow:hidden;
letter-spacing: 0.01em;
}
.similar_players {
width: 97%;
overflow: auto;
height:540px;
font-size: 15px;
color: #002e5c;
background-color: white;
margin-top:1.5%;
margin-left:1.5%;
display: block;
position:relative;
z-index:1;
box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
}
.similardiv {
position:relative;
z-index:2;
display:inline-block;
}
I am hoping the solution is with a change in the display or position property of an element but I believe it has something to do with the default css for the dash component which can be found here.
You can check out the dashboard in link
If I change the position of the .Select-menu-outer to relative as suggested by CBroe in the comments, the following will happen only when the menu is opened:
The problem was to set the height of the dcc.Dropdown to 20px, by just deleting 'height':'20px' the problem got solved

In LXML how can I keep the tags, but remove all the inline styling etc?

I know I can remove tags like <h1> etc with LXML but how can I get it to keep the tag but remove everything else like so:
<h1 style='font-size: 24px; margin: 0px 0px 12px; font-weight: 600;
line-height: 28px; color: rgb(23, 43, 77); font-family:
-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
"Noto Sans", Ubuntu, "Droid Sans",
"Helvetica Neue", sans-serif; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
letter-spacing: normal; orphans: 2; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgb(244, 245, 247); text-decoration-style:
initial; text-decoration-color: initial;'>This text must stay</h1>
The above should be once cleaned:
<h1>This text must stay</h1>
Currently I'm doing this, but I can't quite get it to keep the tag and remove the tag class etc:
safe_attrs = list(lxml.html.clean.defs.safe_attrs) + ['bgcolor']
lxml.html.clean.defs.safe_attrs = safe_attrs
lxml.html.clean.Cleaner.safe_attrs = lxml.html.clean.defs.safe_attrs
cleaner = lxml.html.clean.Cleaner(style=False, remove_tags=['body', 'pre'], safe_attrs_only=True,
safe_attrs=safe_attrs, kill_tags=['title', 'code'])
return cleaner.clean_html(data)

API question - How to REST request with Python

I'm making an API REST request using Python. And I encountered the following html result that says "Service - Endpoint not found. Please see the service help page for constructing valid requests to the service"
How can I fix this issue?
Note: This API can help determine whether an individual address is up to date by inputting individual address, first name, last name, etc.
Python query
import requests
import json
url = 'https://smartmover.melissadata.net/v3/WEB/SmartMover/doSmartMover/'
payload = {'t': '1353', 'id': 'sw38hs47u', 'jobid': '1', 'act': 'NCOA, CCOA', 'cols': 'TransmissionResults,TransmissionReference, Version, TotalRecords,CASSReportLink,NCOAReportLink,Records,AddressExtras,AddressKey,AddressLine1,AddressLine2,AddressTypeCode,BaseMelissaAddressKey,CarrierRoute,City,CityAbbreviation,CompanyName,CountryCode,CountryName,DeliveryIndicator,DeliveryPointCheckDigit,DeliveryPointCode,MelissaAddressKey,MoveEffectiveDate,MoveTypeCode,PostalCode,RecordID,Results,State,StateName,Urbanization', 'opt': 'ProcessingType: Standard', 'List': 'test', 'full': 'PATEL MANISH', 'first':'MANISH','last':'PATEL', 'a1':'1600 S 5TH ST', 'a2':'1600 S 5TH ST', 'city':'Austin', 'state': 'TX', 'postal': '78704', 'ctry': 'USA'}
response = requests.get(url, params=payload)
print (response.text)
Result
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Service</title>
<style>BODY { color: #000000; background-color: white; font-family: Verdana; margin-left: 0px; margin-top: 0px; } #content { margin-left: 30px; font-size: .70em; padding-bottom: 2em; } A:link { color: #336699; font-weight: bold; text-decoration: underline; } A:visited { color: #6699cc; font-weight: bold; text-decoration: underline; } A:active { color: #336699; font-weight: bold; text-decoration: underline; } .heading1 { background-color: #003366; border-bottom: #336699 6px solid; color: #ffffff; font-family: Tahoma; font-size: 26px; font-weight: normal;margin: 0em 0em 10px -20px; padding-bottom: 8px; padding-left: 30px;padding-top: 16px;} pre { font-size:small; background-color: #e5e5cc; padding: 5px; font-family: Courier New; margin-top: 0px; border: 1px #f0f0e0 solid; white-space: pre-wrap; white-space: -pre-wrap; word-wrap: break-word; } table { border-collapse: collapse; border-spacing: 0px; font-family: Verdana;} table th { border-right: 2px white solid; border-bottom: 2px white solid; font-weight: bold; background-color: #cecf9c;} table td { border-right: 2px white solid; border-bottom: 2px white solid; background-color: #e5e5cc;}</style>
</head>
<body>
<div id="content">
<p class="heading1">Service</p>
<p xmlns="">Endpoint not found. Please see the <a rel="help-page" href="https://smartmover.melissadata.net/v3/WEB/SmartMover/help">service help page</a> for constructing valid requests to the service.</p>
</div>
</body>
</html>
[Finished in 0.9s]
Remove the / at the end of the url:
import requests
import json
url = 'https://smartmover.melissadata.net/v3/WEB/SmartMover/doSmartMover' # <<<
payload = {'t': '1353', 'id': 'sw38hs47u', 'jobid': '1', ...}
response = requests.get(
url, params=payload,
headers={'Content-Type': 'application/json'} # Using JSON here for readability in the response
)
print (response.text)
Output:
{
"CASSReportLink": "",
"NCOAReportLink": "",
"Records": [],
"TotalRecords": "",
"TransmissionReference": "1353",
"TransmissionResults": "SE20",
"Version": "4.0.4.48"
}

Categories