How to extract specific html lines (with a flex container) using ironpython? - python

I am using IronPython 2.7.9.0 on Grasshopper and Rhino to web scrape data from a specific widget on this link: https://vemcount.app/embed/widget/uOCRuLPangWo5fT?locale=en
The code I am using is as follows
import urllib
import os
web = urllib.urlopen(url)
html = web.read()
web.close()
The html output contains all the html code from this link except for the parts I need. When I inspect it on chrome it has a "flex" button next to it such as the following image.
image that summarizes the issue I am facing
Anything that is rooted under the line with a "flex" button does not appear in the scraping result and comes as a blank line.
This is the output html I get:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Central Library - Duhig North & Link</title>
<meta charset="utf-8">
<meta name="google" content="notranslate">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="csrf-token" content="">
<link rel="stylesheet" href="/build/app.css?id=2fefc4f9faa59eebcb4b">
<link rel="stylesheet" href="https://vemcount.app/fonts/hamburg_serial/stylesheet.css">
<style>
#embed, #main {
height: 100vh;
}
.vue-grid-item {
margin-bottom: 0px !important;
}
.powered_by {
position: absolute;
bottom: 0px;
right: 0px;
background-color: rgba(0, 0, 0, 0.18);
color: #fff;
padding: 2px 5px;
font-size: 9px;
}
.powered_by:hover, .powered_by:link, .powered_by:visited {
text-decoration: none;
display: none;
}
.dashboard-widget .relative {
overflow: hidden !important;
}
</style>
<script>
window.App = {"socketAppKey":"eJSkWUHWpwolvjVcT2ZxUJZXnDpxtRljdZl74fKr","socketCluster":null,"socketHost":"websocket.vemcount.com","socketPort":443,"socketSecurePort":443,"socketDisableStats":true,"socketEncrypted":true,"locale":"en","settings":[{"name":"type","value":"{\"count_in\":\"column\"}"},{"name":"period","value":"[\"yesterday\"]"},{"name":"period_step","value":"hour"},{"name":"hide_datalabel","value":"0"},{"name":"currency","value":"AUD"},{"name":"show_days","value":"[0,1,2,3,4,5,6]"},{"name":"show_months","value":"[1,2,3,4,5,6,7,8,9,10,11,12]"},{"name":"show_hours_from","value":"00:00"},{"name":"show_hours_to","value":"23:45"},{"name":"data_heatmap","value":"blue"},{"name":"weather_metrics","value":"0"},{"name":"first_day_of_week","value":"1"},{"name":"time_format24","value":"time_format24"},{"name":"date_time_format","value":"2"},{"name":"number_grouping","value":","},{"name":"number_decimal","value":"."},{"name":"opening_hours_overlap","value":"0"},{"name":"data_output","value":"count_in"}],"sound":null};
</script>
<script src="/build/lang/en.js?v=2022.04.4"></script>
</head>
<body class="bg-transparent">
<main id="main">
<div id="embed" >
<div class="w-full h-full vue-grid-item cssTransforms" style="position: absolute;">
<live-inside :embedded="true" :widget="{"id":81438,"pane_id":4005,"title":"Central Library - Duhig North & Link","description":"Live occupancy \/ Seating capacity","x":0,"y":0,"w":2,"h":1,"bg_color":"red","text_color":"black","type":"live-inside","secret":"uOCRuLPangWo5fT","internal":"VRg4JTIRrtJ7Pwg","embeddable":1,"content":{"target":1100,"bidirectional":true,"target_enable":true,"prettify":false,"target_type":"donut","target_donut_hide_metric":false,"target_donut_target_hide_label":false,"target_visual_inside_text":null,"target_visual_available_text":null,"target_screen_ok_title":null,"target_screen_ok_text":null,"target_screen_ok_color":"#38A169","target_screen_ok_image":-1,"target_screen_warning_title":null,"target_screen_warning_pe</live-inside>
</div>
</div>
</main>
<a title=" Vemco Group A/S " class="powered_by" target="_blank"
href="http://vemcount.com">Powered by
<b>vemcount.com</b>
</a>
<script src="/build/manifest.js?id=7f2e9aa3431c681a4683"></script>
<script src="/build/vendor.js?id=19867aae3b960cda7d79"></script>
<script src="/build/embed.js?id=2ff0173dd78c5c1f99c6"></script>
</body>
</html>
As you can see it is missing some lines, which are the lines that have a flex button next to them. (btw I have shortended the code that is in so I dont reach the 30000 character limit).
I am interested in the number 311 which changes every 2 seconds in the live link and it can be found in the html code between
<span>311</span>
Is there a way I can get this value, as well as any other value, using IronPython?
P.S. I am a noob in actual coding, that's why I might have issues with terminologies, but have a fair background in visual scripting. Your help is much appreciated. Thanks.

Just in case you had the same query or were struggling with dynamic web scraping. You have to use CPython and install a webscraper such as Playwright or BS + Selenium
I used playwright which is far more straightforward and has a very much appreciated inner_html() function which reads straight into the dynamic flex HTML code. Here is the code for reference.
#part of the help to write the script I got from https://stackoverflow.com/questions/64303326/using-playwright-for-python-how-do-i-select-or-find-an-element
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(slow_mo=1000)
page = browser.new_page()
page.goto('https://vemcount.app/embed/widget/uOCRuLPangWo5fT')
central = page.query_selector("p.w-full span");
print({'central': central.inner_html()})
browser.close()
Afterwards I am trying to run the .py script remotely from Grasshopper through a batch file and read the output through a txt or CSV file from within Grasshopper.
If there is a better way I am more than happy to hear your suggestions.
Yours,
A Beginner in Python. :)

Related

Passing variable from flask to html adds "

I'm trying to pass a variable from flask to my html code. I'm adding it as a url for a button, so a user can follow it. My problem is that the buttons don't work an when inspecting the website I see that the variables have had " added to them. Removing this makes the buttons work.
HTML code:
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Testing buttons">
<meta name="keywords" content="Test">
<style>
h1 {
font-family: Arial, sans-serif;
color: #2f2d2d;
text-align: Center;
}
p {
font-family: Arial, sans-serif;
font-size: 14px;
text-align: Center;
color: #2f2d2d;
}
</style>
</head>
<body>
<h1>Results</h1>
<p>Click the buttons below to go to your results: </p>
<button onclick={{ value1 }}>
Yandex.com
</body>
</html>
Value1 in my python code:
input1 = (str(""""window.location.href='""")
+ str(img_search_url) + str('''';"'''))
return render_template('results.html', value1=input1)
For testing purposes let img_search_url = https://yandex.com/images/search?cbir_id=1865182%2F7z8tGw017Oxvkl-ZRGX7jA6207&rpt=imageview&lr=123432
Thanks
You need to use the |safe filter as mentioned on other SO answers.
<button onclick={{ value1|safe }}>
This ensures that the auto unescaping is turned off. If you do it on untrusted data, it can easily lead to XSS vulnerabilities though.

How to change the font size of text in one cell jupyter lab?

I want to change the font size of a specific markdown cell in jupyter lab, and not the whole output. I will convert my project at the end into an html file.
I already tried this:
<html>
<head>
<style>
div.a {
font-size: 300%;
}
</style>
<body>
<div class="a">My text in here</div>
</body>
</head>
</html>
But this is not changing my text size after I run my cell under Markdown.
I also don't want to use in order to not give a automatic number to that particular text.
Thanks in advance
I see you want to change the font-size:
<html>
<head>
<style type="text/css">
.a{ //for class use .(class-name)
font-size: 300%;
}
</style>
</head> // all the styling is to be done inside the head tag.
<body>
<div class="a">My text in here</div>
</body>
</html>

Generating (inline CSS) HTML templates from Flask not working

For a personal website I would like to randomly select a background picture (out of 4) for my starting page using flask. When try to create a HTML template (with inline CSS for formatting), the resulting HTML does not display the picture chosen at random.
So far I have tried to use url_for(), as I thought the problem might be that jinja cannot find the files, but this does not resolve my problem.
I also looked at the whitespace and delimiters, which seem to be correct in my mind.
The code from my app.py:
flask import Flask, render_template
import random
app = Flask(__name__)
#app.route('/')
def index():
intt = random.randint(1, 4)
random_number = ("../Images/artwork/{}.jpeg".format(intt))
return render_template('index.html', random_number=random_number)
The code in my HTML file:
<!DOCTYPE html>
<html>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
#font-face {
font-family: "Pretoria Gross";
src: url("../Fonts/Pretoria.ttf");
}
ge {
color: Yellow;
font-family: Pretoria Gross;
font-size:70px;
text-align: center;
display:inline-block;
position:relative;
width:100%;
background: url('../Images/artwork/{{random_number}}.jpeg') no-repeat top center;
background-position: center top;
background-size: 25% auto;
}
</style>
<a href="about.html">
<ge>Website<br/>Title<br/>here</ge>
</a>
The resulting HTML does not render the CSS. Where do I go wrong?
Many Thanks
random_number is already storing the desired path. Change url in the css:
background: url('{{random_number}}') no-repeat top center;
Or, you can simply pass intt to the template, and keep the original css templating:
return render_template('index.html', random_number=intt)

display: inline-block not working Outlook Email HTML

I want to set one table and one image in same line of email.
display: inline; is working fine in any browser but in outlook email everything is in one column, I need them to in same line. Please help me with solution.
<html>
<head>
<title></title>
<style>
img {
display: inline;
width: 100px;
height: 100px;
}
h1 {
display: inline;
}
</style>
</head>
<body> <img src="Image1.jpg">
<h1>afdsgdf dfgsdf dsfgf</h1> <img src="Image2.jpg">
</body>
</html>
Coding HTML for Email clients is completely different to coding for web browsers. And on top of that every email client will render your code differently. It is a far too extensive topic to go into here so I would advise doing some research online. However, to get you started, the most reliable way to write HTML for email is to use a table layout with HTML table tr td tags etc.
Also many email clients ignore style tags so you may want to write your CSS inline like this -
<img style="display: inline; width: 100px; height: 100px;" src="img1.jpg" />
<h1 style="display:inline"> Im a header </h1>
Hope this helps

Python selenium

I can't locate this element.. I'm trying to un-check the history box and dl box (they're checked by default)
from selenium import webdriver
import time
chrome_path = r"C:\Users\Skid\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("chrome://settings/clearBrowserData")
driver.find_element_by_xpath("""//*[#id=delete-browsing-history-checkbox"]""") #unchecks history
driver.find_element_by_xpath("""//*[#id="delete-download-history-checkbox"]""") #unchecks dl history
This is the page source that someone wanted me to update.
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" id="uber" class="" i18n-values="dir:textdirection;lang:language" dir="ltr" lang="en" i18n-processed=""><head>
<meta charset="utf-8" />
<title i18n-content="pageTitle">Settings - Clear browsing data</title>
<link id="favicon" rel="icon" type="image/png" sizes="16x16" href="chrome://theme/IDR_SETTINGS_FAVICON" />
<link id="favicon2x" rel="icon" type="image/png" sizes="32x32" href="chrome://theme/IDR_SETTINGS_FAVICON#2x" />
<link rel="stylesheet" href="chrome://resources/css/chrome_shared.css" />
<style>/* Copyright (c) 2012 The Chromium Authors. All rights reserved.
* Use of this source code is governed by a BSD-style license that can be
* found in the LICENSE file. */
body {
/* http://crbug.com/129406 --- horizontal scrollbars flicker when changing
* sections. */
overflow-x: hidden;
}
#navigation {
height: 100%;
left: 0;
/* This is a hack to prevent the navigation bar from occluding pointer events
* from the bottom scroll bar (which shows when one needs to horizontally
* scroll). Corresponding padding-top to offset this is in uber_frame.css */
margin-top: -20px;
position: absolute;
/* This value is different from the left value to compensate for the scroll
* bar (which is always on and to the right) in RTL. */
right: 15px;
width: 155px;
z-index: 3;
}
#navigation.background {
z-index: 1;
}
#navigation.changing-content {
-webkit-transition: -webkit-transform 100ms, width 100ms;
}
.iframe-container {
-webkit-margin-start: -20px;
-webkit-transition: margin 100ms, opacity 100ms;
bottom: 0;
left: 0;
opacity: 0;
position: absolute;
right: 0;
top: 0;
z-index: 1;
}
.iframe-container.selected {
-webkit-margin-start: 0;
-webkit-transition: margin 200ms, opacity 200ms;
-webkit-transition-delay: 100ms;
opacity: 1;
z-index: 2;
}
.iframe-container.expanded {
left: 0;
}
iframe {
border: none;
display: block;
height: 100%;
width: 100%;
}
</style>
<script src="chrome://resources/js/cr.js"></script>
<script src="chrome://resources/js/cr/ui/focus_manager.js"></script>
<script src="chrome://resources/js/load_time_data.js"></script>
<script src="chrome://resources/js/util.js"></script>
<script src="chrome://chrome/uber.js"></script>
<script src="chrome://chrome/uber_utils.js"></script>
</head>
<body>
<div id="navigation" data-width="155" class="changing-content background" style="transform: translateX(0px);"><iframe src="chrome://uber-frame/" name="chrome" role="presentation" tabindex="-1" aria-hidden="true"></iframe></div>
<div class="iframe-container" i18n-values="id:historyHost; data-url:historyFrameURL;" data-favicon="IDR_HISTORY_FAVICON" id="history" data-url="chrome://history-frame/" hidden="" aria-hidden="true"></div>
<div class="iframe-container" i18n-values="id:extensionsHost; data-url:extensionsFrameURL;" data-favicon="IDR_EXTENSIONS_FAVICON" id="extensions" data-url="chrome://extensions-frame/" hidden="" aria-hidden="true"></div>
<div class="iframe-container selected" i18n-values="id:settingsHost; data-url:settingsFrameURL;" data-favicon="IDR_SETTINGS_FAVICON" id="settings" data-url="chrome://settings-frame/" aria-hidden="false" data-title="Settings - Clear browsing data"><iframe name="settings" role="presentation" src="chrome://settings-frame/clearBrowserData" data-ready="true"></iframe></div>
<div class="iframe-container" i18n-values="id:helpHost; data-url:helpFrameURL;" data-favicon="IDR_PRODUCT_LOGO_16" id="help" data-url="chrome://help-frame/" hidden="" aria-hidden="true"></div>
<script src="chrome://chrome/strings.js"></script>
<script src="chrome://resources/js/i18n_template.js"></script>
</body></html>
driver.find_element_by_xpath is just looking for the checkbox and returning it as WebElement. You want to click on it to unchecked it
driver.find_element_by_xpath("""//*[#id="delete-browsing-history-checkbox"]""").click()
Also, you forgot apostrophes in the first xpath after #id=. It should be like in the example above.
Edit
You can try locating the checkbox by id
driver.find_element_by_id("delete-browsing-history-checkbox").click()
Edit 2
The checkbox are inside iframe. You need to switch to it first
driver.switch_to.frame("settings") # switch to the iframe by name attribute
# driver.switch_to.frame(driver.find_element_by_name("settings")) # should also work
driver.find_element_by_id("delete-browsing-history-checkbox").click()
driver.switch_to.default_content() # switch back to main window
Can you add to your question what you get as body from selenium?
driver.get("chrome://settings/clearBrowserData")
driver.page_source
If I check the source code in Google Chrome of this page I get:
view-source:chrome://chrome/settings/clearBrowserData
<body>
<div id="navigation"><iframe src="chrome://uber-frame/" name="chrome" role="presentation"></iframe></div>
<div class="iframe-container"
i18n-values="id:historyHost; data-url:historyFrameURL;"
data-favicon="IDR_HISTORY_FAVICON"></div>
<div class="iframe-container"
i18n-values="id:extensionsHost; data-url:extensionsFrameURL;"
data-favicon="IDR_EXTENSIONS_FAVICON"></div>
<div class="iframe-container"
i18n-values="id:settingsHost; data-url:settingsFrameURL;"
data-favicon="IDR_SETTINGS_FAVICON"></div>
<div class="iframe-container"
i18n-values="id:helpHost; data-url:helpFrameURL;"
data-favicon="IDR_PRODUCT_LOGO_16"></div>
<script src="chrome://chrome/strings.js"></script>
<script src="chrome://resources/js/i18n_template.js"></script>
</body>
It might be necessary to find another way to do it, if your driver cannot see this node.
Edit
In the source code you posted as page_source returned from selenium, there isn't the node you are trying to find.
After doing a find_element_by... all you get is the element. You also need to have a .click() on that element.
Either:
elem = driver.find_element_by_xpath("""//*[#id=delete-browsing-history-checkbox"]""")
elem.click()
or:
driver.find_element_by_xpath("""//*[#id=delete-browsing-history-checkbox"]""").click()
Btw, you could just use find_element_by_id("delete-browsing-history-checkbox") in your case.
Also, I don't think selenium works on non-web pages. So chrome settings and Firefox's about:config pages (for example) don't work with selenium.

Categories