Grabbing the value of a HTML form input field? - python

There's a web page similar to: www.example.com/form.php
I want to use Python to grab one of the values from the HTML form on the page. For example, if the form had I could get the value "test" returned
I have googled this extensively but most relate to posting form data, or have advice to use Django or cgi-bin. I don't have access to the server directly so I can't do that.
I thought the library REQUESTS could do it but I can't see it in the documentation.
HTML:
<html>
<body>
<form method="" action="formpost.php" name="form1" id="form1">
<input type="text" name"field1" value="this is field1">
<input type="hidden" name="key" value="secret key field">
</form>
</body>
As an example, I'd like something like this in Python:
import special_library
html = special_library.get("http://www.example.com/form.php")
print html.get_field("wanted")
Has anyone got any suggestions to achieve this? Or any libraries I may not have thought of or been aware of?

You can use requests library, and lxml
Try this:
import requests
from lxml import html
s = requests.Session()
resp = s.get("http://www.example.com/form.php")
doc = html.fromstring(resp.text)
wanted_value = doc.xpath("//input[#class='wanted_class_name']/#value")
print(wanted_value)
You can check following resources:
requests
xpath

Related

Redirect user in Telegram Bot to an external link with POST request

Since I'm new to this POST/GET HTTP stuff, I might be getting things wrong, that's why I'll put my question in 2 ways. Maybe one way will be better than the other :)
I'm developing a Telegram Bot using PyTelegramBotAPI, and it needs to include an online payment.
For the online payment I need the user to follow a link with POST method (it's an external link + I need to pass form data), but that's what causes difficulties for me.
I.
In my code I perform the following:
req = requests.post(url=url, data=data)
Where url is the URL of the website to which the client must be redirected, and data is the data that it needs to pass with the POST request when redirecting.
It works fine as a request in Python, but obviously it can't redirect the client to the website needed.
I tried to generate a URL and pass it to the client using
url = url + urlencode(data=data)
Where url is again the URL of the website. But in this case the website tells me that the method used is incorrect. I guess the link becomes a GET request, instead of a POST request.
How can I redirect the client to that link with POST method?
II.
Another way of putting this question is this:
The company which processes the online payments requires them to be performed using the following HTML form:
<form action=”https://securesandbox.webpay.by/” method="post">
<input type=”hidden” name=”*scart” >
<input type=”hidden” name=”wsb_storeid” value=”11111111”>
<input type=”hidden” name=”wsb_order_num” value=”ORDER-12345678”>
<input type=”hidden” name=”wsb_currency_id” value=”BYN”>
<input type=”hidden” name=”wsb_version” value=”2”>
<input type=”hidden” name=”wsb_seed” value=”1242649174”>
<input type=”hidden” name=”wsb_signature” value=”124264917411111111ORDER-123456781BYN10123456”>
<input type=”hidden” name=”wsb_test” value=”1”>
<input type=”hidden” name=”wsb_invoice_item_name[0]” value=”Товар 1”>
<input type=”hidden” name=”wsb_invoice_item_quantity[0]” value=”2”>
<input type=”hidden” name=”wsb_invoice_item_price[0]” value=”10”>
<input type=”hidden” name=”wsb_total” value=”10”>
<input type="submit" value="Купить">
</form>
This would work well if I used HTML pages, but since my web app is a Telegram Bot, hence this wouldn't work. Therefore I need to generate this HTML form automatically in Python (namely, I need to change the "value" fields for every payment).
How can I imitate this HTML form in my Telegram Bot and redirect the client after some trigger?

Pass string from html search box to python function

I am trying to build a front end for a simple TFIDF based document retrieval model(all written in python). The front end will be a simple search bar where the user can enter a query. Using that query I want to return the documents ranked on the basis of their relevancy. I have the backed ready. I have a small function, lets call it query_scorer that takes in the query, does the requisite pre-processing(tokenization, spellcheck, lower casing, etc.) and selects and ranks documents based on their relevancy. What I don't know is how do I pass this query from my html page to the query_scorer and pass the results back to the html page (or maybe a different html page). Lets say I have the following page.
<section >
<form action="" method="">
<input type="search" placeholder="What are you looking for?">
<button>Search</button>
</form>
</section>
How do I transfer the text from the search box to my python script?
Try this:
In the form tag's action="",provide the location of your cgi script and the value of the textbox will be passed to the cgi script.
eg.
<form name="search" action="~/query_scorer.py" method="get">
<input type="text" name="searchbox">
<input type="submit" value="Submit">
</form>
query_scorer.py
import cgi
form = cgi.FieldStorage()
searchterm = form.getvalue('searchbox')
Hope so you may get your result.
You will need to host the php script and expose it as either a web service or web page. I would suggest web page as the easiest method to get started.
You will then need to post to this web page from your form above by entering the action and method in your form attributes.
You web page will need to return html and also call your function.
See a basic overview here

Scrape a form on incorrect web page

I'm trying to scrape a html form using robobrowser with python 3.4. I use the default html parser:
self._browser = RoboBrowser(history=True, parser="html.parser")
It works fine for correct web pages but now I have to parse incorrectly written page. Here is the html fragment:
<form method="post" action="decide.php?act=submit_advance">
<table class="td_advanced">
<tr class="td_advance">
<td colspan="4" class="td_advance"></strong><br></td>
<td colspan="3" class="td_left">Case sensitive:<br><br></td>
<td><input type="checkbox" name="case_sensitive" /><br><br></td>
[...]
</form>
The closing strong tag is incorrect. This error prevents the parser from read all inputs following this incorrect tag:
form = self._browser.get_form()
print(form)
>>> <RoboForm>
Any suggestions?
I have found the solution myself. The comment about beautifulsoup was helpful and took my search to a proper way.
The solution is : use another html parser. I tried with lxml and it works for me.
self._browser = RoboBrowser(history=True, parser="lxml")
As PyPI doesn't currently have lxml installer working with my python version, I downloaded it from here: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml

Parse HTML string from a file and remove element using xpath and write it to same file in python

For my project, I have to remove selective content from a html file using python xpath.
The selective element can be removed using .remove() method, however the content in file looks same.
How do I write the modified content to that file again?
Though If i try to write the same tree object to file using open().write(etree.tostring(tree_obj)), will the content differs for unicode pages? Is there anyother way to keep the modified file?
Why the header tags in below output has different value after printing the tree object?
Please suggest.
Below is my code sample.
Example: I need to remove all div tags inside the html page.
HTML file:
<html>
<head>test</head>
<body>
<p>welcome to the world</p>
<div id="one">
<p>one</p>
<div id="one1">one1</div>
<ul>
<li>ones</li>
<li>twos</li>
<li>threes</li>
</ul>
</div>
<div id="hell">
<p>heaven</p>
<div id="one1">one1</div>
<ul>
<li>ones</li>
<li>twos</li>
<li>threes</li>
</ul>
</div>
<input type="text" placeholder="enter something.." />
<input type="button" value="click" />
</body>
</html>
Python file:
# _*_ coding:utf-8 _*_
import os
import sys
import traceback
import datetime
from lxml import etree, html
import shutil
def process():
fd=open("D:\\hello.html")
tree = html.fromstring(fd.read())
remove_tag = '//div'
for element in tree.xpath(remove_tag):
element.getparent().remove(element)
print etree.tostring(tree)
process()
OUTPUT:
<html>
<head/><body><p>test
</p>
<p>welcome to the world</p>
<input type="text" placeholder="enter something.."/>
<input type="button" value="click"/>
</body></html>
I haven't worked on python but i have played with parsing html based websites using Java with help of library jsoup.
Python also has similar one like this. Beautiful soup. You can play with this thing to get desired output.
Hope it helps.
Have you tried using python's standard library re?
import re</br>
re.sub('<.*?>','', '<nb>foobar<aon><mn>')
re.sub('</.*?>','', '</nb>foobar<aon><mn>')
The above two operations could be used in combination to remove all the html tags. It can be easily modified to remove the div tags too.

Displaying results from search API

I'm trying to get to grips with web2py/python. I want to get the user to fill in a search form, the term they search for is sent to my python script which should send the query to the blekko API and output the results to them in a new HTML page. I've implemented the following code but instead of my normal index page appearing, I'm getting the html response directly from blekko with '%(query)' /html appearing in it's search bar. Really need some help with this!
HTML form on the default/index.html page
<body>
<div id="MainArea">
<p align="center">MY SEARCH ENGINE</p>
<form name="form1" method="get" action="">
<label for="SearchBar"></label>
<div align="center">
<input name="SearchBar" type="text" id="SearchBar" value="" size = "100px"><br />
<input name="submit" type="submit" value="Search">
</div>
</form>
<p align="center"> </p>
Python code on the default.py controller
import urllib2
def index():
import urllib2
address = "http://www.blekko.com/?q='%(query)'+/html&auth=<mykey>"
query = request.vars.query
response = urllib2.urlopen(address)
html=response.read()
return html
I think you are misunderstanding how string formatting works. You need to put the address and query together still:
address = "http://www.blekko.com/?q='%(query)s'+/html&auth=<mykey>" % dict(query=request.vars.query)
Add a hidden field to your form, call it "submitted". Then reformat your controller function as such:
import urllib2
def index():
if request.vars.submitted:
address = "http://www.blekko.com/?q='%(query)'+/html&auth=<mykey>"
query = request.vars.query
response = urllib2.urlopen(address)
html=response.read()
return html
else:
return dict()
This will show your index page unless the form was submitted and the page received the "submitted" form variable.
The /html doesn't do anything. Glad your question got answered. There is python client code for the blekko search api here: https://github.com/sampsyo/python-blekko

Categories