Rails Mechanize how to select a form based on its name - python

I fail to select a form using its name with Mechanize in rails. The source code of the page I am trying to take data from looks like this:
var strAccesBamPoppin = "";
if(!emailing){
strAccesBamPoppin = '<form name="bamaccess_' + idTCM + '" id="bamaccess_' + idTCM + '" class="bamaccessDecloi" autocomplete="off" method="post" action="'+chemin+'"';
if (typeConnexion == "True") {strAccesBamPoppinPoppin = strAccesBamPoppin +'>';
With python, I would use something like
XX.select_form('bamaccess')
What would be the equivalent with Ruby? Thanks.

XX.forms.select{|x| x[:name][/bamaccess/]}
The above should work for sure.
XX.forms_with(name: /bamaccess/)
This will return an array of all the forms with a name containing bamaccess
Keep in mind that /string/ is a regexp since the name will contain bamaccess_*****

Related

How to write Xpath for dynamic value passing?

How to write Xpath for the following:
<input class="t2" style="background-color:#008000;" title="Jump to Detailed Analysis" type="button" value="Analyze" onclick="javascript:popAnalyze
("1622662"," SP0001622662","CS3_pro2_axeda6","5336293761");">
Highlighted values are there in some variable(st_name). Highlighted and Red colour rounded values will be changing dynamically.
I'm not able to get how to write Xpath for this.
import xlrd
path = r'C:\Users\tmou\PycharmProjects\Python\WebScraping\Book2.xlsx'
workbook = xlrd.open_workbook(path)
sheet = workbook.sheet_by_index(0)
for c in range(sheet.ncols):
for r in range(sheet.nrows):
st = (sheet.cell_value(r, c))
try:
if st == float(st):
st_string = int(st)
#variable = 1622662
#new_string = "javascript:popAnalyze("" + str(st_string) + "","SP0001622662","CS3_pro2_axeda6","5336293761");"
#driver.find_element_by_xpath("//input[#class='t2']/#onclick='" + st_string + "'").click()
#driver.find_element_by_xpath("//input[#value='Analyze' and contains(#onclick='" + st_string + "']")
#driver.find_element_by_xpath("//a[#title='" + st_string + "']")
HTML:
<input class="t2" style="background-color:#008000;" title="Jump to Detailed Analysis" type="button" value="Analyze" onclick="javascript:popAnalyze("1622662","SP0001622662","CS3_pro2_axeda6","5336293761");">
If the value that you are looking for is the one under onclick attribute then the following Xpath expression should work:
string(//input[#class='t2']/#onclick)
Edit 1
Can you try for XPath version 3 or lower:
//input[(#class='t2' and matches(#onclick,'1622662'))]
And for XPath version 3.1:
//input[#class='t2']/[matches(#onclick, '1622662')]
There are many ways to do this without the value of onclick, so don't bother even if it is dynamic, like shown below:
//input[#title='Jump to Detailed Analysis']
or
//input[#value='Analyze']
or
//input[#value='Analyze' and #title='Jump to Detailed Analysis']
EDIT 1:
You can use variable like shown below:
variable = "Analyze"
xpath = "//input[#value='" + variable + "']"
EDIT 2:
variable = 1622662
new_string = "javascript:popAnalyze("" + str(variable) + "","SP0001622662","CS3_pro2_axeda6","5336293761");"
EDIT 3:
variable = 1622662
xpath = "//input[#value='Analyze' and contains(#onclick,'" + str(variable) + "')]"
if driver.find_elements_by_xpath(xpath):
driver.find_element_by_xpath(xpath).click()
In the above code variable will be your dynamic value.
xpath variable will have a dynamic xpath based on the value of
variable
if driver.find_elements_by_xpath(xpath): will check if at least one
element with the xpath exit
if exists exits click on it
Use one of the following XPath :
//input[#value='Analyze' and contains(#onclick,'"+st_name+"')]
OR
//input[#title='Jump to Detailed Analysis' and contains(#onclick,'"+st_name+"')]
Final Code :
driver.find_element_by_xpath("//input[#title='Jump to Detailed Analysis' and contains(#onclick,'"+st_name+"')]")

How to store the HTML within an opening and closing tag with Python

I am reading in an HTML document and want to store the HTML nested within a div tag of a certain name, while maintaining its structure (the spacing). This is for the ability convert an HTML doc into components for React. I am struggling with how to store the structure of the nested HTML, and locate the correct closing tag for the div the denotes that everything nested within it will become a React component (div class='rc-componentname' is the opening tag). Any help would be very appreciated. Thanks!
Edit: I assume regex are the best way to go about this. I haven't used regex before so if that is correct someone could point me in the right direction for the expression used in this context that would be great.
import os
components = []
class react_template():
def __init__(self, component_name): # add nested html as second element
self.Import = "import React, { Component } from ‘react’;"
self.Class = "Class " + component_name + ' extends Component {'
self.Render = "render() {"
self.Return = "return "
self.Export = "Default export " + component_name + ";"
def react(component):
r = react_template(component)
if not os.path.exists('components'): # create components folder
os.mkdir('components')
os.chdir('components')
if not os.path.exists(component): # create folder for component
os.mkdir(component)
os.chdir(component)
with open(component + '.js', 'wb') as f: # create js component file
for j_key, j_code in r.__dict__.items():
f.write(j_code.encode('utf-8') + '\n'.encode('utf-8'))
f.close()
def process_html():
with open('file.html', 'r') as f:
for line in f:
if 'rc-' in line:
char_soup = list(line)
for index, char in enumerate(char_soup):
if char == 'r' and char_soup[index+1] == 'c' and char_soup[index+2] == '-':
sliced_soup = char_soup[int(index+3):]
c_slice_index = sliced_soup.index("\'")
component = "".join(sliced_soup[:c_slice_index])
components.append(component)
innerHTML(sliced_soup)
# react(component)
def innerHTML(sliced_soup): # work in progress
first_closing = sliced_soup.index(">")
sliced_soup = "".join(sliced_soup[first_closing:]).split(" ")
def generate_components(components):
for c in components:
react(c)
if __name__ == "__main__":
process_html()
I see you've used the word soup in your code... maybe you've already tried and disliked BeautifulSoup? If you haven't tried it, I'd recommend you look at BeautifulSoup instead of attempting to parse HTML with regex. Although regex would be sufficient for a single tag or even a handful of tags, markup languages are deceptively simple. BeautifulSoup is a fine library and can make things easier for dealing with markup.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
This will allow you to treat the entirety of your html as a single object and enable you to:
# create a list of specific elements as objects
soup.find_all('div')
# find a specific element by id
soup.find(id="custom-header")

Parse Code from Text - Python

I am analyzing StackOverflow's dump file "Posts.Small.xml" using pySpark. I want to separate 'code block' from 'text' in a Row. A typical parsed row looks like:
['[u"<p>I want to use a track-bar to change a form\'s opacity.</p>
<p>This is my code:</p>
<pre><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>
<p>When I try to build it, I get this error:</p>
<blockquote>
<p>Cannot implicitly convert type \'decimal\' to \'double\'.
</p>
</blockquote>
<p>I tried making <code>trans</code> a <code>double</code>, but then the control doesn\'t work.',
'", u\'This code has worked fine for me in VB.NET in the past.',
'\', u"</p>
When setting a form\'s opacity should I use a decimal or double?"]']
I've tried "itertools" and some python functions but couldn't get the result.
My initial code to extract the above row is:
postsXml = textFile.filter( lambda line: not line.startswith("<?xml version=")
postsRDD = postsXml.map(............)
tokensentRDD = postsRDD.map(lambda x:(x[0], nltk.sent_tokenize(x[3])))
new = tokensentRDD.map(lambda x: x[1]).take(1)
a = ''.join(map(str,new))
b = a.replace("<", "<")
final = b.replace(">", ">")
nltk.sent_tokenize(final)
Any ideas are appreciated!
You can extract the code contents by using XPath (the lxml library will help) and then extract the text content selecting everything else, for example:
import lxml.etree
data = '''<p>I want to use a track-bar to change a form's opacity.</p>
<p>This is my code:</p> <pre><code>decimal trans = trackBar1.Value / 5000; this.Opacity = trans;</code></pre>
<p>When I try to build it, I get this error:</p>
<p>Cannot implicitly convert type 'decimal' to 'double'.</p>
<p>I tried making <code>trans</code> a <code>double</code>.</p>'''
html = lxml.etree.HTML(data)
code_blocks = html.xpath('//code/text()')
text_blocks = html.xpath('//*[not(descendant-or-self::code)]/text()')
The easiest way will probably be to apply a regex to the text, matching tags '' and ''. That would enable you to find the code blocks. You don't say what you would do with them afterwards, though. So ...
from itertools import zip_longest
sample_paras = [
"""<p>I want to use a track-bar to change a form\'s opacity.</p>
<p>This is my code:</p>
<pre><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>
<p>When I try to build it, I get this error:</p>
<blockquote>
<p>Cannot implicitly convert type \'decimal\' to \'double\'. </p>
</blockquote>
<p>I tried making <code>trans</code> a <code>double</code>, but then the control doesn\'t work.""",
"""This code has worked fine for me in VB.NET in the past.""",
"""</p>
When setting a form\'s opacity should I use a decimal or double?""",
]
single_block = " ".join(sample_paras)
import re
separate_code = re.split(r"</?code>", single_block)
text_blocks, code_blocks = zip(*zip_longest(*[iter(separate_code)] * 2))
print("Text:\n")
for t in text_blocks:
print("--")
print(t)
print("\n\nCode:\n")
for t in code_blocks:
print("--")
print(t)

Getting this output, with generators/list comprehensions?

I'm having a bit of trouble with something, and I don't know how I could do it.
Well, I'm creating a dynamic form with buttons that adapts to how many files (in this case, movies) there are in a directory.
I have got this so far:
path="C:\\Users\\User\\Desktop\\test\\" # insert the path to the directory of interest
movies = []
dirList=os.listdir(path)
for fname in dirList: # loops through directory specified
print fname # prints file name
movies.append(fname) # adds the file name to list
my_form = form.Form([form.Button("btn", id="btn" + movies[i], value = i, html=movies[i], class_="btn" +movies[i]) for i in range(len(movies))])
However, I want the list comprehension/generator to make my_form look something like this:
my_form = form.Form(
form.Button("btn", id="btnA", value="A", html="Movie1", class_="btnA")
form.Button("btn", id="btnB", value="B", html="Movie2", class_="btnB")
)
As you can see instead of the movie name being the id, it is btnA or btnB.
So how could I generate that output?
I think you want to do something like:
from string import ascii_uppercase
buttons = [form.Button("btn", id="btn{0}".format(char), value=char,
html=movie, class_="btn{0}".format(char))
for char, movie in zip(ascii_uppercase, movies)]
my_form = form.Form(buttons)
This uses the letters in string.ascii_uppercase to label each item in movies.
If I understand correctly, you want the id to be btn + a letter according to the index of the movie?
you can use this code:
def letterForIndex(idx):
return chr(ord('A')+idx)
so you would do :
my_form = form.Form([form.Button("btn", id="btn" + letterForIndex(i),
value = letterForIndex(i), html=movies[i], class_="btn" +letterForIndex(i)) for i in range(len(movies))])

Python While Not recordset.EOF that wont .MoveNext

I don't really understand why this is happening. Maybe a fresh set of eyes could help.
in the table of an access database, say C:\dbase.mdb, I have a table called tProcedureGroups with two fields, ID and Description.
ID Description
1 DIAGNOSTIC
2 PREVENTATIVE
3 RESTORATIVE
So my recordset should be a lot more than an infinite... ID "\t" + Description + "\n"
Here's my code... this had to of happened to a few of you python gurus out there!
Thanks so much for your help, everyone on this site seems super helpful.
import win32com.client
def Procedures(listed):
DB = r"C:\dbase.mdb"
engine = win32com.client.Dispatch("DAO.DBEngine.36")
db = engine.OpenDatabase(DB)
sql = "select * from [tProcedureGroups]"
access = db.OpenRecordset(sql)
while not access.EOF:
for i in listed:
print i + '\t' + str(access.Fields(i).value) + '\n'
access.MoveNext
fields = ["ID", "Description"]
get_procs = Procedures(fields)
In Python you need to call methods explicitly with ().
So change:
access.MoveNext
to
access.MoveNext()

Categories