How can I extract an HTML form value? - python

Hi I have the following in a page:
input id="cu_first_name" class="input_text" type="text" value="test_name" name="cu_name"
I am trying to extract the value and print it in python.
I use:
username = driver.find_element_by_id("cu_first_name")
print username.text
But this will not work since there is no actual text there, I need the "test_name" to be printed, pls help me!

You've gotten the element by id. From there, you need to get the element's attribute. Give the following a try:
username = driver.find_element_by_id("cu_first_name")
value = username.get_attribute('value')
print value

Related

Finding if some tag exists in HTML response and printing if/else accordingly

I am trying to collect data from a website (using Python). In a webpage, there are multiple listings of software and in each listing. My data is within a tag (h5) and certain class ('price_software_details).
However, in some cases, tag along with the data is missing. I want to print 'NA' message if data and tag are missing else it should print the data.
I tried the code that I have mentioned below, though it's not working.
Help please!
interest = soup.find(id = 'allsoftware')
for link in interest.findAll('h5'):
if link.find(class_ = 'price_software_details') == True:
print(link.getText())
else:
print('NA')
Have you tried error handling (try, except)?
interest = soup.find(id='allsoftware')
for link in interest.findAll('h5'):
try:
item = link.find({'class':'price_software_details'})
print(item.get_text())
except:
print('NA')
You need to know soup.find() never be True.It only will be result or None.
interest = soup.find(id = 'allsoftware')
for link in interest.findAll('h5'):
if link.find(class_ = 'price_software_details'):
print(link.getText())
else:
print('NA')

modify url address in python

I'm trying to add strings into a URL address in order to get data from a server.
the string depends on a user input. the user input i saved under a variable called id.
id = str(raw_input("Enter a valid ID: "))
my url address looks like this:
url = "http://www.test.com/?%s&%s" % (id, api_key)
when i'm printing the URL just to check I've got everything in order i get this result:
http://www.test.com/?<built-in function id>&ef50250
I followed some other questions and some other tutorials but none seem to clearly it for my.
It is my first project so excuse if i ask any obvious questions.
id is a built-in function. Give a different name to your variable. By the way, raw_input returns str. So you can get rid of str(raw_input(...))
>>> my_id = raw_input("Enter a valid ID: ")
Enter a valid ID: 12
>>> api_key='abc'
>>> url = "http://www.test.com/?%s&%s" % (my_id, api_key)
>>> print url
http://www.test.com/?12&abc

Extracting CSV from Export Button

I apologize for not being able to specifically give out the url im dealing with. I'm trying to extract some data from a certain site but its not organized well enough. However, they do have an "Export To CSV file" and the code for that block is ...
<input type="submit" name="ctl00$ContentPlaceHolder1$ExportValueCSVButton" value="Export to Value CSV" id="ContentPlaceHolder1_ExportValueCSVButton" class="smallbutton">
In this type of situation, whats the best way to go about grabbing that data when there is no specific url to the CSV, Im using Mechanize and BS4.
If you're able to click a button that could download the data as a csv, it sounds like you might be able to wget link that data and save it on your machine and work with it there. I'm not sure if that's what you're getting at here though, any more details you can offer?
You should try Selenium, Selenium is a suite of tools to automate web browsers across many platforms. It can do a lot thing including click button.
Well, you need SOME starting URL to feed br.open() to even start the process.
It appears that you have an aspnetForm type control there and the below code MAY serve as a bit of a starting point, even though it does not work as-is (it's a work in progress...:-).
You'll need to look at the headers and parameters via the network tab of your browser dev tools to see them.
br.open("http://media.ethics.ga.gov/search/Lobbyist/Lobbyist_results.aspx?&Year=2016&LastName="+letter+"&FirstName=&City=&FilerID=")
soup = BS(br.response().read())
table = soup.find("table", { "id" : "ctl00_ContentPlaceHolder1_Results" }) # Need to add error check here...
if table is None: # No lobbyist with last name starting with 'X' :-)
continue
records = table.find_all('tr') # List of all results for this letter
for form in br.forms():
print "Form name:", form.name
print form
for row in records:
rec_print = ""
span = row.find_all('span', 'lblentry', 'value')
for sname in span:
if ',' in sname.get_text(): # They actually have a field named 'comma'!!
continue
rec_print = rec_print + sname.get_text() + "," # Create comma-delimited output
print(rec_print[:-1]) # Strip final comma
lnk = row.find('a', 'lblentrylink')
if lnk is None: # For some reason, first record is blank.
continue
print("Lnk: ", lnk)
newlnk = lnk['id']
print("NEWLNK: ", newlnk)
newstr = lnk['href']
newctl = newstr[+25:-5] # Matching placeholder (strip javascript....)
br.select_form('aspnetForm') # Tried (nr=0) also...
print("NEWCTL: ", newctl)
br[__EVENTTARGET] = newctl
response = br.submit(name=newlnk).read()

python check if list items is in string

link = 'http://dedegood.com'
wrongdomain = ['google','facebook','twitter']
if any(link.find(i) for i in wrongdomain):
print 'pass this url'
else:
print 'good'
I want to check if link contains the words in wrongdomain
Why this always print 'pass this url'?
link has no google or facebook or twitter in it
I try seperate like link.find('google')
it will return -1 .so what's the problem?
Please help me to check my logic.Thank you
bool(-1) is True in Python. Instead of find, you can just do:
if any(domain in link for domain in wrongdomain):
Just remember that will also match the rest of the url, not just the domain.
Your method will not work correctly like a url like http://dedegood.com/google this. So you can use something like;
link = 'http://dedegood.com'
wrongdomain = ['google','facebook','twitter']
a=link.split("//")
b=a[1].split(".")
if any(domain in b[0] for domain in wrongdomain):
print ('pass this url')
else:
print ('good')
Since you just want to check url, you can use this one. Instead of checking all link, it's checking only the name of website. So if any url like http://dedegood.com/google will not be a problem.
Do you want to know whether the url's domain is in wrongdomain or not? I would suggest you can do this for better performance:
import urlparse
import tldextract
link = 'http://dedegood.com'
wrongdomain = ['google','facebook','twitter']
parsed = tldextract.extract(link)
if parsed.domain in wrongdomain:
print 'pass this url'
else:
print 'good'
You could check out tldextract, a library designed to get domain from a url.

How to select an item for dropdown menu with mechanize in python?

I am REALLY confused. I'm basically trying to fill out a form on a website with mechanize for python. I got everything to work except the dropdown menu. What do I use to select it and what do I put for the value? I don't know if I'm supposed to put the name of the selection or the numerical value of it. Help would be greatly appreciated, thanks.
Code snippet:
try:
br.open("http://www.website.com/")
try:
br.select_form(nr=0)
br['number'] = "mynumber"
br['from'] = "herpderp#gmail.com"
br['subject'] = "Yellow"
br['carrier'] = "203"
br['message'] = "Hello, World!"
response = br.submit()
except:
pass
except:
print "Couldn't connect!"
quit
I'm having trouble with the carrier, which is a dropdown menu.
According to the mechanize documentation examples, you need to access attributes of the form object, not the browser object. Also, for the select control, you need to set the value to a list:
br.open("http://www.website.com/")
br.select_form(nr=0)
form = br.form
form['number'] = "mynumber"
form['from'] = "herpderp#gmail.com"
form['subject'] = "Yellow"
form['carrier'] = ["203"]
form['message'] = "Hello, World!"
response = br.submit()
Sorry for reviving a long-dead post, but this was the still best answer I could find on google and it doesn't work. After more time than I care to admit, I figured it out. infrared is right about the form object, but not about the rest, and his code doesn't work. Here's some code that works for me (though I'm sure a more elegant solution exists):
# Select the form
br.open("http://www.website.com/")
br.select_form(nr=0) # you might need to change the 0 depending on the website
# find the carrier drop down menu
control = br.form.find_control("carrier")
# loop through items to find the match
for item in control.items:
if item.name == "203":
# it matches, so select it
item.selected = True
# now fill out the rest of the form and submit
br.form['number'] = "mynumber"
br.form['from'] = "herpderp#gmail.com"
br.form['subject'] = "Yellow"
br.form['message'] = "Hello, World!"
response = br.submit()
# exit the loop
break

Categories