I've made and deleted a couple of threads on here in reference to a single issue I'm having, only because I haven't been able to accurately describe what I'm trying to do.
Essentially: I want to automate a search process. This would involve automatically selecting a radio button on this webpage, then feeding each cell of column B:B in this excel spreadsheet into one of the forms enabled by the radio button. Ideally, the process would also populate the spreadsheet (or a new .csv or .txt file) with the output from each search.
I am utterly at a loss as to how to integrate excel with Python. I have the following script so far:
import mechanize
URL = 'http://www.adviserinfo.sec.gov/IAPD/Content/Search/iapd_Search.aspx'
br = mechanize.Browser()
br.open(URL)
for form in br.forms():
print "Form name:", form.name
print form
form.set_all_readonly(False)
But from there I don't know what to do. I'm not a programmer; teaching myself Python for this project. Any and all help will keep me from manually copying and pasting 1,175 names into a search bar, which is neither a good use of my time nor an opportunity to learn anything new. Thank you!
You'll have to see a Excel file in txt (open it with notepad or anything you like). That is the "real" archive, the divisions, pretty much like html.
For each row you want to write on excel, instead of print "Form name:", form.name you'll have to write on a file something like "form.name".
Do not forget to create the archive before the printing. Note that you'll create a xls file, and write him as you saw in the txt.
Related
For a Python web project (with Django) I developed a tool that generates an XLSX file. For questions of ergonomics and ease for users I would like to integrate this excel on my HTML page.
So I first thought of converting the XLSX to an HTML array, with the xlsx2html python library. It works but since I can’t determine the desired size for my cells or trim the content during conversion, I end up with huge cells and tiny text..
I found an interesting way with the html tag associated with OneDrive to embed an excel window into a web page, but my file being in my code and not on Excel Online I cannot import it like that. Yet the display is perfect and I don’t need the user to interact with this table.
I have searched a lot for other methods but apart from developing a function to browse my file and generate the script of the html table line by line, I have the feeling that I cannot simply use a method to convert or display it on my web page.
I am not accustomed to this need and wonder if there would not be a cleaner method to display an excel file in html.
Does it make sense to develop a function that builds my html table script in str? Or should I find a library that does it? Maybe there is a specific Django library ?
Thank you for your experience
If I go to this website:
https://covid.cdc.gov/covid-data-tracker/#ed-visits
and click the "download" button (on the right), a .csv file is downloaded.
I can't find the address of that csv file, so that I could fetch it automatically with pd.read_csv(). I had a snoop around the web inspector thing, but I don't really know what I'm doing, and nothing jumped out at me as being the obvious answer. I've also looked around various other sites to try to find an API that gives me access to this data, bat there doesn't appear to be such thing.
Can anyone help me with that?
Thanks so much!
You might want to open your web inspector and go to the "Network"-Tab and then reload the page. You are going to see, that there's never a csv actually being loaded.
Also the export button doesn't link to any file. Rather it has some javascript binding, that exports the existing data in your client (the browser) as a csv to your filesystem. In other words: There isn't an address for that file. Its being created in your browser.
So even better, you can read the json directly. Just find the correct data in the Network-Tab, I think it might be this: https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data
So instead you could directly read the json:
pd.read_json('https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data') and then filter for the data that you need.
I want to automate some report creation. Some elements that I need in the report are saved as rich text, so an HTML file. There are a couple of libraries to do this, such as html2pdf or pdfforge. However, I would also like to add extra information to the report that is not located in this HTML file, like for example a title or some information queried from the DB that is not necessarily in the HTML file.
Does anyone have a suggestion to do this?
Thanks in advance.
As an example lets say I wanted to record all bios of users on SO.
Lets say I loaded up: How to click an element in Selenium WebDriver using JavaScript
I clicked all users: .user-details a (11 of them)
I wrote Extracted text -> to a csv.
driver.get(‘Version compatibility of Firefox and the latest Selenium IDE (2.9.1.1-signed)’)
I read from csv the users.
user: Ripon Al Wasim [Is present again, do not click him] ??? How can this be achieved. As its text.
Is something like this accomplish-able or is this a limitation of selenium python?
You could click all of them, but lets say you had to scrape 200 pages and common name Bob popped up 430 times. I feel like it is unnecessary to click his name. Is something like this possible with Selenium?
I feel like I'm missing something and this is achievable but I am unaware how.
You could compare the text of text file and print(elem.get_attribute("href")) -> write that to a file and compare them. If elements were present, delete them but this is text. You could (maybe) put the text in an excel file. I'm not entirely sure if this is possible but you could write the css elements individually beside the text in the excel. And Delete rows where there are matched strings. And then get Selenium to load that up into Webdriver.
I'm not entirely convinced even this would work.
Is there a sane way of clicking css but ignoring names in a text file you have already clicked.
There's nothing special here with Selenium. That is your tool for interacting with the browser. It is your program that needs to decide how to do that interaction, and what you do with the information from it.
It sounds like you want to build a database of users, so why not use a database? something like SQLite or PostgreSQL might work nicely for you.
Among the user details, store the name as it appears in the link (assuming it will be unique for each user), and index that name. when scraping your page, pull that link text, then use SQL statements to search if the record exists by that name, if not, then click the link and add a new record.
I want to email out a document that will be filled in by many people and emailed back to me. I will then parse the responses using Python and load them into my database.
What is the best format to send out the initial document in?
I was thinking an interactive .pdf but do not want to have to pay for Adobe XI. Alternatively maybe a .html file but I'm not sure how easy it is to save the state of it once its been filled in in order to be emailed back to me. A .xls file may also be a solution but I'm leaning away from it simply because it would not be a particularly professional looking format.
The key points are:
Answers can be easily parsed using Python
The format should common enough to open on most computers
The document should look relatively pleasing to the eye
Send them a web-page with a FORM section, complete with some Javascript to grab the contents of the controls and send them to you (e.g. in JSON format) when they press "submit".
Another option is to set it up as a web application. There are several Python web frameworks that could be used for that. You could then e-mail people a link to the web-app.
Why don't you use Google Docs for the form. Create the form in Google Docs and save the answer in an excel sheet. And then use any python Excel format reader (Google them) to read the file. This way you don't need to parse through mails and will be performance friendly too. Or you could just make a simple form using AppEngine and save the data directly to the database.