I am working with Python and Selenium trying to create an automation for a website.
I have had no issues previously because I take a text from an excel cell using openpyxl, but for this case, I need to paste a text that contains an embedded html link.
When I copy and paste the text from Word into a sheet it lost the format.
When doing it manually agents copy the text from word and paste it into the web field.
Is there any way to achieve the same, maybe a word library for python or similar?
Thanks in advance for your support.
I used openpyxl to get the text from the cell.
I use selenium sendkeys function to send the text to the input field, I expected the text to retain formatting.
Related
For a Python web project (with Django) I developed a tool that generates an XLSX file. For questions of ergonomics and ease for users I would like to integrate this excel on my HTML page.
So I first thought of converting the XLSX to an HTML array, with the xlsx2html python library. It works but since I can’t determine the desired size for my cells or trim the content during conversion, I end up with huge cells and tiny text..
I found an interesting way with the html tag associated with OneDrive to embed an excel window into a web page, but my file being in my code and not on Excel Online I cannot import it like that. Yet the display is perfect and I don’t need the user to interact with this table.
I have searched a lot for other methods but apart from developing a function to browse my file and generate the script of the html table line by line, I have the feeling that I cannot simply use a method to convert or display it on my web page.
I am not accustomed to this need and wonder if there would not be a cleaner method to display an excel file in html.
Does it make sense to develop a function that builds my html table script in str? Or should I find a library that does it? Maybe there is a specific Django library ?
Thank you for your experience
Here is a document(word file), and I extract some sentences from that and write into an excel file by python.
And now I want to create a hyperlink of a sentences, which links to the page where the sentences belong.
For example, if there a sentence "I love python" is in page 5 in a word file, and after I extract this sentence to a cell of an excel file by python, it is possible to create a hyperlink linking back to page 5 of that word file by xlsxwriter?
I afraid there is no way you can hyperlink to a page in DOCS file. You can hyperlink till file level.
Attaching the code here.
formula = 'HYPERLINK("[{path]", "Click Here")'.format(path=<PATH TO FILE>)
value = xlwt.Formula(formula)
This is done by xlwt library.
Also, The Hyperlink works differently on different excel formats. ie Hyperlink works on Microsoft excel will not work in Open Office Excel.
I am trying to remove a page number text object from each page of a PDF I automate each month. I have been able to run a python script to create the final PDF I need, but am unable to automatically remove the incorrect page numbers and replace with new page nums in the correct order (as I have to piece this PDF together from multiple source PDFs.
Any thoughts on best way to complete this?
As an example lets say I wanted to record all bios of users on SO.
Lets say I loaded up: How to click an element in Selenium WebDriver using JavaScript
I clicked all users: .user-details a (11 of them)
I wrote Extracted text -> to a csv.
driver.get(‘Version compatibility of Firefox and the latest Selenium IDE (2.9.1.1-signed)’)
I read from csv the users.
user: Ripon Al Wasim [Is present again, do not click him] ??? How can this be achieved. As its text.
Is something like this accomplish-able or is this a limitation of selenium python?
You could click all of them, but lets say you had to scrape 200 pages and common name Bob popped up 430 times. I feel like it is unnecessary to click his name. Is something like this possible with Selenium?
I feel like I'm missing something and this is achievable but I am unaware how.
You could compare the text of text file and print(elem.get_attribute("href")) -> write that to a file and compare them. If elements were present, delete them but this is text. You could (maybe) put the text in an excel file. I'm not entirely sure if this is possible but you could write the css elements individually beside the text in the excel. And Delete rows where there are matched strings. And then get Selenium to load that up into Webdriver.
I'm not entirely convinced even this would work.
Is there a sane way of clicking css but ignoring names in a text file you have already clicked.
There's nothing special here with Selenium. That is your tool for interacting with the browser. It is your program that needs to decide how to do that interaction, and what you do with the information from it.
It sounds like you want to build a database of users, so why not use a database? something like SQLite or PostgreSQL might work nicely for you.
Among the user details, store the name as it appears in the link (assuming it will be unique for each user), and index that name. when scraping your page, pull that link text, then use SQL statements to search if the record exists by that name, if not, then click the link and add a new record.
This is not a duplicate although the issue has been raised in this forum in 2011Getting a hyperlink URL from an Excel document, 2013 Extracting Hyperlinks From Excel (.xlsx) with Python and 2014 Getting the URL from Excel Sheet Hyper links in Python with xlrd; there is still no answer.
After some deep dive into the xlrd module, it seems the Data_sheet.hyperlink_map.get((row, col)) item trips because "xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx" per #alecxe at Extracting Hyperlinks From Excel (.xlsx) with Python.
Question: has anyone has made progress with extracting URLs from hyperlinks stored in an excel file. Say, of all the customer data, there is a column of hyperlinks. I was toying with the idea of dumping the excel sheet as an html page and proceed per usual scraping (file on local drive). But that's not a production solution. Supplementary: is there any other module that can extract the url from a .cell(row,col).value() call on the hyperlink-cell. Is there a solution in mechanize? Many thanks.
I had the same problem trying to get the hyperlinks from the cells of a xlsx file. The work around I came up with is simply converting the Excel sheet to xls format, from which I could manage to get the hyperlinks withount any trouble, and once finished the editing, I formatted it back to the original xlsx file.
I don't know if this should work for your specific needs, or if the change of format implies some consecuences I am not aware of, but I think it's worth a try.
I was able to read and use hyperlinks to copy files with openpyxl. It has a cell_obj.hyperlink and cell_obj.hyperlink.target which will grab the link value. I made a list of the cell row col values which had hyperlinks, then appended them to a list and then looped through the list to move the linked files.