Inconsistent Download 'Failed - Path too long' Error using Python Selenium Chromedriver - python

Stack: Windows 8.1, Python 3.4.3.4, Selenium 2.46, Chromedriver 2.16
I'm using Selenium Python bindings with Chromedriver to automate some downloads from the following URL: r'http://financials.morningstar.com/'
I have set up the following chromedriver preferences:
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory':symbol_dir}
chromeOptions.add_experimental_option('prefs', prefs)
chromeOptions.add_extension(self.chromeUBlock_path)
chromedriver_path = self.chromedriver_path
Furthermore I have set up the following selenium code to run the download which works without a problem and downloads to the correct file location etc.:
symbol = 'AAPL' # test symbol
IS_anl_statement = 'income-statement'
IS_anl_abbrev = 'is'
url_anl = self.mstar_base_url + r'{}/{}.html?t={}&region=usa&culture=en-US'
IS_anl = webdriver.Chrome(executable_path=chromedriver_path, chrome_options=chromeOptions)
IS_anl.set_page_load_timeout(90)
try:
IS_anl.get(url_anl.format(IS_anl_statement, IS_anl_abbrev, symbol))
anl_element = WebDriverWait(IS_anl, 90).until(EC.element_to_be_clickable(
(By.LINK_TEXT, 'Annual')))
anl_csv_element = WebDriverWait(IS_anl, 90).until(EC.element_to_be_clickable(
(By.CSS_SELECTOR,self.css_path_export)))
anl_csv_element.click()
for i in range(1,10,1):
time.sleep(i/50)
if os.path.isfile(anl_file_string)==True:
break
except Exception as e:
print(e)
IS_anl.quit()
However, when running the download with the following abbreviation (simply substituting balance-sheet in for income-statement like so:
BS_anl_statement = 'balance-sheet'
BS_anl_abbrev = 'bs'
and the remaining Selenium code exactly the same I get the dreaded download error:
Failed-Path Too Long
This is strange because the actual filepath is Not Too Long. I in fact tested the download in 3 different directory's each with a shorter filepath than the last. The final example path is: r"C:\mstar_data\\"
I'm stuck. The only difference I can see between the two download attempts is the actual CSV link. But even in this case the Income-Statement download url is in fact longer than the Balance-Sheet so again I'm stuck. Here they are:
I/S CSV url:
http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:ZUMZ&region=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=573205&denominatorView=raw&number=3
B/S CSV url:
http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNYS:A&region=usa&culture=en-US&cur=&reportType=bs&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=558981&denominatorView=raw&number=3
Any help on this inconsistency would be a great help and definitely appreciated. Thanks.

I was able to find a workaround for this issue. #Skandigraun was right there was a piece of code that was trying to use the downloaded file too soon.
I used Advanced Uninstaller to clear all temporary files, internet history and so on.
I restarted my computer.
I altered the following code block regarding Skandigraun comments from:
for in range(1, 10, 1):
time.sleep(i/50)
if os.path.isfile(anl_file_string)==True:
break
to:
for i in range(1, 10, 1):
time.sleep(i/20) # changed from 50 to 20
if os.path.isfile(anl_file_string)==True:
break
and I added:
finally:
IS_anl.quit()
Now the code runs with much more stability. I believe the os call to test if the file existed was happening too quickly/frequently and it caused a download failure. I added the finally statement to the try/except loop to close the browser no matter what happened. This prevented the script from hanging if there was a problem accessing the site, timeout errors etc.

Related

How to download automatically a CSV file via Selenium (Python) without opening the dialog box

My environment:
Firefox version : 78.9.0esr (64 bits)
Web driver : geckodriver-v0.26.0-win64
OS: Windows 10 (64 bits)
Python version: 3.8 (64 bits)
Python Selenium package version : 3.141.0
My requirement is to Download a CSV file from a web page via Selenium (Python code) but without opening the Dialog box. I mean, if you click manually on the download button, what happens is that you will see a dialog box asking for the name (if you want to change the default name) and location to save the file. What I wish to do is to bypass the dialog box step so when Selenium clicks on the button, the specific file would be downoaded to some default directory that I have specified with its default name. After a lot of googling (including threads on stackoverflow.com) I understood that this can be achieved in the following way by creating a new profile for the Firefox instance that Selenium uses. So here is my code:
self.profile = webdriver.FirefoxProfile()
self.profile.set_preference("browser.preferences.instantApply", True)
self.profile.set_preference("browser.download.folderList", 2)
self.profile.set_preference("browser.download.dir", os.getcwd())
self.profile.set_preference(
"browser.helperApps.neverAsk.saveToDisk",
"text/csv;charset=ISO-8859-1"
)
self.profile.set_preference(
"browser.helperApps.neverAsk.openFile",
"text/csv;charset=ISO-8859-1"
)
self.profile.set_preference(
"browser.download.manager.showWhenStarting",
False
)
self.profile.set_preference(
"browser.download.manager.showAlertOnComplete",
False
)
self.profile.set_preference("browser.download.panel.shown", False)
self.profile.set_preference(
"browser.download.manager.focusWhenStarting",
False
)
self.profile.set_preference(
"browser.download.manager.useWindow",
False
)
self.profile.set_preference(
"browser.download.manager.closeWhenDone",
False
)
self.web_driver = webdriver.Firefox(firefox_profile=self.profile)
Now the problem with the above mentioned code is that it still opens the dialog box and the only part that works is the following:
self.profile.set_preference("browser.download.folderList", 2)
self.profile.set_preference("browser.download.dir", os.getcwd()
But apart from that the main problem persists, that is, each time the dialog box appears. Now I've been looking into many pages on the web (forums, tutorials, etc.) and when I compare their suggested codes, it seems to me that I have proceeded in the same way and I don't see what else in terms of command/option should/could be added in order to prevent opening the dialog box and instead download the CSV file automatically. I would appreciate if you could kindly make some clarification and indicate what's the problem with my code.
Ok, so after a lot of searching on the internet, finally I found the answer to my question in the parameters page of the firefox itself (about:config) : browser.download.useDownloadDir
Here is the new version of my code that works pretty well and does the job:
self.profile = webdriver.FirefoxProfile()
self.profile.set_preference("browser.download.folderList", 2)
self.profile.set_preference("browser.download.dir", os.getcwd())
self.profile.set_preference(
"browser.helperApps.neverAsk.saveToDisk",
"text/csv"
)
self.profile.set_preference("browser.download.useDownloadDir", True)
self.web_driver = webdriver.Firefox(firefox_profile=self.profile)
So what has changed in this new version comapred to the previous version is:
I changed the MIME type to be simply text/csv instead of
text/csv;charset=ISO-8859-1
I removed several profile parameters that seems to have no impact on
the result.
And the most important part is that I added
browser.download.useDownloadDir parameter to my profile. Based on different tests that I did, it seems that specifying the parameter
browser.helperApps.neverAsk.saveToDisk alone is not enough. Indeed
once you define the desired download directory by using
browser.download.dir then you really need to tell explicitly to
Firefox via the browser.download.useDownloadDir parameter so that
browser uses automatically each time that directory while downloading
files.
I hope this might help those who may have encountered the same problem.

How can I get chrome version using python code without having to actually open up a browser?

I'm writing a bunch of programs using python selenium. In order to scrape content from different websites, I need to download chromedriver.exe that is compatible with my current version of chrome. However, chrome is constantly being updated, so I want to write a program that will first check if chrome and chromedriver versions are compatible before running my programs. So, I need a way to get my current chrome version without using chromewebdriver or actually opening up a browser. Any suggestions?
If you are working on Linux:
Try this in terminal(If you want to get the result in python script, you need to use subprocess.popen()):
google-chrome --version
You may need to use which google-chrome to know whether you have install it.Hope it would help.
For windows you could try with the CMD reg query "HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\Google Chrome" as follow:
import os
stream = os.popen('reg query "HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Google Chrome"')
output = stream.read()
print(output)
To extract the google_version from the output you could try the following:
import os
def extract_version(output):
try:
google_version = ''
for letter in output[output.rindex('DisplayVersion REG_SZ') + 24:]:
if letter != '\n':
google_version += letter
else:
break
return(google_version.strip())
except TypeError:
return
stream = os.popen('reg query "HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Google Chrome"')
output = stream.read()
google_version = extract_version(output)
print(google_version)

Python: Getting all Urls from every open Google Chrome Tab

I need to get all the urls from all open Google Chrome tabs in python 3 without intefering with the user. Im on Windows 10 using Microsoft Visual Studio Python3
Ive tried:
Opening it directly with open(path to current tabs)-- doesnt work because i have no permission- i think its locked because chrome activly writes to it.
Current_Tabs_Source = open(r"C:\Users\Beni\AppData\Local\Google\Chrome\User
Data\Default\Current Tabs", "r")
Current_Tabs_Raw = Current_Tabs_Source.read()
print(Current_Tabs_Raw) #just for checking
PermissionError: [Errno 13] Permission denied
Opening through sglite3 -- doesnt work because its locked. And i cant find a password anywhere. Ive tried to open the History for the urls but it doesnt work anyways.
import sqlite3
from os import path
data_path = path.expanduser('~') + r"\AppData\Local\Google\Chrome\User
Data\Default"
files = listdir(data_path)
history_db = path.join(data_path, 'history')
c = sqlite3.connect(history_db)
cursor = c.cursor()
select_statement = "SELECT urls.url, urls.visit_count FROM urls, visits
WHERE urls.id = visits.url;"
cursor.execute(select_statement)
results = cursor.fetchall()
print(results) #just for checking
sqlite3.OperationalError: database is locked
Using selenium and a 3rd party chrome extension to copy all urls to the clipboard -- doesnt work because these extensions only work in the active selenium window. So the Windows with the tabs in it that i want dont get copied.
Ive considered hacking together a chrome extension that copys the urls every 30 sec to a temp file. But i only know minimal Javascript so this thing is driving me mad.
So does anyone know a way to do this in Python ? Any other solution is greatly appreciated.
If you want to access the database, you should close all browsers.
(Source)
When we're accessing the SQLite database of any browser, we have to close that particular browser first.
Moreover, the SQL command being used here will fetch all the duplicate rows.
Need to change `select_statement'
select_statement = "SELECT distinct urls.url, urls.visit_count FROM urls, visits WHERE urls.id = visits.url;"
Further, we need to add a loop to print all the 'urls' from history database of chrome.
for url, count in results:
print(url) # print urls line by line
However, this will give us the whole history of the Chrome browser but not the required URLs of all the presently opened tabs.
You can use windows shadow copy service to make copy of locked sqlite database. And then read it normally.
There is python module for that https://github.com/sblosser/pyshadowcopy

Running python cgi script Interpreter results differ to browser

I was having difficulty converting a program I made to a cgi script. I suspected it was to do with os.walk so I made a smaller test script to test this.
(I noticed the single \ before the D in the variable loc and tried changing that to a double \ still no change)
Produces no errors cant tell why it doesn't run the for loop with os.walk in the browser.
I tried adding some data into s and run for loop printing of contents of it and that worked fine, but trying to do it on os.walk I can't seem to get it to work. I can't find anything relating to the issue on google or stackoverflow.
Below is the code:
import cgi,cgitb,os
loc = "C:\\Users\\wen\Desktop\\sample data\\old py stuff\\"
cgitb.enable(display=1,logdir=loc)
s = []
print("Content-type:text/html\r\n\r\n")
print("<html>")
print("<body>")
print("<p>"+loc+"</p>")
for r,ds,fs in os.walk(loc):
print("<p>omgwtf</p>")
for f in fs:
s.append(f)
for i in s:
print("<p>"+i+"</p>")
print("</body>")
print("</html>")
Took a screenshot, the output in interpreter on the left and browser on right
i.imgur.com/136y1Yq.jpg
webserver is running iis7
I'm pretty sure I've solved the problem, I needed to give the folders permissions for 'Authenticated users'.

Cannot suppress the OS download file window even after setting browser preference through webdriver+python

I know this question has been asked before but after trying suggestions, I am constantly getting the OS download file window. What I am trying to do is download a pdf file. I have set the browser preferences but despite that fact, it does not suppress the OS window.
Here is the code snippet that I have written:
firefoxProfile = webdriver.FirefoxProfile()
firefoxProfile.set_preference('browser.download.folderList', 2)
firefoxProfile.set_preference('browser.download.manager.showWhenStarting', False)
firefoxProfile.set_preference('browser.download.dir', '/media/pinku/Pinku')
firefoxProfile.set_preference('browser.helperApps.alwaysAsk.force', False)
firefoxProfile.set_preference('browser.helperApps.neverAsk.saveToDisk',
'application/octet-stream')
self.driver = webdriver.Firefox(firefoxProfile)
I am using Ubuntu 12.10, Firefox, webdriver, python
I think you might have gotten the MIME type wrong. Try this
firefoxProfile.set_preference('browser.helperApps.neverAsk.saveToDisk',
'application/pdf,application/x-pdf')
A discussion about pdf MIME types can be found here You should check the mime type tht your firefox sees when you try to download the pdf. It might me set wrongly by the server!
Side note: Whenever this topic comes up (downloading files via selenium webdriver) I strongly advise against doing it at all! Have a read through the article "How To Download Files With Selenium And Why You Shouldn’t" for a reasoning. Basically it suggests to use other means to test direct downloads.
Update: I did not put both mime types in one string before which was wrong. Also I added the suggestion about checking what the server actually delivers.
I have been working with firefox 24.03 (this is the ESR version)
This version of firefox introduced pdfjs. This opens the PDF in browser.
So you need to supress that.
Here is the code/firefox profile that worked for me.
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir","C:\\temp")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","application/pdf")
fp.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")
fp.set_preference("pdfjs.disabled", True)
driver = webdriver.Firefox(firefox_profile=fp)
With this profile all my pdf downloads go to "C:\temp"
I had a similar problem because the mime type returned by the server was "text/plain" instead of "text/csv".
This is what worked for me (using watir-webdriver):
profile = Selenium::WebDriver::Firefox::Profile.new
profile['browser.download.folderList'] = 2 # custom location
profile['browser.download.dir'] = download_directory
profile['browser.helperApps.neverAsk.saveToDisk'] = "text/plain"
browser = Watir::Browser.new :firefox, :profile => profile
More info on downloading with watir-webdriver here: http://watirwebdriver.com/browser-downloads/

Categories