I'm using selenium to scrape a local web

I'm using selenium to scrape a local web - python

I need to upload a file using 'upload' button. after that a window will appear but I can't find the exact ID from HTML code. here is the screen shots and my code:
`time.sleep(1)
element=driver.find_element_by_id("Upload-Action-Ico").click()
driver.find_element_by_xpath("//*[contains(text(), 'File')]").send_keys("file path")`

I think that the ID is 'file' so I think this should work
time.sleep(1)
element=driver.find_element_by_id("file").click()

Try click on it and show the HTML code. There is a word "Button" or similar. Can you share me the url of the site?
I hope I can help you and excuse me for my english. (It isn't my mother-language)

The input field does not contain any text. And its id is explicitly mentioned in the html, so you can try find_element_by_id:
driver.find_element_by_id("file").send_keys("file path")
If this doesn't work for you, then you can try using the xpath:
driver.find_element_by_xpath("//*[#id='file']").send_keys("file path")

You can use this code to select files, and after that, you should click on the upload button.
filePath = os.getcwd()+'\img.jpg'
driver.find_element_by_id('Upload-Action-Ico').send_keys(filePath)
os.getcwd() : returns the current working directory.
img.jpg is located right next to the running script in the same directory.

Related

How to open a pdf in a new tab in flask

Hey guys just doing a flask assignment for college to make a simple web application and im trying to open a pdf file in a new tab from my index page. I can open the file in the current tab no problem but cant wrap my head around opening it in a new tab. My code is attached below. Thanks for any time put into answering this.
I have tried using redirect and webbrowser.open_new_tab but might not have been using them correctly.
Dan
#app.get("/my_cv")
def my_cv():
workingdir = os.path.abspath(os.getcwd())
filepath = workingdir + '/static/files/'
return send_from_directory(filepath, 'DanielTurnerCV.pdf')

Getting filename from link and downloading it. Python

I'm trying to make an automated program that downloads a certain file from a link.
Problem is, I don't know what this file will be called. Its always a .zip so for example: filename_4213432.zip . The link does not include this filename in it. It looks something like this https://link.com/api/download/433265902. Therefore its impossible to get the filename trough the link. Is there a way to fetch this name and download it?
print("link:")
url = input("> ")
request = requests.get(url, allow_redirects=True)
I'm stuck at this point because I don't know what to put in my open() now.

Setting QFileDialog

I am using static method:
path = QtGui.QFileDialog.getSaveFileName(self, SAVE_TO_STR, NAME_STR, 'CSV(*.csv)')
where I get path as full_path\some_name.csv
but I need to set different language to buttons and labels of dialog, so I've been looking at docs and find out that I can't do that with static method and I've come up with this code:
ddd = QtGui.QFileDialog(self, SAVE_TO_IN_OTHER_LANGUAGE_STR, NAME_STR, 'CSV(*.csv)')
ddd.setAcceptMode (QtGui.QFileDialog.AcceptSave)
ddd.setLabelText( QtGui.QFileDialog.Accept, "Save - in other language" )
ddd.setLabelText( QtGui.QFileDialog.Reject, "Cancel - in other language" )
ddd.setLabelText( QtGui.QFileDialog.LookIn, "Look in - in other language" )
if ddd.exec_():
path = QtCore.QString(ddd.selectedFiles()[0])
I am trying to set it to look like first one so my questions are:
path I get is ok, but missing .csv at the end, so it saves file with no extension.
should I manually add .csv at the end of the path?
when I choosing where to save and click on folder, "save" button turns to "open". How to change that button text to "Open" in other language?
folders list at the left side of dialog is not complex as when I use QtGui.QFileDialog.getSaveFileName() , it shows only My Computer and User, instead of modern tree with favorites and partitions under My Computer.

1) path I get is ok, but missing .csv at the end, so it saves file with
no extension. should I manually add .csv at the end of the path?
Answer: I think you shouldn't manually add .csv at the end of the path. In PyQt API have this solution to solve it, use QFileDialog.setDefaultSuffix (self, QString suffix);
pathQFileDialog = QtGui.QFileDialog(self)
pathQFileDialog.setAcceptMode(QtGui.QFileDialog.AcceptSave)
pathQFileDialog.setNameFilter('CSV(*.csv)')
pathQFileDialog.setDefaultSuffix('csv')
Reference: http://pyqt.sourceforge.net/Docs/PyQt4/qfiledialog.html#setDefaultSuffix
2) when I choosing where to save and click on folder, "save" button
turns to "open". How to change that button text to "Open" in other
language?
Answer: My opinion of PyQt, No. In Qt (C++) at file qfiledialog.cpp I found your problem in method void QFileDialogPrivate::_q_updateOkButton() at line between 2886 and 2888. It force "&Open" label;
button->setEnabled(enableButton);
if (acceptMode == QFileDialog::AcceptSave)
button->setText(isOpenDirectory ? QFileDialog::tr("&Open") : acceptLabel);
Reference: https://qt.gitorious.org/qt/qt/source/57756e72adf2081137b97f0e689dd16c770d10b1:src/gui/dialogs/qfiledialog.cpp#L2796-2888
3) folders list at the left side of dialog is not complex as when I use
QtGui.QFileDialog.getSaveFileName() , it shows only My Computer and
User, instead of modern tree with favorites and partitions under My
Computer.
Answer: Because on Windows, Mac OS X and Symbian^3, this static function (QtGui.QFileDialog.getSaveFileName()) will use the native file dialog and not a QFileDialog.
Reference: pyqt.sourceforge.net/Docs/PyQt4/qfiledialog.html#getSaveFileName
Regards,

Download Lone Image From a Set of URLs

I have a set of URLs and names in a file as follows:
www.test.yom/something/somethingelse/Profile.aspx?id=1
John Doe
www.test.yom/something/somethingelse/Profile.aspx?id=24
John Benjamin
www.test.yom/something/somethingelse/Profile.aspx?id=307
Benjamin Franklin
....
Each URL page contains normal html and any amount of text, tables, etc. but will always have 1 image in an tag.
My goal is to download this image somehow to my drive, renaming it with the second line name (i.e. "John Doe.jpg" and "John Benjamin.jpg").
Is there an easy way to accomplish this? I parsed out the URL-Name file from raw HTML on a different page using UNIX commands (grep, tr, sed), but I'm guessing this will require something a bit more intricate. Right now I'm thinking Python script, but I'm not exactly sure which libraries to look at or where to start in general (although I am familiar with Python language itself). I would also be down to use Java or any other language if it would make the process easier. Any advice?
Edit: So... ran into a problem where the urls require authentication to access. This is fine but the problem is that it is two-step authentication, and the second step is a passcode sent to mobile. :-( But thanks for the help!

You can put the links in a list or a file and use requests to get the html, then use BeautifulSoup to find the image you want, extract the src attribute and use requests again to download the file. Both libraries are quite simple to use, you won't have a problem doing that simple script :).
Pseudo-code to help you start:
url_list = ['url1', 'url2']
for url in url_list:
html = requests.get(url)
soup = BeautifulSoup(html)
img_element = soup.find('img')
image_url = img_element['src']
requests.download(image_url) # Not sure how to download this to a file

You can use extraction module with requests module :
pip install requests
pip install extraction
Then:
import extraction
import requests
url = "http://google.com/"
html = requests.get(url).text
extracted = extraction.Extractor().extract(html, source_url=url)
print(extracted.image) # If you know that there is only one image in your page
print(extracted.images) # List of images on page
http://google.com/images/srpr/logo9w.png
['http://google.com/images/srpr/logo9w.png']
Note that source_url is optional in extract, but is recommendedas it makes it possible to rewrite relative urls and image urls into absolute paths.
And extracted.image is first item of extracted.images if exist, or None

This is what I ended up doing to bypass the two-step authentication. Note that for the URLs I had if I log into one of the URLs and click the "Remember Me" option on login, this avoids the login page for the following method.
Download the "Save images" extension on Firefox. Restart Firefox.
In Tools -> "Save images" -> Options. Go to "Save" tab. In "Folder Options", pick folder to save files. In "File Names", pick "Use file name:". Enter appropriate file name.
Go to "http://tejji.com/ip/open-multiple-urls.aspx" in Firefox (not Chrome).
Copy and paste only the URLs into the textbox. Click "Submit". After all tabs load, close the tejji.com tab.
On the first profile page, right click -> "Save images" -> "Save images from ALL tabs".
Close the Save prompt if everything looks right.
All the images should now be in your designated folder.
All that's left is to rename the files based on the names (the files are numbered in order which coincide with order of names if you kept URLs in same order), but that should be rudimentary.

Download all the links(related documents) on a webpage using Python

I have to download a lot of documents from a webpage. They are wmv files, PDF, BMP etc. Of course, all of them have links to them. So each time, I have to RMC a file, select 'Save Link As' Then save then as type All Files. Is it possible to do this in Python? I search the SO DB and folks have answered question of how to get the links from the webpage. I want to download the actual files. Thanks in advance. (This is not a HW question :)).

Here is an example of how you could download some chosen files from http://pypi.python.org/pypi/xlwt
you will need to install mechanize first: http://wwwsearch.sourceforge.net/mechanize/download.html
import mechanize
from time import sleep
#Make a Browser (think of this as chrome or firefox etc)
br = mechanize.Browser()
#visit http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
#for more ways to set up your br browser object e.g. so it look like mozilla
#and if you need to fill out forms with passwords.
# Open your site
br.open('http://pypi.python.org/pypi/xlwt')
f=open("source.html","w")
f.write(br.response().read()) #can be helpful for debugging maybe
filetypes=[".zip",".exe",".tar.gz"] #you will need to do some kind of pattern matching on your files
myfiles=[]
for l in br.links(): #you can also iterate through br.forms() to print forms on the page!
for t in filetypes:
if t in str(l): #check if this link has the file extension we want (you may choose to use reg expressions or something)
myfiles.append(l)
def downloadlink(l):
f=open(l.text,"w") #perhaps you should open in a better way & ensure that file doesn't already exist.
br.click_link(l)
f.write(br.response().read())
print l.text," has been downloaded"
#br.back()
for l in myfiles:
sleep(1) #throttle so you dont hammer the site
downloadlink(l)
Note: In some cases you may wish to replace br.click_link(l) with br.follow_link(l). The difference is that click_link returns a Request object whereas follow_link will directly open the link. See Mechanize difference between br.click_link() and br.follow_link()

Follow the Python codes in this link: wget-vs-urlretrieve-of-python.
You can also do this very easily with Wget. Try --limit, --recursive and --accept command-lines in Wget. For example:
wget --accept wmv,doc --limit 2 --recursive http://www.example.com/files/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

I'm using selenium to scrape a local web - python

I think that the ID is 'file' so I think this should work time.sleep(1) element=driver.find_element_by_id("file").click()

Try click on it and show the HTML code. There is a word "Button" or similar. Can you share me the url of the site? I hope I can help you and excuse me for my english. (It isn't my mother-language)

Related

How to open a pdf in a new tab in flask

Getting filename from link and downloading it. Python

Setting QFileDialog

Download Lone Image From a Set of URLs

Download all the links(related documents) on a webpage using Python

Categories

Resources