Script to download website source to a folder - python

I am trying to learn simple automation. I have set up an Ubuntu Server and I want to configure it to download html source from a specific URL and append to a file in a specified folder on the server every 1 minute.
The URL is just basic html with no CSS whatsoever.
I want to use python but admittedly can use any language. What is a good, simple day to do this?

Jeff's answer works for a one time use.
You could do this to run it repeatedly-
import time
import requests
while True:
with open('filename.extension', 'a') as fp:
newHtml = requests.get('url').text
fp.write(newHtml)
time.sleep(60)
You could run this as a background process for as long as you want.
$ python3 script_name.py &

Just pip install the requests library.
$ pip install requests
Then, it's super easy to get the HTML (put this in a file called get_html.py, or whatever name you like):
import requests
req = requests.get('http://docs.python-requests.org/en/latest/user/quickstart/')
print(req.text)
There are a variety of options for saving the HTML to a directory. For example, you could redirect the output from the above script to a file by calling it like this:
python get_html.py > file.html
Hope this helps

Related

How can I automate downloads using IDM?

I want to automate downloading using selenium python which in turn carries the link to IDM. However, the thing is I can't get to download using IDM.
Thisis not good practice in selenium automation
Whilst it is possible to start a download by clicking a link with a browser under Selenium’s control, the API does not expose download progress, making it less than ideal for testing downloaded files. This is because downloading files is not considered an important aspect of emulating user interaction with the web platform. Instead, find the link using Selenium (and any required cookies) and pass it to a HTTP request library like libcurl.
Please refer seleniumhq site
This is the syntax to run IDM in Python. It will download to the default local path. Use an additional parameter '/p' to change the local path if needed.
from subprocess import run
idm_path = "C:\Program Files (x86)\Internet Download Manager\idman.exe"
url = "example url"
filename = "song.mp3"
run([idm_path, '/d', url, '/f', filename, '/n'])
source: Start IDM download from command line.

Download data from online database in python script

I am trying to download data from UniProt using Python from within a script. If you follow the previous link, you will see a Download button, and then the option of choosing the format of the data. I would like to download the Excel format, compressed. Is there a way to do this within a script?
You can easily see the URL for that if you monitor it in the Firefox "netowork" tab or equivalent. For this page it seems to be https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes. You should be able to download it using requests or any similar lib.
Example:
import requests
url = "https://www.uniprot.org/uniprot/?query=*&format=xlsx&force=true&columns=id,entry%20name,reviewed,protein%20names,genes,organism,length&fil=organism:%22Homo%20sapiens%20(Human)%20[9606]%22%20AND%20reviewed:yes&compress=yes"
with open("downloaded.xlsx.gz", "wb") as target:
target.write(requests.get(url).content)

Accessing Indeed through Python

My goal for this python code is to create a way to obtain job information into a folder. The first step is being unsuccessful. When running the code I want the url to print https://www.indeed.com/. However instead the code returns https://secure.indeed.com/account/login. I am open to using urlib or cookielib to resolve this ongoing issue.
import requests
import urllib
data = {
'action':'Login',
'__email':'email#gmail.com',
'__password':'password',
'remember':'1',
'hl':'en',
'continue':'/account/view?hl=en',
}
response = requests.get('https://secure.indeed.com/account/login',data=data)
print(response.url)
If you're trying to scrape information from indeed, you should use the selenium library for python.
https://pypi.python.org/pypi/selenium
You can then write your program within the context of a real user browsing the site normally.

Downloading a file from a html? url with python 3

I've been searching for hours on how to download a file the documentation shows me how to do this; but cygwin is horrible and an annoyance to use and I'm trying to implement this in Python 3 for a program. I've tried to use urllib, requests, wget(in python), httplib and some other. But it only fetched the redirected page (as you would get if you paste the link in the url bar with the properly formatted url.)
Though when I inspect a page and I trigger the download link that has the same address that I tried, it works properly and provide me with a download pop-up. Here is an example page the link is triggered by clicking "Download data"
I don't get how any python package is unable to send the proper get request and that I would need to implement this program in linux only to be able to use 'wget'.
Anyone has a clue on how to properly call the url?
You need to add &submit=Download+Data to the end of your URL to download the data. You can see this with the network tab of inspect element in google chrome. Hope I helped!
I think
from subprocess import call
def download(URL)
CMD = ['curl',url]
call(CMD)
to run this:
download('www.download.com/blah/bah/blah')
if you want to use this from the interpreter:
save as module.py
python -i /path/to/module.py
>>>download('www.download.com/blah/bah/blah')
p.s. if this works i'll prob use this in my shell program
EDIT: my comment:
I tried this and got "malformed url" error
from subprocess import call
def download(FILE,URL):
#FILE = file to save to
#URL - download from here
CMD = ['curl','-o',FILE,URL]
call(CMD)
this is what i do for all system commands from python so its something to do with curl specifically.

Deploy to artifactory via python script

I am trying to create a python script that can deploy an artifact to Artifactory.
I am using Python 3.4 and I want the resulted script to put it through py2exe, so external libraries might create issues.
Through all my research, I found that one way is this, but I don't know how to "translate" it to Python:
curl -X PUT -u user:password --data-binary #/absolute/path/my-utils-2.3.jar "http://localhost/artifactory/my-repo/my/utils/2.3/"
How can I achieve that into Python? Or is it any either way for deploying?
Been trying the whole day and I've had some successful testing using the requests library.
import requests
url = "repo/path/test.txt"
file_name = "test.txt"
auth=(USERNAME, PASSWORD)
with open(file_name, 'rb') as fobj:
res = requests.put(url, auth=auth, data=fobj)
print(res.text)
print(res.status_code)
And py2exe had no issues with it.
You might want to take look at Party, either look on how they do it, or just use it directly.

Categories