Get File Upload Time From Server - python

Is there a way, using urllib2 or something else, to check the time a file was uploaded to a URL? Or even the time the file on the server side was last modified?
At the moment I'm manually using urllib2.urlopen() to read data from a url address. The arguments for the address change each day. What I'd like to do is figure out when each file was first available, so that I can pick the best time for the job to automatically run overnight.

The time is stored in the server which is usually sent to your browser as HTTP headers. You can access this in Javascript using document.lastModified property. Here's a solution in Python that reads headers and parses the information using regular expression and prints the result.
def get_upload_datetime(myurl):
info = urllib2.urlopen(myurl).info()
datetime = re.search("Last-Modified: (.+)", str(info))
if datetime:
return datetime.groups()[0]
If you are also using contents of the webpage,use urlopen.info() and urlopen.read() on the same object (actually read only once) to avoid multiple fetches.
And if you want to do it manually, open webpage in the browser, open console (Ctrl+Shift+J) and type javascript:alert(document.lastModified). It should present an alert box with last modified time.

Related

Script for automating online tool query

So I had a number of amino acid sequence strings that I wanted to use as input into a tool that studies its interactions with certain components of the human immune system (http://www.cbs.dtu.dk/services/NetMHCcons/).
I wanted to ask what, if any, would be a way of accessing, inputting data and getting the output, via a script (R or python preferably). My main issue was I had a lot of sequences that need to be queried separately so wanted to automate the whole thing. The website has one field that reads "Submission" which takes in the string input. There is another field "select species/loci" which gives a drop down menu from which an option needs to be selected. Lastly there's a "submit" button. The output simply loads on the page after hitting submit.
I've tentatively poked around with RSelenium and Rcurl but wanted to ask if there was a more efficient method.
I took a look at what it'd take to send a POST request to this service from Python, and it looks possible:
this form takes in "multipart/form-data" (see: How to send a "multipart/form-data" with requests in python?), you'll need to send your data in this format. You could inspect a request from the browser (using the dev tools) and copy the fields from there as a starting point.
once the form is submitted, it doesn't give you the result right away. You'd need to get your job ID from the response, and then poll the URL: http://www.cbs.dtu.dk/cgi-bin/webface2.fcgi?jobid={your_job_id}&wait=20 until it gives you the result
the result will then need to be downloaded and parsed
This tool is however available as a portable version for linux/mac: https://services.healthtech.dtu.dk/software.php
Perhaps downloading this version would make it easier?
Try this :
Submitting to a web form using python
This link is an answer to how to send web forms in python, using urllib. Check your source code and extract the necessary data using re module from the source code of the link you have put up, and send the request.
save the HTML source code of http://www.cbs.dtu.dk/services/NetMHCcons/ in the python file as
source_code = '''...'''
The HTML can be found by using CTRL+U in firefox.

Get all opened websites from Chrome in Python

I am on Windows 8.1, Python 3.6.
Is it possible to get all currently open websites in the Latest version Of Chrome and save the websites to a text file in D:/.
I tried opening file:
C:\Users\username\AppData\Local\Google\Chrome\User Data\Default\Current Tabs
But I receive an error saying that the file is opened in another program.
There is another file named History that contains URLs that are opened but it also contain characters like NULL.
I tried reading the file in python but I received UndicodeDecodeError(Not sure About This Word).
then I tried opening file by the following code:
with open('C:/Users/username/AppData/Local/Google/Chrome/User Data/Default/History',"r+",encoding='latin') as file:
data = file.read()
print(data)
And it worked. But I got 1 or 2 URLs while in the text file, there were no URLs.
Maybe there's another way something like importing a module.
Something like:
import chrome
url = chrome.get_url()
print(url)
Maybe selenium can also do this. But I don't know how.
Maybe there's another way to read the file with all links in python.
Want I want with it is that it detect websites opened, if mywebsite.com is opened for more than 10 minutes, it will automatically be blocked. The system has its own file:
C:\Windows\System32\drivers\etc\hosts
It will add the following at the end:
127.0.0.1 www.mywebsite.com
And the website will no longer be available to use.
You can use this methodology to store the tab data and manipulate it:
windows = driver.window_handles
You can store the windows using the above method.
current_window = driver.current_window_handle
This method will give you the current window that is being handled. You can go through the list 'windows' and check if it is current_window to navigate between the tabs.
driver.switch_to.window(windows[5])
This method will switch to a desired tab but I assume you already have it.
Now how do you store the time spent after the tabs are opened?
There are two ways to do it:
Internally, by referring to a pandas dataframe or list
Reading and writing to a file.
First you need to import the 'time' library inside the script
current_time=time.time()
current_time is an int representation of the current time. It's a linux timestamp.
In either one of these scenarios, you will need a structure such as this:
data=[]
for i in range(0,len(windows)):
data.append([ windows[i] , time.time() ])
This will give a structure as below:
[[window[0],1234564879],
[window[1],1234567896],...]
Here's the thing you miss:
for i in range(0,len(data)):
if time.time()-data[i][1] > 600 # If new timestamp minus the old one is bigger than 600 seconds
driver.switch_to(data[i][0])
driver.close()
My personal advice is that you start with stable API services to get whatever data you want instead of selenium. I would recommend SerpApi since I work there. It has variety of scrapers including a google results scraper, and it has 5000 free calls for new accounts.

Refresh Webpage using python

I have a webpage showing some data. I have a python script that continuously updates the data(fetches the data from database, and writes it to the html page).It takes about 5 minutes for the script to fetch the data. I have the html page set to refresh every 60 seconds using the meta tag. However, I want to change this and have the page refresh as soon as the python script updates it, so basically I need to add some code to my python script that refreshes the html page as soon as it's done writing to it.
Is this possible ?
Without diving into complex modern things like WebSockets, there's no way for the server to 'push' a notice to a web browser. What you can do, however, it make the client check for updates in a way that is not visible to the user.
It will involve writing Javascript & writing an extra file. When writing your main webpage, add, inside Javascript, a timestamp (Unix timestamp will be easiest here). You also write that same timestamp to a file on the web server (let's call it updatetime.txt). Using an AJAX request on the page, you pull in updatetime.txt & see if the number in the file is bigger than the number stored when you generate the document, refresh the page if you see an updated time. You can alter how 'instantly' the changes get noticed but controlling how quickly you poll.
I won't go into too much detail on writing the code but I'd probably just use $.ajax() from JQuery (even though it's sort of overkill for one function) to make the calls. The trick to putting something on a time in JS is setinterval. You should be able to find plenty of documentation on using both of them already written.

telnetlib | read all message when I have to click space if do it mannually

As you know, sometimes, you have to click space to get the next page under telnet connections, unix. For instance, you 'more' a text file. You can't get all the content at one time. Using 'space' can get to the next page.
Here is the problem, what should I do when using telnetlib, python? I have to get all the information. Posting codes here would be better. Thanks!
Instead of using more(1) or less(1) to view a file, use cat(1). It will not perform any pagination tasks and will write all the content of the file to the terminal, raw.

Parsing lines from a live streaming website in Python

I'm trying to read in info that is constantly changing from a website.
For example, say I wanted to read in the artist name that is playing on an online radio site.
I can grab the current artist's name but when the song changes, the HTML updates itself and I've already opened the file via:
f = urllib.urlopen("SITE")
So I can't see the updated artist name for the new song.
Can I keep closing and opening the URL in a while(1) loop to get the updated HTML code or is there a better way to do this? Thanks!
You'll have to periodically re-download the website. Don't do it constantly because that will be too hard on the server.
This is because HTTP, by nature, is not a streaming protocol. Once you connect to the server, it expects you to throw an HTTP request at it, then it will throw an HTTP response back at you containing the page. If your initial request is keep-alive (default as of HTTP/1.1,) you can throw the same request again and get the page up to date.
What I'd recommend? Depending on your needs, get the page every n seconds, get the data you need. If the site provides an API, you can possibly capitalize on that. Also, if it's your own site, you might be able to implement comet-style Ajax over HTTP and get a true stream.
Also note if it's someone else's page, it's possible the site uses Ajax via Javascript to make it up to date; this means there's other requests causing the update and you may need to dissect the website to figure out what requests you need to make to get the data.
If you use urllib2 you can read the headers when you make the request. If the server sends back a "304 Not Modified" in the headers then the content hasn't changed.
Yes, this is correct approach. To get changes in web, you have to send new query each time. Live AJAX sites do exactly same internally.
Some sites provide additional API, including long polling. Look for documentation on the site or ask their developers whether there is some.

Categories