download file link produces 404 page - python

I am trying to create a link that allows a user to download a zip file that's been generated earlier in the python script. The script then writes an HTML link to a web page. The user should be able to click the link and download their zip file.
import os,sys
downloadZip = ("http://<server>/folder/structure/here/" + zipFileName + ".zip")
print """<h3><a href="{}" download>Download zip file</a></h3>""".format(downloadZip)
The result is a link that when clicked opens a 404 page. I've noticed that on that page, it displays
Physical Path C:\inetpub\wwwroot\inputted\path\here\file.zip
I am testing this on the same server the processing is occurring on. I wouldn't think that should make a difference, but here I am. The end result should be a zip file downloaded to the user's pc.

Not sure if this would be helpful or not but I have noticed that some 'server package apps' disallow the execution/download of certain filetypes. I had a similar thing happen years ago.
To test if this is the case, create a new folder in your web directory and add an index.html page with some random writing (to identify that you have the correct page). Quickly try to access this page.
Next create a .zip file, put it in the same folder as the index.html file you just created, and add a Download link on the index.html page.
Now revisit the page, and try to download the file you created. If it works then there is a problem elsewhere, if it doesn't then whatever server package application you are using probably set Apache to block .zip files by default. Hope this helps buddy :)

Related

Using selenium in python to download files based on file names from Ambari (HDFS)

I want to be able to download all csv files in a specified path within Ambari file viewer that have a certain file name.
i.e. Open ambari file viewer on google chrome, log into ambari file viewer with username and password, navigate to a specified folder in ambari, download all relevant csv files based on file name using wildcards (e.g. file__20191231.csv), place files in a specified windows folder
Seems very doable, I'm not exactly sure what your question is though and I'm not familiar with Ambari. To tackle a project like this I suggest the following steps:
Step 1: Research Selenium and practice things like logging into a social media or another web account.
Step 2: Specifically look at the section for identifying items by id, class, and xpath. Check the html of Ambari and see if the id's or classes seem reliable for the elements you need to interact with (ie. username & password fields).Use xpath if you must.
Step 3: Find the column/container that the files are displayed in and create a loop to pull the text out of each container. Add page pagination if needed.
Step 4: use python to read each text as it is viewed, if it contains the substring you desire, ask selenium to right click the element it just visited and download (or click the corresponding download button if available).
Step 5: Move the file from your downloads folder to your desired folder on your machine with os and shutil there's another thread about doing this here
P.S. You'll need a compatable chromedriver.exe to run selenium with chrome. Again, view the selenium docs to learn more about python-selenium and setting it up.
Hope this helps

Quick and dirty way to access files external to Django?

I'm working on a demo for a program that creates some files on its own directory. This demo will be shown to someone physically far, via VPN, so I made a simple django project just to receive an input, call some scripts and display the output - the generated file. However, I don't have permission to open the file to display it since it's on a directory outside of the django project (the result is a permission denied error).
I'm aware it's not good practice or even safe for a web server to have access to files outside of its directories, but since this will run in a closed environment for a short amount of time only, is there a workaround?
Think of it this way - If web server can generate the files , it can display them also.
As for your answer - if you know the path of the file, use the python open built in method to open the file and render the result to a template.
data = open('file_path').read().decode('utf-8')
render(request, template, context={data:data})

Is there a way to change the filename of a file, when starting to download a file using Selenium?

I'd like to download multiple files from a single website, but the biggest quirk I have is that the server automatically generates a random filename upon requesting the file to download. The issue here is then I won't know which file is which, without having to manually go through each file. However, on the site that has the links to download the files, they all have a name. For example...
File name -> Resultant file name(fake file names)
Week1.pdf 2asd123e.pdf
Week1_1.jpg dsfgp142.jpg
.
.
Week10.pdf 19fgmo2o.pdf
Week11.pdf 0we5984w.pdf
If I were to download them manually by myself, I would type click "download" and a popup "Save as" menu comes up, which gives me the option to change the file name manually, then click ok to confirm the download, to which it starts downloading.
Currently, my code is made to open up the website, log into my account, go to the files page, and then find a file, with it's corresponding server request link. IE: . I am able to store the name of the file, "Week1.pdf" into a variable, and click on the request link, but the only problem is that the Save as menu, doesn't have the ability to change the name of the filename, and only gives me the option to view the file, or Save the file immediately. I've looked around a little, and tried to play around with the Firefox profile settings, but nothing has worked. How would I go about solving this problem?
Thanks
I can think of a few things that you might try...
After the file is saved, look in the downloads folder for the most recently saved file (with the correct extension) using time stamps. This will probably be OK as long as you aren't running this threaded.
Get the list of files in the download directory, download the file, find the file that doesn't exist in the list of files. Again this should be safe unless you are running this threaded.
Create a new folder, set the download directory to the newly created folder, download the file. It should be the only file in that directory. As far as I know, you can only set the download directory before creating the driver instance.
In each of these cases, if you plan to download multiple files I would rename each file as you download them or move them into some known directory to make it easier on yourself.
There's another method I ran across in another answer.

Python write web scraper to get pdf files without several logins

How can I download all the pdf (or specific extension files like .tif or .pdf) from a webpage that requires login. I dont want to log in everytime for every pdf so I cant use link generation and pushing to browser scheme
The solution was simple: just posting it for others may have the same question
mydriver.get("https://username:password#www.somewebsite.com/somelink")

Python getting files from sourceforge

Hello I am making a program, and I need it to download an exe from sourceforge. I have the download link which leads it to the "wait 5 seconds thing". How can I download the file from it, and save it to the cwd?
The HTML file you get contains a refresh link:
<meta http-equiv="refresh" content="5; url=https://downloads.sourceforge.net/project/...">
You can search the HTML document for that element, extract the url, then download that.
However, remember to respect the robots.txt file. I.e. have a delay of at least one second between requests and do not try to download disallowed paths.
Edit: Actually, the downloads subdomain has its own robots.txt that prohibits all automated downloads, so you should not do this. You could e.g. open a link in the user's web browser instead.

Categories