Swagger - Does not support downloading of multiple files? - python

I am trying to create a response that will return 3 files with one request.
However, if you give it to the response body, I am in trouble because I do not know whether it can be realized.
The method of generating the response body that I am about to deliver is trying to return using python's MultipartEncoder
[ response body ]
※Boundary generation is also done
--dd7457a7dc684f32b2fd26ec468ed4b8
Content-Disposition: form-data; name=file1; filename="test1"
Content-Type: application/octet-stream
test1 sample
--dd7457a7dc684f32b2fd26ec468ed4b8
Content-Disposition: form-data; name=file2; filename="test2"
Content-Type: application/octet-stream
test2 sample
--dd7457a7dc684f32b2fd26ec468ed4b8
Content-Disposition: form-data; name=file3; filename="test3"
Content-Type: application/octet-stream
test3 sample
--dd7457a7dc684f32b2fd26ec468ed4b8--
Body as above
The following header
response.headers["Content-Type"] = 'multipart/form-data`
I know that swagger-ui.js creates a download link with the fileapi's blob library, but download three files via the download link of three files or one download link using the blob library it can
Is there such a way?
It is already possible to do a method of consolidating files into tar or zip and then doing DL and json format.
I would like to ask if there is any way.
[version]
swagger-ui 2.2.10
Python 3.4.4
flask 0.10.1

Related

Extracting file name from url when its name is not in url

So I wanted to create a download manager, which can download multiple files automatically. I had a problem however with extracting the name of the downloaded file from the url. I tried an answer to How to extract a filename from a URL and append a word to it?, more specifically
a = urlparse(URL)
file = os.path.basename(a.path)
but all of them, including the one shown, break when you have a url such as
URL = https://calibre-ebook.com/dist/win64
Downloading it in Microsoft Edge gives you file with the name of calibre-64bit-6.5.0.msi, but downloading it with python, and using the method from the other question to extract the name of the file, gives you win64 instead, which is the intended file.
The URL https://calibre-ebook.com/dist/win64 is a HTTP 302 redirect to another URL https://download.calibre-ebook.com/6.5.0/calibre-64bit-6.5.0.msi. You can see this by running a HEAD request, for example in a macOS/Linux terminal (note 302 and the location header):
$ curl --head https://calibre-ebook.com/dist/win64
HTTP/2 302
server: nginx
date: Wed, 21 Sep 2022 16:54:49 GMT
content-type: text/html
content-length: 138
location: https://download.calibre-ebook.com/6.5.0/calibre-64bit-6.5.0.msi
The browser follows the HTTP redirect and downloads the file, naming it based on the last URL. If you'd like to do the same in Python, you also need to get to the last URL and use that as the file name. The requests library might or might not follow these redirects depending on the version, better to explicitly use allow_redirects=True.
With requests==2.28.1 this code returns the last URL:
import requests
requests.head('https://calibre-ebook.com/dist/win64', allow_redirects=True).url
# 'https://download.calibre-ebook.com/6.5.0/calibre-64bit-6.5.0.msi'
If you'd like to solve it with built-in modules so you won't need to install external libs like requests you can also achieve the same with urllib:
import urllib.request
opener=urllib.request.build_opener()
opener.open('https://calibre-ebook.com/dist/win64').geturl()
# 'https://download.calibre-ebook.com/6.5.0/calibre-64bit-6.5.0.msi'
Then you can split the lat URL by / and get the last section as the file name, for example:
import urllib.request
opener=urllib.request.build_opener()
url = opener.open('https://calibre-ebook.com/dist/win64').geturl()
url.split('/')[-1]
# 'calibre-64bit-6.5.0.msi'
I was using urllib3==1.26.12, requests==2.28.1 and Python 3.8.9 in the examples, if you are using much older versions they might behave differently and might need extra flags to ensure redirects.
The URL results in a 302 redirect, so you don't have enough information with just the URL to get that basename. You have to get the URL from 302 response.
import requests
resp = requests.head("https://calibre-ebook.com/dist/win64")
print(resp.status_code, resp.headers['location'])
>>> 302 https://download.calibre-ebook.com/6.5.0/calibre-64bit-6.5.0.msi
You'd want to have more intelligent handling obviously in case it's not a 302. And you'd want to loop in case the new URL results in another redirect.

How do I limit request data being sent from server?

I'm trying to get a lot of requests, but I only need a part of the data near the start of the webpage html. Since right now I'm requesting for the whole webpage each time I request, it takes a lot of network usage to loop it. Can I request only a section of a website html, with any module?
If you know the specific number of bytes that is enough, then you can request a partial "range" of the resource:
curl -q http://www.example.com -i -H "Range: bytes=0-50"
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Age: 506953
Cache-Control: max-age=604800
Content-Range: bytes 0-50/1256
...
Content-Length: 51
<!doctype html>
<html>
<head>
<title>Example Do%
See https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests

keep filename while uploading an url with python response library

I am using python to upload a file to an api with
url = 'http://domain.tld/api/upload'
files = {'file': open('image.jpg', 'rb')}
r = requests.post(url, files=files)
this works well and my file is uploaded to the server as image.jpg. Now I don't have a local files but an uri instead, so I changed my code to:
url = 'http://domain.tld/api/upload'
files = {'file': urlopen('http://domain.tld/path/to/image.jpg')}
r = requests.post(url, files=files)
the image is also uploaded sucessfully but it does not preserve it's name and is stored as 'file' (without extension). My question is, how can I upload an url while keeping it's filename (Of course without downloading it first)
You can pass the name:
files = {'name': ('image.jpg', urlopen('http://domain.tld/path/to/image.jpg'))}
If you look at the post body you will see for your variation:
Content-Disposition: form-data; name="file"; filename="file"
And using the code above:
Content-Disposition: form-data; name="name"; filename="image.jpg"
You can see the name is retained in the latter.

HTML: Get direct link to file from embed src

I want to know how to get the direct link to an embedded video (the link to the .flv/.mp4 or whatever file) from just the embed link.
For example, http://www.kumby.com/ano-hana-episode-1/ has
<embed src="http://www.4shared.com/embed/571660264/396a46be"></embed>
, though the link to the video seems to be
"http://dc436.4shared.com/img/571660264/396a46be/dlink__2Fdownload_2FM2b0O5Rr_3Ftsid_3D20120514-093834-29c48ef9/preview.flv"
How does the browser know where to load the video from? How can I write code that converts the embed link to a direct link?
UPDATE:
Thanks for the quick answer, Quentin.
However, I don't seem to receive a 'Location' header when connecting to "http://www.4shared.com/embed/571660264/396a46be".
import urllib2
r=urllib2.urlopen('http://www.4shared.com/embed/571660264/396a46be')
gives me the following headers:
'content-length', 'via', 'x-cache', 'accept-ranges', 'server', 'x-cache-lookup', 'last-modified', 'connection', 'etag', 'date', 'content-type', 'x-jsl'
from urllib2 import Request
r=Request('http://www.4shared.com/embed/571660264/396a46be')
gives me no headers at all.
The server issues a 302 HTTP status code and a Location header.
$ curl -I http://www.4shared.com/embed/571660264/396a46be
HTTP/1.1 302 Moved Temporarily
Server: Apache-Coyote/1.1
(snip cookies)
Location: http://static.4shared.com/flash/player/5.6/player.swf?file=http://dc436.4shared.com/img/M2b0O5Rr/gg_Ano_Hi_Mita_Hana_no_Namae_o.flv&provider=image&image=http://dc436.4shared.com/img/M2b0O5Rr/gg_Ano_Hi_Mita_Hana_no_Namae_o.flv&displayclick=link&link=http://www.4shared.com/video/M2b0O5Rr/gg_Ano_Hi_Mita_Hana_no_Namae_o.html&controlbar=none
Content-Length: 0
Date: Mon, 14 May 2012 10:01:59 GMT
See How do I prevent Python's urllib(2) from following a redirect if you want to get information about the redirect response instead of following the redirect automatically.

Download a URL only if it is a HTML Webpage

I want to write a python script which downloads the web-page only if the web-page contains HTML. I know that content-type in header will be used. Please suggest someway to do it as i am unable to get a way to get header before the file download.
Use http.client to send a HEAD request to the URL. This will return only the headers for the resource then you can look at the content-type header and see if it text/html. If it is then send a GET request to the URL to get the body.

Categories