I'm trying to download a few files using roboBrowser, URLLIB or any other python library, but I couldn't find a way to make it work.
Basically, I have a form which retrieves a .CSV file when is submitted, but I couldn't find any way to start this download.
I have submitted the form using RoboBrowser and URLLIB post but I couldn't reach the file
Form = browser.get_form(action=re.compile(r'downloadForm'))
Form ["d_screen_file"].value = "1"
browser.submit_form(Form , submit=programForm['download'])
or
action = browser.find('form', id='fx_form').get('action')
requests.post(action)
There is another way to submit this form/make this requisition to engage this download?
I figure out how to make it work:
Using requests I do it a post with stream=True
f = session.post(FormRequest, data=search_data, stream=True)
After that, I create a CSV file to receive this data and use a for loop to parse the data using iter_content and write in the file
with open("file.csv", 'wb') as s:
for chunk in f.iter_content(chunk_size=1024):
s.write(chunk)
Related
I need to open the page automatically and download the file returned by the server
I have a simple code to open the page and download the content. I am also pulling the headers so I know the name of the returned file. below is the code
downloadPageRequest = self.reqSession.get( self.url_file ,stream=True)
headers = downloadPageRequest.headers
if 'content-disposition' in headers:
file_name = re.findall("filename=(.+)", headers['content-disposition'])
that's what I got, it returns an array with the filename, but now I am stuck and have no idea how to open and go through returned excel file
this has to be done using requests, that's why i cannot use any other method (e.g selenium)
will be thankful for your support
I am trying to upload a file using python requests module and i am not sure whether we can use both data and files in the post call.
fileobj= open(filename,'rb')
upload_data = {
'data':payload,
'file':fileobj
}
resp = s.post(upload_url,data=upload_data,headers=upload_headers)
and this is not working. So can anyone help me with this ?
I think you should be using the data and files keyword parameters in the post request to send the data and file respectively.
with open(filename,'rb') as fileobj:
files = {'file': fileobj}
resp = s.post(upload_url,data=payload,files=files,headers=upload_headers)
I've also use a context manager just because it closes the file for me and takes care of exceptions that happen either during file opening or during something that happens with the requests post.
I have links of the form:
http://youtubeinmp3.com/fetch/?video=LINK_TO_YOUTUBE_VIDEO_HERE
If you put links of this type in an <a> tag on a webpage, clicking them will download an MP3 of the youtube video at the end of the link. Source is here.
I'd like to mimic this process from the command-line by making post requests (or something of that sort), but I'm not sure how to do it in Python! Can I get any advice, please, or is this more difficult than I'm making it out to be?
As Mark Ma mentioned, you can get it done without leaving the standard library by utilizing urllib2. I like to use Requests, so I cooked this up:
import os
import requests
dump_directory = os.path.join(os.getcwd(), 'mp3')
os.makedirs(dump_directory, exist_ok=True)
def dump_mp3_for(resource):
payload = {
'api': 'advanced',
'format': 'JSON',
'video': resource
}
initial_request = requests.get('http://youtubeinmp3.com/fetch/', params=payload)
if initial_request.status_code == 200: # good to go
download_mp3_at(initial_request)
def download_mp3_at(initial_request):
j = initial_request.json()
filename = '{0}.mp3'.format(j['title'])
r = requests.get(j['link'], stream=True)
with open(os.path.join(dump_directory, filename), 'wb') as f:
print('Dumping "{0}"...'.format(filename))
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush()
It's then trivial to iterate over a list of YouTube video links and pass them into dump_mp3_for() one-by-one.
for video in ['http://www.youtube.com/watch?v=i62Zjga8JOM']:
dump_mp3_for(video)
In its API Doc, it provides one version of URL which returns download link as JSON: http://youtubeinmp3.com/fetch/?api=advanced&format=JSON&video=http://www.youtube.com/watch?v=i62Zjga8JOM
Ok Then we can use urllib2 to call the API and fetch API result, then unserialize with json.loads(), and download mp3 file using urllib2 again.
import urllib2
import json
r = urllib2.urlopen('http://youtubeinmp3.com/fetch/?api=advanced&format=JSON&video=http://www.youtube.com/watch?v=i62Zjga8JOM')
content = r.read()
# extract download link
download_url = json.loads(content)['link']
download_content = urllib2.urlopen(download_url).read()
# save downloaded content to file
f = open('test.mp3', 'wb')
f.write(download_content)
f.close()
Notice the file should be opened using mode 'wb', otherwise the mp3 file cannot be played correctly.
If the file is big, downloading will be a time-consuming progress. And here is a post describes how to display downloading progress in GUI (PySide)
If you want to download video or just the audio from YouTube you can use this module pytube it does all the hard work.
You can also list the audio only:
from pytube import YouTube
# initialize a YouTube object by the url
yt = YouTube("YOUTUBE_URL")
# that will get all audio files available
audio_list = yt.streams.filter(only_audio=True).all()
print(audio_list)
And then download it:
# that will download the file to current working directory
yt.streams.filter(only_audio=True)[0].download()
Complete Code:
from pytube import YouTube
yt = YouTube ("YOUTUBE_URL")
audio = yt.streams.filter(only_audio=True).first()
audio.download()
I'm writing a script in python and I'm trying to wrap my head around a problem. I've a URL that when opened, downloads a document. I'm trying to write a python script that opens the https URL that downloads this document, and automatically send that document to a server I have opened using python's pysftp module.
I can't wrap my head around how to do this... Do you think I'd be able to just do:
server.put(urllib.open('https://......./document'))
EDIT:
This is the code I've tried before the above doesn't work...
download_file = urllib2.urlopen('https://somewebsite.com/file.csv')
file_contents = download_file.read().replace('"', '')
columns = [x.strip() for x in file_contents.split(',')]
# Write Downloaded File Contents To New CSV File
with open('file.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(columns)
# Upload New File To Server
srv.put('./file.csv', './SERVERFOLDER/file.csv')
ALSO:
How would I go about getting a FILE that is ONE DAY old from the server? (Examining age of each file)... using paramiko
Is there a way I can download all/some the image files (e.g. JPG/PNG) from a Google Images search result?
I can use the following code to download one image that I already know its url:
import urllib.request
file = "Facts.jpg" # file to be written to
url = "http://www.compassion.com/Images/Hunger-Facts.jpg"
response = urllib.request.urlopen (url)
fh = open(file, "wb") #open the file for writing
fh.write(response.read()) # read from request while writing to file
To download multiple images, it has been suggested that I define a function and use that function to repeat the task for each image url that I would like to write to disk:
def image_request(url, file):
response = urllib.request.urlopen(url)
fh = open(file, "wb") #open the file for writing
fh.write(response.read())
And then loop over a list of urls with:
for i, url in enumerate(urllist):
image_request(url, str(i) + ".jpg")
However, what I really want to do is download all/some image files (e.g. JPG/PNG) from my own search result from Google Images without necessarily having a list of the image urls beforehand.
P.S.
Please I am a complete beginner and would favour an answer that breaks down the broad steps to achieve this over one that is bogs down on specific codes. Thanks.
You can use the Google API like this, where BLUE and DOG are your search parameters:
https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=BLUE%20DOG
There is a developer guide about this here:
https://developers.google.com/image-search/v1/jsondevguide
You need to parse this JSON format before you can use the links directly.
Here's a start to your JSON parsing:
import json
j = json.loads('{"one" : "1", "two" : "2", "three" : "3"}')
print(j['two'])