Using python to open various links

Using python to open various links - python

This is my first post here and i hope i get my answers.
I want to open various links from my ftp server, and do some stuff in them. My links are http://mypage/photos0001/ , /photos002/, /photos003/ etc.
How can i write a script to open all of them and do the same job in all of them?
I tried:
Link = 'http://mypage/photos0001/' + 1
To do something like loop, but this doesn't work of course.
Any help?

Without being able to see your actual FTP directory tree, this may be a little difficult, but hopefully the following can get you started.
Consider reading up on ftplib for more information (see Docs)
import ftplib
ftp = ftplib.FTP('mypage')
ftp.login()
for dir in ftp.nlst():
if 'photos' in dir:
ftp.cwd('/mypage/{}'.format(dir))
for file in ftp.nlst():
if file.endswith('.jpg'):
try:
print('Attempting to download {}...'.format(file), end=' ')
with open(file, 'wb') as f:
ftp.retbinary('RETR ' + file, f.write, 8*1024)
print('[SUCCESS]')
except Exception as e:
print('[FAILED]')
print(e)
ftp.close()
So let's try and run through what is going on here:
Log in to your FTP server mypage.
List all the directories found in the root directory of your server.
If the folder name contains 'photos' then change working directory into that folder.
List all the files in this photos sub-folder.
If the file ends in .jpg its probably a picture we want.
Create a file on your system with the same name, and download the picture into it.
Repeat.
Now, expect to run into problems when you directory tree turns out to be slightly different than you've described to use here; however, you should be able to modify the example to fit your server. I do know this code works, as I have been able to use it to recursively download .html files from ftp.debian.org.

Related

Intermittent "No such file or directory" and permission errors when opening files (in a loop) on mounted FTP drive (linux)? Sync issue?

Getting errors like
FileNotFoundError: [Errno 2] No such file or directory: '/path/to/files/file.pdf'
when trying to loop through and open files in a mounted FTP drive (mounted via curlftpfs 'myuser:mypassword'#MY.SERVER.IP /path/to/files). I suspect sync issues, as the mounted drive is from another server on our network.
I can see that the file is there, can open it manually, can ls '/path/to/files/file.pdf' to see the file, but when executing...
FILES = os.listdir('/path/to/files')
FILES.sort()
.
.
.
for file in FILES:
with open(os.path.join('/path/to/files', 'file.pdf'), 'rb') as fd:
do stuff
... I sometimes get the FileNotFOundError.
More confusing, I can actually open this file (using the same path string that the error message tells me is not a file or directory) separately by just starting a python interactive shell and run something like...
fd = open('/path/to/files/file.pdf', 'rb')
fd.read()
...so IDK what the issue could be when reading it in a list of files.
Any debugging ideas or ideas of what could be causing this? Could there be some kind of timing/sync issues between reading the files on the mounted FTP drive vs the script that is running locally (and how to fix)?
* UPDATE:
Oddly, printing the target path before trying to open the file like...
print(os.path.join('/path/to/files', 'file.pdf'))
time.sleep(2) # giving even more time after initial access
with open(os.path.join('/path/to/files', 'file.pdf'), 'rb') as fd:
do stuff
...seems to help (kinda). Now also randomly throws PermissionErrors for random files that I had no problem reading before (still occasionally throws FileNotFoundErrors) and that I can actually open when accessing individually in python interactive shell. Makes me moreso think it is some kind of sync issue. Will need to investigate more.

It seems that os.path is a module, it will return an error if you use it like os.path('/path/to/files/file.pdf')
But I think it's not the cause of FileNotFOundError.

zipfile write dont find files in gcloud

Im trying zip a few files from Google Storage.
The zipfile of Python doesnt find the files in gcloud, just in the project.
How can I do for my code find the files in gcloud?
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for revenue in revenues:
# queryset with files a lot, so, for a each file, add in zip
t = tempfile.NamedTemporaryFile()
t.write(revenue.revenue.name)
if revenue.revenue.name:
t.seek(0)
with default_storage.open(revenue.revenue.name, "r") as file_data:
zip_file.write(file_data.name, compress_type=zipfile.ZIP_DEFLATED)
# the code dont pass from this part
t.close()
response = HttpResponse(content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename=my_zip.zip'
response.write(zip_buffer.getvalue())
return response
In this part, I write the file that I opened from gcloud, but stop inside the function:
def write(self, filename, arcname=None, compress_type=None):
"""Put the bytes from filename into the archive under the name
arcname."""
if not self.fp:
raise RuntimeError(
"Attempt to write to ZIP archive that was already closed")
st = os.stat(filename)
# when I try find the file, the command os.stat search in project, not in gcloud
the "os.stat(filename)" search for a file in project, how can I do for find in the gcloud?

I will post my findings as an answer, since I would like to comment about few things.
I have understood:
You have a Python library zipfile that is used to work with ZIP files.
You are looking for files locally and add one by one into the ZIP file.
You would like to do this as well for files located in Google Cloud Storage bucket. But it is failing to find the files.
If I have misunderstood the use-case scenario, please elaborate further in a comment.
However, if this is exactly what you are trying to do, then this is not supported. In the StackOverflow Question - Compress files saved in Google cloud storage, it is stated that compressing files that are already in the Google Cloud Storage is not possible. The solution in that question is to subscribe to newly created files and then download them locally, compress them and overwrite them in GCS. As you can see, you can list the files, or iterate through the files stored in GCS, but you first need to download them to be able to process them.
Work around
Therefore, in your use-case scenario, I would recommend the following workaround, by using the Python client API:
You can use Listing objects Python API, to get all the objects from GCS.
Then you can use Downloading objects Python API, to download the objects locally.
As soon as the objects are located in local directory, you can use the zipfile Python library to ZIP them together, as you are already doing it.
Then the objects are ZIPed and if you no longer need the downloaded objects, you can delete them with os.remove("downloaded_file.txt").
In case you need to have the compressed ZIP file in the Google Cloud Storage bucket, then you can use the Uploading objects Python API to upload the ZIP file in the GCS bucket.
As I have mentioned above, processing files (e.g. Adding them to a ZIP files etc.) directly in Google Cloud Storage bucket, is not supported. You first need to download them locally in order to do so. I hope that my workaround is going to be helpful to you.
UPDATE
As I have mentioned above, zipping files while they are in GCS bucket is not supported. Therefore I have prepared for you a working example in Python on how to use the workaround.
NOTE: As I am not professional on operating os commands with Python
library and I am not familiar with zipfile library, there is
probably a better and more efficient way of achieving this. However,
the code that can be found in this GitHub link, does the following
procedures:
Under #Public variables: section change BUCKET_NAME to your corresponding bucket name and execute the python script in Google Cloud Shell. Cloud Shell
Now my bucket structure is as follows:
gs://my-bucket/test.txt
gs://my-bucket/test1.txt
gs://my-bucket/test2.txt
gs://my-bucket/directory/test4.txt
When executing the command, what the app does is the following:
Will get the path of where the script is executed. e.g. /home/username/myapp.
It will create a temporary directory within this directory e.g. /home/username/myapp/temp
It will iterate through all the files located in the bucket that you have specified and will download them locally inside that temp directory.
NOTE: If the file in the bucket is under directory it will simple download the file, instead of creating that sub-directory again. You can modify the code to make it work as you desired later.
So the new downloaded files will look like this:
/home/username/myapp/temp/test.txt
/home/username/myapp/temp/test1.txt
/home/username/myapp/temp/test2.txt
/home/username/myapp/temp/test4.txt
After that, the code will zip all those files to a new zipedFile.zip that will be located in the same directory with the main.py script that you have executed.
When this step is done as well, the script will delete the directory /home/username/myapp/temp/ with all of its contents.
As I have mentioned above, after executing the script locally, you should be able to see the main.py and an zipedFile.zip file with all the zipped files from the GCS bucket. Now you can take the idea of implementation and modify it according to your project's needs.

the final code:
zip_buffer = io.BytesIO()
base_path = '/home/everton/compressedfiles/'
fiscal_compentecy_month = datetime.date(int(year), int(month), 1)
revenues = CompanyRevenue.objects.filter(company__pk=company_id, fiscal_compentecy_month=fiscal_compentecy_month)
if revenues.count() > 0:
path = base_path + str(revenues.first().company.user.pk) + "/"
zip_name = "{}-{}-{}-{}".format(revenues.first().company.external_id, revenues.first().company.external_name, month, year)
for revenue in revenues:
filename = revenue.revenue.name.split('revenues/')[1]
if not os.path.exists(path):
os.makedirs(path)
with open(path + filename, 'wb+') as file:
file.write(revenue.revenue.read())
file.close()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for file in os.listdir(path):
zip_file.write(path + file, compress_type=zipfile.ZIP_DEFLATED)
zip_file.close()
response = HttpResponse(content_type='application/x-zip-compressed')
response['Content-Disposition'] = 'attachment; filename={}.zip'.format(zip_name)
response.write(zip_buffer.getvalue())
shutil.rmtree(path)
return response

Kodi saving via special:// protocol gives ERRNO 2 (No such file or directory)

I have a problem with writing a kodi plugin.
I am listing an entry to view a stream that provides a preview image. But since kodi caches the images I thought of a way of requesting the image manually every time. To achieve that I want to save the image to the resources/cache directory of my plugin.
But I get the following error:
Error Contents: [Errno 2] No such file or directory: 'special://home/addon_data/[plugin]/resources/caches/preview_de.png'
My code is
f = urlopen(Request(url))
local_file = open(local, 'w'+mode)
local_file.write(f.read())
local_file.close()
I guess the special:// protocol is the problem, but what can I do to not only work on one machine?

You need to call translatePath() and use the returned string as url before you can use it.
Example:
local = xbmc.translatePath('special://home/addon_data/[plugin]/resources/caches/preview_de.png')
f = urlopen(Request(url))
local_file = open(local, 'w'+mode)
local_file.write(f.read())
local_file.close()
PS. To avoid caching of images, you might be able to archive by adding random GET data into your request.

Extracting .app from zip file in Python

(Python 2.7)
I have a program that will download a .zip file from a server, containing a .app file which I'd like to run. The .zip downloads fine from the server, and trying to extract it outside of Python works fine. However, when I try to extract the zip from Python, the .app doesn't run - it does not say the file is corrupted or damaged, it simply won't launch. I've tried this with other .app files, and I get the same problem, and was wondering if anyone else has had this problem before and a way to fix it?
The code I'm using:
for a in gArchives:
if (a['fname'].endswith(".build.zip") or a['fname'].endswith(".patch.zip")):
#try to extract: if not, delete corrupted zip
try :
zip_file = zipfile.ZipFile(a['fname'], 'r')
except:
os.remove(a['fname'])
for files in zip_file.namelist() :
#deletes local files in the zip that already exist
if os.path.exists(files) :
try :
os.remove(files)
except:
print("Cannot remove file")
try :
shutil.rmtree(files)
except:
print("Cannot remove directory")
try :
zip_file.extract(files)
except:
print("Extract failed")
zip_file.close()
I've also tried using zip_file.extractall(), and I get the same problem.

Testing on my macbook pro, the problem appears to be with the way Python extracts the files.
If you run
diff -r python_extracted_zip normal_extracted_zip
You will come into messages like this:
File Seashore.app/Contents/Frameworks/TIFF.framework/Resources is a directory while file here/Seashore.app/Contents/Frameworks/TIFF.framework/Resources is a regular file
So obviously the issue is with the filenames it's coming across as it's extracting them. You will need to implement some checking of the filenames as you extract them.
EDIT: It appears to be a bug within python 2.7.* as found here - Sourced from another question posted here.

Managed to resolve this myself - the problem was not to do with directories not being extracted correctly, but in fact with permissions as eri mentioned above.
When the files were being extracted with Python, the permissions were not being kept as they were inside the .zip, so all executable files were set to be not executable. This problem was resolved with a call to the following on all files I extracted, where 'path' is the path of the file:
os.chmod(path, 0755)

Determine if a listing is a directory or file in Python over FTP

Python has a standard library module ftplib to run FTP communications. It has two means of getting a listing of directory contents. One, FTP.nlst(), will return a list of the contents of a directory given a directory name as an argument. (It will return the name of a file if given a file name instead.) This is a robust way to list the contents of a directory but does not give any indication whether each item in the list is a file or directory. The other method is FTP.dir(), which gives a string formatted listing of the directory contents of the directory given as an argument (or of the file attributes, given a file name).
According to a previous question on Stack Overflow, parsing the results of dir() can be fragile (different servers may return different strings). I'm looking for some way to list just the directories contained within another directory over FTP, though. To the best of my knowledge, scraping for a d in the permissions part of the string is the only solution I've come up with, but I guess I can't guarantee that the permissions will appear in the same place between different servers. Is there a more robust solution to identifying directories over FTP?

Unfortunately FTP doesn't have a command to list just folders so parsing the results you get from ftp.dir() would be 'best'.
A simple app assuming a standard result from ls (not a windows ftp)
from ftplib import FTP
ftp = FTP(host, user, passwd)
for r in ftp.dir():
if r.upper().startswith('D'):
print r[58:] # Starting point
Standard FTP Commands
Custom FTP Commands

If the FTP server supports the MLSD command, then please check that answer for a couple of useful classes (FTPDirectory and FTPTree).

Another way is to assume everything is a directory and try and change into it. If this succeeds it is a directory, but if this throws an ftplib.error_perm it is probably a file. You can catch then catch the exception. Sure, this isn't really the safest, but neither is parsing the crazy string for leading 'd's.
Example
def processRecursive(ftp,directory):
ftp.cwd(directory)
#put whatever you want to do in each directory here
#when you have called processRecursive with a file,
#the command above will fail and you will return
#get the files and directories contained in the current directory
filenames = []
ftp.retrlines('NLST',filenames.append)
for name in filenames:
try:
processRecursive(ftp,name)
except ftplib.error_perm:
#put whatever you want to do with files here
#put whatever you want to do after processing the files
#and sub-directories of a directory here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.