Get URLS of files in Dropbox folder in python

Get URLS of files in Dropbox folder in python - python

I have a bunch of folders in Dropbox with pictures in them, and I'm trying to get a list of URLs for all of the pictures in a specific folder.
import requests
import json
import dropbox
TOKEN = 'my_access_token'
dbx = dropbox.Dropbox(TOKEN)
for entry in dbx.files_list_folder('/Main/Test').entries:
# print(entry.name)
print(entry.file_requests.FileRequest.url)
# print(entry.files.Metadata.path_lower)
# print(entry.file_properties.PropertyField)
printing the entry name correctly lists all of the file names in the folder, but everything else says 'FileMetadata' object has no attribute 'get_url'.

The files_list_folder method returns a ListFolderResult, where ListFolderResult.entries is a list of Metadata. Files in particular are FileMetadata.
Also, note that you aren't guaranteed to get everything back from files_list_folder method, so make sure you implement files_list_folder_continue as well. Refer to the documentation for more information.
The kind of link you mentioned is a shared link. FileMetadata don't themselves contain a link like that. You can get the path from path_lower though. For example, in the for loop in your code, that would look like print(entry.path_lower).
You should use sharing_list_shared_links to list existing links, and/or sharing_create_shared_link_with_settings to create shared links for any particular file as needed.

Related

get list of gsutil URI inside a specific bucket folder to iterate through

I'm very new to GCS and using it with python.
I have GCS bucket called "my_data", and inside it there are many folders,I'm interested in folder called "ABC" and sub folder inside called "WW3".
I want to get list of the gsutil URI (not the blobs) inside a specific folder inside the bucket, called "ABC", so I can open them as pandas data frame and concatenate them.
Until now I was able to get list of blobs like this(I have used this post and this video to do that):
my_bucket=storage_client.get_bucket("my_data")
# Get blobs in specific subirectory
# Get blobs in specific subirectory
blobs_specific = list(my_bucket.list_blobs(prefix='ABC/WW3/'))
>>>
#printing blob_specific gives me the blocs like this as list:
[<Blob: my_data, ABC/S3/, 12231543135681432>,...,.......]
I would like to get list of URL that looks like this:
["gs://my_data/ABC/WW3/tab1.csv","gs://my_data/ABC/WW3/tab2.csv","gs://my_data/ABC/WW3/tab3.csv"...]
So I can later open them with pandas and concatenate them.
Is there a way I can get the list URLs and not the blobs?
or, if I can somehow use the blob to concatenate the csv and read as pandas ...
Edit:
I have tried to solve it by spliting the blob and then access the files and it seems like it creates list of url but it doesn't really what it looks and it is not very smart:
urls=[]
for x,y in enumerate(blobs_specific):
first_part="gs://my_data/WW3/"
scnd_part=str(blobs_specific[x]).split(',')[1]
url=first_part+scnd_part
urls.append(url)
However, when I try to iterate with this list, it fails. and seems like it prints different url then what it saves:
urls[1]
>>>'gs://my_data/WW3/ ABC/tab1.csv'
#seems like it has space between the / and the "ABC" and then when I try to read it with pandas I get path not found:
file_path = urls[1]
df = pd.read_csv(file_path,
sep=",",
storage_options={"token": "my_secret_token-20g8g632vsk1.json"})
>>>
#this is a bit different than the original because I couldn't put the real name but it gets the b and o and weird characters that don't appear when I print the path....
FileNotFoundError: b/my_data/o/%20WW3%2ABC%2FS1%2FABCtab1.csv

I have found a solution for this by using .lstrip() , however, if someoe have smarter solution I would like to learn:)
urls=[]
for x,y in enumerate(blobs_specific):
first_part="gs://my_data/WW3/"
scnd_part=str(blobs_specific[x]).split(',')[1].lstrip()
url=first_part+scnd_part
urls.append(url)
it might be that the gs://my_data will be a bit different in your case, make sure you take the right path

How do I access a jpg file's tags (from "Properties") in Python? [duplicate]

How do I access the tags attribute here in the Windows File Properties panel?
Are there any modules I can use? Most google searches yield properties related to media files, file access times, but not much related to metadata properties like Tags, Description etc.
the exif module was able to access a lot more properties than most of what I've been able to find, but still, it wasn't able to read the 'Tags' property.
The Description -> Tags property is what I want to read and write to a file.

There's an entire module dedicated to exactly what I wanted: IPTCInfo3.
import iptcinfo3, os, sys, random, string
# Random string gennerator
rnd = lambda length=3 : ''.join(random.choices(list(string.ascii_letters), k=length))
# Path to the file, open a IPTCInfo object
path = os.path.join(sys.path[0], 'DSC_7960.jpg')
info = iptcinfo3.IPTCInfo(path)
# Show the keywords
print(info['keywords'])
# Add a keyword and save
info['keywords'] = [rnd()]
info.save()
# Remove the weird ghost file created after saving
os.remove(path + '~')
I'm not particularly sure what the ghost file is or does, it looks to be an exact copy of the original file since the file size remains the same, but regardless, I remove it since it's completely useless to fulfilling the read/write purposes of metadata I need.
There have been some weird behaviours I've noticed while setting the keywords, like some get swallowed up into the file (the file size changed, I know they're there, but Windows doesn't acknowledge this), and only after manually deleting the keywords do they reappear suddenly. Very strange.

How to have multiple programs access the same file without manually giving them all the file path?

I'm writing several related python programs that need to access the same file however, this file will be updated/replaced intermittently and I need them all to access the new file. My current idea is to have a specific folder where the latest file is placed whenever it needs to be replaced and was curious how I could have python select whatever text file is in the folder.
Or, would I be better off creating a program that has a Class entirely dedicated to holding the information of the file and have each program reference the file in that class. I could have the Class use tkinter.filedialog to select a new file whenever necessary and perhaps have a text file that has the path or name to the file that I need to access and have the other programs reference that.
Edit: I don't need to write to the file at all just read from it. However, I would like to have it so that I do not need to manually update the path to the file every time I run the program or update the file path.
Edit2: Changed title to suit the question more

If the requirement is to get the most recently modified file in a specific directory:
import os
mypath = r'C:\path\to\wherever'
myfiles = [(f,os.stat(os.path.join(mypath,f)).st_mtime) for f in os.listdir(mypath)]
mysortedfiles = sorted(myfiles,key=lambda x: x[1],reverse=True)
print('Most recently updated: %s'%mysortedfiles[0][0])
Basically, get a list of files in the directory, together with their modified time as a list of tuples, sort on modified date, then get the one you want.

It sounds like you're looking for a singleton pattern, which is a neat way of hiding a lot of logic into an 'only one instance' object.
This means the logic for identifying, retrieving, and delivering the file is all in one place, and your programs interact with it by saying 'give me the one instance of that thing'. If you need to alter how it identifies, retrieves, or delivers what that one thing is, you can keep that hidden.
It's worth noting that the singleton pattern can be considered an antipattern as it's a form of global state, it depends on the context of the program if this is a deal breaker or not.

To "have python select whatever text file is in the folder", you could use the glob library to get a list of file(s) in the directory, see: https://docs.python.org/2/library/glob.html
You can also use os.listdir() to list all of the files in a directory, without matching pattern names.
Then, open() and read() whatever file or files you find in that directory.

How does one use ZipInfo in Python?

Could someone please explain how exactly ZipInfo is supposed to be used? It says that ZipInfo.comment can access "comment for the individual archive member"
I didn't even know archive members can have comments %\ ...
I tried getting it with:
data = zipfile.ZipFile('filename')
info = data.infolist()
but what I'm getting looks like:
[<zipfile.ZipInfo object at 0x0257DBF8>, <zipfile.ZipInfo object at 0x026A7030>, <zipfile.ZipInfo object at 0x026A7098>, ... ]
I don't know what that means :(
Also, i can't seem to call zipinfo.comment at all, but from above it looks like infolist() is the same thing?
So confused...

Calling data.infolist() is giving you a list of ZipInfo objects. These are descriptions of all the individual files and directories stored inside your zip archive (and not the files/directories themselves). To manipulate these individual files/directories, you have to call a method of your ZipFile object data with the name from info. For example if you want to print the first 10 characters in each file you could run
for f in info:
data.read(f)[:10]

Google Docs python gdata 2.0.16 upload file to existing collection

I have managed to create a simple app which deletes (bypassing the recycle bin) any files I want to. It can also upload files. The problem I am having is that I cannot specify which collection the new file should be uploaded to.
def UploadFile(folder, filename, local_file, client):
print "Upload Resource"
doc = gdata.docs.data.Resource(type='document', title=filename)
path = _GetDataFilePath(local_file)
media = gdata.data.MediaSource()
media.SetFileHandle(path, 'application/octet-stream')
create_uri = gdata.docs.client.RESOURCE_UPLOAD_URI + '?convert=false'
collection_resource = folder
upload_doc = client.CreateResource(doc, create_uri=create_uri, collection=collection_resource, media=media)
print 'Created, and uploaded:', upload_doc.title, doc.resource_id
From what I understand the function CreateResources requires a resource object representing the collection. How do I get this object? The variable folder is currently just a string which says 'daily' which is the name of the collection, it is this variable which I need to replace with the collection resource.

From various sources, snippets and generally stuff all over the place I managed to work this out. You need to pass a uri to the FindAllResources function (one which I found no mention of in the sample code from gdata).
I have written up in more detail how I managed to upload, delete (bypassing the bin), search for and move files into collections
here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get URLS of files in Dropbox folder in python - python

Related

get list of gsutil URI inside a specific bucket folder to iterate through

How do I access a jpg file's tags (from "Properties") in Python? [duplicate]

How to have multiple programs access the same file without manually giving them all the file path?

How does one use ZipInfo in Python?

Google Docs python gdata 2.0.16 upload file to existing collection

Categories

Resources