How does one use ZipInfo in Python? - python

Could someone please explain how exactly ZipInfo is supposed to be used? It says that ZipInfo.comment can access "comment for the individual archive member"
I didn't even know archive members can have comments %\ ...
I tried getting it with:
data = zipfile.ZipFile('filename')
info = data.infolist()
but what I'm getting looks like:
[<zipfile.ZipInfo object at 0x0257DBF8>, <zipfile.ZipInfo object at 0x026A7030>, <zipfile.ZipInfo object at 0x026A7098>, ... ]
I don't know what that means :(
Also, i can't seem to call zipinfo.comment at all, but from above it looks like infolist() is the same thing?
So confused...

Calling data.infolist() is giving you a list of ZipInfo objects. These are descriptions of all the individual files and directories stored inside your zip archive (and not the files/directories themselves). To manipulate these individual files/directories, you have to call a method of your ZipFile object data with the name from info. For example if you want to print the first 10 characters in each file you could run
for f in info:
data.read(f)[:10]

Related

Get URLS of files in Dropbox folder in python

I have a bunch of folders in Dropbox with pictures in them, and I'm trying to get a list of URLs for all of the pictures in a specific folder.
import requests
import json
import dropbox
TOKEN = 'my_access_token'
dbx = dropbox.Dropbox(TOKEN)
for entry in dbx.files_list_folder('/Main/Test').entries:
# print(entry.name)
print(entry.file_requests.FileRequest.url)
# print(entry.files.Metadata.path_lower)
# print(entry.file_properties.PropertyField)
printing the entry name correctly lists all of the file names in the folder, but everything else says 'FileMetadata' object has no attribute 'get_url'.
The files_list_folder method returns a ListFolderResult, where ListFolderResult.entries is a list of Metadata. Files in particular are FileMetadata.
Also, note that you aren't guaranteed to get everything back from files_list_folder method, so make sure you implement files_list_folder_continue as well. Refer to the documentation for more information.
The kind of link you mentioned is a shared link. FileMetadata don't themselves contain a link like that. You can get the path from path_lower though. For example, in the for loop in your code, that would look like print(entry.path_lower).
You should use sharing_list_shared_links to list existing links, and/or sharing_create_shared_link_with_settings to create shared links for any particular file as needed.

Getting the file name of latest unpacked file using shutil.unpack_archive

I am using shututil to unpack my archive like this:
shutil.unpack_archive(file_path, extract_dir=extract_path)
Unpacking works fine, but I want to be able to store the latest unpacked file name into a variable somehow and then use it later. In the documentation (shutil), I don't see how I can access the filename inside the archive. (note that the unpacked filename might differ from the unpacked archive)
Any ideas on how I can achieve this or even print the status of the extraction?
You can use os.listdir(..) before and after extraction.

How to work with CSV files inside a zipped folder?

I'm working with zipped files in python for the first time, and I'm stumped.
I read the documentation for zipfile, but I'm not sure what would be the best way to do what I'm trying to do. I have a zipped folder with CSV files inside, and I'd like to be able to open the zip file, and retrieve certain values from the csv files inside.
Do I use zipfile.extract(file name here) to bring it to the current working directory? And if I do that, do I just use the file name to work with the file, or does this index or list them differently?
Currently, I manually extract all files in the zipped folder to the current working directory for my project, and then use the csv module to read them. All I'm really trying to do is remove that step.
Any and all help would be greatly appreciated!
You are looking to avoid extracting to disk, in the zip docs for python there is ZipFile.open() which gives you a file-like object. That is an object that mostly behaves like a regular file on disk, but it is in memory. It gives a bytes array when read, at least in py3.
Something like this...
from zipfile import ZipFile
import csv
with ZipFile('abc.zip') as myzip:
print(myzip.filelist)
for mf in myzip.filelist:
with myzip.open(mf.filename) as myfile:
mc = myfile.read()
c = csv.StringIO(mc.decode())
for row in c:
print(row)
The documentation of Python is actually quite good once one has learned how to find things as well as some of the basic programming terms/descriptions used in the documentation.
For some reason csv.BytesIO is not implemented, hence the extra step via csv.StringIO.

vim complete() with result from lvim

I am trying to create a function that will pop up a list of file includes the word "Module"(case insensitive).
I tried :lvim /Module/gj *.f90 when all *.f90 is in current dir, but I failed to make a globpath() like expand so that I can include and subdirs.
So, I turned to python. From python, I am getting the list perfectly. I am inserting the python code, which will possibly show my goal:
#!/usr/bin/python
import os
import re
flsts = []
path = "/home/rudra/Devel/dream/"
print("All files==>")
for dirs, subdirs, files in os.walk(path):
for tfile in files:
if tfile.endswith('f90'):
print(os.path.splitext(tfile)[0])
text = open(dirs+'/'+tfile).read()
match = re.search("Module", text)
if match:
flsts.append(os.path.splitext(tfile)[0])
print("The list to be used ==>")
print(flsts)
after having the list, I want a
complete(col('.')), flsts)
The problem is, I am unable to include it inside vim function.
May I kindly have some help, so that I can get a list from vim and use it in the complete function?
I have checked this as a possible solution, but unfortunately it is not.
Kindly help.
edit: More explanation
So, say, in my work-dir, i have:
$tree */*.f90
OLD/dirac.f90
OLD/environment.f90
src/constants.f90
src/gencrystal.f90
src/geninp.f90
src/init.f90
among them, only two has word module in it:
$ grep Module */*.f90
OLD/dirac.f90: 10 :module mdirac
src/constants.f90: 2 :module constants
So, I want, with a inoremap, complete() to pop up only constants and dirac.
Hence, Module is the keyword I am searching in the subdirs of present working directory, and only those file matches (dirac and constants in this example) should pop up in complete()
I'm not sure what your exact problem is.
With split(globpath('./**/Module/**', '*.f90'), '\n') you will obtain the list of all files that match *.f90, and which are somewhere within a directory named Module.
Then, using complete() has a few restrictions. It has to be from a function that will be called from insert mode, and that returns an empty string.
By itself, complete() will insert the selected text, if we play with the {starcol} parameter, we can even remove what's before the cursor. This way, you can type Module, hit the key you want and use Module to filter.
function! s:Complete()
" From lh-vim-lib: word_tools.vim
let key = GetCurrentKeyword()
let files = split(glob('./**/*'.key.'*/**', '*.vim'), '\n')
call complete(col('.')-len(key), files )
return ''
endfunction
inoremap ยต <c-R>=<sid>Complete()<cr>
However, if you want to trigger an action (instead of inserting text), it becomes much more complex. I did that in muTemplate. I've published the framework used to associate hooks to completion items in lh-vim-lib (See lh#icomplete#*() functions).
EDIT: OK, then, I'll work with let files=split(system("grep --include=*.f90 -Ril module *"), '\n') to obtain the list of files, then call complete(col('.'), files) with that list. That should be the more efficient solution. This is somehow quite similar to Ingo's solution. The difference is that we don't need Python if grep is available.
Regarding Python integration, well it's possible with :py vim.command(). See for instance jira-complete that integrates complete() with a Python script that builds the completion-list: https://github.com/mnpk/vim-jira-complete/blob/master/autoload/jira.vim#L116
Notes:
if "module:" can be pre-searched with ctags, it will to possible to extract your files from tags database with taglist().
It's also possible to fill dynamically the list of files with complete_add(), which is something that would make sense from a python script that tests each file one after the other.
There's an example at :help complete() that you can adapt. If you modify your Python script to output just the (newline-separated) files, you can invoke it via system():
inoremap <F5> <C-R>=FindFiles()<CR>
function! FindFiles()
call complete(col('.'), split(system('python path/to/script.py'), '\n'))
return ''
endfunction

Over-riding os.walk to return a generator object as the third item

While checking the efficiency of os.walk, I created 6,00,000 files with the string Hello <number> (where number is just a number indicating the number of the file in the directory), e.g. the contents of the files in the directory would look like:-
File Name | Contents
1.txt | Hello 1
2.txt | Hello 2
.
.
600000.txt|Hello 600000
Now, I ran the following code:-
a= os.walk(os.path.join(os.getcwd(),'too_many_same_type_files')) ## Here, I am just passing the actual path where those 6,00,000 txt files are present
print a.next()
The problem what I felt was that the a.next() takes too much time and memory, because the 3rd item that a.next() would return is the list of files in the directory (which has 600000 items). So, I am trying to figure out a way to reduce the space complexity (at least) by somehow making a.next() to return a generator object as the 3rd item of the tuple, instead of list of file names.
Would that be a good idea to reduce the space complexity?
As folks have mentioned already, 600,000 files in a directory is a bad idea. Initially I thought that there's really no way to do this because of how you get access to the file list, but it turns out that I'm wrong. You could use the following steps to achieve what you want:
Use subprocess or os.system to call ls or dir (whatever OS you happen to be on). Direct the output of that command to a temporary file (say /tmp/myfiles or something. In Python there's a module that can return you a new tmp file).
Open that file for reading in Python.
File objects are iterable and will return each line, so as long as you have just the filenames, you'll be fine.
It's such a good idea, that's the way the underlying C API works!
If you can get access to readdir, you can do it: unfortunately this isn't directly exposed by Python.
This question shows two approaches (both with drawbacks).
A cleaner approach would be to write a module in C to expose the functionality you want.
os.walk calls listdir() under the hood to retrieve the contents of the root directory then proceeds to split the returned list of items to dirs and non-dirs.
To achieve what you want you'll need to dig much lower down and implement not only your own version of walk() but also an alternative listdir() that returns a generator. Note that even then you will not be able to provide independent generators for both dirs and files unless you make two separate calls to the modifiedlistdir() and filter the results on the fly.
As suggested by Sven in the comments above, it might be better to address the actual problem (too many files in a dir) rather than over-engineer a solution.

Categories