Is it possible to open and read cythonized .so files with python?
The use-case is a test that scans all python files in a directory and evaluates if certain object attributes are used (to be ultimately able to identify and remove unused attributes).
This test runs perfectly on the local environment but in our CI that cythonizes all files this breaks, as .so files can't be parsed.
Currently I am scanning the files for the object attribute occurrences like this:
import os
path = '/path/to/dir'
attribute_regex = r'object\.(\w+)'
used_attributes = set()
for root, _, files in os.walk(path):
for file in files:
with open(os.path.join(root, file), 'r') as f:
used_attributes.update(re.findall(attribute_regex, f))
Maybe I am looking at this issue from the wrong angle, are there other more sophisticated ways to check if attributes of an object are used across multiple python files?
Related
I have a Python application in a directory dir. This directory has a __main__.py file and several data files that are read by the application using open(...,'r'). Without editing the code, it it possible to bundle the code and data files into a single zip file and execute it using something like python app.pyz
My goal is to share the file and data easily.
Running the application using python dir works fine.
If I make a zip file using python -m zipfile -c app.pyz dir/*, the resulting application will run but cannot read the files. This makes sense.
I can ask the customers to unzip the compressed folder before running or I could embed the files as strings within the code. That said, I'm curious of this can be avoided.
Can I bundle code and data into one file?
As of Python 3.9 you can use importlib.resources from the standard library. This module uses Python's import machinery to resolve the paths of data files as though they were modules inside a package.
Create a new package inside dir. Let's call it data. Make sure it has an __init__.py.
Add your data files to data. Let's say you added a text file text.txt and a binary file binary.dat.
Now from your __main__.py script or any part of your code with access to the module data, you can access files inside that package like so:
To read text.txt to memory as a string:
txt_file = importlib.resources.files("data").joinpath("text.txt").read_text(encoding="utf-8")
To read binary.dat to memory as bytes:
bin_file = importlib.resources.files("data").joinpath("binary.dat").read_bytes()
To open any file:
path = importlib.resources.files("data").joinpath("text.txt")
with path.open("rt", encoding="utf-8") as file:
lines = file.readlines()
# As streams:
textio_stream = importlib.resources.files("data").joinpath("text.txt").open("rt", encoding="utf-8")
bytesio_stream = importlib.resources.files("data").joinpath("binary.dat").open("rb")
If something requires an actual real file on the filesystem, or you simply want to wrap zipapp compatibility over existing code (e.g. with open()) without having to modify it:
# Old, incompatible with zipfiles.
file_path = "data/text.txt"
with open(file_path, "rt", encoding="utf-8") as file:
lines = file.readlines()
# New, compatible with zipfiles.
file_path = importlib.resources.files("data").joinpath("text.txt")
# If file is inside a zipfile, unzips it in a temporary file, then
# destroys it once the context manager closes. Otherwise, reads the file normally.
with importlib.resources.as_file(file_path) as path:
with open(path, "rt", encoding="utf-8") as file:
lines = file.readlines()
# Since it is a context manager, you can even store it like this:
file_path = importlib.resources.files("data").joinpath("text.txt")
real_path = importlib.resources.as_file(file_path)
with real_path as path:
with open(path, "rt", encoding="utf-8") as file:
lines = file.readlines()
The Traversable objects returned from importlib.resources functions can be mixed with Path objects using as_posix, since joinpath requires posix separators:
file_path = pathlib.Path("subdirectory", "text.txt")
txt_file = importlib.resources.files("data").joinpath(file_path.as_posix()).read_text(encoding="utf-8")
You can use slashes to grow a Traversable, just like pathlib.Path objects:
resources_root = importlib.resources.files("data")
text_path = resources_root / "text.txt"
bin_file = (resources_root / "subdirectory" / "bin.dat").read_bytes()
You can also import the data package like any other package, and use the module object directly. Subpackages are also supported. The only Python files inside the data tree are the __init__.py files of each subpackage:
# __main__.py
import importlib.resources
import data.config
import data.models.b
# Load binary file `file.dat` from `data.models.b`.
# Subpackages are being used as subdirectories.
bin_file = importlib.resources.files(data.models.b).joinpath("file.dat").read_bytes()
...
You technically only need to make your resource root directory be a package. For max brevity:
# __main__.py
from importlib.resources import files
data = files("data") # Resources root.
# In this example, `models` and `b` are regular directories:
bin_file = (data / "models" / "b" / "file.dat").read_bytes()
...
Note that importlib.resources and zipfiles in general support reading only and you will get an exception if you try to write to any file-like object returned from the above functions. It might technically be possible to support modifying data files inside zips but this is way out of scope. If you want to write files, just open a file in the filesystem as normal.
Now your data files have become file-system agnostic and your program should work via zipapp and normal invocation just the same.
I'm trying to create a python script which places a number of files in a "staging" directory tree, and then uses ZipFile to a create a .zip archive of them. This will later be copied to a linux machine, which will extract the files and use them. The staging directory contains a mix of text and binary data files. The section doing the writing is in this "try" block:
try:
import zipfile
zipf = zipfile.ZipFile(out_file, 'w', zipfile.ZIP_DEFLATED)
for root, dirs, files in os.walk(staging_dir):
for d in dirs:
# Write directories so even empty directories are copied:
arcname = os.path.relpath(os.path.join(root, d), staging_dir)
zipf.write(os.path.join(root, d), arcname)
for f in files:
arcname = os.path.relpath(os.path.join(root, f), staging_dir)
zipf.write(os.path.join(root, f), arcname)
This works on a linux machine running python 2.7 (my main goal) or 3.x (secondary goal). It can also run on a Windows machine (sort of an afterthought, it might be useful), but there's a problem with permissions in that case. Normally the script sets permissions in the files in the staging_dir with "os.chmod", and then zip creates the archive with the right permissions. But running this on windows, the "os.chmod" command doesn't really set all linux file modes (not possible), so the zipfile contents aren't at the right permissions. I'm trying to figure out if there's a way to fix the permissions when making the zipfile in the code above. In particular, files in staging_dir/bin need to have "0o750" permissions.
I've seen the answer to How do I set permissions (attributes) on a file in a ZIP file using Python's zipfile module, so I see how you could set permissions with "external_attr", and then write a file with "ZipFile.writestr". But the "external_attr" doesn't seem to apply to "ZipFile.write", only "ZipFile.writestr". And I'd like to do this on a zip archive that contains some binary files. Is there any other option than "writestr"? Is it be possible to use "writestr" on large binary files?
I am trying to write my first python script below. I want to search through a read only archive on an HPC to look in zipfiles contained within folders with a variety of other folder/file types. If the zip contains a .kml file I want to print the line in there starting with the string <coordinates>.
import zipfile as z
kfile = file('*.kml') #####breaks here#####
folderpath = '/neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/01/21' # folder with multiple folders and .zips
for zipfile in folderpath: # am only interested in the .kml files within the .zips
if kfile in zipfile:
with read(kfile) as k:
for line in k:
if '<coordinates>' in line: # only want the coordinate line
print line # print the coordinates
k.close()
Eventually I want to loop this through multiple folders rather than pointing to the exact folder location ie loop thorough every sub folder in here /neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/ but this is a starting point for me to try and understand how python works.
I am sure there are many problems with this script before it will run but the current one I have is
kfile = file('*.kml')
IOError: [Errno 22] invalid mode ('r') or filename: '*.kml'
Process finished with exit code 1
Any help appreciated to get this simple process script working.
When you run:
kfile = file('*.kml')
You are trying to open a single file named exactly *.kml, which is not what you want. If you want to process all *.kml files, you will need to (a) get a list of matching files and then (b) process those files in a list.
There are a number of ways to accomplish the above; the easiest is probably the glob module, which can be used something like this:
import glob
for kfilename in glob.glob('*.kml'):
print kfilename
However, if you are trying to process a directory tree, rather than a single directory, you may instead want to investigate the os.walk function. From the docs:
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
A simple example might look something like this:
import os
for root, dirs, files in os.walk('topdir/'):
kfilenames = [fn for fn in files if fn.endswith('.kml')]
for kfilename in kfilenames:
print kfilename
Additional commentary
Iterating over strings
Your script has:
for zipfile in folderpath:
That will simply iterate over the characters in the string folderpath. E.g., the output of:
folderpath = '/neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/01/21'
for zipfile in folderpath:
print zipefile
Would be:
/
n
e
o
d
c
/
s
e
n
t
i
n
e
l
1
a
/
...and so forth.
read is not a context manager
Your code has:
with read(kfile) as k:
There is no read built-in, and the .read method on files cannot be used as a context manager.
KML is XML
You're looking for "lines beginning with <coordinate>", but KML files are not line based. An entire KML could be a single line and it would still be valid.
Your are much better off using an XML parser to parse XML.
There are many ways, to search a dir for containing a string, that's not really my question. But is there something built in for Python Kivy, that allows automatically searching for files (*.mp3) in a directory with subdirs, or do I have to create one on my own?
If I have to do so, how do I get all the files and subdirs in a directory?
Thank you :)
Finally I decided coding the needed function on my own:
import os
def listfiles(path):
files = []
for base, directory, filename in os.walk(path):
for i in range(len(filename)):
files.append(base+"/"+filename[i])
return files
print(listfiles("/path/path/"))
Checking files for extensions should be easy enaugh :)
Unfortunately the process might take very long for bigger directories, so I'm still looking for a different solution.
Check out Kivy kivy.uix.filechooser FileChooserController and its method files
The list of files in the directory specified by path after applying the filters.
files is a read-only ListProperty.
#edit
Here's what I also found in Kivy docs and this one seems even nicer:
from kivy.uix.filechooser import FileSystemLocal
file_system = FileSystemLocal()
file_system.listdir('/path/to/dir') # this returns a list of files in dir
Instead of FileSystemLocal you can also use FileSystemAbstract if you are not going to browse only local files.
files = [] # list with files in directory
suff = ('.mp3', '.wav')
for i in files:
if i.endswith(suff):
print files
Suffixes needs to be tuples for this to work.
zip = zipfile.ZipFile(destination+ff_name,"w")
zip.write(source)
zip.close()
Above is the code that I am using, and here "source" is the path of the directory. But when I run this code it just zips the source folder and not the files and and folders contained in it. I want it to compress the source folder recursively. Using tarfile module I can do this without passing any additional information.
The standard os.path.walk() function will likely be of great use for this.
Alternatively, reading the tarfile module to see how it does its work will certainly be of benefit. Indeed, looking at how pieces of the standard library were written was an invaluable part of my learning Python.
I haven't tested this exactly, but it's something similar to what I use.
zip = zipfile.ZipFile(destination+ff_name, 'w', zipfile.ZIP_DEFLATED)
rootlen = len(source) + 1
for base, dirs, files in os.walk(source):
for file in files:
fn = os.path.join(base, file)
zip.write(fn, fn[rootlen:])
This example is from here:
http://bitbucket.org/jgrigonis/mathfacts/src/ff57afdf07a1/setupmac.py
I'd like to add a "new" python 2.7 feature to this topic: ZipFile can be used as a context manager and therefore you can do things like this:
with zipfile.ZipFile(my_file, 'w') as myzip:
rootlen = len(xxx) #use the sub-part of path which you want to keep in your zip file
for base, dirs, files in os.walk(pfad):
for ifile in files:
fn = os.path.join(base, ifile)
myzip.write(fn, fn[rootlen:])