How to include a particular directory for checking with mypy

How to include a particular directory for checking with mypy - python

In the mypy.ini file I have:
[mypy]
exclude = ['tests', 'build']
Which works alright as long as both of these directories exists - if one of them doesn't exist then I get the following error message:
There are no .py[i] files in directory '.'
I'm wondering if there's a way to exclude these directories without this error - or if not - if there's a way to explicitly tell mypy which directories it should check.

The issue is that exclude expects a regular expression in the INI file (see docs). I'm not great with regex, but you can start off with
[mypy]
exclude = (build|tests)
The documentation on the --exclude parameter is also useful (copied relevant parts below.
A regular expression that matches file names, directory names and paths which mypy should ignore while recursively discovering files to check. Use forward slashes on all platforms.
For instance, to avoid discovering any files named setup.py you could pass --exclude '/setup.py$'. Similarly, you can ignore discovering directories with a given name by e.g. --exclude /build/ or those matching a subpath with --exclude /project/vendor/. To ignore multiple files / directories / paths, you can provide the –exclude flag more than once, e.g --exclude '/setup.py$' --exclude '/build/'.
Also see https://github.com/python/mypy/issues/10310 for relevant discussion.

Related

Python's shutil.make_archive() creates dot directory on Linux (when using tar or gztar)

I'm using a basic python script to create an archive with the contents of a directory "directoryX":
shutil.make_archive('NameOfArchive', format='gztar', root_dir=getcwd()+'/directoryX/')
The generated archive rather than just storing the contents of directoryX, creates a . folder in the archive (and the contents of folder directoryX are stored in this . folder).
Interestingly this only happens with .tar and tar.gz but not with .zip
Used python version -> 3.8.10
It seems that when using .tar or .tar.gz formats, the default base_dir of "./" gets accepted literally and it creates a folder titled "."
I tried using base_dir=os.currdir but got the same results...
Tried to also use python2 but got the same results.
Is this a bug with shutil.make_archive or am I doing something incorrectly?

It's a documented behavior, sort of, just a little odd. The base_dir argument to make_archive is documented to:
Be the directory we start archiving from (after chdiring to root_dir)
Default to the current directory (specifically, os.curdir)
os.curdir is actually a constant string, '.', and, matching the tar command line utility, shutil.make_archive (and tar.add which it's implemented in terms of) stores the complete path "given" (in this case, './' plus the rest of the relative path to the file). If you run tar -c -z -C directoryX -f NameOfArchive.tar.gz ., you'll end up with a tarball full of ./ prefixed files too (-C directoryX does the same thing as root_dir, and the . argument is the same as the default base_dir='.').
I don't see an easy workaround that retains the simplicity of shutil.make_archive; if you try to pass base_dir='' it dies when it tries to stat '', so that's out.
To be clear, this behavior should be fine; a tar entry named ./foo and one named foo are equivalent for most purposes. If it really bothers you, you can switch to using the tarfile module directly, e.g.:
# Imports at top of file
import os
import tarfile
# Actual code
with tarfile.open('NameOfArchive.tar.gz', 'w:gz') as tar:
for entry in os.scandir('directoryX'):
# Operates recursively on any directories, using the arcname as the base,
# so you add the whole tree just by adding all the entries in the top
# level directory. Using arcname of entry.name means it's equivalent to
# adding os.path.basename(entry.path), omitting all directory components
tar.add(entry.path, arcname=entry.name)
# The whole loop *could* be replaced with just:
# tar.add('directoryX', arcname='')
# which would add all contents recursively, but it would also put an entry
# for '/' in, which is undesirable
For a directory structure like:
directoryX/
|
\- foo
\- bar
\- subdir/
|
\- spam
\- eggs
the resulting tar's contents would be:
foo
bar
subdir/
subdir/eggs
subdir/spam
vs. the:
./foo
./bar
./subdir/
./subdir/eggs
./subdir/spam
your current code produces.
Slightly more work to code, but not that much worse; two imports and three lines of code, and with greater control over what gets added (for example, you could trivially exclude symlinks by wrapping the tar.add call in an if not entry.is_symlink(): block, or omit recursive adding of specific directories by conditionally setting recursive=False to the tar.add call for directories you don't want to include the contents of; you can even provide a filter function to the tar.add call to conditionally exclude specific entries even when deep recursion gets involved).

How to guess a file extension with python?

I have recently tried to make a python code which takes a path of a file without an extension and determine what extension it has.
I was looking for something like the example below. In the example the extension is exe (but the code doesn't know that yet).
path = 'C:\\MyPath\\Example'
#takes the path above and guesses the programs extension:
extension = guess_extension(path)
#adds the extension to the path:
fullPath = path+extension
print(fullPath)
Output:
C:\MyPath\Example.exe
If you know a python module that would do that (or something similar), please list it below.
I have tried to use filetype (filetype.guess()) and mimetypes (mimetypes.guess_extension()) modules, but they would both return value of none.
I have also tried to use answers from many questions like this one, but that still didn't work.

It sounds like the built in glob module (glob docs) might be what you're looking for. This module provides Unix style pattern expansion functionality within Python.
In the following example the incomplete path variable has the str .* appended to it when passed to glob.glob. This essentially tells glob.glob to return a list of valid paths found within the host system that start the same as path, followed by a period (designating a file extension), with the asterisk matching any and all characters following those from path + '.'.
import glob
path = r'C:\Program Files\Firefox Developer Edition\minidump-analyzer'
full = glob.glob(path+'.*')
print(full[0])
Output: C:\Program Files\Firefox Developer Edition\minidump-analyzer.exe
It is worth noting that the above is just an illustration of how glob could be leveraged as part of a solution to your question. Proper handling of unexpected inputs, edge cases, exceptions etc. should be implemented as required by the needs of your program.

Python rglob pattern for directory search

I try to get the name of subdirectories with Python3 script on Windows10.
Thus, I wrote code as follows:
from pathlib2 import Path
p = "./path/to/target/dir"
[str(item) for item in Path(p).rglob(".")]
# obtained only subdirectories path names including target directory itself.
It is good for me to get this result, but I don't know why the pattern of rglob argument returns this reuslt.
Can someone explain this?
Thanks.

Every directory in a posix-style filesystem features two files from the get go: .., which refers to the parent directory, and ., which refers to the current directory:
$ mkdir tmp; cd tmp
tmp$ ls -a
. ..
tmp$ cd .
tmp$ # <-- still in the same directory
- with the notable exception of /.., which refers to the root itself since the root has not parent.
A Path object from python's pathlib is, when it is created, just a wrapper around a string that is assumed to point somewhere into the filesystem. It will only refer to something tangible when it is resolved:
>>> Path('.')
PosixPath('.') # just a fancy string
>>> Path('.').resolve()
PosixPath('/current/working/dir') # an actual point in your filesystem
The bottom line is that
the paths /current/working/dir and /current/working/dir/. are, from the filesystem's point of view, completely equivalent, and
a pathlib.Path will also reflect that as soon as it is resolved.
By matching the glob call to ., you found all links pointing to the current directories below the initial directory. The results from glob get resolved on return, so the . doesn't appear in there any more.
As a source for this behavior, see this section of PEP428 (which serves as the specification for pathlib), where it briefly mentions path equivalence.

Primer needed in python pathnames

I am a very novice coder, and Python is my first (and, practically speaking, only) language. I am charged as part of a research job with manipulating a collection of data analysis scripts, first by getting them to run on my computer. I was able to do this, essentially by removing all lines of coding identifying paths, and running the scripts through a Jupyter terminal opened in the directory where the relevant modules and CSV files live so the script knows where to look (I know that Python defaults to the location of the terminal).
Here are the particular blocks of code whose function I don't understand
import sys
sys.path.append('C:\Users\Ben\Documents\TRACMIP_Project\mymodules/')
import altdata as altdata
I have replaced the pathname in the original code with the path name leading to the directory where the module is; the file containing all the CSV files that end up being referenced here is also in mymodules.
This works depending on where I open the terminal, but the only way I can get it to work consistently is by opening the terminal in mymodules, which is fine for now but won't work when I need to work by accessing the server remotely. I need to understand better precisely what is being done here, and how it relates to the location of the terminal (all the documentation I've found is overly technical for my knowledge level).
Here is another segment I don't understand
import os.path
csvfile = 'csv/' + model +'_' + exp + '.csv'
if os.path.isfile(csvfile): # csv file exists
hcsvfile = open(csvfile )
I get here that it's looking for the CSV file, but I'm not sure how. I'm also not sure why then on some occasions depending on where I open the terminal it's able to find the module but not the CSV files.
I would love an explanation of what I've presented, but more generally I would like information (or a link to information) explaining paths and how they work in scripts in modules, as well as what are ways of manipulating them. Thanks.

sys.path
This is simple list of directories where python will look for modules and packages (.py and dirs with __init__.py file, look at modules tutorial). Extending this list will allow you to load modules (custom libs, etc.) from non default locations (usually you need to change it in runtime, for static dirs you can modify startup script to add needed enviroment variables).
os.path
This module implements some useful functions on pathnames.
... and allows you to find out if file exists, is it link, dir, etc.
Why you failed loading *.csv?
Because sys.path responsible for module loading and only for this. When you use relative path:
csvfile = 'csv/' + model +'_' + exp + '.csv'
open() will look in current working directory
file is either a string or bytes object giving the pathname (absolute or relative to the current working directory)...
You need to use absolute paths by constucting them with os.path module.

I agree with cdarke's comment that you are probably running into an issue with backslashes. Replacing the line with:
sys.path.append(r'C:\Users\Ben\Documents\TRACMIP_Project\mymodules')
will likely solve your problem. Details below.
In general, Python treats paths as if they're relative to the current directory (where your terminal is running). When you feed it an absolute path-- which is a path that includes the root directory, like the C:\ in C:\Users\Ben\Documents\TRACMIP_Project\mymodules-- then Python doesn't care about the working directory anymore, it just looks where you tell it to look.
Backslashes are used to make special characters within strings, such as line breaks (\n) and tabs (\t). The snag you've hit is that Python paths are strings first, paths second. So the \U, \B, \D, \T and \m in your path are getting misinterpreted as special characters and messing up Python's path interpretation. If you prefix the string with 'r', Python will ignore the special characters meaning of the backslash and just interpret it as a literal backslash (what you want).
The reason it still works if you run the script from the mymodules directory is because Python automatically looks in the working directory for files when asked. sys.path.append(path) is telling the computer to include that directory when it looks for commands, so that you can use files in that directory no matter where you're running the script. The faulty path will still get added, but its meaningless. There is no directory where you point it, so there's nothing to find there.
As for path manipulation in general, the "safest" way is to use the function in os.path, which are platform-independent and will give the correct output whether you're working in a Windows or a Unix environment (usually).
EDIT: Forgot to cover the second part. Since Python paths are strings, you can build them using string operations. That's what is happening with the line
csvfile = 'csv/' + model +'_' + exp + '.csv'
Presumably model and exp are strings that appear in the filenames in the csv/ folder. With model = "foo" and exp = "bar", you'd get csv/foo_bar.csv which is a relative path to a file (that is, relative to your working directory). The code makes sure a file actually exists at that path and then opens it. Assuming the csv/ folder is in the same path as you added in sys.path.append, this path should work regardless of where you run the file, but I'm not 100% certain on that. EDIT: outoftime pointed out that sys.path.append only works for modules, not opening files, so you'll need to either expand csv/ into an absolute path or always run in its parent directory.
Also, I think Python is smart enough to not care about the direction of slashes in paths, but you should probably not mix them. All backslashes or all forward slashes only. os.path.join will normalize them for you. I'd probably change the line to
csvfile = os.path.join('csv\', model + '_' + exp + '.csv')
for consistency's sake.

how to launch an exe with a variable path, special characters and arguements

I want to copy an installer file from a location where one of the folder names changes as per the build number
This works for defining the path where the last folder name changes
import glob
import os
dirname = "z:\\zzinstall\\*.install"
filespec = "setup.exe"
print glob.glob (os.path.join (dirname, filespec))
# the print is how I'm verifying the path is correct
['z:\\zzinstall\\35115.install\\setup.exe'
The problem I have is that I can't get the setup.exe to launch due to the arguments needed
I need to launch setup.exe with, for example
setup.exe /S /z"
There are numerous other arguments that need to be passed with double quotes, slashes and whitespaces. Due to the documentation provided which is inconsistent, I have to test via trial and error. There are even instances that state I need to use a "" after a switch!
So how can I do this?
Ideally I'd like to pass the entrire path, including the file I need to glob or
I'd like to declare the result of the path with glob as a variable then concatenate with setup.exe and the arguements. That did not work, the error list can't be combined with string is returned.
Basically anything that works, so far I've failed because of my inability to handle the filename that varies and the obscene amount of whitespaces and special characters in the arguements.
The following link is related howevers does not include a clear answer for my specific question
link text
The response provided below does not answer the question nor does the link I provided, that's why I'm asking this question. I will rephrase in case I'm not understood.
I have a file that I need to copy at random times. The file is prependedned with unique, unpredicatable number e.g. a build number. Note this is a windows system.
For this example I will cite the same folder/file structure.
The build server creates a build any time in a 4 hour range. The path to the build server folder is Z:\data\builds*.install\setup.exe
Note the wildcard in the path. This means the folder name is prepended with a random(yes, random) string of 8 digits then a dot. then "install". So, the path at one time may be Z:\data\builds\12345678.install\setup.exe or it could be Z:\data\builds\66666666.install\setup.exe This is one, major portion of this problem. Note, I did not design this build numbering system. I've never seen anything like this my years as a QA engineer.
So to deal with the first issue I plan on using a glob.
import glob
import os
dirname = "Z:\\data\\builds\\*.install"
filespec = "setup.exe"
instlpath = glob.glob (os.path.join (dirname, filespec))
print instlpath # this is the test,printsthe accurate path to launch an install, problem #is I have to add arguements
OK so I thought I could use path that I defined as instlpath, concatnenate it and execute.
when it try and use prinnt to test
print instlpath + [" /S /z" ]
I get
['Z:\builds\install\12343333.install\setup.exe', ' /S /z']
I need
Z:\builds\install\12343333.install\setup.exe /S /z" #yes, I need the whitespace as #well and amy also need a z""
Why are all of the installs called setup.exe and not uniquely named? No freaking idea!
Thank You,
Surfdork

The related question you linked to does contain a relatively clear answer to your problem:
import subprocess
subprocess.call(['z:/zzinstall/35115.install/setup.exe', '/S', '/z', ''])
So you don't need to concatenate the path of setup.exe and its arguments. The arguments you specify in the list are passed directly to the program and not processed by the shell. For an empty string, which would be "" in a shell command, use an empty python string.
See also http://docs.python.org/library/subprocess.html#subprocess.call

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.