How to allow only opening files in current directory in Python3? - python

I am writing a simple file server in Python. The filename is provided by the client and should be considered untrusted. How to verify that it corresponds to a file inside the current directory (within it or any of its subdirectories)? Will something like:
pwd=os.getcwd()
if os.path.commonpath((pwd,os.path.abspath(filename))) == pwd:
open(filename,'rb')
suffice?

Convert the filename to a canonical path using os.path.realpath, get the directory portion, and see if the current directory (in canonical form) is a prefix of that:
import os, os.path
def in_cwd(fname):
path = os.path.dirname(os.path.realpath(fname))
return path.startswith(os.getcwd())
By converting fname to a canonical path we handle symbolic links and paths containing ../.
Update
Unfortunately, the above code has a little problem. For example,
'/a/b/cd'.startswith('/a/b/c')
returns True, but we definitely don't want that behaviour here! Fortunately, there's an easy fix: we just need to append os.sep to the paths before performing the prefix test. The new version also handles any OS pathname case-insensitivity issues via os.path.normcase.
import os, os.path
def clean_dirname(dname):
dname = os.path.normcase(dname)
return os.path.join(dname, '')
def in_cwd(fname):
cwd = clean_dirname(os.getcwd())
path = os.path.dirname(os.path.realpath(fname))
path = clean_dirname(path)
return path.startswith(cwd)
Thanks to DSM for pointing out the flaw in the previous code.
Here's a version that's a little more efficient. It uses os.path.commonpath, which is more robust than appending os.sep and doing a string prefix test.
def in_cwd(fname):
cwd = os.path.normcase(os.getcwd())
path = os.path.normcase(os.path.dirname(os.path.realpath(fname)))
return os.path.commonpath((path, cwd)) == cwd

Related

Why does root returned from os.walk() contain / as directory separator but os.sep (or os.path.sep) return \ on Win10?

Why does the root element returned from os.walk() show / as the directory separator but os.sep (or os.path.sep) shows \ on Win10?
I'm just trying to create the complete path for a set of files in a folder as follows:
import os
base_folder = "c:/data/MA Maps"
for root, dirs, files in os.walk(base_folder):
for f in files:
if f.endswith(".png") and f.find("_N") != -1:
print(os.path.join(root, f))
print(os.path.sep)
Here's what I get as an output:
c:/data/MA Maps\Map_of_Massachusetts_Nantucket_County.png
c:/data/MA Maps\Map_of_Massachusetts_Norfolk_County.png
\
I understand that some of python's library functions (like open()) will work with mixed path separators (at least on Windows) but relying on that hack really can't be trusted across all libraries. It just seems like the items returned from os.walk() and os.path (.sep or .join()) should yield consistent results based on the operating system being used. Can anyone explain why this inconsistency is happening?
P.S. - I know there is a more consistent library for working with file paths (and lots of other file manipulation) called pathlib that was introduced in python 3.4 and it does seem to fix all this. If your code is being used in 3.4 or beyond, is it best to use pathlib methods to resolve this issue? But if your code is targeted for systems using python before 3.4, what is the best way to address this issue?
Here's a good basic explanation of pathlib: Python 3 Quick Tip: The easy way to deal with file paths on Windows, Mac and Linux
Here's my code & result using pathlib:
import os
from pathlib import Path
# All of this should work properly for any OS. I'm running Win10.
# You can even mix up the separators used (i.e."c:\data/MA Maps") and pathlib still
# returns the consistent result given below.
base_folder = "c:/data/MA Maps"
for root, dirs, files in os.walk(base_folder):
# This changes the root path provided to one using the current operating systems
# path separator (/ for Win10).
root_folder = Path(root)
for f in files:
if f.endswith(".png") and f.find("_N") != -1:
# The / operator, when used with a pathlib object, just concatenates the
# the path segments together using the current operating system path separator.
print(root_folder / f)
c:\data\MA Maps\Map_of_Massachusetts_Nantucket_County.png
c:\data\MA Maps\Map_of_Massachusetts_Norfolk_County.png
This can even be done more succinctly using only pathlib and list comprehension (with all path separators correctly handled per OS used):
from pathlib import Path
base_folder = "c:/data/MA Maps"
path = Path(base_folder)
files = [item for item in path.iterdir() if item.is_file() and
str(item).endswith(".png") and
(str(item).find("_N") != -1)]
for file in files:
print(file)
c:\data\MA Maps\Map_of_Massachusetts_Nantucket_County.png
c:\data\MA Maps\Map_of_Massachusetts_Norfolk_County.png
This is very Pythonic and at least I feel it is quite easy to read and understand. .iterdir() is really powerful and makes dealing with files and dirs reasonably easy and in a cross-platform way. What do you think?
The os.walk function always yields the initial part of the dirpath unchanged from what you pass in to it. It doesn't try to normalize the separators itself, it just keeps what you've given it. It does use the system-standard separators for the rest of the path, as it combines each subdirectory's name to the root directory with os.path.join. You can see the current version of the implementation of the os.walk function in the CPython source repository.
One option for normalizing the separators in your output is to normalize the base path you pass in to os.walk, perhaps using pathlib. If you normalize the initial path, all the output should use the system path separators automatically, since it will be the normalized path that will be preserved through the recursive walk, rather than the non-standard one. Here's a very basic transformation of your first code block to normalize the base_folder using pathlib, while preserving all the rest of the code, in its simplicity. Whether it's better than your version using more of pathlib's features is a judgement call that I'll leave up to you.
import os
from pathlib import Path
base_folder = Path("c:/data/MA Maps") # this will be normalized when converted to a string
for root, dirs, files in os.walk(base_folder):
for f in files:
if f.endswith(".png") and f.find("_N") != -1:
print(os.path.join(root, f))

How to get Path in the form "file://///SERVER//folder1/folder2/

i am rather new to python and i have the following problem (just an example):
import os
mypath = 'I:\Folder1'
for dirpath,_,filenames in os.walk(mypath):
for f in filenames:
getpath = os.path.abspath(os.path.join(dirpath, f))
returns the path in the form:
I:\Folder1\Folder2
which is normally ok for me
However "I:\" is one of our servers at work and for further processing (html stuff) i would need the exact address in such a form
file://///Servername/Subfolder/Folder1/Folder2
Edit: In other words:
My program may be used locally or on different servers - it just depends on the user. Rather stupidly said I need a function that returns what in win10 goes like this: "right click on a folder --> Path Copy --> file:////....". And I only know that this path on my computer is called "I:\Folder1" ..but "I:\Folder1" is the server name
Edit 2: Solved (see comments)
If you are on a Windows platform and need forward slashes, it is actually possible to import the OS specific version. For example you could use posixpath.
To solve your problem you would need to first strip off mypath from each return dirpath. Next split this into folder components using split with your operating system's seperator i.e. \. This can then be all rejoined with a server prefix using the posixpath.join() command. For example:
import posixpath
import os
mypath = r'I:\Folder1'
server = 'file://///Servername/Subfolder'
for dirpath,_,filenames in os.walk(mypath):
for f in filenames:
subfolder = dirpath[len(mypath):]
server_path = posixpath.join(server, *subfolder.split(os.sep), f)
print(server_path)

Calling for relative paths in Python

I have this below Python script that fetches a file from one location and copies that to another Target location. The below code works just fine if I define the paths with the absolute locations.
I am trying to rather define this using variables, which when done does not execute the script. There is no error that is thrown but the code does not seem to be executed.
Code:
Path_from = r'/Users/user/Desktop/report'
Path_to = r'/Users/user/Desktop/report'
for root, dirs, files in os.walk((os.path.normpath(Path_from)), topdown=False):
for name in files:
if name.endswith('{}.txt'.format(date)):
print
"Found"
SourceFolder = os.path.join(root, name)
shutil.copy2(SourceFolder, Path_to)
I want to change the code from
Path_from = r'/Users/user/Desktop/report'
to
base_path = /Users/user/Desktop/
Path_from = r'base_path/{}'.format(type)
I would recommend you leave all the current working directory concerns to the user - if they want to specify a relative path, they can enter into the directory to which it relates before invoking the python and providing relative paths.
This is what just about every linux tool and program does - rarely do they take a 'base path', but rather leave the job of providing valid paths relative to the current directory ( or absolute ) to the user.
If you're dedicated to the idea of taking another parameter as the relative path, it should be pretty straightforward to do. Your example doesn't have valid python syntax, but it's close:
$ cat t.py
from os.path import join
basepath="/tmp"
pathA = "fileA"
pathB = "fileB"
print(join(basepath,pathA))
print(join(basepath,pathB))
note however that this prevents an absolute path being provided at script execution time.
You could use a format instead,
basepath="/tmp"
pathA = "fileA"
pathB = "fileB"
print( "{}/{}".format(basepath, pathA) )
print( "{}/{}".format(basepath, pathB) )
But then you're assuming that you know how to join paths on the operating system in question, which is why os.path.join exists.
If I'm reading this right, you could use pathlib, specifically pathlib.Path code would look like
from pathlib import Path
import re
import shutil
path_from = Path("/") / "Users" / "user" / "Desktop" # Better IMO
# path_from = Path("/Users/user/Desktop")
path_to = Path("/") / "Users" / "user" / "OtherDesktop"
datename = "whatever"
for x in path_from.glob("*.txt"):
if re.search(r"{}$".format(datename), x.stem): # stem is whatever is before the extension
# ex. something.txt -> something
shutil.copy(str(path_from / x.name), str(path_to / x.name))

Safely extract zip or tar using Python

I'm trying to extract user-submitted zip and tar files to a directory. The documentation for zipfile's extractall method (similarly with tarfile's extractall) states that it's possible for paths to be absolute or contain .. paths that go outside the destination path. Instead, I could use extract myself, like this:
some_path = '/destination/path'
some_zip = '/some/file.zip'
zipf = zipfile.ZipFile(some_zip, mode='r')
for subfile in zipf.namelist():
zipf.extract(subfile, some_path)
Is this safe? Is it possible for a file in the archive to wind up outside of some_path in this case? If so, what way can I ensure that files will never wind up outside the destination directory?
Note: Starting with python 2.7.4, this is a non-issue for ZIP archives. Details at the bottom of the answer. This answer focuses on tar archives.
To figure out where a path really points to, use os.path.abspath() (but note the caveat about symlinks as path components). If you normalize a path from your zipfile with abspath and it does not contain the current directory as a prefix, it's pointing outside it.
But you also need to check the value of any symlink extracted from your archive (both tarfiles and unix zipfiles can store symlinks). This is important if you are worried about a proverbial "malicious user" that would intentionally bypass your security, rather than an application that simply installs itself in system libraries.
That's the aforementioned caveat: abspath will be misled if your sandbox already contains a symlink that points to a directory. Even a symlink that points within the sandbox can be dangerous: The symlink sandbox/subdir/foo -> .. points to sandbox, so the path sandbox/subdir/foo/../.bashrc should be disallowed. The easiest way to do so is to wait until the previous files have been extracted and use os.path.realpath(). Fortunately extractall() accepts a generator, so this is easy to do.
Since you ask for code, here's a bit that explicates the algorithm. It prohibits not only the extraction of files to locations outside the sandbox (which is what was requested), but also the creation of links inside the sandbox that point to locations outside the sandbox. I'm curious to hear if anyone can sneak any stray files or links past it.
import tarfile
from os.path import abspath, realpath, dirname, join as joinpath
from sys import stderr
resolved = lambda x: realpath(abspath(x))
def badpath(path, base):
# joinpath will ignore base if path is absolute
return not resolved(joinpath(base,path)).startswith(base)
def badlink(info, base):
# Links are interpreted relative to the directory containing the link
tip = resolved(joinpath(base, dirname(info.name)))
return badpath(info.linkname, base=tip)
def safemembers(members):
base = resolved(".")
for finfo in members:
if badpath(finfo.name, base):
print >>stderr, finfo.name, "is blocked (illegal path)"
elif finfo.issym() and badlink(finfo,base):
print >>stderr, finfo.name, "is blocked: Symlink to", finfo.linkname
elif finfo.islnk() and badlink(finfo,base):
print >>stderr, finfo.name, "is blocked: Hard link to", finfo.linkname
else:
yield finfo
ar = tarfile.open("testtar.tar")
ar.extractall(path="./sandbox", members=safemembers(ar))
ar.close()
Edit: Starting with python 2.7.4, this is a non-issue for ZIP archives: The method zipfile.extract() prohibits the creation of files outside the sandbox:
Note: If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, e.g.: ///foo/bar becomes foo/bar on Unix, and C:\foo\bar becomes foo\bar on Windows. And all ".." components in a member filename will be removed, e.g.: ../../foo../../ba..r becomes foo../ba..r. On Windows, illegal characters (:, <, >, |, ", ?, and *) [are] replaced by underscore (_).
The tarfile class has not been similarly sanitized, so the above answer still apllies.
Contrary to the popular answer, unzipping files safely is not completely solved as of Python 2.7.4. The extractall method is still dangerous and can lead to path traversal, either directly or through the unzipping of symbolic links. Here was my final solution which should prevent both attacks in all versions of Python, even versions prior to Python 2.7.4 where the extract method was vulnerable:
import zipfile, os
def safe_unzip(zip_file, extract_path='.'):
with zipfile.ZipFile(zip_file, 'r') as zf:
for member in zf.infolist():
file_path = os.path.realpath(os.path.join(extract_path, member.filename))
if file_path.startswith(os.path.realpath(extract_path)):
zf.extract(member, extract_path)
Edit 1: Fixed variable name clash. Thanks Juuso Ohtonen.
Edit 2: s/abspath/realpath/g. Thanks TheLizzard
Use ZipFile.infolist()/TarFile.next()/TarFile.getmembers() to get the information about each entry in the archive, normalize the path, open the file yourself, use ZipFile.open()/TarFile.extractfile() to get a file-like for the entry, and copy the entry data yourself.
Copy the zipfile to an empty directory. Then use os.chroot to make that directory the root directory. Then unzip there.
Alternatively, you can call unzip itself with the -j flag, which ignores the directories:
import subprocess
filename = '/some/file.zip'
rv = subprocess.call(['unzip', '-j', filename])

Find the current directory and file's directory [duplicate]

This question already has answers here:
How do you properly determine the current script directory?
(16 answers)
How to know/change current directory in Python shell?
(7 answers)
Closed 5 years ago.
How do I determine:
the current directory (where I was in the shell when I ran the Python script), and
where the Python file I am executing is?
To get the full path to the directory a Python file is contained in, write this in that file:
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
(Note that the incantation above won't work if you've already used os.chdir() to change your current working directory, since the value of the __file__ constant is relative to the current working directory and is not changed by an os.chdir() call.)
To get the current working directory use
import os
cwd = os.getcwd()
Documentation references for the modules, constants and functions used above:
The os and os.path modules.
The __file__ constant
os.path.realpath(path) (returns "the canonical path of the specified filename, eliminating any symbolic links encountered in the path")
os.path.dirname(path) (returns "the directory name of pathname path")
os.getcwd() (returns "a string representing the current working directory")
os.chdir(path) ("change the current working directory to path")
Current working directory: os.getcwd()
And the __file__ attribute can help you find out where the file you are executing is located. This Stack Overflow post explains everything: How do I get the path of the current executed file in Python?
You may find this useful as a reference:
import os
print("Path at terminal when executing this file")
print(os.getcwd() + "\n")
print("This file path, relative to os.getcwd()")
print(__file__ + "\n")
print("This file full path (following symlinks)")
full_path = os.path.realpath(__file__)
print(full_path + "\n")
print("This file directory and name")
path, filename = os.path.split(full_path)
print(path + ' --> ' + filename + "\n")
print("This file directory only")
print(os.path.dirname(full_path))
The pathlib module, introduced in Python 3.4 (PEP 428 — The pathlib module — object-oriented filesystem paths), makes the path-related experience much much better.
pwd
/home/skovorodkin/stack
tree
.
└── scripts
├── 1.py
└── 2.py
In order to get the current working directory, use Path.cwd():
from pathlib import Path
print(Path.cwd()) # /home/skovorodkin/stack
To get an absolute path to your script file, use the Path.resolve() method:
print(Path(__file__).resolve()) # /home/skovorodkin/stack/scripts/1.py
And to get the path of a directory where your script is located, access .parent (it is recommended to call .resolve() before .parent):
print(Path(__file__).resolve().parent) # /home/skovorodkin/stack/scripts
Remember that __file__ is not reliable in some situations: How do I get the path of the current executed file in Python?.
Please note, that Path.cwd(), Path.resolve() and other Path methods return path objects (PosixPath in my case), not strings. In Python 3.4 and 3.5 that caused some pain, because open built-in function could only work with string or bytes objects, and did not support Path objects, so you had to convert Path objects to strings or use the Path.open() method, but the latter option required you to change old code:
File scripts/2.py
from pathlib import Path
p = Path(__file__).resolve()
with p.open() as f: pass
with open(str(p)) as f: pass
with open(p) as f: pass
print('OK')
Output
python3.5 scripts/2.py
Traceback (most recent call last):
File "scripts/2.py", line 11, in <module>
with open(p) as f:
TypeError: invalid file: PosixPath('/home/skovorodkin/stack/scripts/2.py')
As you can see, open(p) does not work with Python 3.5.
PEP 519 — Adding a file system path protocol, implemented in Python 3.6, adds support of PathLike objects to the open function, so now you can pass Path objects to the open function directly:
python3.6 scripts/2.py
OK
To get the current directory full path
>>import os
>>print os.getcwd()
Output: "C :\Users\admin\myfolder"
To get the current directory folder name alone
>>import os
>>str1=os.getcwd()
>>str2=str1.split('\\')
>>n=len(str2)
>>print str2[n-1]
Output: "myfolder"
Pathlib can be used this way to get the directory containing the current script:
import pathlib
filepath = pathlib.Path(__file__).resolve().parent
If you are trying to find the current directory of the file you are currently in:
OS agnostic way:
dirname, filename = os.path.split(os.path.abspath(__file__))
If you're using Python 3.4, there is the brand new higher-level pathlib module which allows you to conveniently call pathlib.Path.cwd() to get a Path object representing your current working directory, along with many other new features.
More info on this new API can be found here.
To get the current directory full path:
os.path.realpath('.')
Answer to #1:
If you want the current directory, do this:
import os
os.getcwd()
If you want just any folder name and you have the path to that folder, do this:
def get_folder_name(folder):
'''
Returns the folder name, given a full folder path
'''
return folder.split(os.sep)[-1]
Answer to #2:
import os
print os.path.abspath(__file__)
I think the most succinct way to find just the name of your current execution context would be:
current_folder_path, current_folder_name = os.path.split(os.getcwd())
If you're searching for the location of the currently executed script, you can use sys.argv[0] to get the full path.
For question 1, use os.getcwd() # Get working directory and os.chdir(r'D:\Steam\steamapps\common') # Set working directory
I recommend using sys.argv[0] for question 2 because sys.argv is immutable and therefore always returns the current file (module object path) and not affected by os.chdir(). Also you can do like this:
import os
this_py_file = os.path.realpath(__file__)
# vvv Below comes your code vvv #
But that snippet and sys.argv[0] will not work or will work weird when compiled by PyInstaller, because magic properties are not set in __main__ level and sys.argv[0] is the way your executable was called (it means that it becomes affected by the working directory).

Categories