Getting dir path for file in argument in Python

Getting dir path for file in argument in Python - python

I'm using argparse to send .pcap's to a script that scrapes through them. Wanted to organize what I was doing better, so I threw the data I was using in one folder and my scripts in another. Interfered with the way I was saving IP lists/hostnames (filename now tries to just add 'IP-list-' before the path to the .pcap file I sent as an argument).
new_ips_filename = '/IP-list-' + self.pcap
new_ips_file = open(new_ips_filename[:-5], 'w')
for i in range(len(self.new_ips)):
new_ips_file.write(self.new_ips[i] + ':' + self.new_hostnames[i] + '\n')
new_ips_file.close()
self.pcap is the path to the .pcap (which would just be ./file.pcap if it were in the same dir), is there any easy way to pull the dir from the filename here?

Are you looking for the absolute path of the file?
You might try os.path.abspath("file.pcap")
Or if you want only the directory name os.path.dirname(..)

Related

How to open a specific path with open()?

I'm trying to build a file transfer system with python3 sockets. I have the connection and sending down but my issue right now is that the file being sent has to be in the same directory as the program, and when you receive the file, it just puts the file into the same directory as the program. How can I get a user to input the location of the file to be sent and select the location of the file to be sent to?

I assume you're opening files with:
open("filename","r")
If you do not provide an absolute path, the open function will always default to a relative path. So, if I wanted to open a file such as /mnt/storage/dnd/5th_edition.txt, I would have to use:
open("/mnt/storage/dnd/4p5_edition","r")
And if I wanted to copy this file to /mnt/storage/trash/ I would have to use the absolute path as well:
open("/mnt/storage/trash/4p5_edition","w")
If instead, I decided to use this:
open("mnt/storage/trash/4p5_edition","w")
Then I would get an IOError if there wasn't a directory named mnt with the directories storage/trash in my present folder. If those folders did exist in my present folder, then it would end up in /whatever/the/path/is/to/my/current/directory/mnt/storage/trash/4p5_edition, rather than /mnt/storage/trash/4p5_edition.

since you said that the file will be placed in the same path where the program is, the following code might work
import os
filename = "name.txt"
f = open(os.path.join(os.path.dirname(__file__),filename))

Its pretty simple just get the path from user
subpath = raw_input("File path = ")
print subpath
file=open(subpath+str(file_name),'w+')
file.write(content)
file.close()
I think thats all you need let me know if you need something else.

like you say, the file should be in the same folder of the project so you have to replace it, or to define a function that return the right file path into your open() function, It's a way that you can use to reduce the time of searching a solution to your problem brother.
It should be something like :
import os
filename = "the_full_path_of_the_fil/name.txt"
f = open(os.path.join(os.path.dirname(__file__),filename))
then you can use the value of the f variable as a path to the directory of where the file is in.

Python tar.add files but omit parent directories

I am trying to create a tar file from a list of files stored in a text file, I have working code to create the tar, but I wish to start the archive from a certain directory (app and all subdirectories), and remove the parents directories. This is due to the software only opening the file from a certain directory.
package.list files are as below:
app\myFile
app\myDir\myFile
app\myDir\myFile2
If I omit the path in restore.add, it cannot find the files due to my program running from elsewhere. How do I tell the tar to start at a particular directory, or to add the files, but maintain the directory structure it got from the text file, e.g starting with app not all the parent dirs
My objective is to do this tar cf restore.tar -T package.list but with Python on Windows.
I have tried basename from here: How to compress a tar file in a tar.gz without directory?, this strips out ALL the directories.
I have also tried using arcname='app' in the .add method, however this gives some weird results by breaking the directory structure and renames loads of files to app
path = foo + '\\' + bar
file = open(path + '\\package.list', 'r')
restore = tarfile.open(path + '\\restore.tar', 'w')
for line in file:
restore.add(path + '\\' + line.strip())
restore.close()
file.close()
Using Python 2.7

You can use 2nd argument for TarFile.add, it specified the name inside the archive.
So assuming every path is sane something like this would work:
import tarfile
prefix = "some_dir/"
archive_path = "inside_dir/file.txt"
with tarfile.open("test.tar", "w") as tar:
tar.add(prefix+archive_path, archive_path)
Usage:
> cat some_dir/inside_dir/file.txt
test
> python2 test_tar.py
> tar --list -f ./test.tar
inside_dir/file.txt
In production, i'd advise to use appropriate module for path handling to make sure every slash and backslash is in right place.

Absolute and relative importing for scripts

I know this question has been asked often but I have a very specific problem concerning importing. I have a file structure as follows:
main/main.py
main/test_device.py
main/lib/instructions.py
main/device/android.py
main/temp/example.py
Basically, what's happening here is that my program (main.py) creates several smaller scripts (in temp/) and then attempts to run them. However, each of these scripts references lib/instructions.py and device/android.py. This code runs these files:
name = "temp/test_" + str(program_name) + ".py"
input_file = open("test_device.py", "r")
contents = input_file.readlines()
input_file.close()
contents.insert(7, "program = [" + ", ".join(str(i) for i in instructions) + "]\r\n")
contents.insert(8, "count = " + str(program_name) + "\r\n")
contents = "".join(contents)
input_file = open(name, "w+")
input_file.write(contents)
Popen("python " + name)
I have __init__.py files in every directory but because these files are scripts, I can't use relative imports. How would I go about importing these libraries?

if I'm understanding you, you need the script you're building in contents to be able to import the other modules from your package, but it can't ask for the right target directory b/c its kind of an awkward relative import above itself. try adding this line before you join the list together
contents.insert(0, "import sys; sys.path.append('lib'); sys.path.append('device')")
it's late over here & I'm on my phone so there may be a typo, but I'm hoping that'll work for you.
edit: depending on which is the present working directory, you might need to append '../lib' or use an absolute path

Copying your modules to python/lib/site-packages would solve the issue.

Have Python Script Locate Files in Seperate Directories

I'm pretty new to python programming, and I wrote a script to automate uploading a file via SFTP to a remote machine. The script works wonderfully, but there is an issue that I can't seem to figure out. If I'm in the directory in which the file I'm trying to upload is residing, everything goes fine. But, when I type the filename that is not residing in said directory, it doesn't like that. It's a hassle having to browse to different folders each time. I know I can consolidate the files into one folder... But I would love to try and automate this.
The /Downloads directory is hard-coded since that's where most tools reside, does anyone know how I can tweak this line of code to grab the matching file name regardless of the directory the file resides in?
This is what I've written:
#! /usr/bin/python2
# includes
import thirdpartylib
import sys
if len(sys.argv) != 6:
print "Usage: %s file url port username password" % sys.argv[0]
exit(0)
file = sys.argv[1]
host = sys.argv[2]
port = int(sys.argv[3])
username = sys.argv[4]
password = sys.argv[5]
filelocation = "Downloads/%s" % file
transport = thirdpartylib.Transport((host, port))
transport.connect(username=username, password=password)
sftp = thirdpartylib.SFTPClient.from_transport(transport)
sftp.put(file, filelocation)
sftp.close()
transport.close()

First of all, if you're doing any work with filepaths it is recommended that you use some built-in functionality to construct them to ensure that you have proper file separators etc. os.path.join is great for this.
That being said, I would recommend having the user pass in the file path as either an absolute path (in which case it can live anywhere on the machine) or a relative path (in which case it is relative to the current directory). I would not append Downloads/ to all the file paths as that obviously breaks any absolute paths and it requires the individual calling your program to know the internals of it. I think of this as the file path equivalent of a magic number.
So what that boils down to changing the filelocation to simply be the file input argument itself.
filelocation = sys.argv[1]
# You can even do some validation if you want
is not os.path.isfile(filelocation):
print "File '%s' does not exist!" % filelocation
If you really want that Downloads/ folder to be the default (if the file path isn't an absolute path), you can check to see if the input is an absolute path (using os.path.isabs) and if it's not, then specify that it's in the Downloads/ directory.
if not os.path.isabs(filelocation):
filelocation = os.path.join('Downloads', filelocation)
Then users could call your script in two ways:
# Loads file in /absolute/path/to/file
./script.py /absolute/path/to/file ...
# Loads filename.txt from Downloads/filename.txt
./script.py filename.txt ...
Also, it looks like you may have your sftp.put input arguments reversed. The local filename should come first.

I think you want filelocation as the first argument to stfp.put, as that is supposed to be the filename on the local machine. Also, you probably want to put a slash in front of Downloads.

Python File System Reader Performance

I need to scan a file system for a list of files, and log those who don't exist. Currently I have an input file with a list of the 13 million files which need to be investigated. This script needs to be run from a remote location, as I do not have access/cannot run scripts directly on the storage server.
My current approach works, but is relatively slow. I'm still fairly new to Python, so I'm looking for tips on speeding things up.
import sys,os
from pz import padZero #prepends 0's to string until desired length
output = open('./out.txt', 'w')
input = open('./in.txt', 'r')
rootPath = '\\\\server\share\' #UNC path to storage
for ifid in input:
ifid = padZero(str(ifid)[:-1], 8) #extracts/formats fileName
dir = padZero(str(ifid)[:-3], 5) #exracts/formats the directory containing the file
fPath = rootPath + '\\' + dir + '\\' + ifid + '.tif'
try:
size = os.path.getsize(fPath) #don't actually need size, better approach?
except:
output.write(ifid+'\n')
Thanks.

dirs = collections.defaultdict(set)
for file_path in input:
file_path = file_path.rjust(8, "0")
dir, name = file_path[:-3], file_path
dirs[dir].add(name)
for dir, files in dirs.iteritems():
for missing_file in files - set(glob.glob("*.tif")):
print missing_file
Explanation
First read the input file into a dictionary of directory: filename. Then for each directory, list all the TIFF files in that directory on the server, and (set) subtract this from the collection of filenames you should have. Print anything that's left.
EDIT: Fixed silly things. Too late at night when I wrote this!

That padZero and string concatenation stuff looks to me like it would take a good percent of time.
What you want it to do is spend all its time reading the directory, very little else.
Do you have to do it in python? I've done similar stuff in C and C++. Java should be pretty good too.

You're going to be I/O bound, especially on a network, so any changes you can make to your script will result in very minimal speedups, but off the top of my head:
import os
input, output = open("in.txt"), open("out.txt", "w")
root = r'\\server\share'
for fid in input:
fid = fid.strip().rjust(8, "0")
dir = fid[:-3] # no need to re-pad
path = os.path.join(root, dir, fid + ".tif")
if not os.path.isfile(path):
output.write(fid + "\n")
I don't really expect that to be any faster, but it is arguably easier to read.
Other approaches may be faster. For example, if you expect to touch most of the files, you could just pull a complete recursive directory listing from the server, convert it to a Python set(), and check for membership in that rather than hitting the server for many small requests. I will leave the code as an exercise...

I would probably use a shell command to get the full listing of files in all directories and subdirectories in one hit. Hopefully this will minimise the amount of requests you need to make to the server.
You can get a listing of the remote server's files by doing something like:
Linux: mount the shared drive as /shared/directory/ and then do ls -R /shared/directory > ~/remote_file_list.txt
Windows: Use Map Network Drive to mount the shared drive as drive letter X:, then do dir /S X:/shared_directory > C:/remote_file_list.txt
Use the same methods to create a listing of your local folder's contents as local_file_list.txt. You python script will then reduce to an exercise in text processing.
Note: I did actually have to do this at work.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.