How to guess a file extension with python?

How to guess a file extension with python? - python

I have recently tried to make a python code which takes a path of a file without an extension and determine what extension it has.
I was looking for something like the example below. In the example the extension is exe (but the code doesn't know that yet).
path = 'C:\\MyPath\\Example'
#takes the path above and guesses the programs extension:
extension = guess_extension(path)
#adds the extension to the path:
fullPath = path+extension
print(fullPath)
Output:
C:\MyPath\Example.exe
If you know a python module that would do that (or something similar), please list it below.
I have tried to use filetype (filetype.guess()) and mimetypes (mimetypes.guess_extension()) modules, but they would both return value of none.
I have also tried to use answers from many questions like this one, but that still didn't work.

It sounds like the built in glob module (glob docs) might be what you're looking for. This module provides Unix style pattern expansion functionality within Python.
In the following example the incomplete path variable has the str .* appended to it when passed to glob.glob. This essentially tells glob.glob to return a list of valid paths found within the host system that start the same as path, followed by a period (designating a file extension), with the asterisk matching any and all characters following those from path + '.'.
import glob
path = r'C:\Program Files\Firefox Developer Edition\minidump-analyzer'
full = glob.glob(path+'.*')
print(full[0])
Output: C:\Program Files\Firefox Developer Edition\minidump-analyzer.exe
It is worth noting that the above is just an illustration of how glob could be leveraged as part of a solution to your question. Proper handling of unexpected inputs, edge cases, exceptions etc. should be implemented as required by the needs of your program.

Related

Python Convert Windows File path in a variable

Given is a variable that contains a windows file path. I have to then go and read this file. The problem here is that the path contains escape characters, and I can't seem to get rid of it. I checked os.path and pathlib, but all expect the correct text formatting already, which I can't seem to construct.
For example this. Please note that fPath is given, so I cant prefix it with r for a rawpath.
#this is given, I cant rawpath it with r
fPath = "P:\python\t\temp.txt"
file = open(fPath, "r")
for line in file:
print (line)
How can I turn fPath via some function or method from:
"P:\python\t\temp.txt"
to
"P:/python/t/temp.txt"
I've tried also tried .replace("\","/"), which doesnt work.
I'm using Python 3.7 for this.

You can use os.path.abspath() to convert it:
print(os.path.abspath("P:\python\t\temp.txt"))
>>> P:/python/t/temp.txt
See the documentation of os.path here.

I've solved it.
The issues lies with the python interpreter. \t and all the others don't exist as such data, but are interpretations of nonprint characters.
So I got a bit lucky and someone else already faced the same problem and solved it with a hard brute-force method:
http://code.activestate.com/recipes/65211/
I just had to find it.
After that I have a raw string without escaped characters, and just need to run the simple replace() on it to get a workable path.

You can use Path function from pathlib library.
from pathlib import Path
docs_folder = Path("some_folder/some_folder/")
text_file = docs_folder / "some_file.txt"
f = open(text_file)

if you would like to do replace then do
replace("\\","/")

When using python version >= 3.4, the class Path from module pathlib offers a function called as_posix, which will sort of convert a path to *nix style path. For example, if you were to build Path object via p = pathlib.Path('C:\\Windows\\SysWOW64\\regedit.exe'), asking it for p.as_posix() it would yield C:/Windows/SysWOW64/regedit.exe. So to obtain a complete *nix style path, you'd need to convert the drive letter manually.

I came across similar problem with Windows file paths. This is what is working for me:
import os
file = input(str().split('\\')
file = '/'.join(file)
This gave me the input from this:
"D:\test.txt"
to this:
"D:/test.txt"
Basically when trying to work with the Windows path, python tends to replace '' to '\'. It goes for every backslash. When working with filepaths, you won't have double slashes since those are splitting folder names.
This way you can list all folders by order by splitting '\' and then rejoining them by .join function with frontslash.
Hopefully this helps!

File not found from Python although file exists

I'm trying to load a simple text file with an array of numbers into Python. A MWE is
import numpy as np
BASE_FOLDER = 'C:\\path\\'
BASE_NAME = 'DATA.txt'
fname = BASE_FOLDER + BASE_NAME
data = np.loadtxt(fname)
However, this gives an error while running:
OSError: C:\path\DATA.txt not found.
I'm using VSCode, so in the debug window the link to the path is clickable. And, of course, if I click it the file opens normally, so this tells me that the path is correct.
Also, if I do print(fname), VSCode also gives me a valid path.
Is there anything I'm missing?
EDIT
As per your (very helpful for future reference) comments, I've changed my code using the os module and raw strings:
BASE_FOLDER = r'C:\path_to_folder'
BASE_NAME = r'filename_DATA.txt'
fname = os.path.join(BASE_FOLDER, BASE_NAME)
Still results in error.
Second EDIT
I've tried again with another file. Very basic path and filename
BASE_FOLDER = r'Z:\Data\Enzo\Waste_Code'
BASE_NAME = r'run3b.txt'
And again, I get the same error.
If I try an alternative approach,
os.chdir(BASE_FOLDER)
a = os.listdir()
then select the right file,
fname = a[1]
I still get the error when trying to import it. Even though I'm retrieving it directly from listdir.
>> os.path.isfile(a[1])
False

Using the module os you can check the existence of the file within python by running
import os
os.path.isfile(fname)
If it returns False, that means that your file doesn't exist in the specified fname. If it returns True, it should be read by np.loadtxt().
Extra: good practice working with files and paths
When working with files it is advisable to use the amazing functionality built in the Base Library, specifically the module os. Where os.path.join() will take care of the joins no matter the operating system you are using.
fname = os.path.join(BASE_FOLDER, BASE_NAME)
In addition it is advisable to use raw strings by adding an r to the beginning of the string. This will be less tedious when writing paths, as it allows you to copy-paste from the navigation bar. It will be something like BASE_FOLDER = r'C:\path'. Note that you don't need to add the latest '\' as os.path.join takes care of it.

You may not have the full permission to read the downloaded file. Use
sudo chmod -R a+rwx file_name.txt
in the command prompt to give yourself permission to read if you are using Ubuntu.

For me the problem was that I was using the Linux home symbol in the link (~/path/file). Replacing it with the absolute path /home/user/etc_path/file worked like charm.

Primer needed in python pathnames

I am a very novice coder, and Python is my first (and, practically speaking, only) language. I am charged as part of a research job with manipulating a collection of data analysis scripts, first by getting them to run on my computer. I was able to do this, essentially by removing all lines of coding identifying paths, and running the scripts through a Jupyter terminal opened in the directory where the relevant modules and CSV files live so the script knows where to look (I know that Python defaults to the location of the terminal).
Here are the particular blocks of code whose function I don't understand
import sys
sys.path.append('C:\Users\Ben\Documents\TRACMIP_Project\mymodules/')
import altdata as altdata
I have replaced the pathname in the original code with the path name leading to the directory where the module is; the file containing all the CSV files that end up being referenced here is also in mymodules.
This works depending on where I open the terminal, but the only way I can get it to work consistently is by opening the terminal in mymodules, which is fine for now but won't work when I need to work by accessing the server remotely. I need to understand better precisely what is being done here, and how it relates to the location of the terminal (all the documentation I've found is overly technical for my knowledge level).
Here is another segment I don't understand
import os.path
csvfile = 'csv/' + model +'_' + exp + '.csv'
if os.path.isfile(csvfile): # csv file exists
hcsvfile = open(csvfile )
I get here that it's looking for the CSV file, but I'm not sure how. I'm also not sure why then on some occasions depending on where I open the terminal it's able to find the module but not the CSV files.
I would love an explanation of what I've presented, but more generally I would like information (or a link to information) explaining paths and how they work in scripts in modules, as well as what are ways of manipulating them. Thanks.

sys.path
This is simple list of directories where python will look for modules and packages (.py and dirs with __init__.py file, look at modules tutorial). Extending this list will allow you to load modules (custom libs, etc.) from non default locations (usually you need to change it in runtime, for static dirs you can modify startup script to add needed enviroment variables).
os.path
This module implements some useful functions on pathnames.
... and allows you to find out if file exists, is it link, dir, etc.
Why you failed loading *.csv?
Because sys.path responsible for module loading and only for this. When you use relative path:
csvfile = 'csv/' + model +'_' + exp + '.csv'
open() will look in current working directory
file is either a string or bytes object giving the pathname (absolute or relative to the current working directory)...
You need to use absolute paths by constucting them with os.path module.

I agree with cdarke's comment that you are probably running into an issue with backslashes. Replacing the line with:
sys.path.append(r'C:\Users\Ben\Documents\TRACMIP_Project\mymodules')
will likely solve your problem. Details below.
In general, Python treats paths as if they're relative to the current directory (where your terminal is running). When you feed it an absolute path-- which is a path that includes the root directory, like the C:\ in C:\Users\Ben\Documents\TRACMIP_Project\mymodules-- then Python doesn't care about the working directory anymore, it just looks where you tell it to look.
Backslashes are used to make special characters within strings, such as line breaks (\n) and tabs (\t). The snag you've hit is that Python paths are strings first, paths second. So the \U, \B, \D, \T and \m in your path are getting misinterpreted as special characters and messing up Python's path interpretation. If you prefix the string with 'r', Python will ignore the special characters meaning of the backslash and just interpret it as a literal backslash (what you want).
The reason it still works if you run the script from the mymodules directory is because Python automatically looks in the working directory for files when asked. sys.path.append(path) is telling the computer to include that directory when it looks for commands, so that you can use files in that directory no matter where you're running the script. The faulty path will still get added, but its meaningless. There is no directory where you point it, so there's nothing to find there.
As for path manipulation in general, the "safest" way is to use the function in os.path, which are platform-independent and will give the correct output whether you're working in a Windows or a Unix environment (usually).
EDIT: Forgot to cover the second part. Since Python paths are strings, you can build them using string operations. That's what is happening with the line
csvfile = 'csv/' + model +'_' + exp + '.csv'
Presumably model and exp are strings that appear in the filenames in the csv/ folder. With model = "foo" and exp = "bar", you'd get csv/foo_bar.csv which is a relative path to a file (that is, relative to your working directory). The code makes sure a file actually exists at that path and then opens it. Assuming the csv/ folder is in the same path as you added in sys.path.append, this path should work regardless of where you run the file, but I'm not 100% certain on that. EDIT: outoftime pointed out that sys.path.append only works for modules, not opening files, so you'll need to either expand csv/ into an absolute path or always run in its parent directory.
Also, I think Python is smart enough to not care about the direction of slashes in paths, but you should probably not mix them. All backslashes or all forward slashes only. os.path.join will normalize them for you. I'd probably change the line to
csvfile = os.path.join('csv\', model + '_' + exp + '.csv')
for consistency's sake.

What is the fastest method of finding a file in Linux and Windows using Python?

I am writing a plug-in for RawTherapee in Python. I need to extract the version number from a file called 'AboutThisBuild.txt' that may exist anywhere in the directory tree. Although RawTherapee knows where it is installed this data is baked into the binary file.
My plug-in is being designed to collect basic system data when run without any command line parameters for the purpose of short circuiting troubleshooting. By having the version number, revision number and changeset (AKA Mercurial), I can sort out why the script may not be working as expected. OK that is the context.
I have tried a variety of methods, some suggested elsewhere on this site. The main one is using os.walk and fnmatch.
The problem is speed. Searching the entire directory tree is like watching paint dry!
To reduce load I have tried to predict likely hiding places and only traverse these. This is quicker but has the obvious disadvantage of missing some files.
This is what I have at the moment. Tested on Linux but not Windows as yet as I am still researching where the file might be placed.
import fnmatch
import os
import sys
rootPath = ('/usr/share/doc/rawtherapee',
'~',
'/media/CoreData/opt/',
'/opt')
pattern = 'AboutThisBuild.txt'
# Return the first instance of RT found in the paths searched
for CheckPath in rootPath:
print("\n")
print(">>>>>>>>>>>>> " + CheckPath)
print("\n")
for root, dirs, files in os.walk(CheckPath, True, None, False):
for filename in fnmatch.filter(files, pattern):
print( os.path.join(root, filename))
break
Usually 'AboutThisBuild.txt' is stored in a directory/subdirectory called 'rawtherapee' or has the string somewhere in the directory tree. I had naively though I could get the 5000 odd directory names and search these for 'rawtherapee' then use os.walk to traverse those directories but all modules and functions I have looked at collate all files in the directory (again).
Anyone have a quicker method of searching the entire directory tree or am I stuck with this hybrid option?

I am a beginner in Python, but I think I know the simplest way of finding a file in Windows.
import os
for dirpath, subdirs, filenames in os.walk('The directory you wanna search the file in'):
if 'name of your file with extension' in filenames:
print(dirpath)
This code will print out the directory of the file you are searching for in the console. All you have to do is get to the directory.

The thing about searching is that it doesn't matter too much how you get there (eg cheating). Once you have a result, you can verify it is correct relatively quickly.
You may be able to identify candidate locations fairly efficiently by guessing. For example, on Linux, you could first try looking in these locations (obviously not all are directories, but it doesn't do any harm to os.path.isfile('/;l$/AboutThisBuild.txt'))
$ strings /usr/bin/rawtherapee | grep '^/'
/lib/ld-linux.so.2
/H=!
/;l$
/9T$,
/.ba
/usr/share/rawtherapee
/usr/share/doc/rawtherapee
/themes/
/themes/slim
/options
/usr/share/color/icc
/cache
/languages/default
/languages/
/languages
/themes
/batch/queue
/batch/
/dcpprofiles
/#q=
/N6rtexif16NAISOInterpreterE
If you have it installed, you can try the locate command
If you still don't find it, move on to the brute force method
Here is a rough equivalent of strings using Python
>>> from string import printable, whitespace
>>> from itertools import groupby
>>> pathchars = set(printable) - set(whitespace)
>>> with open("/usr/bin/rawtherapee") as fp:
... data = fp.read()
...
>>> for k, g in groupby(data, pathchars.__contains__):
... if not k: continue
... g = ''.join(g)
... if len(g) > 3 and g.startswith("/"):
... print g
...
/lib64/ld-linux-x86-64.so.2
/^W0Kq[
/pW$<
/3R8
/)wyX
/WUO
/w=H
/t_1
/.badpixH
/d$(
/\$P
/D$Pv
/D$#
/D$(
/l$#
/d$#v?H
/usr/share/rawtherapee
/usr/share/doc/rawtherapee
/themes/
/themes/slim
/options
/usr/share/color/icc
/cache
/languages/default
/languages/
/languages
/themes
/batch/queue.csv
/batch/
/dcpprofiles
/#q=
/N6rtexif16NAISOInterpreterE

It sounds like you need a pure python solution here. If not, other answers will suffice.
In this case, you should traverse the folders using a queue and threads. While some may say Threads are never the solution, Threads are a great way of speeding up when you are I/O bound, which you are in this case. Essentially, you'll os.listdir the current dir. If it contains your file, party like it's 1999. If it doesn't, add each subfolder to the work queue.
If you're clever, you can play with depth first vs breadth first traversal to get the best results.
There is a great example I have used quite successfully at work at http://www.tutorialspoint.com/python/python_multithreading.htm. See the section titled Multithreaded Priority Queue. The example could probably be updated to include threadpools though, but it's not necessary.

how to launch an exe with a variable path, special characters and arguements

I want to copy an installer file from a location where one of the folder names changes as per the build number
This works for defining the path where the last folder name changes
import glob
import os
dirname = "z:\\zzinstall\\*.install"
filespec = "setup.exe"
print glob.glob (os.path.join (dirname, filespec))
# the print is how I'm verifying the path is correct
['z:\\zzinstall\\35115.install\\setup.exe'
The problem I have is that I can't get the setup.exe to launch due to the arguments needed
I need to launch setup.exe with, for example
setup.exe /S /z"
There are numerous other arguments that need to be passed with double quotes, slashes and whitespaces. Due to the documentation provided which is inconsistent, I have to test via trial and error. There are even instances that state I need to use a "" after a switch!
So how can I do this?
Ideally I'd like to pass the entrire path, including the file I need to glob or
I'd like to declare the result of the path with glob as a variable then concatenate with setup.exe and the arguements. That did not work, the error list can't be combined with string is returned.
Basically anything that works, so far I've failed because of my inability to handle the filename that varies and the obscene amount of whitespaces and special characters in the arguements.
The following link is related howevers does not include a clear answer for my specific question
link text
The response provided below does not answer the question nor does the link I provided, that's why I'm asking this question. I will rephrase in case I'm not understood.
I have a file that I need to copy at random times. The file is prependedned with unique, unpredicatable number e.g. a build number. Note this is a windows system.
For this example I will cite the same folder/file structure.
The build server creates a build any time in a 4 hour range. The path to the build server folder is Z:\data\builds*.install\setup.exe
Note the wildcard in the path. This means the folder name is prepended with a random(yes, random) string of 8 digits then a dot. then "install". So, the path at one time may be Z:\data\builds\12345678.install\setup.exe or it could be Z:\data\builds\66666666.install\setup.exe This is one, major portion of this problem. Note, I did not design this build numbering system. I've never seen anything like this my years as a QA engineer.
So to deal with the first issue I plan on using a glob.
import glob
import os
dirname = "Z:\\data\\builds\\*.install"
filespec = "setup.exe"
instlpath = glob.glob (os.path.join (dirname, filespec))
print instlpath # this is the test,printsthe accurate path to launch an install, problem #is I have to add arguements
OK so I thought I could use path that I defined as instlpath, concatnenate it and execute.
when it try and use prinnt to test
print instlpath + [" /S /z" ]
I get
['Z:\builds\install\12343333.install\setup.exe', ' /S /z']
I need
Z:\builds\install\12343333.install\setup.exe /S /z" #yes, I need the whitespace as #well and amy also need a z""
Why are all of the installs called setup.exe and not uniquely named? No freaking idea!
Thank You,
Surfdork

The related question you linked to does contain a relatively clear answer to your problem:
import subprocess
subprocess.call(['z:/zzinstall/35115.install/setup.exe', '/S', '/z', ''])
So you don't need to concatenate the path of setup.exe and its arguments. The arguments you specify in the list are passed directly to the program and not processed by the shell. For an empty string, which would be "" in a shell command, use an empty python string.
See also http://docs.python.org/library/subprocess.html#subprocess.call

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.