Iterate through directory of files as input to modules - python

I have a module that I want to run on every file in a directory. However, when I iterate over that directory, using each file as an input, the module cannot find the file, as though the variable defined in the loop doesn't actually point to the file. Here is the code I am trying to execute:
import os as os
for file in os.listdir():
if file.endswith('.fasta'):
!python ../iupred2a.py file long
Any help is greatly appreciated. Thanks!

https://ipython.readthedocs.io/en/stable/interactive/reference.html#system-shell-access says that shell commands (for example, your line prefixed with a "!") are interpreted literally. When you type "file", it sees "file", not the value of your file variable.
Any input line beginning with a ! character is passed verbatim (minus the !, of course) to the underlying operating system.
But it also says you can use braces or a dollar sign to "expand" a value.
IPython also allows you to expand the value of python variables when making system calls. Wrap variables or expressions in {braces}:
In [1]: pyvar = 'Hello world'
In [2]: !echo "A python variable: {pyvar}"
A python variable: Hello world
In [3]: import math
In [4]: x = 8
In [5]: !echo {math.factorial(x)}
40320
For simple cases, you can alternatively prepend $ to a variable name:
In [6]: !echo $sys.argv
[/home/fperez/usr/bin/ipython]
In [7]: !echo "A system variable: $$HOME" # Use $$ for literal $
A system variable: /home/fperez
In your case, try !python ../iupred2a.py $file long or !python ../iupred2a.py {file} long.
... All that said, I think it would be better to just import your other Python file and call its functions directly. This may require a little redesigning, because importing from a file from one directory up is somewhat tricky, and the command-line interface for a module is usually different from its programming interface.
If you can get your current file and iupred2a.py into the same directory, and figure out the name of the function that you actually want to call, then your code would end up looking something like:
import os
import iupred2a as iup
for file in os.listdir():
if file.endswith('.fasta'):
iup.do_the_thing(file, mode="long")

Related

Passing ipython variable to bash command one line for loop

I'm sorry for asking a duplicate as this and this are very similar but with those answers I don't seem to be able to make my code work.
If have a jupyter notebook cell with:
some_folder = "/some/path/to/files/*.gif"
!for name in {some_folder}; do echo $name; done
The output I get is just {folder}
If I do
some_folder = "/some/path/to/files/*.gif"
!for name in "/some/path/to/files/*.gif"; do echo $name; done # <-- gives no newlines between file names
# or
!for name in /some/path/to/files/*.gif; do echo $name; done # <-- places every filename on its own line
My gif files are printed to screen.
So my question why does it not use my python variable in the for loop?
Because the below, without a for loop, does work:
some_folder = "/some/path/to/files/"
!echo {some_folder}
Follow up question: I actually want my variable to just be the folder and add the wildcard only in the for loop. So something like this:
some_folder = "/some/path/to/files/"
!for name in {some_folder}*.gif; do echo $name; done
For context, later I actually want to rename the files in the for loop and not just print them. The files have an extra dot (not the one from the .gif extension) which I would like to remove.
There's an alternative way to use shell bash in a Jupyter cell with cell magic, see here. It seems to allow what you are trying to do.
If you already ran in a normal cell some_folder = r"/some/path/to/files/*.gif" or some_folder = "/some/path/to/files/*.gif", then you can try in a separate cell:
%%bash -s {some_folder}
for i in {1..5}; do echo $1; done
That said, what you seems to be trying to do with some_folder = "/some/path/to/files/*.gif" isn't going to work as such. If you try to pass "/some/path/to/files/*.gif" from Python to bash, it isn't going to work like passing /some/path/to/files/*.gif directly to bash. Bash isn't passing "/some/path/to/files/*.gif" directly to a command, it expands it and then passes it. There's not going to be an expansion passing from Python. And there's other peculiarities you'll come across. Tar you can pass a Python list of files directly using the bracket notation and it will handle that.
The solutions are to either do more on the Python side or more in the shell side. Python has it's own glob module, see here. You can combine that with working with os.system(). Python has fnamtch that is nice because you can use Unix-like file name matching. Plus there's shutil that allows moving/renaming, see shutil.move(). In Python os.remove(temp_file_name) can delete files. If you aren't working on a Windows machine there's the sh module that makes things nice. See here and here.

Unexpected double quotes while appending file items to subprocess.run

I am trying to read from a file which has contents like this:
#\5\5\5
...
#\5\5\10
This file content is then fed into subprocess module of python like this:
for lines in file.readlines():
print(lines)
cmd = ls
p = subprocess.run([cmd, lines])
The output turns into something like this:
CompletedProcess(args=['ls', "'#5\\5\\5'\n"], returncode=1)
I don't understand why the contents of the file is appended with a double quote and another backward slash is getting appended.
The real problem here isn't Python or the subprocess module. The problem the use of subprocess to invoke shell commands, and then trying to parse the results. In this case, it looks like the command is ls, and the plan appears to be to read some filesystem paths from a text file (each path on a separate line), and list the files at that location on the filesystem.
Using subprocess to invoke ls is really, really, really not the way to accomplish that in Python. This is basically an attempt to use Python like a shell script (this use of ls would still be problematic, but that's a different discussion).
If a shell script is the right tool for the job, then write a shell script. If you want to use Python, then use one of the API's that it provides for interacting with the OS and the filesystem. There is no need to bring in external programs to achieve this.
import os
with open("list_of_paths.txt", "r") as fin:
for line in fin.readlines():
w = os.listdir(line.strip())
print(w)
Note the use of .strip(), this is a string method that will remove invisible characters like spaces and newlines from the ends of the input.
The listdir method provided by the os module will return a list of the files in a directory. Other options are os.scandir, os.walk, and the pathlib module.
But please do not use subprocess. 95% of the time, when someone thinks "should I use Python's subprocess module for this?" the ansewr is "NO".
It is because \ with a relevant character or digit becomes something else other than the string. For example, \n is not just \ and n but it means next line. If you really want a \n, then you would add another backslash to it (\\n). Likewise \5 means something else. here is what I found when i ran \5:
and hence the \\ being added, if I am not wrong

Can a variable expansion in python work like shell expansion of variable

I have a legacy python code that reads from a shell script for variable values like in a properties file. For example, say, I have a shell program x.sh for variable declaration as:
Y_HOME=/utils
Y_SPCL=$Y_HOME/spcl
UTIL1=$Y_SPCL/u1
Y_LIB=$Y_HOME/lib
Now, from within python program abc.py, I read x.sh file line by line and use line.split("=")[1] to get the value of the variable, say, UTIL1 as $Y_SPCL/u1 in non-expanded form and not in expanded form as /utils/spcl/u1.
Can I have sone mechanism in python to have vafiable expandion like in a shell program execution. I think, since I am using x.sh not as a shell program, rather as a configuration file like properties, there should be all variables in expanded form to let the python program run properly, such as:
Y_HOME=/utils
Y_SPCL=/utils/spcl
UTIL1=/utils/spcl/u1
Y_LIB=/utils/lib
This will have no change on the legacy python part of code and changing the configuration file as an external properties data.
Please pass your opinions.
There is a package dotenv that can do that.
pip install --user python-dotenv
Then, in Python:
import dotenv, os
dotenv.load_env("x.sh")
print(os.environ["Y_LIB"])
Important: Make sure your variable substitutions read like ${VAR}. So your x.sh would look like this:
Y_HOME=/utils
Y_SPCL=${Y_HOME}/spcl
UTIL1=${Y_SPCL}/u1
Y_LIB=${Y_HOME}/lib
Assuming variables must be declared before use and they form a correct path when expanded, you could do something like this.
There is the file fileName that contains the variables:
Y_HOME=/utils
Y_SPCL=$Y_HOME/spcl
UTIL1=$Y_SPCL/u1
Y_LIB=$Y_HOME/lib
So, For each "variable", you search for it in the next variables "values" and replace it for the proper "value". You could have a .py like this:
variables = []
with open("fileName", 'r') as f:
while True:
line = f.readline()[:-1] # Get the line minus '\n'
if not line:
break
variables.append(line.split('='))
for i in range(len(variables)):
current = "$" + variables[i][0]
for j in range(len(variables)):
replaced = None
while replaced != variables[j][1]:
# We replace until no further replaces happen
replaced = variables[j][1].replace(current, variables[i][1])
variables[j][1] = replaced
for var in variables:
print(var[0] + "=" + var[1])
Output:
Y_HOME=/utils
Y_SPCL=/utils/spcl
UTIL1=/utils/spcl/u1
Y_LIB=/utils/li

running python code in an external shell using sublimetext?

First of all, I am new to programming.
To run python code in an external shell window, I followed the instructions given on this page
link
My problem is that if I save the python file in any path that contains a folder name with a space, it gives me this error:
C:\Python34\python.exe: can't open file 'C:\Program': [Errno 2] No such file or directory
Does not work:
C:\Program Files\Python Code
Works:
C:\ProgramFiles\PythonCode
could someone help me fix the problem???
Here is the code:
import sublime
import sublime_plugin
import subprocess
class PythonRunCommand(sublime_plugin.WindowCommand):
def run(self):
command = 'cmd /k "C:\Python34\python.exe" %s' % sublime.active_window().active_view().file_name()
subprocess.Popen(command)
subprocess methods accept a string or a list. Passing as a string is the lazy way: just copy/paste your command line and it works. That is for hardcoded commands, but things get complicated when you introduce parameters known at run-time only, which may contain spaces, etc...
Passing a list is better because you don't need to compose your command and escape spaces by yourself. Pass the parameters as a list so it's done automatically and better that you could do:
command = ['cmd','/k',r"C:\Python34\python.exe",sublime.active_window().active_view().file_name()]
And always use raw strings (r prefix) when passing literal windows paths or you may have some surprises with escape sequences meaning something (linefeed, tab, unicode...)
In this particular case, if file associations are properly set, you only need to pass the python script without any other command prefix:
command = [sublime.active_window().active_view().file_name()]
(you'll need shell=True added to the subprocess command but it's worth it because it avoids to hardcode python path, and makes your plugin portable)

how to "source" file into python script

I have a text file /etc/default/foo which contains one line:
FOO="/path/to/foo"
In my python script, I need to reference the variable FOO.
What is the simplest way to "source" the file /etc/default/foo into my python script, same as I would do in bash?
. /etc/default/foo
Same answer as #jil however, that answer is specific to some historical version of Python.
In modern Python (3.x):
exec(open('filename').read())
replaces execfile('filename') from 2.x
You could use execfile:
execfile("/etc/default/foo")
But please be aware that this will evaluate the contents of the file as is into your program source. It is potential security hazard unless you can fully trust the source.
It also means that the file needs to be valid python syntax (your given example file is).
Keep in mind that if you have a "text" file with this content that has a .py as the file extension, you can always do:
import mytextfile
print(mytestfile.FOO)
Of course, this assumes that the text file is syntactically correct as far as Python is concerned. On a project I worked on we did something similar to this. Turned some text files into Python files. Wacky but maybe worth consideration.
Just to give a different approach, note that if your original file is setup as
export FOO=/path/to/foo
You can do source /etc/default/foo; python myprogram.py (or . /etc/default/foo; python myprogram.py) and within myprogram.py all the values that were exported in the sourced' file are visible in os.environ, e.g
import os
os.environ["FOO"]
If you know for certain that it only contains VAR="QUOTED STRING" style variables, like this:
FOO="some value"
Then you can just do this:
>>> with open('foo.sysconfig') as fd:
... exec(fd.read())
Which gets you:
>>> FOO
'some value'
(This is effectively the same thing as the execfile() solution
suggested in the other answer.)
This method has substantial security implications; if instead of FOO="some value" your file contained:
os.system("rm -rf /")
Then you would be In Trouble.
Alternatively, you can do this:
>>> with open('foo.sysconfig') as fd:
... settings = {var: shlex.split(value) for var, value in [line.split('=', 1) for line in fd]}
Which gets you a dictionary settings that has:
>>> settings
{'FOO': ['some value']}
That settings = {...} line is using a dictionary comprehension. You could accomplish the same thing in a few more lines with a for loop and so forth.
And of course if the file contains shell-style variable expansion like ${somevar:-value_if_not_set} then this isn't going to work (unless you write your very own shell style variable parser).
There are a couple ways to do this sort of thing.
You can indeed import the file as a module, as long as the data it contains corresponds to python's syntax. But either the file in question is a .py in the same directory as your script, either you're to use imp (or importlib, depending on your version) like here.
Another solution (that has my preference) can be to use a data format that any python library can parse (JSON comes to my mind as an example).
/etc/default/foo :
{"FOO":"path/to/foo"}
And in your python code :
import json
with open('/etc/default/foo') as file:
data = json.load(file)
FOO = data["FOO"]
## ...
file.close()
This way, you don't risk to execute some uncertain code...
You have the choice, depending on what you prefer. If your data file is auto-generated by some script, it might be easier to keep a simple syntax like FOO="path/to/foo" and use imp.
Hope that it helps !
The Solution
Here is my approach: parse the bash file myself and process only variable assignment lines such as:
FOO="/path/to/foo"
Here is the code:
import shlex
def parse_shell_var(line):
"""
Parse such lines as:
FOO="My variable foo"
:return: a tuple of var name and var value, such as
('FOO', 'My variable foo')
"""
return shlex.split(line, posix=True)[0].split('=', 1)
if __name__ == '__main__':
with open('shell_vars.sh') as f:
shell_vars = dict(parse_shell_var(line) for line in f if '=' in line)
print(shell_vars)
How It Works
Take a look at this snippet:
shell_vars = dict(parse_shell_var(line) for line in f if '=' in line)
This line iterates through the lines in the shell script, only process those lines that has the equal sign (not a fool-proof way to detect variable assignment, but the simplest). Next, run those lines into the function parse_shell_var which uses shlex.split to correctly handle the quotes (or the lack thereof). Finally, the pieces are assembled into a dictionary. The output of this script is:
{'MOO': '/dont/have/a/cow', 'FOO': 'my variable foo', 'BAR': 'My variable bar'}
Here is the contents of shell_vars.sh:
FOO='my variable foo'
BAR="My variable bar"
MOO=/dont/have/a/cow
echo $FOO
Discussion
This approach has a couple of advantages:
It does not execute the shell (either in bash or in Python), which avoids any side-effect
Consequently, it is safe to use, even if the origin of the shell script is unknown
It correctly handles values with or without quotes
This approach is not perfect, it has a few limitations:
The method of detecting variable assignment (by looking for the presence of the equal sign) is primitive and not accurate. There are ways to better detect these lines but that is the topic for another day
It does not correctly parse values which are built upon other variables or commands. That means, it will fail for lines such as:
FOO=$BAR
FOO=$(pwd)
Based off the answer with exec(.read()), value = eval(.read()), it will only return the value. E.g.
1 + 1: 2
"Hello Word": "Hello World"
float(2) + 1: 3.0

Categories