How to call a sed command in a python script? - python

Through python script, I am trying to sed command that through subprocess.call() as it in the script.
file = "a.xml"
updateData= "(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=abc)(PORT=1234)))(CONNECT_DATA=(SERVICE_NAME=centraldb)))"
subprocess.call(["sed", "-i", 's#<DB_CONNECT_STRING>.*</DB_CONNECT_STRING>#<DB_CONNECT_STRING>updateData</DB_CONNECT_STRING>#', file])
When I run the command in the shell script or command, it runs fine, but in python I get a result saying "No input file". Any idea how to fix that error?
a.xml looks something like this.
<?xml version = '1.0' encoding = 'UTF-8'?>
<!DOCTYPE properties SYSTEM "java.sun.com/dtd/properties.dtd">
<properties> <!-- Database server details -->
<DB_CONNECT_STRING>(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=abc)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=cdb)))</DB_CONNECT_STRING>
</properties>

You really don't need or want to use an external subprocess for this.
import fileinput
updateData = "(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=abc)(PORT=1234)))(CONNECT_DATA=(SERVICE_NAME=centraldb)))"
for line in fileinput.input('a.xml', inplace=True):
try:
prefix, tail = line.split('<DB_CONNECT_STRING>', 1)
_, suffix = tail.split('</DB_CONNECT_STRING>', 1)
line = prefix + '<DB_CONNECT_STRING>' + updateData + '</DB_CONNECT_STRING>' + suffix
except ValueError:
# <DB_CONNECT_STRING> or close tag not found -- don't replace
pass
print(line)
For the record, updateData inside quotes does not magically change into the value of the variable updateData so that was another problem with your attempt.
The ad-hoc XML processing is still a major wart; a proper solution would use an XML parser and perhaps XSLT to update the file. (On the other hand, if you know for a fact that the line will never contain anything outside the start and end tags, you can simplify the above script somewhat. On the third hand, the ad-hoc s-expressions inside the XML tags looks like you really want to rethink the configuration file format more thoroughly if you have any control at all over this.)

Related

Unexpected double quotes while appending file items to subprocess.run

I am trying to read from a file which has contents like this:
#\5\5\5
...
#\5\5\10
This file content is then fed into subprocess module of python like this:
for lines in file.readlines():
print(lines)
cmd = ls
p = subprocess.run([cmd, lines])
The output turns into something like this:
CompletedProcess(args=['ls', "'#5\\5\\5'\n"], returncode=1)
I don't understand why the contents of the file is appended with a double quote and another backward slash is getting appended.
The real problem here isn't Python or the subprocess module. The problem the use of subprocess to invoke shell commands, and then trying to parse the results. In this case, it looks like the command is ls, and the plan appears to be to read some filesystem paths from a text file (each path on a separate line), and list the files at that location on the filesystem.
Using subprocess to invoke ls is really, really, really not the way to accomplish that in Python. This is basically an attempt to use Python like a shell script (this use of ls would still be problematic, but that's a different discussion).
If a shell script is the right tool for the job, then write a shell script. If you want to use Python, then use one of the API's that it provides for interacting with the OS and the filesystem. There is no need to bring in external programs to achieve this.
import os
with open("list_of_paths.txt", "r") as fin:
for line in fin.readlines():
w = os.listdir(line.strip())
print(w)
Note the use of .strip(), this is a string method that will remove invisible characters like spaces and newlines from the ends of the input.
The listdir method provided by the os module will return a list of the files in a directory. Other options are os.scandir, os.walk, and the pathlib module.
But please do not use subprocess. 95% of the time, when someone thinks "should I use Python's subprocess module for this?" the ansewr is "NO".
It is because \ with a relevant character or digit becomes something else other than the string. For example, \n is not just \ and n but it means next line. If you really want a \n, then you would add another backslash to it (\\n). Likewise \5 means something else. here is what I found when i ran \5:
and hence the \\ being added, if I am not wrong

Passing to SOAP arguments from the command line

I have a python script that successfully sends SOAP to insert a record into a system. The values are static in the test. I need to make the value dynamic/argument that is passed through the command line or other stored value.
execute: python myscript.py
<d4p1:Address>MainStreet</d4p1:Address> ....this works to add hard coded "MainStreet"
execute: python myscript.py MainStreet
...this is now trying to pass the argument MainStreet
<d4p1:Address>sys.argv[1]</d4p1:Address> ....this does not work
It saves the literal text address as "sys.argv[1]" ... I have imported sys ..I have tried %, {}, etc from web searches, what syntax am I missing??
You need to read a little about how to create strings in Python, below is how it could look like in your code. Sorry it's hard to say more without seeing your actual code. And you actually shouldn't create XMLs like that, you should use for instance xml module from standard library.
test = "<d4p1:Address>" + sys.argv[1] + "</d4p1:Address>"

Python - Open file in notepad that is contained in a variable

I couldn't find this anywhere, so sorry if I missed it. It seems like it should be simple but somehow isn't. I have a simple program that opens a log (log1.lg let's say) and strips any lines that don't contain keywords. It then tosses them into a 2nd file that is renamed to Log1.lg.clean.
The way I've implemented this is by using os.rename so the code looks like this:
#define source and key words
source_log = 'Log1.lg'
bad_words = ['word', 'bad']
#clean up the log
with open(source_log) as orig_log, open('cleanlog.lg', 'w') as cleanlog:
for line in orig_log:
if not any9bad_word in line for bad_word in bad_words):
cleanlog.write(line)
#rename file and open in Notepad
rename = orig_log + '.clean'
new_log = os.rename("cleanlog.lg", rename)
prog = "notepad.exe"
subprocess.Popen(prog, new_log)
Error I'm getting is this:
File "C:\Users\me\Downloads\PythonStuff\stripMmax.py", line 23, in cleanLog
subprocess.Popen(prog, new_log)
File "C:\Python27\lib\subprocess.py", line 339, in __init__
raise TypeError("bufsize must be an integer")
TypeError: bufsize must be an integer
I'm using Python 2.7 if that's relevant. I don't get why this isn't working or why it's requiring a bufsize. I've seen other examples where this works this way so I'm thinking maybe this command doesn't work in 2.7 the way I'm typing it?
The documentation shows how to use this properly using the actual file name in quotes, but as you can see, mine here is contained in a variable which seems to cause issues. Thanks in advance!
See the Popen constructor here: subprocess.Popen. The second argument to Popen is bufsize. That explains your error. Also note that os.rename does not return anything so new_log will be None. Use your rename variable instead. Your call should look like this:
subprocess.Popen([prog, rename])
You likely also want to wait on the created Popen object:
proc = subprocess.Popen([prog, rename])
proc.wait()
Or something like that.

Local Blast empty xml file python

I am trying to implement a little script in order to automatize a local blast alignment.
I had ran commands in the terminal en it works perfectly. However when I try to automatize this, I have a message like : Empty XML file.
Do we have to implement a "system" waiting time to let the file be written, or I did something wrong?
The code :
#sequence identifier as key, sequence as value.
for element in dictionnaryOfSequence:
#I make a little temporary fasta file because the blast command need a fasta file as input.
out_fasta = open("tmp.fasta", 'w')
query = ">" + element + "\n" + str(dictionnary[element])
out_fasta.write(query) # And I have this file with my sequence correctly filled
OUT_FASTA.CLOSE() # EDIT : It was out of my loop....
#Now the blast command, which works well in the terminal, I have my tmp.xml file well filled.
os.system("blastn -db reads.fasta -query tmp.fasta -out tmp.xml -outfmt 5 -max_target_seqs 5000")
#Parsing of the xml file.
handle = open("tmp.xml", 'r')
blast_records = NCBIXML.read(handle)
print blast_records
I have an Error : Your XML file was empty, and the blast_records object doesn't exist.
Did I make something wrong with handles?
I take all advice. Thank you a lot for your ideas and help.
EDIT : Problem solved, sorry for the useless question. I did wrong with handle and I did not open the file in the right location. Same thing with the closing.
Sorry.
try to open the file "tmp.xml" in Internet explorer. All tags are closed?

Specifying filename in os.system call from python

I am creating a simple file in python to reorganize some text data I grabbed from a website. I put the data in a .txt file and then want to use the "tail" command to get rid of the first 5 lines. I'm able to make this work for a simple filename shown below, but when I try to change the filename (to what I'd actually like it to be) I get an error. My code:
start = 2010
end = 2010
for i in range(start,end+1)
year = str(i)
...write data to a file called file...
teamname=open(file).readline() # want to use this in the new filename
teamfname=teamname.replace(" ","") #getting rid of spaces
file2 = "gotdata2_"+year+".txt"
os.system("tail -n +5 gotdata_"+year+".txt > "+file2)
The above code works as intended, creating file, then creating file2 that excludes the first 5 lines of file. However, when I change the name of file2 to be:
file2 = teamfname+"_"+year+".txt"
I get the error:
sh: line 1: _2010.txt: command not found
It's as if the end of my file2 statement is getting chopped off and the .txt part isn't being recognized. In this case, my code outputs a file but is missing the _2010.txt at the end. I've double checked that both year and teamfname are strings. I've also tried it with and without spaces in the teamfname string. I get the same error when I try to include a os.system mv statement that would rename the file to what I want it to be, so there must be something wrong with my understanding of how to specify the string here.
Does anyone have any ideas about what causes this? I haven't been able to find a solution, but I've found this problem difficult to search for.
Without knowing what your actual strings are, it's impossible to be sure what the problem is. However, it's almost certainly something to do with failing to properly quote and/or escape arguments for the command line.
My first guess would be that you have a newline in the middle of your filename, and the shell is truncating the command at the newline. But I wouldn't bet too heavily on that. If you actually printed out the repr of the pathname, I could tell you for sure. But why go through all this headache?
The solution to almost any problem with os.system is to not use os.system.
If you look at the docs, they even tell you this:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.
If you use subprocess instead of os.system, you can avoid the shell entirely. You can also pass arguments as a list instead of trying to figure out how to quote them and escape them properly. Which would completely avoid the exact problem you're having.
For example, if you do this:
file2 = "gotdata2_"+year+".txt"
with open(file2, 'wb') as f:
subprocess.check_call(['tail', '-n', '+5', "gotdata_"+year+".txt"], stdout=f)
Then, if you change that first line to this:
file2 = teamfname+"_"+year+".txt"
It will still work even if teamfname has a space or a quote or another special character in it.
That being said, I'm not sure why you want to use tail in the first place. You can skip the first 5 lines just as easily directly in Python.

Categories