Encoding of characters when running powershell script as python subprocess - python

Currently, I'm scripting a small python application that executes a PowerShell script. My python script should process the returned string but unfortunately, I have some trouble with the encoding of special characters like 'ä', 'ö', 'Ü' and so on. How can I return a Unicode/UTF-8 string?
You can see a simple example below. The console output is b'\xc7\xcf\r\n'. I don't understand why it's not b'\xc3\xa4\r\n' because \xc3\xa4 should be the correct UTF8 Encoding for the character 'ä'.
try.py:
import subprocess
p = subprocess.check_output(["powershell.exe", ".\script.ps1"])
print(p)
script.ps1:
return 'ä'
I adopted my PowerShell script in some ways but did not get the desired result.
Added "[Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8". Result: b'\xc3\x83\xc2\xa4\r\n'
Returned return [System.Text.Encoding]::UTF8.GetBytes("ä"). Result: b'195\r\n131\r\n194\r\n164\r\n'
Who can help to get console output of 'ä' for my upper script?

I used "pwsh" because I ran it on mac, you can use "powershell.exe" in your code
Try this:
import subprocess
p = subprocess.check_output(["pwsh", ".\sc.ps1"])
print(p.decode('utf-8'))
For more: You can read here.
Working Screenshot

Related

same string gives different result in Python

So I'm using approach in this post
to extract a double quoted string from a string. If the input string comes from terminal argument, it works fine. But if the input string comes from a txt file like the following, it gives nontype error. I tried to get the hash code for two strings(one from file and one from terminal) with identical txt content, and turns out they are different. I'm curious if anyone knows how to solve this?(in Python 3.x)
That said, I have set the default encoding to "utf-8" in my code.
python filename.py < input.txt
If you are using command python, the command recognize it to python 2.x.
If you want python 3.x, just change the command to python3
like this
python3 filename.py < input.txt
Two things, if you want to ingest a txt file into a python script, you need to specify it. Add these two lines
import sys
text = str(sys.argv[1])
this mean text would be your 'input.txt'.
Second, if your script has only a function, it would not know what you want to do with the function, you have to either, tell the script explicity to execute the function through the entry main
import re
import sys
def doit(text):
matches=re.findall(r'\"(.+?)\"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
if __name__ == '__main__':
text_file = str(sys.argv[1])
text = open(text_file).read()
print(doit(text))
Alternately, you can just execute line by line without wrapping the re in a function, since it is only one line.
I just figure it out, the bug doesn't come from my code. I had the "smart quotes" enabled on my Mac so whenever it reads a quote, it's identified as a special character. Disable this under keyboard setting would do the trick.
LOL what a "bug".

How do I let Python see what Terminal outputs when I enter a command?

I want to run a program on Python on macOS Sierra that checks Terminal for its outputs after I automatically enter a command on it. For example, I would write in Terminal:
$ pwd
and then Terminal would output something like:
/Users/username
How would I have Python scan what Terminal outputs and set it to a variable as a string?
>>>output = (whatever Terminal outputs)
>>>print (output)
"/Users/username"
By the way, the other forums do not explain in much detail how one would do this in macOS. Therefore, this is not a duplicate of any forum.
You could pipe the output to a file and read the file.
$ pwd > output.txt
Then read the file and take further actions based on its contents.
Use the subprocess module, it has some shortcut methods to make things easier and less complicated than using Popen.
>>> import subprocess
>>> output = subprocess.check_output("pwd")
>>> print(output)
b'L:\\\r\n'
You can decode this using output.decode("UTF-8") if you like or you can use the universal_newlines keyword argument to have it done automatically as well as sorting out newlines.
>>> subprocess.check_output("pwd", universal_newlines=True)
'L:\\\n'
Edit: With #Silvio's sensible suggestion, passing all arguments you can do the following:
subprocess.check_output(["ls", "-l"])
Or if you have a string sourced from elsewhere you can call .split() which will generate a list of substrings separated by a space.
subprocess.check_output("ls -l /".split())
Note: I'm using Python3 on Windows and Gnu on Windows so I have \r\n line endings and pwd.

How to call command line command (AFNI command)?

I am trying to read a dicom header tag in dicom file.
Now, there are two ways to read this dicom header tag.
1) Using pydicom package in python which apparently is not working well on my python installed version(python 3).
2) or when i call AFNI function 'dicom_hinfo' through command line, i can get dicom tag value. The syntax to call afni function in terminal is as follows:
dicom_hinfo -tag aaaa,bbbb filename.dcm
output:fgre
Now how should i call this dicom-info -tag aaaa,bbbb filename.dcm in python script.
I guess subprocess might work but not sure about how to use it in this case.
To get output from a subprocess, you could use check_output() function:
#!/usr/bin/env python
from subprocess import check_output
tag = check_output('dicom_hinfo -tag aaaa,bbbb filename.dcm output:fgre'.split(),
universal_newlines=True).strip()
universal_newlines=True is used to get Unicode text on Python 3 (the data is decoded using user locale's character encoding).
check_output() assumes that dicom_hinfo prints to its standard output stream (stdout). Some utilities may print to stderr or the terminal directly instead. The code could be modified to adapt to that.
Oh this was due to syntax error using Pydicom.
I wanted to access 0019, 109c tag.
Syntax should be:
ds[0x0019,0x109c].value.
not ds[aaaa,bbbb].value

Python pipe cp1252 string from PowerShell to a python (2.7) script

After a few days of dwelling over stackoverflow and python 2.7 doc, I have come to no conclusion about this.
Basically I'm running a python script on a windows server that must have as input a block of text. This block of text (unfortunately) has to be passed by a pipe. Something like:
PS > [something_that_outputs_text] | python .\my_script.py
So the problem is:
The server uses cp1252 encoding and I really cannot change it due to administrative regulations and whatnot. And when I pipe the text to my python script, when I read it, it comes already with ? whereas characters like \xe1 should be.
What I have done so far:
Tested with UTF-8. Yep, chcp 65001 and $OutputEncoding = [Console]::OutputEncoding "solve it", as in python gets the text perfectly and then I can decode it to unicode etc. But apparently they don't let me do it on the server /sadface.
A little script to test what the hell is happening:
import codecs
import sys
def main(argv=None):
if argv is None:
argv = sys.argv
if len(argv)>1:
for arg in argv[1:]:
print arg.decode('cp1252')
sys.stdin = codecs.getreader('cp1252')(sys.stdin)
text = sys.stdin.read().strip()
print text
return 0
if __name__=="__main__":
sys.exit(main())
Tried it with both the codecs wrapping and without it.
My input & output:
PS > echo "Blá" | python .\testinput.py blé
blé
Bl?
--> So there's no problem with the argument (blé) but the piped text (Blá) is no good :(
I even converted the text string to hex and, yes, it gets flooded with 3f (AKA mr ?), so it's not a problem with the print.
[Also: it's my first question here... feel free to ask any more info about what I did]
EDIT
I don't know if this is relevant or not, but when I do sys.stdin.encoding it yields None
Update: So... I have no problems with cmd. Checked sys.stdin.encoding while running the program on cmd and everything went fine. I think my head just exploded.
How about saving the data into a file and piping it to Python on a CMD session? Invoke Powershell and Python on CMD. Like so,
c:\>powershell -command "c:\genrateDataForPython.ps1 -output c:\data.txt"
c:\>type c:\data.txt | python .\myscript.py
Edit
Another an idea: convert the data into base64 format in Powershell and decode it in Python. Base64 is simple in Powershell, I guess in Python it isn't hard either. Like so,
# Convert some accent chars to base64
$s = [Text.Encoding]::UTF8.GetBytes("éêèë")
[System.Convert]::ToBase64String($s)
# Output:
w6nDqsOow6s=
# Decode:
$d = [System.Convert]::FromBase64String("w6nDqsOow6s=")
[Text.Encoding]::UTF8.GetString($d)
# Output
éêèë

Python subprocess locale settings

When executing opennlp POSTagger with subprocess.call in python, the result goes wrong. But when I put the same command into my terminal, the result is correct.
After some testing , I think this is because opennlp failed to load the model file correctly, so what's the problem? The model is trained in Chinese and I use python 2.7.
OpenNLP runs without any warnings or errors, but it tags the input sentence totally wrong. It gives the correct tags in the terminal. I guess it's an encoding problem but I'm not sure.
Here is the code. It's nothing special and contains only ascii chars.
Print this command and copy it to terminal, the result is correct.
Now I know it's the locale/encoding problem (debug the script by strace). But it's of no use to set the python locale to en_US.utf-8 or zh_CN.utf-8. My shell locale setting is zh_CN.utf-8.
opennlp_path = './opennlp/bin/opennlp'
pos_model = 'train.pos.model'
pos_predict_cmd = [opennlp_path, 'POSTagger', pos_model]
subproc = call(pos_predict_cmd)
First, have a look at http://docs.python.org/library/subprocess.html#using-the-subprocess-module, read it once or twice, then try using call(pos_predict_cmd, shell=True) and see if that works.

Categories