same string gives different result in Python - python

So I'm using approach in this post
to extract a double quoted string from a string. If the input string comes from terminal argument, it works fine. But if the input string comes from a txt file like the following, it gives nontype error. I tried to get the hash code for two strings(one from file and one from terminal) with identical txt content, and turns out they are different. I'm curious if anyone knows how to solve this?(in Python 3.x)
That said, I have set the default encoding to "utf-8" in my code.
python filename.py < input.txt

If you are using command python, the command recognize it to python 2.x.
If you want python 3.x, just change the command to python3
like this
python3 filename.py < input.txt

Two things, if you want to ingest a txt file into a python script, you need to specify it. Add these two lines
import sys
text = str(sys.argv[1])
this mean text would be your 'input.txt'.
Second, if your script has only a function, it would not know what you want to do with the function, you have to either, tell the script explicity to execute the function through the entry main
import re
import sys
def doit(text):
matches=re.findall(r'\"(.+?)\"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
if __name__ == '__main__':
text_file = str(sys.argv[1])
text = open(text_file).read()
print(doit(text))
Alternately, you can just execute line by line without wrapping the re in a function, since it is only one line.

I just figure it out, the bug doesn't come from my code. I had the "smart quotes" enabled on my Mac so whenever it reads a quote, it's identified as a special character. Disable this under keyboard setting would do the trick.
LOL what a "bug".

Related

Unexpected double quotes while appending file items to subprocess.run

I am trying to read from a file which has contents like this:
#\5\5\5
...
#\5\5\10
This file content is then fed into subprocess module of python like this:
for lines in file.readlines():
print(lines)
cmd = ls
p = subprocess.run([cmd, lines])
The output turns into something like this:
CompletedProcess(args=['ls', "'#5\\5\\5'\n"], returncode=1)
I don't understand why the contents of the file is appended with a double quote and another backward slash is getting appended.
The real problem here isn't Python or the subprocess module. The problem the use of subprocess to invoke shell commands, and then trying to parse the results. In this case, it looks like the command is ls, and the plan appears to be to read some filesystem paths from a text file (each path on a separate line), and list the files at that location on the filesystem.
Using subprocess to invoke ls is really, really, really not the way to accomplish that in Python. This is basically an attempt to use Python like a shell script (this use of ls would still be problematic, but that's a different discussion).
If a shell script is the right tool for the job, then write a shell script. If you want to use Python, then use one of the API's that it provides for interacting with the OS and the filesystem. There is no need to bring in external programs to achieve this.
import os
with open("list_of_paths.txt", "r") as fin:
for line in fin.readlines():
w = os.listdir(line.strip())
print(w)
Note the use of .strip(), this is a string method that will remove invisible characters like spaces and newlines from the ends of the input.
The listdir method provided by the os module will return a list of the files in a directory. Other options are os.scandir, os.walk, and the pathlib module.
But please do not use subprocess. 95% of the time, when someone thinks "should I use Python's subprocess module for this?" the ansewr is "NO".
It is because \ with a relevant character or digit becomes something else other than the string. For example, \n is not just \ and n but it means next line. If you really want a \n, then you would add another backslash to it (\\n). Likewise \5 means something else. here is what I found when i ran \5:
and hence the \\ being added, if I am not wrong

Passing to SOAP arguments from the command line

I have a python script that successfully sends SOAP to insert a record into a system. The values are static in the test. I need to make the value dynamic/argument that is passed through the command line or other stored value.
execute: python myscript.py
<d4p1:Address>MainStreet</d4p1:Address> ....this works to add hard coded "MainStreet"
execute: python myscript.py MainStreet
...this is now trying to pass the argument MainStreet
<d4p1:Address>sys.argv[1]</d4p1:Address> ....this does not work
It saves the literal text address as "sys.argv[1]" ... I have imported sys ..I have tried %, {}, etc from web searches, what syntax am I missing??
You need to read a little about how to create strings in Python, below is how it could look like in your code. Sorry it's hard to say more without seeing your actual code. And you actually shouldn't create XMLs like that, you should use for instance xml module from standard library.
test = "<d4p1:Address>" + sys.argv[1] + "</d4p1:Address>"

how to "source" file into python script

I have a text file /etc/default/foo which contains one line:
FOO="/path/to/foo"
In my python script, I need to reference the variable FOO.
What is the simplest way to "source" the file /etc/default/foo into my python script, same as I would do in bash?
. /etc/default/foo
Same answer as #jil however, that answer is specific to some historical version of Python.
In modern Python (3.x):
exec(open('filename').read())
replaces execfile('filename') from 2.x
You could use execfile:
execfile("/etc/default/foo")
But please be aware that this will evaluate the contents of the file as is into your program source. It is potential security hazard unless you can fully trust the source.
It also means that the file needs to be valid python syntax (your given example file is).
Keep in mind that if you have a "text" file with this content that has a .py as the file extension, you can always do:
import mytextfile
print(mytestfile.FOO)
Of course, this assumes that the text file is syntactically correct as far as Python is concerned. On a project I worked on we did something similar to this. Turned some text files into Python files. Wacky but maybe worth consideration.
Just to give a different approach, note that if your original file is setup as
export FOO=/path/to/foo
You can do source /etc/default/foo; python myprogram.py (or . /etc/default/foo; python myprogram.py) and within myprogram.py all the values that were exported in the sourced' file are visible in os.environ, e.g
import os
os.environ["FOO"]
If you know for certain that it only contains VAR="QUOTED STRING" style variables, like this:
FOO="some value"
Then you can just do this:
>>> with open('foo.sysconfig') as fd:
... exec(fd.read())
Which gets you:
>>> FOO
'some value'
(This is effectively the same thing as the execfile() solution
suggested in the other answer.)
This method has substantial security implications; if instead of FOO="some value" your file contained:
os.system("rm -rf /")
Then you would be In Trouble.
Alternatively, you can do this:
>>> with open('foo.sysconfig') as fd:
... settings = {var: shlex.split(value) for var, value in [line.split('=', 1) for line in fd]}
Which gets you a dictionary settings that has:
>>> settings
{'FOO': ['some value']}
That settings = {...} line is using a dictionary comprehension. You could accomplish the same thing in a few more lines with a for loop and so forth.
And of course if the file contains shell-style variable expansion like ${somevar:-value_if_not_set} then this isn't going to work (unless you write your very own shell style variable parser).
There are a couple ways to do this sort of thing.
You can indeed import the file as a module, as long as the data it contains corresponds to python's syntax. But either the file in question is a .py in the same directory as your script, either you're to use imp (or importlib, depending on your version) like here.
Another solution (that has my preference) can be to use a data format that any python library can parse (JSON comes to my mind as an example).
/etc/default/foo :
{"FOO":"path/to/foo"}
And in your python code :
import json
with open('/etc/default/foo') as file:
data = json.load(file)
FOO = data["FOO"]
## ...
file.close()
This way, you don't risk to execute some uncertain code...
You have the choice, depending on what you prefer. If your data file is auto-generated by some script, it might be easier to keep a simple syntax like FOO="path/to/foo" and use imp.
Hope that it helps !
The Solution
Here is my approach: parse the bash file myself and process only variable assignment lines such as:
FOO="/path/to/foo"
Here is the code:
import shlex
def parse_shell_var(line):
"""
Parse such lines as:
FOO="My variable foo"
:return: a tuple of var name and var value, such as
('FOO', 'My variable foo')
"""
return shlex.split(line, posix=True)[0].split('=', 1)
if __name__ == '__main__':
with open('shell_vars.sh') as f:
shell_vars = dict(parse_shell_var(line) for line in f if '=' in line)
print(shell_vars)
How It Works
Take a look at this snippet:
shell_vars = dict(parse_shell_var(line) for line in f if '=' in line)
This line iterates through the lines in the shell script, only process those lines that has the equal sign (not a fool-proof way to detect variable assignment, but the simplest). Next, run those lines into the function parse_shell_var which uses shlex.split to correctly handle the quotes (or the lack thereof). Finally, the pieces are assembled into a dictionary. The output of this script is:
{'MOO': '/dont/have/a/cow', 'FOO': 'my variable foo', 'BAR': 'My variable bar'}
Here is the contents of shell_vars.sh:
FOO='my variable foo'
BAR="My variable bar"
MOO=/dont/have/a/cow
echo $FOO
Discussion
This approach has a couple of advantages:
It does not execute the shell (either in bash or in Python), which avoids any side-effect
Consequently, it is safe to use, even if the origin of the shell script is unknown
It correctly handles values with or without quotes
This approach is not perfect, it has a few limitations:
The method of detecting variable assignment (by looking for the presence of the equal sign) is primitive and not accurate. There are ways to better detect these lines but that is the topic for another day
It does not correctly parse values which are built upon other variables or commands. That means, it will fail for lines such as:
FOO=$BAR
FOO=$(pwd)
Based off the answer with exec(.read()), value = eval(.read()), it will only return the value. E.g.
1 + 1: 2
"Hello Word": "Hello World"
float(2) + 1: 3.0

Why is python converting Kurdich characters into UTF-8 literals?

I was trying to take the content of a text file and map it into a json file, but I noticed that python automatically turned the kurdish(sorani) text into UTF-8 literals. Can someone explain why python does this and how can I prevent the conversion?
You can test it with the code below:
def readText():
# test.txt contains kurdish sorani characters (an article)
# Sorani example: ڕۆژتان باش بەڕێزان. من ناوم ڕەنجە.
with open('test.txt', 'r') as context:
data = context.readlines()
return data
print(readText())
I'm running python 2.x on Ubuntu 14.x. Python2.x does this! Python 3.x does not convert it and works just fine.
You are seeing the repr output as you call readlines which returns a list and lists show the repr representation of your data, once you actually print the strings themselves you will see the actual str output, you are also using python2:
In [11]: out = readText()
In [12]: print out
['\xda\x95\xdb\x86\xda\x98\xd8\xaa\xd8\xa7\xd9\x86 \xd8\xa8\xd8\xa7\xd8\xb4 \xd8\xa8\xdb\x95\xda\x95\xdb\x8e\xd8\xb2\xd8\xa7\xd9\x86. \xd9\x85\xd9\x86 \xd9\x86\xd8\xa7\xd9\x88\xd9\x85 \xda\x95\xdb\x95\xd9\x86\xd8\xac\xdb\x95. ']
In [13]: print out[0]
ڕۆژتان باش بەڕێزان. من ناوم ڕەنجە.
I'm going to take a stab here and guess that you are reading the output in a terminal of some sort, and when Python writes to the terminal it's trying to display in ASCII.
If you set your PYTHONIOENCODING environment variable to UTF-8 this can sometimes solve the issue - it depends on other variables as well.
So, if you're on a UNIX-like system, try this in your terminal: export PYTHONIOENCODING=UTF-8
Or, for Windows, set PYTHONIOENCODING=UTF-8.
Then, try running your script again and see if you get the correct characters printed.
More information can be found here: How to print UTF-8 Encoded Text to the console in Python3

Syntax for input redirection in IDLE

I need to enter the contents of a text (.txt) file as input for a Python (.py) file. Assuming the name of the text file is TextFile and the name of the Python file PythonFile, then the code should be as follows:
python PythonFile.py < TextFile.txt
Yet, when I try to do this in IDLE and type in
import PythonFile < TextFile,
IDLE gives me an invalid syntax message, pointing to the < sign. I tried all sorts of variations on this theme (i.e.,using or not using the file name extensions), but still got the same invalid-syntax message. How is the syntax different for input redirection in IDLE?
If it works in the command line, then why do you want to do this in IDLE? There are ways to achieve a similar result using, for example, subprocess, but a better way would be to refactor PythonFile.py so that you can call a function from it, e.g.:
>>> import PythonFile
>>> PythonFile.run_with_input('TextFile.txt')
If you post the contents of PythonFile.py, we might be able to help you do this.

Categories