How to parse xml from a subprocess with untangle in python - python

I'm trying to parse some xml from a subprocess with untangle in python.
out = subprocess.run(["./my_executable",options], stdout=PIPE, stderr=PIPE)
root = untangle.parse(out.stdout)
which gives my a TypeError:
Traceback (most recent call last):
File "./script.py", line 64, in <module>
root = untangle.parse(out.stdout)
File "/home/user/.local/lib/python3.6/site-packages/untangle.py", line 182, in parse
parser.parse(StringIO(filename))
TypeError: initial_value must be str or None, not bytes
when I print out.stdout it does in fact give the xml tags as expected but in the following format:
b'<root>\n <c1>value1</c1>\n <c2>value2</c2>\n</root>\n'
I tried removing the \n with re.sub() but then I get another error: TypeError: cannot use a string pattern on a bytes-like object.
I thought this might be an encoding problem and that the documentation would help me out, but it seems quite limited. How can I make untangle parse a bytes-like object?

Decode the bytes-like object to a string first.
I'm using check_output here to raise an exception if my_executable ends with a nonzero return code.
out = subprocess.check_output(["./my_executable",options])
root = untangle.parse(out.decode("utf-8"))

Related

how to get data data from datalayer.push using python webscraping

my code us:
# init scrapy selector
response = Selector(text=content)
json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1)
print(json_data)
# debug data extraction logic
HummartScraper.parse_product(HummartScraper, '')
'
the output error is:
Traceback (most recent call last):
File "hummart2.py", line 86, in parse_product
json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1)
TypeError: findall() missing 1 required positional argument: 'string'
why is this error getting.
For a single dataLayer:
data_layer = response.css('script::text').re_first(r'dataLayer\.push\(([^)]+)')
data = json.loads(data_layer)
You can use response.css(...).re() to get a list of matches.
but this give me this type of error:
File "hummart2.py", line 88, in parse_product
data = json.loads(data_layer_raw)[1]
File "/home/danish-khan/miniconda3/lib/python3.7/json/__init__.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType

Getting TypeError when running script with os.path.join()

I have the following lines in a script I am running:
api_xml = os.path.join(opts.out, os.path.basename(
opts.api_raw).replace('.raw', '.xml'))
Running with Python 3.7, I get the error:
Traceback (most recent call last):
File "generate_code.py", line 32, in <module>
opts.api_raw).replace('.raw', '.xml'))
File "/usr/lib/python3.7/posixpath.py", line 146, in basename
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
It seems to me like a simple join and then replace, not sure why it is failing.
TypeError: expected str, bytes or os.PathLike object, not NoneType
means that you passed None to a function that expects a path.
Try adding these lines before you try to build api_xml:
assert opts.out is not None
assert opts.api_raw is not None

How to search for text string in executable output with python?

I'm trying to create a python script to auto update a program for me. When I run program.exe --help, it gives a long output and inside the output is a string with value of "Version: X.X.X" How can I make a script that runs the command and isolates the version number from the executable's output?
I should have mentioned that I tried the following:
import re
import subprocess
regex = r'Version: ([\d\.]+)'
match = re.search(regex, subprocess.run(["program.exe", "--help"]))
print((match.group(0)))
and got the error:
Traceback (most recent call last):
File "run.py", line 6, in <module>
match = re.search(regex, subprocess.run(["program.exe", "--help"]))
File "C:\Python37\lib\re.py", line 183, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
Something like this should work:
re.search(r'Version: ([\d\.]+)', subprocess.check_output(['program.exe', '--help']).decode()).group(1)

Python 3 [TypeError: 'str' object cannot be interpreted as an integer] when working with sockets

I run the script with python3 in the terminal but when I reach a certain point in it, I get the following error:
Traceback (most recent call last):
File "client.py", line 50, in <module>
client()
File "client.py", line 45, in client
s.send(bytes((msg, 'utf-8')))
TypeError: 'str' object cannot be interpreted as an integer
This is the code it refers to.
else :
# user entered a message
msg = sys.stdin.readline()
s.send(bytes((msg, 'utf-8')))
sys.stdout.write(bytes('[Me] '))
sys.stdout.flush()
I read the official documentation for bytes() and another source
https://docs.python.org/3.1/library/functions.html#bytes
http://www.pythoncentral.io/encoding-and-decoding-strings-in-python-3-x/
but I am no closer to understanding how to fix this. I realise that my msg is a string and I need an integer, but I am confused about how to convert it. Can you please help me, or direct me to a source that will help me?
Edit 1: I changed the line
s.send(bytes((msg, 'utf-8')))
to
s.send(bytes(msg, 'utf-8'))
but now I get the following error:
Traceback (most recent call last):
File "client.py", line 50, in <module>
client()
File "client.py", line 46, in client
sys.stdout.write(bytes('[Me] '))
TypeError: string argument without an encoding
Edit 2: According to #falsetru updated answer.
Using bytes literal gives me
TypeError: must be str, not bytes
Change the following line:
s.send(bytes((msg, 'utf-8')))
as:
s.send(bytes(msg, 'utf-8'))
In other words, pass a string and an encoding name instead of a passing a tuple to bytes.
UPDATE accoridng to question change:
You need to pass a string to sys.stdout.write. Simply pass a string literal:
sys.stdout.write('[Me] ')

Passing a string but getting byte attribute error to urllib.request.read

I am trying to read an XML file from the Yahoo finance API. So far, I've tried the following:
from xml.dom.minidom import parse
#Start Get Employees
xml = urllib.request.urlopen('https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.stocks%20where%20symbol%3D%22wfc%22&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys')
dom = parse(xml.read())
numemployees = dom.getElementsByTagName('FullTimeEmployees')
numemployees = name[0].firstChild.nodeValue
#End Get Employees
However, this raises an exception:
AttributeError: 'bytes' object has no attribute 'read'
I think this is because if it doesn't recognize the string, it assumes I'm passing a byte pattern. However, I am passing a string so I don't know what the problem is here.
Full Stack Trace:
Traceback (most recent call last):
File "C:\Python34\lib\tkinter\__init__.py", line 1487, in __call__
return self.func(*args)
File "C:\Users\kylec\Desktop\dm\Mail Server Finder\mailserverfinder.py", line 25, in getServers
dom = parse(xml.read())
File "C:\Python34\lib\xml\dom\minidom.py", line 1960, in parse
return expatbuilder.parse(file)
File "C:\Python34\lib\xml\dom\expatbuilder.py", line 913, in parse
result = builder.parseFile(file)
File "C:\Python34\lib\xml\dom\expatbuilder.py", line 204, in parseFile
buffer = file.read(16*1024)
AttributeError: 'bytes' object has no attribute 'read'
xml.dom.minidom.parse is excepting a file-like object, not a bytes or str, as stated in its documentation:
xml.dom.minidom.parse(filename_or_file[, parser[, bufsize]])
Return a Document from the given input. filename_or_file may be either
a file name, or a file-like object.
So you just need to do this:
dom = parse(xml)
Because the http.client.HTTPResponse object returned by urlopen is file-like.
Kyle, sorry but your example isn't clear enough. I think this is what you expected to do.
from xml.dom.minidom import parseString
employees = urllib.urlopen('https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.stocks%20where%20symbol%3D%22wfc%22&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys').read()
dom = parseString(employees)
numemployees = dom.getElementsByTagName('FullTimeEmployees')
numemployees = numeemployees[0].firstChild.nodeValue

Categories