how to get data data from datalayer.push using python webscraping - python

my code us:
# init scrapy selector
response = Selector(text=content)
json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1)
print(json_data)
# debug data extraction logic
HummartScraper.parse_product(HummartScraper, '')
'
the output error is:
Traceback (most recent call last):
File "hummart2.py", line 86, in parse_product
json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1)
TypeError: findall() missing 1 required positional argument: 'string'
why is this error getting.

For a single dataLayer:
data_layer = response.css('script::text').re_first(r'dataLayer\.push\(([^)]+)')
data = json.loads(data_layer)
You can use response.css(...).re() to get a list of matches.

but this give me this type of error:
File "hummart2.py", line 88, in parse_product
data = json.loads(data_layer_raw)[1]
File "/home/danish-khan/miniconda3/lib/python3.7/json/__init__.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType

Related

How to parse xml from a subprocess with untangle in python

I'm trying to parse some xml from a subprocess with untangle in python.
out = subprocess.run(["./my_executable",options], stdout=PIPE, stderr=PIPE)
root = untangle.parse(out.stdout)
which gives my a TypeError:
Traceback (most recent call last):
File "./script.py", line 64, in <module>
root = untangle.parse(out.stdout)
File "/home/user/.local/lib/python3.6/site-packages/untangle.py", line 182, in parse
parser.parse(StringIO(filename))
TypeError: initial_value must be str or None, not bytes
when I print out.stdout it does in fact give the xml tags as expected but in the following format:
b'<root>\n <c1>value1</c1>\n <c2>value2</c2>\n</root>\n'
I tried removing the \n with re.sub() but then I get another error: TypeError: cannot use a string pattern on a bytes-like object.
I thought this might be an encoding problem and that the documentation would help me out, but it seems quite limited. How can I make untangle parse a bytes-like object?
Decode the bytes-like object to a string first.
I'm using check_output here to raise an exception if my_executable ends with a nonzero return code.
out = subprocess.check_output(["./my_executable",options])
root = untangle.parse(out.decode("utf-8"))

PyCryptodome when verifying - AttributeError: '_io.BufferedReader' object has no attribute 'n'

I'm trying to verify a signed message, but I keep getting the error;
AttributeError: '_io.BufferedReader' object has no attribute 'n'
I can't really figure out what causes this error
First of all, I sign the AES encrypted data and then base64 encode it.
Then a json.dump that is printed out, when running the script I pipe it to a file
def get_signature(message):
h = SHA256.new(message)
signature = pkcs1_15.new(priv_keyObj).sign(h)
return signature
ENCODING = 'utf-8'
print(json.dumps({
'EncryptedString': base64.standard_b64encode(encrypted_data).decode(ENCODING),
'SignedDataString': base64.standard_b64encode(get_signature(encrypted_data)).decode(ENCODING),
}))
I start by reading the file as json, then when I verity, I read the base64 encoded msg and start with a b64 decoding;
def verify_signature(message, signature):
h = SHA256.new(message)
try:
pkcs1_15.new(pub_key_new).verify(h, signature)
print("The signature is valid.")
except (ValueError, TypeError):
print("The signature is not valid.")
verify_signature(base64.standard_b64decode(data['EncryptedString']), base64.standard_b64decode(data['SignedDataString']))
I have tried to make this question minimal and understandable - so please tell me if I need to provide more information.
The full traceback is;
>Traceback (most recent call last):
> File "C:/PATH/Scipts/crypto/decrypt.py", line 9, in <module>
print(default_decrypt(read_json_file(filename)).decode("utf-8"))
> File "C:\PATH\Scipts\crypto\crypt_helper_new.py", line 127, in default_decrypt
verify_signature(base64.standard_b64decode(data['EncryptedString']),
base64.standard_b64decode(data['SignedDataString']))
encoded msg: <class 'str'>
> File "C:\PATH\Scipts\crypto\crypt_helper_new.py", line 65, in verify_signature
pkcs1_15.new(pub_key_new).verify(h, signature)
message: b'S\xacU\x14\xb2E\xec\x08\xc3\x83\x18\x8ey\x98\x069'
> File "C:\PATH\AppData\Local\Programs\Python\Python36\lib\site-packages\Crypto\Signature\pkcs1_15.py", line 106, in verify
modBits = Crypto.Util.number.size(self._key.n)
> AttributeError: '_io.BufferedReader' object has no attribute 'n'
You can't pass a buffer directly to that function. You should read the bytes from the file to create a key object:
pub_key_new = RSA.import_key(open('foo.pub').read())
The type of self._key (i.e. pub_key_new) should be:
<class 'Crypto.PublicKey.RSA.RsaKey'>

Python Error when converting String to Binary

I have a Python script obtained from a project which I am trying to debug however I am unable to resolve one error. Per the author's description of the project, everything works fine.
The script takes a parameter called "ascii" which is of type str as shown below:
parser.add_argument('--ascii', type=str,
help='ASCII Data type: ASCII characters')
Per my understanding, in the following code, it processes the input string one character at a time and each character is sent to a function, iter_bin() which will take the ASCII value of the character and convert it to binary, appending the output to a list.
ASCIIDATA = args.ascii
dataArray = []
for line in ASCIIDATA:
for entry in line:
# Make sure everything is a number, convert if not
dataArray.append(''.join(s for s in iter_bin(entry)))
def iter_bin(s):
sb = s.encode('ascii')
return (format(b, '07b') for b in sb)
When I run this code, I get the following error:
Traceback (most recent call last):
File "check.py", line 107, in <module>
main()
File "check.py", line 70, in main
dataArray.append(''.join(s for s in iter_bin(entry)))
File "check.py", line 70, in <genexpr>
dataArray.append(''.join(s for s in iter_bin(entry)))
File "check.py", line 82, in <genexpr>
return (format(b, '07b') for b in sb)
ValueError: Unknown format code 'b' for object of type 'str'
How can I resolve this error?
Thanks.

Python 3 [TypeError: 'str' object cannot be interpreted as an integer] when working with sockets

I run the script with python3 in the terminal but when I reach a certain point in it, I get the following error:
Traceback (most recent call last):
File "client.py", line 50, in <module>
client()
File "client.py", line 45, in client
s.send(bytes((msg, 'utf-8')))
TypeError: 'str' object cannot be interpreted as an integer
This is the code it refers to.
else :
# user entered a message
msg = sys.stdin.readline()
s.send(bytes((msg, 'utf-8')))
sys.stdout.write(bytes('[Me] '))
sys.stdout.flush()
I read the official documentation for bytes() and another source
https://docs.python.org/3.1/library/functions.html#bytes
http://www.pythoncentral.io/encoding-and-decoding-strings-in-python-3-x/
but I am no closer to understanding how to fix this. I realise that my msg is a string and I need an integer, but I am confused about how to convert it. Can you please help me, or direct me to a source that will help me?
Edit 1: I changed the line
s.send(bytes((msg, 'utf-8')))
to
s.send(bytes(msg, 'utf-8'))
but now I get the following error:
Traceback (most recent call last):
File "client.py", line 50, in <module>
client()
File "client.py", line 46, in client
sys.stdout.write(bytes('[Me] '))
TypeError: string argument without an encoding
Edit 2: According to #falsetru updated answer.
Using bytes literal gives me
TypeError: must be str, not bytes
Change the following line:
s.send(bytes((msg, 'utf-8')))
as:
s.send(bytes(msg, 'utf-8'))
In other words, pass a string and an encoding name instead of a passing a tuple to bytes.
UPDATE accoridng to question change:
You need to pass a string to sys.stdout.write. Simply pass a string literal:
sys.stdout.write('[Me] ')

Passing a string but getting byte attribute error to urllib.request.read

I am trying to read an XML file from the Yahoo finance API. So far, I've tried the following:
from xml.dom.minidom import parse
#Start Get Employees
xml = urllib.request.urlopen('https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.stocks%20where%20symbol%3D%22wfc%22&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys')
dom = parse(xml.read())
numemployees = dom.getElementsByTagName('FullTimeEmployees')
numemployees = name[0].firstChild.nodeValue
#End Get Employees
However, this raises an exception:
AttributeError: 'bytes' object has no attribute 'read'
I think this is because if it doesn't recognize the string, it assumes I'm passing a byte pattern. However, I am passing a string so I don't know what the problem is here.
Full Stack Trace:
Traceback (most recent call last):
File "C:\Python34\lib\tkinter\__init__.py", line 1487, in __call__
return self.func(*args)
File "C:\Users\kylec\Desktop\dm\Mail Server Finder\mailserverfinder.py", line 25, in getServers
dom = parse(xml.read())
File "C:\Python34\lib\xml\dom\minidom.py", line 1960, in parse
return expatbuilder.parse(file)
File "C:\Python34\lib\xml\dom\expatbuilder.py", line 913, in parse
result = builder.parseFile(file)
File "C:\Python34\lib\xml\dom\expatbuilder.py", line 204, in parseFile
buffer = file.read(16*1024)
AttributeError: 'bytes' object has no attribute 'read'
xml.dom.minidom.parse is excepting a file-like object, not a bytes or str, as stated in its documentation:
xml.dom.minidom.parse(filename_or_file[, parser[, bufsize]])
Return a Document from the given input. filename_or_file may be either
a file name, or a file-like object.
So you just need to do this:
dom = parse(xml)
Because the http.client.HTTPResponse object returned by urlopen is file-like.
Kyle, sorry but your example isn't clear enough. I think this is what you expected to do.
from xml.dom.minidom import parseString
employees = urllib.urlopen('https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.stocks%20where%20symbol%3D%22wfc%22&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys').read()
dom = parseString(employees)
numemployees = dom.getElementsByTagName('FullTimeEmployees')
numemployees = numeemployees[0].firstChild.nodeValue

Categories