Scrapy FormRequest can't handle complex dicts as formdata - python

I am trying to provide formdata to a scrapy.FormRequest object. The formdata is a dict of the following structure:
{
"param1": [
{
"paramA": "valueA",
"paramB": "valueB"
}
]
}
via equivalent to the following code, run in scrapy shell:
from scrapy import FormRequest
url = 'www.example.com'
method_post = 'POST'
formdata = <the above dict>
fr = FormRequest(url=url, method=method_post, formdata=formdata)
fetch(fr)
and in response I get the following error:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/chhk/.local/share/virtualenvs/project/lib/python3.6/site-packages/scrapy/http/request/form.py", line 31, in __init__
querystr = _urlencode(items, self.encoding)
File "/Users/chhk/.local/share/virtualenvs/project/lib/python3.6/site-packages/scrapy/http/request/form.py", line 66, in _urlencode
for k, vs in seq
File "/Users/chhk/.local/share/virtualenvs/project/lib/python3.6/site-packages/scrapy/http/request/form.py", line 67, in <listcomp>
for v in (vs if is_listlike(vs) else [vs])]
File "/Users/chhk/.local/share/virtualenvs/project/lib/python3.6/site-packages/scrapy/utils/python.py", line 119, in to_bytes
'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got dict
I have tried a variety of solutions, including the whole thing as a string, with various escape characters, and variations on the dict to make it more agreeable, but none of the solutions that remove this error work for the request (I get a 400 response).
I know that the formdata and that everything else I am doing is correct, in that I have replicated it successfully in curl (formdata was provided via -d formdata.txt).
Is there a way around FormRequest's inability to deal with complex dict structures? Or am I missing something?

Instead of formdata you can try to use body parameter. Example:
FormRequest(url=url, method=method_post, body=json.dumps(formdata))

Related

literal_eval return invalid syntax when reading JSON

While reading a JSON and trying to evaluate, a syntax error is returned.
json file has the below data
{
"communication":{
"xml":{
"xmlData": "<test vers=\"1.0\" >random</test>",
"user_id":"123456789"
},
},
}
Code snippet :
import ast
.
.
#json_file is the python obj which consists the data read from json file
.
val = ast.literal_eval(json.dumps(json_file))
print(val)
Error thrown :
Traceback (most recent call last):
File "./prog.py", line 12, in <module>
File "/usr/lib/python3.8/ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 4
"xmlData": "<test vers="1.0" >random</test>",
^
SyntaxError: invalid syntax
Please suggest a way to resolve the syntax error. Note that changing vers="1.0" to vers='1.0' would have fixed the issue but I do not have write access to this JSON file. My application is just reading the data.
Your json is invalid, if you cant modify the file then modify the data in python.
corrected json
{
"communication":{
"xml":{
"xmlData":"<test vers=\"1.0\" >random</test>",
"user_id":"123456789"
}
}
}
my code
import json
import ast
fd = open("text.json")
json_file = json.load(fd)
val = ast.literal_eval(json.dumps(json_file))
print(val)
output
{'communication': {'xml': {'xmlData': '<test vers="1.0" >random</test>', 'user_id': '123456789'}}}

how to get data data from datalayer.push using python webscraping

my code us:
# init scrapy selector
response = Selector(text=content)
json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1)
print(json_data)
# debug data extraction logic
HummartScraper.parse_product(HummartScraper, '')
'
the output error is:
Traceback (most recent call last):
File "hummart2.py", line 86, in parse_product
json_data = json.loads(script.get() for script in (re.findall(r'dataLayer\.push\(([^)]+)'),response.css('script::text'))).group(1)
TypeError: findall() missing 1 required positional argument: 'string'
why is this error getting.
For a single dataLayer:
data_layer = response.css('script::text').re_first(r'dataLayer\.push\(([^)]+)')
data = json.loads(data_layer)
You can use response.css(...).re() to get a list of matches.
but this give me this type of error:
File "hummart2.py", line 88, in parse_product
data = json.loads(data_layer_raw)[1]
File "/home/danish-khan/miniconda3/lib/python3.7/json/__init__.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType

Query about usage of setattr() in Python-Redmine

setattr() for an item in Redmine issues, is failing, with the following error.
Traceback (most recent call last):
File "E:\test\get_redmine_data.py", line 47, in <module>
print (item.assigned_to)
File "C:\Python27\lib\site-packages\redminelib\resources\standard.py", line 150, in __getattr__
return super(Issue, self).__getattr__(attr)
File "C:\Python27\lib\site-packages\redminelib\resources\base.py", line 164, in __getattr__
attr, encoded = self.encode(attr, decoded, self.manager)
File "C:\Python27\lib\site-packages\redminelib\resources\base.py", line 266, in encode
return attr, manager.new_manager(cls._resource_map[attr]).to_resource(value)
File "C:\Python27\lib\site-packages\redminelib\managers\base.py", line 29, in to_resource
return self.resource_class(self, resource)
File "C:\Python27\lib\site-packages\redminelib\resources\base.py", line 130, in __init__
self._decoded_attrs = dict(dict.fromkeys(relations_includes), **attributes)
TypeError: type object argument after ** must be a mapping, not str
I am trying to set some default assignee, for issues where the assignee is not set. The code fails at the line, where I print the attribute I just set. My code is given below:
redmine = Redmine('http://redmine_url', username='uname', password='pwd')
project = redmine.project.get('proj_name')
work_items = project.issues
for item in work_items:
assignee_not_set = getattr(item,'assigned_to',True)
if assignee_not_set == True:
print item.id
setattr(item,'assigned_to','Deepak')
print (item.assigned_to)
I also tried using the update() method,
redmine.project.update(item.id, assigned_to='Deepak')
That also fails with another error - redminelib.exceptions.ResourceNotFoundError: Requested resource doesn't exist.
I verifed that the issue id exists in Redmine.
You have several problems here:
The attribute name is assigned_to_id and not assigned_to
It accepts user id which is int and not a username which is str
No need to use setattr() here, just use item.assigned_to_id = 123
You need to call item.save() after setting assigned_to_id otherwise it won't be saved to Redmine
When you're trying to use update() method, you're using in on a Project resource and not on Issue resource, this is why you're getting ResourceNotFoundError
All this information is available in the docs: https://python-redmine.com/resources/issue.html

zabbix API json request with python urllib.request

I'm working on my python project and I migrated from python2.6 to python 3.6. So I had to replace urllib2 with urllib.request ( and .error and .parse ).
But I'm facing an issue I can't solve, here it is...
I want to send a request written in JSON like below :
import json
import urllib2
data= json.dumps({
"jsonrpc":"2.0",
"method":"user.login",
"params":{
"user":"guest",
"password":"password"
}
"id":1,
"auth":None
})
with urllib2 I faced no issue, I just had to create the request with :
req=urllib2.Request("http://myurl/zabbix/api_jsonrpc.php",data,{'Content-type':'application/json})
send it with:
response=urllib2.urlopen(req)
and it was good but now with urllib.request, I have met many error raised by the library. check what I did ( the request is the same within 'data') :
import json
import urllib.request
data= json.dumps({
"jsonrpc":"2.0",
"method":"user.login",
"params":{
"user":"guest",
"password":"password"
}
"id":1,
"auth":None
})
req = urllib.request.Request("http://myurl/zabbix/api_jsonrpc.php",data,{'Content-type':'application/json})
response = urllib.request.urlopen(req)
and I get this error :
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/tmp/Python-3.6.1/Lib/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/tmp/Python-3.6.1/Lib/urllib/request.py", line 524, in open
req = meth(req)
File "/tmp/Python-3.6.1/Lib/urllib/request.py", line 1248, in do_request_
raise TypeError(msg)
TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.
So I inquired about this and learned that I must use the function urllib.parse.urlencode() to convert my request into bytes, so I tried to use it on my request :
import urllib.parse
dataEnc=urllib.parse.urlencode(data)
another error occured :
Traceback (most recent call last):
File "/tmp/Python-3.6.1/Lib/urllib/parse.py", line 842, in urlencode
raise TypeError
TypeError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/tmp/Python-3.6.1/Lib/urllib/parse.py", line 850, in urlencode
"or mapping object").with_traceback(tb)
File "/tmp/Python-3.6.1/Lib/urllib/parse.py", line 842, in urlencode
raise TypeError
TypeError: not a valid non-string sequence or mapping object
and I realized that json.dumps(data) just convert my array/dictionnary into a string, which is not valid for the urllib.parse.urlencode function, soooooo I retired the json.dumps from data and did this :
import json
import urllib.request
import urllib.parse
data= {
"jsonrpc":"2.0",
"method":"user.login",
"params":{
"user":"guest",
"password":"password"
}
"id":1,
"auth":None
}
dataEnc=urllib.parse.urlencode(data) #this one worked then
req=urllib.request.Request("http://myurl/zabbix/api_jsonrpc.php",data,{'Content-type':'application/json})
response = urllib.request.urlopen(req) #and this one too, but it was too beautiful
then I took a look in the response and got this :
b'{"jsonrpc":"2.0",
"error":{
"code":-32700,
"message":"Parse error",
"data":"Invalid JSON. An error occurred on the server while parsing the JSON text."}
,"id":1}
And I guess it's because the JSON message is not json.dumped !
There is always one element blocking me from doing the request correctly,
so I'm totally stuck with it, if any of you guys have an idea or an alternative I would be so happy.
best Regards
Gozu09
In fact you just need to pass your json data as a byte sequence like this:
data= {
"jsonrpc":"2.0",
"method":"user.login",
"params":{
"user":"guest",
"password":"password"
}
"id":1,
"auth":None
}
req = urllib.request.Request(
"http://myurl/zabbix/api_jsonrpc.php",
data=json.dumps(data).encode(), # Encode a string to a bytes sequence
headers={'Content-type':'application/json}
)
POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str
This error means that the data argument is expected to be an iterables of bytes.
st = "This is a string"
by = b"This is an iterable of bytes"
by2 = st.encode() # Convert my string to a bytes sequence
st2 = by.decode() # Convert my byte sequence into an UTF-8 string
json.dumps() returns a string, therefore you have to call json.dumps().encode() to convert it into a byte array.
By the way, urlencode is used when you want to convert a string that will be passed as an url argument (i.e: converting spaces characters to "%20"). The output of this method is a string, not a byte array

AttributeError: 'HTTPResponse' object has no attribute 'type'

So, I am trying to build a program that will retrieve the scores of the NHL's season through the use of yahoo's RSS feed.
I am not an experienced programmer, so some things haven't quite gotten into my head just yet. However, here is my code so far:
from urllib.request import urlopen
import xml.etree.cElementTree as ET
YAHOO_NHL_URL = 'http://sports.yahoo.com/nhl/rss'
def retrievalyahoo():
nhl_site = urlopen('http://sports.yahoo.com/nhl/rss')
tree = ET.parse(urlopen(nhl_site))
retrievalyahoo()
The title above states the error I get after I test the aforementioned code.
EDIT: Okay, after the fix, the traceback error comes as this, to which I am puzzled:
Traceback (most recent call last):
File "C:/Nathaniel's Folder/Website Scores.py", line 12, in <module>
retrievalyahoo()
File "C:/Nathaniel's Folder/Website Scores.py", line 10, in retrievalyahoo
tree = ET.parse(nhl_site)
File "C:\Python33\lib\xml\etree\ElementTree.py", line 1242, in parse
tree.parse(source, parser)
File "C:\Python33\lib\xml\etree\ElementTree.py", line 1730, in parse
self._root = parser._parse(source)
File "<string>", line None
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 17, column 291
The problem is that you're trying to call urlopen on the result of urlopen.
Just call it once, like this:
nhl_site = urlopen('http://sports.yahoo.com/nhl/rss')
tree = ET.parse(nhl_site)
The error message probably could be nicer. If you look at the docs for urlopen:
Open the URL url, which can be either a string or a Request object.
Clearly the http.client.HTTPResponse object that it returns is neither a string nor a Request object. What's happened here is that urlopen sees that it's not a string, and therefore assumes it's a Request, and starts trying to access methods and attributes that Request objects have. This kind of design is generally a good thing, because it lets you pass things that act just like a Request and they'll just work… but it does mean that if you pass something that doesn't act like a Request, the error message can be mystifying.

Categories