Django how to serialize and validate external xml data

Django how to serialize and validate external xml data - python

i have "problems" validating external xml. What i want is to fetch data from an external rss site, validate if all fields are present and save it to the database then.
The problem i am having is that i am not sure how can i send the xml data to my validator. I tried in different ways but it was not working.
https://api.foxsports.com/v1/rss?partnerKey=zBaFxRyGKCfxBagJG9b8pqLyndmvo7UU&tag=nba
This is an example of the rss i am trying to parse.
Here is my code:
import json
import requests
import xml.etree.ElementTree as ET
from rest_framework.views import APIView
from rest_framework.response import Response
from rest_framework import status, generics
class Test(APIView):
def get(self, request, format=None):
response = requests.get(
channel.url
)
print(response.status_code)
if response.status_code == 200:
xml = response.text.encode("utf-8")
tree = ET.fromstring(xml)
for child in tree.iter("item"):
serializer = RssSerializer(data=child)
if serializer.is_valid():
serializer.save(parsed_xml)
The problem here is that my serializer is always not valid, no matter what i do. I kind of got around this problem when i wrote a small helper function that is manually extracting fields from the request.
It looks like this:
def parse_xml(self, node):
parsed_json = {
"title": node.find("title").text,
"description": node.find("description").text,
"link": node.find("link").text,
"pub_date": node.find("pubDate").text,
"guid": node.find("guid").text,
}
return parsed_json
Basically i just added this line parsed_xml = self.parse_xml(child)
and i am sending the parsed_xml to my serializer. That works fine but it seems like a hackhish way to me, but i am not able to process the data in any other way.
class RssSerializer(serializers.Serializer):
title = serializers.CharField(min_length=2)
link = serializers.CharField(min_length=2)
description = serializers.CharField(min_length=2)
pub_date = serializers.CharField(min_length=2)
guid = serializers.CharField(min_length=2)
def save(self, data):
new_feed = RssFeed()
new_feed.title = data["title"]
new_feed.description = data["description"]
new_feed.pub_date = data["pub_date"]
new_feed.link = data["link"]
new_feed.guid = data["guid"]
new_feed.save()
What i want to know is there any way i can fetch the xml from an external source and directly pass it to my validator? Thanks in advance for your help

Serializers do not expect xml nodes as data, so the short answer is no, unfortunately there is no direct way to pass it to validator. However, you could reuse one of the existing methods to either change whole xml to dictionary, and then feed the items to the serializer, or to change single nodes to dictionaries.
For inspiration, I would look into available solution (no need to reinvent the wheel):
This problem was already touched How to convert an xml string to a dictionary in Python?
I used once this DRF XML parser to parse XML requests, and it did the job: https://github.com/jpadilla/django-rest-framework-xml/blob/master/rest_framework_xml/parsers.py#L40
and if you don't mind additional library, xmltodict did the job for me with pretty similiar problem as well https://docs.python-guide.org/scenarios/xml/
E.g. your code could look like this:
xml = response.text.encode("utf-8")
xml_dict = xmltodict.parse(xml)
for item in xml_dict["rss"]["channel"]["item"]:
serializer = RssSerializer(data=item)

Related

django-rest-framework: Rendering both HTML and JSON with the same ModelViewSet

I'm using Django==1.10.5 and djangorestframework==3.5.3.
I have a few ModelViewSets that handle JSON API requests properly. Now, I'd like to use TemplateRenderer to add HTML rendering to those same ModelViewSets. I started with the list endpoint, and created a simple template that lists the available objects. I implemented the get_template_names to return the template I created.
Accessing that endpoint through the browser works fine when there are no objects to list, so everything related to setting up HTML renderers alongside APIs seems to work.However, when tere are objects to return the same endpoint fails with the following error:
ValueError: dictionary update sequence element #0 has length XX; 2 is required
Where XX is the number of attributes the object has.
This documentation section suggests the view function should act slightly differently when returning an HTML Response object, but I assume this is done by DRF's builtin views when necessary, so I don't think that's the issue.
This stackoverflow Q/A also looks relevant but I'm not quite sure it's the right solution to my problem.
How can I make my ModelViewSets work with both HTML and JSON renderers?
Thanks!

DRF has a brief explanation of how to do this in their documentation.
I think you'd do something like this...
On the client side, tell your endpoint what type of response you want:
fetch(yourAPIUrl, {
headers: {
'Accept': 'application/json'
// or 'Accept': 'text/html'
}
})
In your view, just check that and act accordingly:
class FlexibleAPIView(APIView):
"""
API view that can render either JSON or HTML.
"""
renderer_classes = [TemplateHTMLRenderer, JSONRenderer]
def get(self, request, *args, **kwargs):
queryset = Things.objects.all()
# If client wants HTML, give them HTML.
if request.accepted_renderer.format == 'html':
return Response({'things': queryset}, template_name='example.html')
# Otherwise, the client likely wants JSON so serialize the data.
serializer = ThingSerializer(instance=queryset)
data = serializer.data
return Response(data)

Django Rest Framework: Custom JSON Serializer

I am struggling to figure out how to write a custom serializer in DRF to parse a complex JSON data structure that is being passed to an endpoint. The JSON looks like this:
{
"name": "new",
"site": "US",
"data": {
"settings": [],
"meta": {
"meta1":{}
}
}
}
Here is my backend code:
# views.py
class SaveData(views.APIView):
def post(self, request, *args, **kwargs):
name = request.POST.get('name')
site = request.POST.get('site')
data = request.POST.get('data')
But data always returns None. On closer inspection into the request object, the incoming JSON looks like this:
# POST attribute of request object
'name' = 'new'
'site' = 'US'
'data[settings][0] = ''
'data[meta][meta1][] = ''
Basically it looks like the nested JSON objects associated with the data key are not getting properly serialized into Python dict and list objects. I've been looking for examples of custom DRF serializers but most of the ones I've found have been for serializing Django models, but I don't need to do that. The incoming data does not directly map to my models; instead I need to do some processing before I save any of the data.
Does anyone have any advice on a custom serializer that would properly convert the data JSON into proper Python objects? I started with this but it throws an exception (When a serializer is passed a 'data' keyword argument you must call '.is_valid()' before attempting to access the serialized '.data' representation.
You should either call '.is_valid()' first, or access '.initial_data' instead.). Here is my code with the custom serializer I created:
# serializers.py
class SaveDataSerializer(serializers.Serializer):
name = serializers.CharField()
site = serializers.CharField()
data = serializers.DictField()
def create(self, validated_data):
return dict(**validated_data)
Thanks.

I was able to solve my problem by converting the JS object to JSON when passing it to $.ajax, which then DRF was able to correctly parse into Python objects. Here's a snippet of the jQuery I'm using:
$.ajax({
url: '/api/endpoint',
type: 'POST',
contentType: 'application/json',
data: JSON.stringify(ajaxData),
done: function(data) {}
...
})
Note though that jQuery will default to x-www-urlencoded as the Content-Type if you don't specify anything. Since I'm using the JSON.stringify() method to explicitly convert the ajaxData object to a JSON string the Content-Type needed to be explicitly set as well.
I got the idea from this SO answer: Convert Object to JSON string
Of course, this doesn't really answer my original question, but this is another solution that is simpler and works like I wanted. By doing this I didn't have to create any custom serializers, but I did have to modify my API view like so:
class SaveData(views.APIView):
def post(self, request, *args, **kwargs):
name = request.data.get('name')
site = request.data.get('site')
data = request.data.get('data')
After doing this the expected object types are returned
>>> type(name)
str
>>> type(site)
str
>>> type(data)
dict

Serializing optionally nested structures: Difference between QueryDict and normal dict?

I'm running into weird behavior when writing nested structures with django-rest and then trying to test them using django-rest's test client. The nested child object should be optional.
Here's a sample serializer:
from rest_framework import serializers
class OptionalChildSerializer(serializers.Serializer):
field_b = serializers.IntegerField()
field_c = serializers.IntegerField()
class Meta:
fields = ('field_b', 'field_c', )
class ParentSerializer(serializers.Serializer):
field_a = serializers.IntegerField()
child = OptionalChildSerializer(required=False, many=False)
class Meta:
fields = ('a', 'child',)
def perform_create(self, serializer):
# TODO: create nested object.
pass
(I've omitted the code in perform_create, as it's not relevant to the question).
Now, passing a normal dict as data argument works just fine:
ser = ParentSerializer(data=dict(field_a=3))
ser.is_valid(raise_exception=True)
But passing a QueryDict instead will fail:
from django.http import QueryDict
ser = ParentSerializer(data=QueryDict("field_a=3"))
ser.is_valid(raise_exception=True)
ValidationError: {'child': {'field_b': [u'This field is required.'], 'field_c': [u'This field is required.']}}
On the actual web site, the API gets a normal dict and thus works fine. During testing however, using something like client.post('url', data=dict(field_a=3)) will result in a QueryDict being passed to the view, and hence not work.
So my question is: what's the difference between the QueryDict and normal dict? Or am I approaching this the wrong way?

DRF defines multiple parser classes for parsing the request content having different media types.
request.data will normally be a QueryDict or a normal dictionary depending on the parser used to parse the request content.
JSONParser:
It parses the JSON request content i.e. content with .media_type as application/json.
FormParser
It parses the HTML form content. Here, request.data is populated with a QueryDict of data. These have .media_type as application/x-www-form-urlencoded.
MultiPartParser
It parses multipart HTML form content, which supports file uploads. Here also, both request.data is populated with a QueryDict. These have
.media_type as multipart/form-data.
FileUploadParser
It parses raw file upload content. The request.data property is a dictionary with a single key file containing the uploaded file.
How does DRF determines the parser?
When DRF accesses the request.data, it examines the Content-Type header on the incoming request and then determines which parser to use to parse the request content.
You will need to set the Content-Type header when sending the data otherwise it will use either a multipart or a form parser to parse the request content and give you a QueryDict in request.data instead of a dictionary.
As per DRF docs,
If you don't set the content type, most clients will default to using
'application/x-www-form-urlencoded', which may not be what you wanted.
So when sending json encoded data, also set the Content-Type header to application/json and then it will work as expected.
Why request.data is sometimes QueryDict and sometimes dict?
This is done because different encodings have different datastructures and properties.
For example, form data is an encoding that supports multiple keys of the same value, whereas json does not support that.
Also in case of JSON data, request.DATA might not be a dict at all, it could be a list or any of the other json primitives.
Check out this Google Groups thread about the same.
What you need to do?
You can add format='json' in the tests when POSTing the data which will set the content-type as well as serialize the data correctly.
client.post('url', format='json', data=dict(field_a=3))
You can also send JSON-encoded content with content-type argument.
client.post('url', json.dumps(dict(field_a=3)), content_type='application/json')

The django-rest test client doesn't automatically serialize data as json, but uses multipart/form, which results in a QueryDict.
There is, however, a format option, described in the docs. The following test code works fine:
client.post('url', format='json', data=dict(field_a=3))
I'm still puzzled on the different serializer behavior between a normal dict and a QueryDict, though...
Thanks Rajesh for pointing me in the right direction!

How to get api format response from Django

I followed "Writing regular Django views..." from the official documentation of Django Rest framework and got this kind of code:
#views.py file
#imports go here
class JSONResponse(HttpResponse):
"""
An HttpResponse that renders its content into JSON.
"""
def __init__(self, data, **kwargs):
content = JSONRenderer().render(data)
kwargs['content_type'] = 'application/json'
super(JSONResponse, self).__init__(content, **kwargs)
def items(request):
output = [{"a":"1","b":"2"},{"c":"3","d":"4"}]
return JSONResponse(output)
And it works well. When a user goes to /items/ page, he or she sees a nicely looking json-formated data [{"a":"1","b":"2"},{"c":"3","d":"4"}]. But, how can I get (code?) api-formated data, or check if a user requested ?format=api then render in api format manner and if not, then in json format. By api-formated data I mean this kind of view

Try using the #api_view() decorator as described here. And make sure you use the built in Response instead of JSONResponse.
Your view should then look something like this:
from rest_framework.decorators import api_view
...
#api_view()
def items(request):
output = [{"a":"1","b":"2"},{"c":"3","d":"4"}]
return Response(output)
In the case of getting the error
Cannot apply DjangoModelPermissions on a view that does not have model or queryset property
Remove DjangoModelPermissions from your rest framework permissions settings in you settings.py

Json response list with django

I want to use typeahead.js in my forms in Django 1.7. Furthermore I want to implement that using class based views.
As far as I understand the problem, I need to create a view that generates a JSON response for the ajax request coming from typeahead.js.
Is it a good idea to use django-braces for that?
What I have so far is this:
from braces.views import JSONResponseMixin
[...]
class TagList(JSONResponseMixin, ListView):
"""
List Tags
"""
model = Tag
context_object_name = 'tags'
def get(self, request, *args, **kwargs):
objs = self.object_list()
context_dict = {
"name": <do something with "obs" to get just the name fields>
"color": <do something with "obs" to get just the color fields>
}
return self.render_json_response(context_dict)
That's where I'm stuck at the moment. Am I on the right path? Or would it even be possible (and easy) to go without a third party app?

Serializing non-dictionary objects¶
In order to serialize objects other than dict you must set the safe parameter to False:
response = JsonResponse([1, 2, 3], safe=False)
https://docs.djangoproject.com/en/1.10/ref/request-response/#jsonresponse-objects
Edit:
But please be aware that this introduces a potentially serious CSRF vulnerability into your code [1] and IS NOT RECOMMENDED by the Django spec, hence it being called unsafe. If what you are returning requires authentication and you don't want a third party to be able to capture it then avoid at all costs.
In order to mitigate this vulnerability, you should wrap your list in a dictionary like so:
{'context': ['some', 'list', 'elements']}
[1] https://haacked.com/archive/2008/11/20/anatomy-of-a-subtle-json-vulnerability.aspx/

I usually use the python json library like this:
import json
from django.http import HttpResponse
class TagList(ListView):
...
context_dict = {
"name": <do something with "obs" to get just the name fields>
"color": <do something with "obs" to get just the color fields>
}
return HttpResponse(json.dumps({'context_dict': context_dict}),
content_type='application/json; charset=utf8')
But in the new Django 1.7 you have JsonResponse objects
Hope you find it useful.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.