How to create model instances from csv file - python

There is a task to parse the csv file and create instances in the database based on the received data. On the backend - DRF and at the front - React.
The specific feature is that the file processing is not quite hidden. The logic is as follows:
There is a button to load the file. The file is loaded and validated, but nothing is created in the database at once. A window appears with a list of saved data (like a table) and in this window there is a new button to confirm by clicking on which the database is already requested.
What I just did:
1. Created a class to download the file (Upload button)
class FileUploadView(APIView):
parser_classes = ( MultiPartParser, FormParser)
renderer_classes = [JSONRenderer]
def put(self, request, format=None):
if 'file' not in request.data:
raise ParseError("Empty content")
f = request.data['file']
filename = f.name
if filename.endswith('.csv'):
file = default_storage.save(filename, f)
r = csv_file_parser(file)
status = 204
else:
status = 406
r = "File format error"
return Response(r, status=status)
In the class, the csv_file_parser function is called, the result of which is json containing all the saved data like this:
{
"1": {
"Vendor": "Firstvendortestname",
"Country": "USA",
...
...
"Modules": "Module1",
" NDA date": "2019-12-24"
},
"2": {
"Vendor": "Secondvendortestname",
"Country": "Canada",
...
...
"Modules": "Module1",
" NDA date": "2019-12-24"
}
}
This data will be used to preview the fields from which the model instaces will be created in the base by clicking on the Confirm button.
csv_file_parser function
def csv_file_parser(file):
result_dict = {}
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
line_count = 1
for rows in reader:
for key, value in rows.items():
if not value:
raise ParseError('Missing value in file. Check the {} line'.format(line_count))
result_dict[line_count] = rows
line_count += 1
return result_dict
When the Confirm button is pressed, React passes this data as an argument to a class that works with the database using the POST method. With the implementation of this class, I have difficulties. How to correctly process the received data and record it in the database?
class CsvToDatabase(APIView):
def post(self, request, format=None):
data = request.data
for vendor in data:
Vendors(
vendor_name=vendor['Vendor'],
country=vendor['Country']
).save()
return Response({'received data': request.data})
This code gives error
TypeError at /api/v1/vendors/from_csv_create/
string indices must be integers
Printing of request.data output
<QueryDict: {'{\n "1": {\n "Vendor": "Firstvendortestname",\n "Country": "USA",\n "Primary Contact Name": "Jack Jhonson",\n "Primary Contact Email": "jack#gmail.com",\n "Secondary Contact Name": "Jack2 Jhonson",\n "Secondary Contact Email": "jack2#gmail.com",\n "Modules": "Module1, Module2",\n " NDA date": "2019-12-24"\n },\n "2": {\n "Vendor": "Secondvendortestname",\n "Country": "Canada",\n "Primary Contact Name": "Sandra Bullock",\n "Primary Contact Email": "sandra#gmail.com",\n "Secondary Contact Name": "Sandra Bullock",\n "Secondary Contact Email": "sandra#gmail.com",\n "Modules": "Module1, Module2",\n " NDA date": "2019-12-24"\n }\n}': ['']}>
Maybe I'm using the wrong data format?
And overall, I have a feeling that I'm doing the job the wrong way. I don't use serializers, do I need them here?

You iterating over dict key, but should iterate over items:
for key, vendor in data.items():
Vendors(
vendor_name=vendor['Vendor'],
country=vendor['Country']
).save()

First of all my suggestion is to use bulk create operation instead of creating them one by one. Please follow this documentation link https://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create.
Your problem is caused because you are following data incorrectly in your loop. My advice is to start searching for the problem from the errors. The error clearly says that the BUG is in the data structure.
Now let's look at the request.data, it is not a list containing dicts to loop them like you are doing there. Please see this StackOverflow page for more details: Extracting items out of a QueryDict

Related

Django serializer test post of file with user information

I try to test a file upload like this:
#deconstructible
class FileGenerator:
#staticmethod
def generate_text_file(file_ending='txt'):
file_content = b'some test string'
file = io.BytesIO(file_content)
file.name = f'test.{file_ending}'
file.seek(0)
return file
def test_this(self, api_client, login_as):
user = login_as('quality-controller')
url = reverse('test-list')
organization = Organization(name="test")
organization.save()
data = {
"organization": organization.id,
"import_file": FileGenerator.generate_text_file('txt'),
"user": {
"id": user.id,
"username": user.username,
}
}
response = api_client.post(url, data, format='json')
But I receive the following error message:
b'{"import_file": ["The submitted data was not a file. Check the
encoding type on the form."]}'
I also tried to use: format='multipart' but then I receive the following error:
AssertionError: Test data contained a dictionary value for key 'user',
but multipart uploads do not support nested data. You may want to
consider using format='json' in this test case.
How can I solve this?
This is how I deal with this issue:
Simplest: flatten the form
Suck it up and just remove the issue by making your serializer to use user_id and user_username and fix it up on the server side in the serializer's validate(self, attrs) method. A bit ugly/hacky but it works just fine and can be documented.
def validate(self, attrs):
attrs["user"] = {
"id": attrs.pop("user_id"),
"name": attrs.pop("user_username")
}
return attrs
Nicest if you dont mind the size: B64 Fields
You can base64 encode the file field and pass it in the json. Then to decode it on the server side you would write (or search for) a simple Base64FileField() for DRF.
class UploadedBase64ImageSerializer(serializers.Serializer):
file = Base64ImageField(required=False)
created = serializers.DateTimeField()
Alternative - Flatten the form data
You can't pass nested data, but you can flatten the nested dicts and pass that to a DRF service. Serializers actually can understand nested data if the field names are correct.
I don't know if this field name format is standardized, but this is what worked for me after experimentation. I only use it for service->service communication TO drf, so you would have to clone it into JS, but you can use the python in unit tests. Let me know if it works for you.
def flatten_dict_for_formdata(input_dict, array_separator="[{i}]"):
"""
Recursively flattens nested dict()s into a single level suitable
for passing to a library that makes multipart/form-data posts.
"""
def __flatten(value, prefix, result_dict, previous=None):
if isinstance(value, dict):
# If we just processed a dict, then separate with a "."
# Don't do this if it is an object inside an array.
# In that case the [:id] _is_ the separator, adding
# a "." like list[1].name will break but list[x]name
# is correct (at least for DRF/django decoding)
if previous == "dict":
prefix += "."
for key, v in value.items():
__flatten(
value=v,
prefix=prefix + key,
result_dict=result_dict,
previous="dict"
)
elif isinstance(value, list) or isinstance(value, tuple):
for i, v in enumerate(value):
__flatten(
value=v,
prefix=prefix + array_separator.format(i=i), # e.g. name[1]
result_dict=result_dict,
previous="array"
)
else:
result_dict[prefix] = value
# return her to simplify the caller's life. ignored during recursion
return result_dict
return __flatten(input_dict, '', OrderedDict(), None)
# flatten_dict_for_formdata({...}):
{ # output field name
"file": SimpleUploadFile(...), # file
"user": {
"id": 1, # user.id
"name": "foghorn", # user.name
"jobs": [
"driver", # user.jobs[0]
"captain", # user.jobs[1]
"pilot" # user.jobs[1]
]
},
"objects": [
{
"type": "shoe", # objects[0]type
"size": "44" # objects[0]size
},
]
}

Django save request.POST to JSONField picks last item from list instead of saving the list

I have a view that receives a post request from client.post()
data = {
"token": create_hash(customer_name),
"images": [image_1, image_2],
"name": customer_name,
"email": "test#email.com",
"phone": "0612345678",
"product": "product-sku0",
"font_family": "Helvetica",
"font_size": 12,
"colors_used": (
"#AAAAAA|White D",
"#FFFFFF|Black C"
)
}
I am trying to save the post request as a whole to a model.JSONfield().
The post request key-value pair looks like this:
'colors_used': ['#AAAAAA|White D', '#FFFFFF|Black C']
When I save and later retrieve the value it looks like this:
'colors_used': '#FFFFFF|Black C'
Instead of saving the nested list in the JSONfield it only saved the last value.
The view:
#csrf_exempt
def order(request):
"""
Receives and saves request
"""
post = request.POST
files = request.FILES
print(f"{post=}")
assert post["token"] == create_hash(post["name"])
design_obj = RequestDetails.objects.create(
customer_name = post["name"],
customer_email = post["email"],
customer_phone = post["phone"],
request_json = post
)
I am using SQLite.
Turns out this is just default behaviour when you convert the queryset to a json string. On a key-level you can you getlist() to get all values of a multivalue key.
I ended up placing the whole nested data structure in a single json string using json.dumps(data) and just send that along with the request.

Python - Search and export information from JSON

This is the structure of my json file
},
"client1": {
"description": "blabla",
"contact name": "",
"contact email": "",
"third party organisation": "",
"third party contact name": "",
"third party contact email": "",
"ranges": [
"1.1.1.1",
"2.2.2.2",
"3.3.3.3"
]
},
"client2": {
"description": "blabla",
"contact name": "",
"contact email": "",
"third party organisation": "",
"third party contact name": "",
"third party contact email": "",
"ranges": [
"4.4.4.4",
"2.2.2.2"
]
},
I've seen ways to export specific parts of this json file but not everything. Basically all I want to do is search through the file using user input.
All I'm struggling with is how I actually use the user input to search and print everything under either client1 or client2 based on the input? I am sure this is only 1 or 2 lines of code but cannot figure it out. New to python. This is my code
data = json.load(open('clients.json'))
def client():
searchq = input('Client to export: '.capitalize())
search = ('""'+searchq+'"')
a = open('Log.json', 'a+')
a.write('Client: \n')
client()
This should get you going:
# Safely open the file and load the data into a dictionary
with open('clients.json', 'rt') as dfile:
data = json.load(dfile)
# Ask for the name of the client
query = input('Client to export: ')
# Show the corresponding entry if it exists,
# otherwise show a message
print(data.get(query, 'Not found'))
I'm going to preface this by saying this is 100% a drive-by answering, but one thing you could do is have your user use a . (dot) delimited format for specifying the 'path' to the key in the dictionary/json structure, then implementing a recursive function to seek out the value under that path like so:
def get(query='', default=None, fragment=None):
"""
Recursive function which returns the value of the terminal
key of the query string supplied, or if no query
is supplied returns the whole fragment (dict).
Query string should take the form: 'each.item.is.a.key', allowing
the user to retrieve the value of a key nested within the fragment to
an arbitrary depth.
:param query: String representation of the path to the key for which
the value should be retrieved
:param default: If default is specified, returns instead of None if query is invalid
:param fragment: The dictionary to inspect
:return: value of the specified key or fragment if no query is supplied
"""
if not query:
return fragment
query = query.split('.')
try:
key = query.pop(0)
try:
if isinstance(fragment, dict) and fragment:
key = int(key) if isinstance(fragment.keys()[0], int) else key
else:
key = int(key)
except ValueError:
pass
fragment = fragment[key]
query = '.'.join(query)
except (IndexError, KeyError) as e:
return default if default is not None else None
if not fragment:
return fragment
return get(query=query, default=default, fragment=fragment)
There are going to be a million people who come by here with better suggestions than this and there are doubtless many improvements to be made to this function as well, but since I had it lying around I thought I'd put it here, at least as a starting point for you.
Note:
Fragment should probably be made a positional argument or something. IDK. Its not because I had to rip some application specific context out (it used to have a sensible default state) and I didn't want to start re-writing stuff, so I leave that up to you.
You can do some cool stuff with this function, given some data:
d = {
'woofage': 1,
'woofalot': 2,
'wooftastic': ('woof1', 'woof2', 'woof3'),
'woofgeddon': {
'woofvengers': 'infinity woof'
}
}
Try these:
get(fragment=d, query='woofage')
get(fragment=d, query='wooftastic')
get(fragment=d, query='wooftastic.0')
get(fragment=d, query='woofgeddon.woofvengers')
get(fragment=d, query='woofalistic', default='ultraWOOF')
Bon voyage!
Pass the json format into Dict then look into the topic you want and Read or write it
import json
r = {'is_claimed': True, 'rating': 3.5}
r = json.dumps(r) # Here you have json format {"is_claimed": true, "rating": 3.5}
Json to Dict:
loaded_r = json.loads(r) # {'is_claimed': True, 'rating': 3.5}
print (r)#Print json format
print (loaded_r) #Print dict
Read the Topic
Data=loaded_r['is_claimed'] #Print Topic desired
print(Data) #True
Overwrite the topic
loaded_r['is_claimed']=False
And also this would do the same
print(loaded_r['client1']['description'])

List Indices in json in Python

I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?
The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None

how to convert json in django models instances?

I am trying to save json data as django models instances, I am new to djano-rest-framework
here is my model:
class objective(models.Model):
description = models.CharField(max_length=200)
profile_name = models.CharField(max_length=100)
pid = models.ForeignKey('personal_info')
serializer.py
class objective_Serilaizer(serializers.Serializer):
description = serializers.CharField(max_length=200)
profile_name = serializers.CharField(max_length=100)
pid = serializers.IntegerField()
def restore_object(self, attrs, instance=None):
if instance:
instance.description = attrs.get('description', instance.description)
instance.profile_name = attrs.get('profile_name', instance.profile_name)
instance.pid = attrs.get('pid', instance.pid)
return instance
return objective(**attrs)
json
{
"objective": {
"description": "To obtain job focusing in information technology.",
"profile_name": "Default",
"id": 1
}
}
I tried
>>> stream = StringIO(json)
>>> data = JSONParser().parse(stream)
I am getting following error
raise ParseError('JSON parse error - %s' % six.text_type(exc))
ParseError: JSON parse error - No JSON object could be decoded
Use:
objective_Serilaizer(data=json)
or probably because your json is data on the request object:
objective_Serilaizer(data=request.DATA)
Here's a good walk through from the Django Rest-framework docs.
If you are sure your JSON as a string is correct then it should be easy to parse without going to the lengths you currently are:
>>> import json
>>> stream = """\
... {
... "objective": {
... "description": "To obtain job focusing in information technology.",
... "profile_name": "Default",
... "id": 1
... }
... }"""
>>> json.loads(stream)
{u'objective': {u'profile_name': u'Default', u'description': u'To obtain job focusing in information technology.', u'id': 1}}
So surely the question is how come you aren't able to parse it. Where is that JSON string you quote actually coming from? Once your JSON object is parsed, you need to address the top-level "objective" key to access the individual data elements in the record you want to create.

Categories