Django ORM uses comma as a delimiter? - python

I have some problem with Django, which is somehow Django ORM considers comma as delimiter.
add example code in below.
print sub_categorys.description # is printed as "drum class and drums feature"
print sub_categorys.image_url # is printed as ", bongo class no.jpg"
but, real database row is description = "drum class and drums feature, bongo class ", and image_url = "no.npg"
please help me out here!
thanks!
additional explain in below, by code.
** model.py **
class SubCategory(models.Model):
name = models.TextField( unique=True )
description = models.TextField( null=True )
image_url = models.URLField( null=True )
** views.py > code use to insert data to model **
with open('./classes/resource/model/csv/sub_category_model.csv', 'rb') as f:
reader = csv.reader(f)
is_first = True
for row in reader:
if is_first:
is_first = False
continue
sub_category = SubCategory(name=unicode(row[0], 'euc-kr'),
description=unicode(row[3], 'euc-kr'),
image_url=unicode(row[4], 'euc-kr'))
try:
sub_category.save()
except Exception, e:
logger.error(e)

It's not the ORM that's using the comma as a delimiter, it's csv.reader. If you want to import strings that contain commas, you'll have to wrap them in quotation marks. Make sure the CSV file contains the proper quoting. Give your code above, your CSV rows should read something like:
foo,bar,baz,"drum class and drums feature, bongo class",no.jpg
If that's a problem for some reason, you can choose other delimiters, e.g.:
reader = csv.reader(csvfile, delimiter='|')
would take as input:
foo|bar|baz|drum class and drums feature, bongo class|no.jpg
More examples are available in the CSV module documentation

Related

Django : invalid date format with Pandas - \xa0

I would like to create objects from a CSV file with Django and Pandas.
Everything is fine for FloatFields and CharFields but when I want to add DateFields, Django returns this error: ['The date format of the value "\xa02015/08/03\xa0" is not valid. The correct format is YYYY-MM-DD.']
However, the CSV file proposes this type of data for the columns concerned: '2015/08/03'. There is no space in the data as Django seems to suggest...
here is what I tried in views :
class HomeView(LoginRequiredMixin, View):
def get(self, request,*args, **kwargs):
user = User.objects.get(id=request.user.id)
Dossier.objects.filter(user=user).delete()
csv_file = user.profile.user_data
df = pd.read_csv(csv_file, encoding = "UTF-8", delimiter=';', decimal=',')
df = df.round(2)
row_iter = df.iterrows()
objs = [
Dossier(
user = user,
numero_op = row['N° Dossier'],
porteur = row['Bénéficiaire'],
libélé = row['Libellé du dossier'],
descriptif = row["Résumé de l'opération"],
AAP = row["Référence de l'appel à projet"],
date_dépôt = row["Date Dépôt"],
date_réception = row["Accusé de réception"],
montant_CT = row['Coût total en cours'],
)
for index, row in row_iter
]
Dossier.objects.bulk_create(objs)
If I change my Model to CharField, I no longer get an error.
I tried to use the str.strip() function:
df["Date Dépôt"]=df["Date Dépôt"].str.strip()
But without success.
Could someone help me? I could keep the CharField format but it limits the processing of the data I want to propose next.
It seems that you have some garbage in that file, in particular your date is surrounded by a byte "\xa0" on either side.
In some encodings this byte denotes a "non breaking space", which may be why you're not seeing it.

Is there a way to take input from text file and put it in set/get methods of a class?

I am doing a course project based on Python and I am curious if there is a way to write something similar to this (written in C++) in python. I am struggling to write this in Python (transfer information from text file into the set/getters of a class I have already created.
while (file >> Code >> Name >> Description >> Price >> Quantity >> color >>
Size >> BasketballRate) {
Basketball* object3 = new Basketball();
object3->SetName(Name);
object3->SetCode(Code);
object3->SetDescript(Description);
object3->SetPrice(Price);
object3->SetQuantity(Quantity);
object3->setColor(color);
object3->setSize(Size);
object3->setBasketballRate(BasketballRate);
basketball.push_back(object3);
}
file.close();
Getters and setters typically aren't used in Python since they're seldom needed and can effectively be added latter (without breaking existing code) if there turns out to be some unusual reason to have one or more of them.
Here's an example of reading the data from a text file and using it to create instances of the Basketball class.
class Basketball:
fields = ('name', 'code', 'description', 'price', 'quantity', 'color', 'size',
'basketball_rate')
def __init__(self):
for field in type(self).fields:
setattr(self, field, None)
def __str__(self):
args = []
for field in type(self).fields:
args.append(f'{field}={getattr(self, field)}')
return f'{type(self).__name__}(' + ', '.join(args) + ')'
basketballs = []
with open('bb_info.txt') as file:
while True:
basketball = Basketball()
try:
for field in Basketball.fields:
setattr(basketball, field, next(file).rstrip())
except StopIteration:
break # End of file.
basketballs.append(basketball)
for basketball in basketballs:
print(basketball)

How to speed up writing in a database?

I have a function which search for json files in a directory, parse the file and write data in the database. My problem is writing in database, because it take around 30 minutes. Any idea how can I speed up writting in a database? I have few quite big files to parse, but parsing the file is not a problem it take around 3 minutes. Currently I am using sqlite but in the future I will change it to PostgreSQL.
Here is my function:
def create_database():
with transaction.atomic():
directory = os.fsencode('data/web_files/unzip')
for file in os.listdir(directory):
filename = os.fsdecode(file)
with open('data/web_files/unzip/{}'.format(filename.strip()), encoding="utf8") as f:
data = json.load(f)
cve_items = data['CVE_Items']
for i in range(len(cve_items)):
database_object = DataNist()
try:
impact = cve_items[i]['impact']['baseMetricV2']
database_object.severity = impact['severity']
database_object.exp_score = impact['exploitabilityScore']
database_object.impact_score = impact['impactScore']
database_object.cvss_score = impact['cvssV2']['baseScore']
except KeyError:
database_object.severity = ''
database_object.exp_score = ''
database_object.impact_score = ''
database_object.cvss_score = ''
for vendor_data in cve_items[i]['cve']['affects']['vendor']['vendor_data']:
database_object.vendor_name = vendor_data['vendor_name']
for description_data in cve_items[i]['cve']['description']['description_data']:
database_object.description = description_data['value']
for product_data in vendor_data['product']['product_data']:
database_object.product_name = product_data['product_name']
database_object.save()
for version_data in product_data['version']['version_data']:
if version_data['version_value'] != '-':
database_object.versions_set.create(version=version_data['version_value'])
My models.py:
class DataNist(models.Model):
vendor_name = models.CharField(max_length=100)
product_name = models.CharField(max_length=100)
description = models.TextField()
date = models.DateTimeField(default=timezone.now)
severity = models.CharField(max_length=10)
exp_score = models.IntegerField()
impact_score = models.IntegerField()
cvss_score = models.IntegerField()
def __str__(self):
return self.vendor_name + "-" + self.product_name
class Versions(models.Model):
data = models.ForeignKey(DataNist, on_delete=models.CASCADE)
version = models.CharField(max_length=50)
def __str__(self):
return self.version
I will appreciate if you can give me any advice how can I improve my code.
Okay, given the structure of the data, something like this might work for you.
This is standalone code aside from that .objects.bulk_create() call; as commented in the code, the two classes defined would actually be models within your Django app.
(By the way, you probably want to save the CVE ID as an unique field too.)
Your original code had the misassumption that every "leaf entry" in the affected version data would have the same vendor, which may not be true. That's why the model structure here has a separate product-version model that has vendor, product and version fields. (If you wanted to optimize things a little, you might deduplicate the AffectedProductVersions even across DataNists (which, as an aside, is not a perfect name for a model)).
And of course, as you had already done in your original code, the importing should be run within a transaction (transaction.atomic()).
Hope this helps.
import json
import os
import types
class DataNist(types.SimpleNamespace): # this would actually be a model
severity = ""
exp_score = ""
impact_score = ""
cvss_score = ""
def save(self):
pass
class AffectedProductVersion(types.SimpleNamespace): # this too
# (foreign key to DataNist here)
vendor_name = ""
product_name = ""
version_value = ""
def import_item(item):
database_object = DataNist()
try:
impact = item["impact"]["baseMetricV2"]
except KeyError: # no impact object available
pass
else:
database_object.severity = impact.get("severity", "")
database_object.exp_score = impact.get("exploitabilityScore", "")
database_object.impact_score = impact.get("impactScore", "")
if "cvssV2" in impact:
database_object.cvss_score = impact["cvssV2"]["baseScore"]
for description_data in item["cve"]["description"]["description_data"]:
database_object.description = description_data["value"]
break # only grab the first description
database_object.save() # save the base object
affected_versions = []
for vendor_data in item["cve"]["affects"]["vendor"]["vendor_data"]:
for product_data in vendor_data["product"]["product_data"]:
for version_data in product_data["version"]["version_data"]:
affected_versions.append(
AffectedProductVersion(
data_nist=database_object,
vendor_name=vendor_data["vendor_name"],
product_name=product_data["product_name"],
version_name=version_data["version_value"],
)
)
AffectedProductVersion.objects.bulk_create(
affected_versions
) # save all the version information
return database_object # in case the caller needs it
with open("nvdcve-1.0-2019.json") as infp:
data = json.load(infp)
for item in data["CVE_Items"]:
import_item(item)

Separate file for keeping inventory (of books)

I have a program that is basically a library simulation, you can look up books, edit, delete, etc.
In my program I've initialized some default books into a class such as this:
class BookData:
def __init__(self):
self.bookTitle = ''
self.isbn = ''
self.author = ''
self.publisher = ''
self.dateAdded = ''
self.quantity = 0.0
self.wholesale = 0.0
self.retail = 0.0
def __str__(self):
return 'Title: {} ISBN: {} Author: {} ' \
'Publisher: {} Date Added: {} ' \
'Quantity: {} Wholesale Value: {} ' \
'Retail Value: {}'.format(self.bookTitle, self.isbn, self.author, self.publisher, self.dateAdded,
self.quantity, self.wholesale, self.retail)
An example of a book I have stored in the program:
book0.bookTitle, book0.isbn, book0.author, book0.publisher, book0.dateAdded, book0.quantity, book0.wholesale, book0.retail = "INTRODUCING PYTHON", "978-1-4493-5936-2", "Bill Lubanovic", "O'Reilly Media, Inc.", "11/24/2014", 25, 39.95, 50.00
Each book then gets appended into a list.
What I want to do is store all the books into a separate file so that it can be updated and edited within that file, but I don't quite get how to properly open the file, read each part (such as title, isbn, author) then in the main program make those into BookData objects and put them into a list.
I've considered either a plain .txt document with commas to format. I don't know if something like JSON or XML will make this easier.
Psuedo code example:
open(file):
for word in file:
create book with title, author, isbn, etc in file
append to list of books
Python natively supports CSV (comma separated values) files: Python documentation
An example would be:
import csv
books = []
with open('file.csv', newline = '') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
book = BookData()
book.bookTitle = row[0]
book.isbn = row[1]
However, that being said, it may be more constructive to change your constructor (ha ha) to take in a row, and then assign it directly:
def __init__(self, row):
self.bookTitle = row[0]

Django w/ sqlite3 CharField crashing on special character: "Rüppell's Vulture"

Django verision 1.9, DB backend: sqlite3.
I am having a hard time figuring out how to handle this error. I am importing the master bird species list (available here) into a set of Django models. I had the import going well, but it is crashing when I try to save this value: Rüppell's Vulture into the model. The target field is defined like this:
species_english = models.CharField(max_length=100, default=None, blank=True, null=True)
Here is there error:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory =
str). It is highly recommended that you instead just switch your
application to Unicode strings.
I was reading through Django's documentation about unicode strings. Which starts off beautifully like this:
Django natively supports Unicode data everywhere. Providing your
database can somehow store the data, you can safely pass around
Unicode strings to templates, models and the database.
Also looking up information about this character: ü, it has representation is both unicode and utf-8.
The method for saving this string to the DB is very straight-forward, I am simply parsing the CSV file using csv.reader:
new_species = Species(genus=new_genus, species=row[4], species_english=row[7])
Where the error-throwing string is contained in row[7]. What am I missing about why the database will not allow this character?
UPDATE
here is the content of the whole script importing the data:
import csv
from birds.models import SpeciesFile, Order, Family, Genus, Species, Subspecies
csv_file = str(SpeciesFile.objects.all()[0].species_list)
#COLUMNS
#0 - Order
#1 - Family Scientific
#2 - Family (English)
#3 - Genus
#4 - Species
#5 - SubSpecies
with open("birds/media/"+csv_file.split('/')[1], 'rU') as c:
Order.objects.all().delete()
Family.objects.all().delete()
Genus.objects.all().delete()
Species.objects.all().delete()
Subspecies.objects.all().delete()
reader = csv.reader(c, delimiter=';', quotechar='"')
ini_rows = 4
for row in reader:
if ini_rows > 0:
ini_rows -= 1
continue
if row[0]:
new_order = Order(order=row[0])
new_order.save()
elif row[1]:
new_fam = Family(order = new_order, family_scientific=row[1], family_english=row[2])
new_fam.save()
elif row[3]:
new_genus = Genus(family = new_fam, genus=row[3])
new_genus.save()
elif row[4]:
print row[4]
new_species = Species(genus=new_genus, species=row[4], species_english=row[7])
new_species.save()
elif row[5]:
print row[5]
new_subspecies = Subspecies(species=new_species, subspecies=row[5])
new_subspecies.save()
And here are the models.py file definitions:
from __future__ import unicode_literals
from django.db import models
class SpeciesFile(models.Model):
species_list = models.FileField()
class Order(models.Model):
order = models.CharField(max_length=100)
def __str__(self):
return self.order
class Family(models.Model):
order = models.ForeignKey(Order)
family_scientific = models.CharField(max_length=100)
family_english = models.CharField(max_length=100)
def __str__(self):
return self.family_english+" "+self.family_scientific
class Genus(models.Model):
family = models.ForeignKey(Family)
genus = models.CharField(max_length=100)
def __str__(self):
return self.genus
class Species(models.Model):
genus = models.ForeignKey(Genus, default=None)
species = models.CharField(max_length=100, default=None)
species_english = models.CharField(max_length=100, default=None, blank=True, null=True)
def __str__(self):
return self.species+" "+self.species_english
class Subspecies(models.Model):
species = models.ForeignKey(Species)
subspecies = models.CharField(max_length=100)
def __str__(self):
return self.subspecies
Django CharField is a character-oriented format. You need to pass it Unicode strings.
CSV is a byte-oriented format. When you read data out of a CSV file you get byte strings.
To get from bytes to characters you have to know what encoding was used when the original characters were turned into bytes as the CSV file was exported. Ideally that would be UTF-8, but if the file has come out of Excel it probably won't be. Maybe it's Windows-1252 (‘ANSI’ code page for Western European installations). Maybe it's something else.
(Django/Python 2 lets you get away with writing byte strings to Unicode properties when you have only ASCII bytes in it (bytes 0–127) because those have the same mapping in a lot encodings. ASCII is a ‘best guess’ at Do What I Mean, but it's not reliable and Python 3 prefers to raise errors if you try.)
So:
new_order = Order(order=row[0].decode('windows-1252'))
or, to decode the whole row at once:
row = [s.decode('windows-1252') for s in row]

Categories