Django : invalid date format with Pandas - \xa0

Django : invalid date format with Pandas - \xa0 - python

I would like to create objects from a CSV file with Django and Pandas.
Everything is fine for FloatFields and CharFields but when I want to add DateFields, Django returns this error: ['The date format of the value "\xa02015/08/03\xa0" is not valid. The correct format is YYYY-MM-DD.']
However, the CSV file proposes this type of data for the columns concerned: '2015/08/03'. There is no space in the data as Django seems to suggest...
here is what I tried in views :
class HomeView(LoginRequiredMixin, View):
def get(self, request,*args, **kwargs):
user = User.objects.get(id=request.user.id)
Dossier.objects.filter(user=user).delete()
csv_file = user.profile.user_data
df = pd.read_csv(csv_file, encoding = "UTF-8", delimiter=';', decimal=',')
df = df.round(2)
row_iter = df.iterrows()
objs = [
Dossier(
user = user,
numero_op = row['N° Dossier'],
porteur = row['Bénéficiaire'],
libélé = row['Libellé du dossier'],
descriptif = row["Résumé de l'opération"],
AAP = row["Référence de l'appel à projet"],
date_dépôt = row["Date Dépôt"],
date_réception = row["Accusé de réception"],
montant_CT = row['Coût total en cours'],
)
for index, row in row_iter
]
Dossier.objects.bulk_create(objs)
If I change my Model to CharField, I no longer get an error.
I tried to use the str.strip() function:
df["Date Dépôt"]=df["Date Dépôt"].str.strip()
But without success.
Could someone help me? I could keep the CharField format but it limits the processing of the data I want to propose next.

It seems that you have some garbage in that file, in particular your date is surrounded by a byte "\xa0" on either side.
In some encodings this byte denotes a "non breaking space", which may be why you're not seeing it.

Related

Python How to extract data for all index in to a only function

I am new to python and currently learning this language. I am trying to build a web scraper that will export the data to a CSV. I have the data I want and downloaded it to a CSV. The problem is that I have only managed to dump the data from one index and I want to dump all the data from all the indexes into the same CSV to form a database.
The problem I have is that I can only request n_companies indicating the index. For example (n_company[0] ) and I get the data from the first index of the list. What I want is to get all the data from all the indexes in the same function and then dump them with pandas in a CSV and thus be able to create a DB.
I'm stuck at this point and don't know how to proceed. Can you help me please.
This is the function:
def datos_directorio(n_empresa):
r = session.get(n_empresa[0])
home=r.content.decode('UTF-8')
tree=html.fromstring(home)
descripcion_direccion_empresas = '//p[#class = "paragraph"][2]//text()[normalize-space()]'
nombre_e = '//h1[#class ="mb3 h0 bold"][normalize-space()]/text()'
email = '//div[#class = "inline-block mb1 mr1"][3]/a[#class = "mail button button-inverted h4"]/text()[normalize-space()]'
teléfono = '//div[#class = "inline-block mb1 mr1"][2]/a[#class = "tel button button-inverted h4"]/text()[normalize-space()]'
d_empresas=tree.xpath(descripcion_direccion_empresas)
d_empresas = " ".join(d_empresas)
empresas_n=tree.xpath(nombre_e)
empresas_n = " ".join(empresas_n[0].split())
email_e=tree.xpath(email)
email_e = " ".join(email_e[0].split())
teléfono_e=tree.xpath(teléfono)
teléfono_e = " ".join(teléfono_e[0].split())
contenido = {
'EMPRESA' : empresas_n,
'EMAIL' : email_e,
'TELÉFONO' : teléfono_e,
'CONTACTO Y DIRECCIÓN' : d_empresas
}
return contenido
Best regards.

How to get readable unicode string from single bibtex entry field in python script

Suppose you have a .bib file containing bibtex-formatted entries. I want to extract the "title" field from an entry, and then format it to a readable unicode string.
For example, if the entry was:
#article{mypaper,
author = {myself},
title = {A very nice {title} with annoying {symbols} like {\^{a}}}
}
what I want to extract is the string:
A very nice title with annoying symbols like â
I am currently trying to use the pybtex package, but I cannot figure out how to do it. The command-line utility pybtex-format does a good job in converting full .bib files, but I need to do this inside a script and for single title entries.

Figured it out:
def load_bib(filename):
from pybtex.database.input.bibtex import Parser
parser = Parser()
DB = parser.parse_file(filename)
return DB
def get_title(entry):
from pybtex.plugin import find_plugin
style = find_plugin('pybtex.style.formatting', 'plain')()
backend = find_plugin('pybtex.backends', 'plaintext')()
sentence = style.format_title(entry, 'title')
data = {'entry': entry,
'style': style,
'bib_data': None}
T = sentence.f(sentence.children, data)
title = T.render(backend)
return title
DB = load_bib("bibliography.bib")
print ( get_title(DB.entries["entry_label"]) )
where entry_label must match the label you use in latex to cite the bibliography entry.

Building upon the answer by Daniele, I wrote this function that lets one render fields without having to use a file.
from io import StringIO
from pybtex.database.input.bibtex import Parser
from pybtex.plugin import find_plugin
def render_fields(author="", title=""):
"""The arguments are in bibtex format. For example, they may contain
things like \'{i}. The output is a dictionary with these fields
rendered in plain text.
If you run tests by defining a string in Python, use r'''string''' to
avoid issues with escape characters.
"""
parser = Parser()
istr = r'''
#article{foo,
Author = {''' + author + r'''},
Title = {''' + title + '''},
}
'''
bib_data = parser.parse_stream(StringIO(istr))
style = find_plugin('pybtex.style.formatting', 'plain')()
backend = find_plugin('pybtex.backends', 'plaintext')()
entry = bib_data.entries["foo"]
data = {'entry': entry, 'style': style, 'bib_data': None}
sentence = style.format_author_or_editor(entry)
T = sentence.f(sentence.children, data)
rendered_author = T.render(backend)[0:-1] # exclude period
sentence = style.format_title(entry, 'title')
T = sentence.f(sentence.children, data)
rendered_title = T.render(backend)[0:-1] # exclude period
return {'title': rendered_title, 'author': rendered_author}

django get_or_create saves data in parentheses

I am using django's get_or_create to save the data into postgres. The code works fine but the itemgrp1hd field saves as ('Mobile 5010',) while I have only fed Mobile 5010. Can anyone explain why the parentheses & single quotes are appearing when saved in postgres.
The code is as below:
#api_view(['GET', 'POST', 'PUT', 'DELETE'])
def Post_Items_Axios(request):
data_itemfullhd = request.data['Item Name']
data_itemgrp1hd = request.data['Item Group1']
td_items, created = Md_Items.objects.get_or_create(
cunqid = entity_unqid,
itemfullhd = data_itemfullhd,
# defaults = dict(
# itemgrp1hd = data_itemgrp1hd,
# )
)
# type(request.data['Item Group1'])
# <class 'str'>
td_items.itemgrp1hd = data_itemgrp1hd,
td_items.save()
data = {'data_itemfullhd': data_itemfullhd}
return Response(data)

You must remove the trailing comma at the end of (or around, as I am on mobile) line 15.
Change
td_items.itemgrp1hd = data_itemgrp1hd,
td_items.save()
To
td_items.itemgrp1hd = data_itemgrp1hd
td_items.save()
Having a comma at the end tells Python that you want It saved in a tuple.
See this question here for more about trailing commas and tuples.
What is the syntax rule for having trailing commas in tuple definitions?

Django ORM uses comma as a delimiter?

I have some problem with Django, which is somehow Django ORM considers comma as delimiter.
add example code in below.
print sub_categorys.description # is printed as "drum class and drums feature"
print sub_categorys.image_url # is printed as ", bongo class no.jpg"
but, real database row is description = "drum class and drums feature, bongo class ", and image_url = "no.npg"
please help me out here!
thanks!
additional explain in below, by code.
** model.py **
class SubCategory(models.Model):
name = models.TextField( unique=True )
description = models.TextField( null=True )
image_url = models.URLField( null=True )
** views.py > code use to insert data to model **
with open('./classes/resource/model/csv/sub_category_model.csv', 'rb') as f:
reader = csv.reader(f)
is_first = True
for row in reader:
if is_first:
is_first = False
continue
sub_category = SubCategory(name=unicode(row[0], 'euc-kr'),
description=unicode(row[3], 'euc-kr'),
image_url=unicode(row[4], 'euc-kr'))
try:
sub_category.save()
except Exception, e:
logger.error(e)

It's not the ORM that's using the comma as a delimiter, it's csv.reader. If you want to import strings that contain commas, you'll have to wrap them in quotation marks. Make sure the CSV file contains the proper quoting. Give your code above, your CSV rows should read something like:
foo,bar,baz,"drum class and drums feature, bongo class",no.jpg
If that's a problem for some reason, you can choose other delimiters, e.g.:
reader = csv.reader(csvfile, delimiter='|')
would take as input:
foo|bar|baz|drum class and drums feature, bongo class|no.jpg
More examples are available in the CSV module documentation

export list to csv and present to user via browser

Want to prompt browser to save csv
^^working off above question, file is exporting correctly but the data is not displaying correctly.
#view_config(route_name='csvfile', renderer='csv')
def csv(self):
name = DBSession.query(table).join(othertable).filter(othertable.id == 9701).all()
header = ['name']
rows = []
for item in name:
rows = [item.id]
return {
'header': header,
'rows': rows
}
Getting _csv.Error
Error: sequence expected but if I change in my renderer writer.writerows(value['rows']) to writer.writerow(value['rows']) the file will download via the browser just fine. Problem is, it's not displaying data in each row. The entire result/dataset is in one row, so each entry is in it's own column rather than it's own row.

First, I wonder if having a return statement inside your for loop isn't also causing problems; from the linked example it looks like their loop was in the prior statement.
I think what it looks like it's doing is it's building a collection of rows based on "table" having columns with the same name as the headers. What are the fields in your table table?
name = DBSession.query(table).join(othertable).filter(othertable.id == 9701).all()
This is going to give you back essentially a collection of rows from table, as if you did a SELECT query on it.
Something like
name = DBSession.query(table).join(othertable).filter(othertable.id == 9701).all()
header = ['name']
rows = []
for item in name:
rows.append(item.name)
return {
'header': header,
'rows': r
}

Figured it out. kept getting Error: sequence expected so I was looking at the output. Decided to try putting the result inside another list.
#view_config(route_name='csv', renderer='csv')
def csv(self):
d = datetime.now()
query = DBSession.query(table, othertable).join(othertable).join(thirdtable).filter(
thirdtable.sid == 9701)
header = ['First Name', 'Last Name']
rows = []
filename = "csvreport" + d.strftime(" %m/%d").replace(' 0', '')
for i in query:
items = [i.table.first_name, i.table.last_name, i.othertable.login_time.strftime("%m/%d/%Y"),
]
rows.append(items)
return {
'header': header,
'rows': rows,
'filename': filename
}
This accomplishes 3 things. Fills out the header, fills the rows, and passes through a filename.
Renderer should look like this:
class CSVRenderer(object):
def __init__(self, info):
pass
def __call__(self, value, system):
fout = StringIO.StringIO()
writer = csv.writer(fout, delimiter=',',quotechar =',',quoting=csv.QUOTE_MINIMAL)
writer.writerow(value['header'])
writer.writerows(value['rows'])
resp = system['request'].response
resp.content_type = 'text/csv'
resp.content_disposition = 'attachment;filename='+value['filename']+'.csv'
return fout.getvalue()
This way, you can use the same csv renderer anywhere else and be able to pass through your own filename. It's also the only way I could figure out how to get the data from one column in the database to iterate through one column in the renderer. It feels a bit hacky but it works and works well.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django : invalid date format with Pandas - \xa0 - python

It seems that you have some garbage in that file, in particular your date is surrounded by a byte "\xa0" on either side. In some encodings this byte denotes a "non breaking space", which may be why you're not seeing it.

Related

Python How to extract data for all index in to a only function

How to get readable unicode string from single bibtex entry field in python script

django get_or_create saves data in parentheses

Django ORM uses comma as a delimiter?

export list to csv and present to user via browser

Categories

Resources