Django Mysql string convertion utf8 and unicode? - python

Well, I have a django model that stores the name of a music band.
So new trouble arises when a band called '✝✝✝ (Crosses)' attempts to be stored in a TextField.
This is the error:
django.db.utils.OperationalError: (1366, "Incorrect string value: '\\xE2\\x9C\\x9D\\xE2\\x9C\\x9D...' for column 'album_name' at row 1")
But this becomes weird because I have another table that stores a JsonField with the band info. The same name '✝✝✝ (Crosses)' is stored correctly. the JsonField was a TextField that stored json.dumps(dict_with_band_info) ... So in the database is stored something like
{ "name": "✝✝✝ (Crosses)" ...}. And repeat, this was a TextField before and works as expected.
So why attempting to add "name": "✝✝✝ (Crosses)" to the db Textfield shows that error but not in the other table no? I'm using pdb.set_trace() to see what are the values before do the save().
I would like to mention again that that error doesn't appear even when the JsonField was TextField in my band info table, but the error appears in the TextField of the band_name and exactly in the instance.save(). with this, I can deduct that my text_fields are ready to receive unicode, because in the band info table, the jsonfield shows the "✝✝✝ (Crosses)". Why python is doing a utf-8 in the step of saving in the band name text field?
The only thing that I see different is the fact that:
When I save the band info I call the model like:
from bands.model import BandInfo
from apis import music_api as api
# Expected to be dict
band_info = api.get_band_info(song="This is a trick", singer="chino moreno")[0]
band = BandInfo()
band.band_info=band_info #{'name':'✝✝✝ (Crosses)'}
band.save()
and when I save the band_name:
def save_info(Table, data:dict):
instance_ = Table(
'name': data['name'] #'✝✝✝ (Crosses)'
)
instance_.save()
then in another file:
from apis import music_api as api
from bands import snippets
from bands.models import Tracks
track_info = api.get_track_info(song="This is a trick", singer="chino moreno")[0]
snippets.save_info(Tracks, data:dict)
Using: python 3.9.1
django 3.1.7
MySQL workbench 8 with the community installation
well, hope I'm doing an obvious mistake.

MySQL's utf8 permits only the Unicode characters that can be represented with 3 bytes in UTF-8. If you have MySQL 5.5 or later you can change the column encoding from utf8 to utf8mb4. This encoding allows storage of characters that occupy 4 bytes in UTF-8.
To do this, set the charset option to utf8mb4 in the OPTIONS dict of the DATABASES setting in the Django settings file.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'my_db',
'USER': 'my_user',
'PASSWORD': 'my_pass',
'HOST': 'my.host',
'OPTIONS': {
'charset': 'utf8mb4' # This is the important line
}
}
}

Related

Using additional domain fields in WHMCS with python

I'm trying to register a domain for customers using WHMCS apis with python.
The issue I'm having is with the additional domain fields, in this case for the .eu domain:
additional_domainfields = base64.b64encode(phpserialize.dumps({'Entity Type':'INDIVIDUAL', 'EU Country of Citizenship':'IT'}))
post_data = {
'identifier':'xxx',
'secret':'xxx',
'action':'AddOrder',
'clientid':client_id,
'domain':domain,
'domaintype':"register",
'billingcycle':"annually",
'regperiod':1,
'noinvoice':True,
'noinvoiceemail':True,
'noemail':True,
'paymentmethod':"stripe",
'domainpriceoverride':0,
'domainrenewoverride':0,
'domainfields':additional_domainfields,
'responsetype':'json'
}
postfields = urlencode(post_data)
c.setopt(c.POSTFIELDS, postfields) # type: ignore
c.perform()
c.reset
I can't find what to use for the array of serialized data, so far all of my tests were for nothing.
I tried any combination of fields on the line:
additional_domainfields = base64.b64encode(phpserialize.dumps({'Entity Type':'INDIVIDUAL', 'EU Country of Citizenship':'IT'})).
Also, tried cheching for the php overrides for field names, so far without success.
Any help would be immensely appreciated

Push turtle file in bytes form to stardog database using pystardog

def add_graph(file, file_name):
file.seek(0)
file_content = file.read()
if 'snomed' in file_name:
conn.add(stardog.content.Raw(file_content,content_type='bytes', content_encoding='utf-
8'), graph_uri='sct:900000000000207008')
Here I'm facing issues in push the file which I have downloaded from S3 bucket and is in bytes form. It is throwing stardog.Exception 500 on pushing this data to stardog database.
I tried pushing the bytes directly as shown below but that also didn't help.
conn.add(content.File(file),
graph_uri='<http://www.example.com/ontology#concept>')
Can someone help me to push the turtle file which is in bytes form to push in stardog database using pystardog library of Python.
I believe this is what you are looking for:
import stardog
conn_details = {
'endpoint': 'http://localhost:5820',
'username': 'admin',
'password': 'admin'
}
conn = stardog.Connection('myDb', **conn_details) # assuming you have this since you already have 'conn', just sending it to a DB named 'myDb'
file = open('snomed.ttl', 'rb') # just opening a file as a binary object to mimic
file_name = 'snomed.ttl' # adding this to keep your function as it is
def add_graph(file, file_name):
file.seek(0)
file_content = file.read() # this will be of type bytes
if 'snomed' in file_name:
conn.begin() # added this to begin a connection, but I do not think it is required
conn.add(stardog.content.Raw(file_content, content_type='text/turtle'), graph_uri='sct:900000000000207008')
conn.commit() # added this to commit the added data
add_graph(file, file_name) # I just ran this directly in the Python file for the example.
Take note of the conn.add line where I used text/turtle as the content-type. I added some more context so it can be a running example.
Here is the sample file as well snomed.ttl:
<http://api.stardog.com/id=1> a :person ;
<http://api.stardog.com#first_name> "John" ;
<http://api.stardog.com#id> "1" ;
<http://api.stardog.com#dob> "1995-01-05" ;
<http://api.stardog.com#email> "john.doe#example.com" ;
<http://api.stardog.com#last_name> "Doe" .
EDIT - Query Test
If it runs successfully and there are no errors in stardog.log you should be able to see results using this query. Note that you have to specify the Named Graph since the data was added there. If you query without specifying, it will show no results.
SELECT * {
GRAPH <sct:900000000000207008> {
?s ?p ?o
}
}
You can run that query in stardog.studio but if you want it in Python, this will print the JSON result:
print(conn.select('SELECT * { GRAPH <sct:900000000000207008> { ?s ?p ?o } }'))

Unable to read environment variables in Django using django_configurations package

I was using django-environ to manage env variables, everything was working fine, recently I moved to django-configurations.
My settings inherit configurations.Configuration but I am having trouble getting values from .env file. For example, while retrieving DATABASE_NAME it gives the following error:
TypeError: object of type 'Value' has no len()
I know the below code returns a value.Value instance instead of a string, but I am not sure why it does so. The same is the case with every other env variable:
My .env. file is as follows:
DEBUG=True
DATABASE_NAME='portfolio_v1'
SECRET_KEY='your-secrete-key'
settings.py file is as follows
...
from configurations import Configuration, values
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': values.Value("DATABASE_NAME", environ=True),
...
I have verified that my `.env' file exists and is on the valid path.
I spent more time resolving the above issue and found what was missing.
Prefixing .env variables is mandatory in django-configuration as a default behavior.
While dealing dict keys, we have to provide environ_name kwarg to the Value instance
NOTE: .env variables should be prefixed with DJANGO_ even if you provide environ_name. If you want to override the prefix you have to provide environ_prefix) i.e.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': values.Value(environ_name="DATABASE_NAME"), # provide DJANGO_DATABASE_NAME='portfolio_v1' in .env file
other use cases are:
VAR = values.Value() # works, provided DJANGO_VAR='var_value'
VAR = values.Value(environ_prefix='MYSITE') # works, provided MYSITE_VAR='var_value'
CUSTOM_DICT = {
'key_1': values.Value(environ_required=True), # doesn't work
'key_2': values.Value(environ_name='other_key'), # works if provided DJANGO_key_2='value_2' in .env
}
You are using django-configurations in the wrong way.
See the source code of the Value class:
class Value:
#property
def value(self):
...
def __init__(self, default=None, environ=True, environ_name=None,
environ_prefix='DJANGO', environ_required=False,
*args, **kwargs):
...
So you want to have the default value not as "DATABASE_NAME", and your environment variable in your .env file should start with DJANGO_.
Then to use the value you can use the value property, so your settings file should look like:
...
from configurations import Configuration, values
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': values.Value("DEFAULT_VAL").value,
# No need for environ=True since it is default
...

Elasticsearch - Reindex single field with different analyzer using Python

I use dynamic mapping in elasticsearch to load my json file into elasticsearch, like this:
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
def extract():
f = open('tmdb.json')
if f:
return json.loads(f.read())
movieDict = extract()
def index(movieDict={}):
for id, body in movieDict.items():
es.index(index='tmdb', id=id, doc_type='movie', body=body)
index(movieDict)
How can I update mapping for single field? I have field title to which I want to add different analyzer.
title_settings = {"properties" : { "title": {"type" : "text", "analyzer": "english"}}}
es.indices.put_mapping(index='tmdb', body=title_settings)
This fails.
I know that I cannot update already existing index, but what is proper way to reindex mapping generated from json file? My file has a lot of fields, creating mapping/settings manually would be very troublesome.
I am able to specify analyzer for an query, like this:
query = {"query": {
"multi_match": {
"query": userSearch, "analyzer":"english", "fields": ['title^10', 'overview']}}}
How do I specify it for index or field?
I am also able to put analyzer to settings after closing and opening index
analysis = {'settings': {'analysis': {'analyzer': 'english'}}}
es.indices.close(index='tmdb')
es.indices.put_settings(index='tmdb', body=analysis)
es.indices.open(index='tmdb')
Copying exact settings for english analyzers doesn't do 'activate' it for my data.
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-lang-analyzer.html#english-analyzer
By 'activate' I mean, search is not returned in a form processed by english analyzer ie. there are still stopwords.
Solved it with massive amount of googling....
You cannot change analyzer on already indexed data. This includes opening/closing of index. You can specify new index, create new mapping and load your data (quickest way)
Specifying analyzer for whole index isn't good solution, as 'english' analyzer is specific to 'text' fields. It's better to specify analyzer by field.
If analyzers are specified by field you also need to specify type.
You need to remember that analyzers are used at can be used at/or index and search time. Reference Specifying analyzers
Code:
def create_index(movieDict={}, mapping={}):
es.indices.create(index='test_index', body=mapping)
start = time.time()
for id, body in movieDict.items():
es.index(index='test_index', id=id, doc_type='movie', body=body)
print("--- %s seconds ---" % (time.time() - start))
Now, I've got mapping from dynamic mapping of my json file. I just saved it back to json file for ease of processing (editing). That's because I have over 40 fields to map, doing it by hand would be just tiresome.
mapping = es.indices.get_mapping(index='tmdb')
This is example of how title key should be specified to use english analyzer
'title': {'type': 'text', 'analyzer': 'english','fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}

Get column names (title) from a Vertica data base?

I'm trying to extract the column names when pulling data from a vertica data base in python via an sql query. I am using vertica-python 0.6.8. So far I am creating a dictionary of the first line, but I was wondering if there is an easier way of doing it. This is how I am doing it right now:
import vertica_python
import csv
import sys
import ssl
import psycopg2
conn_info = {'host': '****',
'port': 5433,
'user': '****',
'password': '****',
'database': '****',
# 10 minutes timeout on queries
'read_timeout': 600,
# default throw error on invalid UTF-8 results
'unicode_error': 'strict',
# SSL is disabled by default
'ssl': False}
connection = vertica_python.connect(**conn_info)
cur = connection.cursor('dict')
str = "SELECT * FROM something WHERE something_happens LIMIT 1"
cur.execute(str)
temp = cur.fetchall()
ColumnList = []
for column in temp[0]:
ColumnList.append(column)
cheers
Two ways:
First, you can just access the dict's keys if you want the column list, this is basically like what you have, but shorter:
ColumnList = temp[0].keys()
Second, you can access the cursor's field list, which I think is what you are really looking for:
ColumnList = [d.name for d in cur.description]
The second one is better because it'll let you see the columns even if the result is empty.
If I am not wrong you are asking about the title of each column.
You can do that by using data descriptors of "class hp_vertica_client.cursor".
It can be found here :
https://my.vertica.com/docs/7.2.x/HTML/Content/python_client/cursor.html

Categories