I'm new to python and even newer to SQL and have just run into the following problem:
I want to insert a list (or actually, a list containing one or more dictionaries) into a single cell in my SQL database. This is one row of my data:
[a,b,c,[{key1: int, key2: int},{key1: int, key2: int}]]
As the number of dictionaries inside the lists varies and I want to iterate through the elements of the list later on, I thought it would make sense to keep it in one place (thus not splitting the list into its single elements). However, when trying to insert the list as it is, I get the following error:
sqlite3.InterfaceError: Error binding parameter 2 - probably unsupported type.
How can this kind of list be inserted into a single cell of my SQL database?
SQLite has no facility for a 'nested' column; you'd have to store your list as text or binary data blob; serialise it on the way in, deserialise it again on the way out.
How you serialise to text or binary data depends on your use-cases. JSON (via the json module could be suitable if your lists and dictionaries consist only of text, numbers, booleans and None (with the dictionaries only using strings as keys). JSON is supported by a wide range of other languages, so you keep your data reasonably compatible. Or you could use pickle, which lets you serialise to a binary format and can handle just about anything Python can throw at it, but it's specific to Python.
You can then register an adapter to handle converting between the serialisation format and Python lists:
import json
import sqlite
def adapt_list_to_JSON(lst):
return json.dumps(lst).encode('utf8')
def convert_JSON_to_list(data):
return json.loads(data.decode('utf8'))
sqlite3.register_adapter(list, adapt_list_to_JSON)
sqlite3.register_converter("json", convert_JSON_to_list)
then connect with detect_types=sqlite3.PARSE_DECLTYPES and declare your column type as json, or use detect_types=sqlite3.PARSE_COLNAMES and use [json] in a column alias (SELECT datacol AS "datacol [json]" FROM ...) to trigger the conversion on loading.
Related
I'm using sqlalchemy currently but I can't store multiple values in a column. The only values I can put in a db are strings, int, etc. but not lists. I was thinking what if I wanted a list of integers and I just made it a string in this format: "1|10|91" and then split it afterwards. Would that work or would I run out of memory or something?
It should work, in case you take care of the length of the column string in the database. So, for this case, it's better to use Text type column which has extended size in most of the database servers.
class model_name(db.Model):
id = db.Column(db.Integer, primary_key=True)
integer_values = db.Column(db.Text)
In python, you can convert a dictionary easy to a JSON file. A dictionary consists of a bunch of variables. You can then convert the JSON file easy to SQL.
JSON files are often used to convert variables from one programming language to another.
I am attempting to use python to pull a JSON array from a file and input it into ElasticSearch. The array looks as follows:
{"name": [["string1", 1, "string2"],["string3", 2, "string4"], ... (variable length) ... ["string n-1", 3, "string n"]]}
ElasticSearch throws a TransportError(400, mapper_parsing_exception, failed to parse) when attempting to index the array. I discovered that ElasticSearch sometimes throws the same error whenever I try to feed it a string with both strings and integers. So, for example, the following will sometimes crash and sometimes succeed:
import json
from elasticsearch import Elasticsearch
es = Elasticsearch()
test = json.loads('{"test": ["a", 1, "b"]}')
print test
es.index(index, body=test)
This code is everything I could safely comment out without breaking the program. I put the JSON in the program instead of having it read from a file. The actual strings I'm inputting are quite long (or else I would just post them) and will always crash the program. Changing the JSON to "test": ["a"] will cause it to work. The current setup crashes if it last crashed, or works if it last worked. What is going on? Will some sort of mapping setup fix this? I haven't figured out how to set a map with variable array length. I'd prefer to take advantage of the schema-less input but I'll take whatever works.
It is possible you are running into type conflicts with your mapping. Since you have expressed a desire to stay "schema-less", I am assuming you have not explicitly provided a mapping for your index. That works fine, just recognize that the first document you index will determine the schema for your index. Each document you index afterwards that has the same fields (by name), those fields must conform to the same type as the first document.
Elasticsearch has no issues with arrays of values. In fact, under the hood it treats all values as arrays (with one or more entries). What is slightly concerning is the example array you chose, which mixes string and numeric types. Since each value in your array gets mapped to the field named "test", and that field may only have one type, if the first value of the first document ES processes is numeric, it will likely assign that field as a long type. Then, future documents that contain a string that does not parse nicely into a number, will cause an exception in Elasticsearch.
Have a look at the documentation on Dynamic Mapping.
It can be nice to go schema-less, but in your scenario you may have more success by explicitly declaring a mapping on your index for at least some of the fields in your documents. If you plan to index arrays full of mixed datatypes, you are better off declaring that field as string type.
How can I store python 'list' values into MySQL and access it later from the same database like a normal list?
I tried storing the list as a varchar type and it did store it. However, while accessing the data from MySQL I couldn't access the same stored value as a list, but it instead it acts as a string. So, accessing the list with index was no longer possible. Is it perhaps easier to store some data in the form of sets datatype? I see the MySQL datatype 'set' but i'm unable to use it from python. When I try to store set from python into MySQL, it throws the following error: 'MySQLConverter' object has no attribute '_set_to_mysql'. Any help is appreciated
P.S. I have to store co-ordinate of an image within the list along with the image number. So, it is going to be in the form [1,157,421]
Use a serialization library like json:
import json
l1 = [1,157,421]
s = json.dumps(l1)
l2 = json.loads(s)
Are you using an ORM like SQLAlchemy?
Anyway, to answer your question directly, you can use json or pickle to convert your list to a string and store that. Then to get it back, you can parse it (as JSON or a pickle) and get the list back.
However, if your list is always a 3 point coordinate, I'd recommend making separate x, y, and z columns in your table. You could easily write functions to store a list in the correct columns and convert the columns to a list, if you need that.
I have a scientific model which I am running in Python which produces a lookup table as output. That is, it produces a many-dimensional 'table' where each dimension is a parameter in the model and the value in each cell is the output of the model.
My question is how best to store this lookup table in Python. I am running the model in a loop over every possible parameter combination (using the fantastic itertools.product function), but I can't work out how best to store the outputs.
It would seem sensible to simply store the output as a ndarray, but I'd really like to be able to access the outputs based on the parameter values not just indices. For example, rather than accessing the values as table[16][5][17][14] I'd prefer to access them somehow using variable names/values, for example:
table[solar_z=45, solar_a=170, type=17, reflectance=0.37]
or something similar to that. It'd be brilliant if I were able to iterate over the values and get their parameter values back - that is, being able to find out that table[16]... corresponds to the outputs for solar_z = 45.
Is there a sensible way to do this in Python?
Why don't you use a database? I have found MongoDB (and the official Python driver, Pymongo) to be a wonderful tool for scientific computing. Here are some advantages:
Easy to install - simply download the executables for your platform (2 minutes tops, seriously).
Schema-less data model
Blazing fast
Provides map/reduce functionality
Very good querying functionalities
So, you could store each entry as a MongoDB entry, for example:
{"_id":"run_unique_identifier",
"param1":"val1",
"param2":"val2" # etcetera
}
Then you could query the entries as you will:
import pymongo
data = pymongo.Connection("localhost", 27017)["mydb"]["mycollection"]
for entry in data.find(): # this will yield all results
yield entry["param1"] # do something with param1
Whether or not MongoDB/pymongo are the answer to your specific question, I don't know. However, you could really benefit from checking them out if you are into data-intensive scientific computing.
If you want to access the results by name, then you could use a python nested dictionary instead of ndarray, and serialize it in a .JSON text file using json module.
One option is to use a numpy ndarray for the data (as you do now), and write a parser function to convert the query values into row/column indices.
For example:
solar_z_dict = {...}
solar_a_dict = {...}
...
def lookup(dataArray, solar_z, solar_a, type, reflectance):
return dataArray[solar_z_dict[solar_z] ], solar_a_dict[solar_a], ...]
You could also convert to string and eval, if you want to have some of the fields to be given as "None" and be translated to ":" (to give the full table for that variable).
For example, rather than accessing the values as table[16][5][17][14]
I'd prefer to access them somehow using variable names/values
That's what numpy's dtypes are for:
dt = [('L','float64'),('T','float64'),('NMSF','float64'),('err','float64')]
data = plb.loadtxt(argv[1],dtype=dt)
Now you can access the data elements using date['T']['L']['NMSF']
More info on dtypes:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html
pytables doesn't natively support python dictionaries. The way I've approached it is to make a data structure of the form:
tables_dict = {
'key' : tables.StringCol(itemsize=40),
'value' : tables.Int32Col(),
}
(note that I ensure that the keys are <40 characters long) and then create a table using this structure:
file_handle.createTable('/', 'dictionary', tables_dict)
and then populate it with:
file_handle.dictionary.append(dictionary.items())
and retrieve data with:
dict(file_handle.dictionary.read())
This works ok, but reading the dictionary back in is extremely slow. I think the problem is that the read() function is causing the entire dictionary to be loaded into memory, which shouldn't really be necessary. Is there a better way to do this?
You can ask PyTables to search inside the table, and also create an index on the key column to speed that up.
To create an index:
table.cols.key.createIndex()
To query the values where key equals the variable search_key:
[row['value'] for row in table.where('key == search_key')]
http://pytables.github.com/usersguide/optimization.html#searchoptim