How should python dictionaries be stored in pytables? - python

pytables doesn't natively support python dictionaries. The way I've approached it is to make a data structure of the form:
tables_dict = {
'key' : tables.StringCol(itemsize=40),
'value' : tables.Int32Col(),
}
(note that I ensure that the keys are <40 characters long) and then create a table using this structure:
file_handle.createTable('/', 'dictionary', tables_dict)
and then populate it with:
file_handle.dictionary.append(dictionary.items())
and retrieve data with:
dict(file_handle.dictionary.read())
This works ok, but reading the dictionary back in is extremely slow. I think the problem is that the read() function is causing the entire dictionary to be loaded into memory, which shouldn't really be necessary. Is there a better way to do this?

You can ask PyTables to search inside the table, and also create an index on the key column to speed that up.
To create an index:
table.cols.key.createIndex()
To query the values where key equals the variable search_key:
[row['value'] for row in table.where('key == search_key')]
http://pytables.github.com/usersguide/optimization.html#searchoptim

Related

Insert list into SQLite3 cell

I'm new to python and even newer to SQL and have just run into the following problem:
I want to insert a list (or actually, a list containing one or more dictionaries) into a single cell in my SQL database. This is one row of my data:
[a,b,c,[{key1: int, key2: int},{key1: int, key2: int}]]
As the number of dictionaries inside the lists varies and I want to iterate through the elements of the list later on, I thought it would make sense to keep it in one place (thus not splitting the list into its single elements). However, when trying to insert the list as it is, I get the following error:
sqlite3.InterfaceError: Error binding parameter 2 - probably unsupported type.
How can this kind of list be inserted into a single cell of my SQL database?
SQLite has no facility for a 'nested' column; you'd have to store your list as text or binary data blob; serialise it on the way in, deserialise it again on the way out.
How you serialise to text or binary data depends on your use-cases. JSON (via the json module could be suitable if your lists and dictionaries consist only of text, numbers, booleans and None (with the dictionaries only using strings as keys). JSON is supported by a wide range of other languages, so you keep your data reasonably compatible. Or you could use pickle, which lets you serialise to a binary format and can handle just about anything Python can throw at it, but it's specific to Python.
You can then register an adapter to handle converting between the serialisation format and Python lists:
import json
import sqlite
def adapt_list_to_JSON(lst):
return json.dumps(lst).encode('utf8')
def convert_JSON_to_list(data):
return json.loads(data.decode('utf8'))
sqlite3.register_adapter(list, adapt_list_to_JSON)
sqlite3.register_converter("json", convert_JSON_to_list)
then connect with detect_types=sqlite3.PARSE_DECLTYPES and declare your column type as json, or use detect_types=sqlite3.PARSE_COLNAMES and use [json] in a column alias (SELECT datacol AS "datacol [json]" FROM ...) to trigger the conversion on loading.

storing python list into mysql and accessing it

How can I store python 'list' values into MySQL and access it later from the same database like a normal list?
I tried storing the list as a varchar type and it did store it. However, while accessing the data from MySQL I couldn't access the same stored value as a list, but it instead it acts as a string. So, accessing the list with index was no longer possible. Is it perhaps easier to store some data in the form of sets datatype? I see the MySQL datatype 'set' but i'm unable to use it from python. When I try to store set from python into MySQL, it throws the following error: 'MySQLConverter' object has no attribute '_set_to_mysql'. Any help is appreciated
P.S. I have to store co-ordinate of an image within the list along with the image number. So, it is going to be in the form [1,157,421]
Use a serialization library like json:
import json
l1 = [1,157,421]
s = json.dumps(l1)
l2 = json.loads(s)
Are you using an ORM like SQLAlchemy?
Anyway, to answer your question directly, you can use json or pickle to convert your list to a string and store that. Then to get it back, you can parse it (as JSON or a pickle) and get the list back.
However, if your list is always a 3 point coordinate, I'd recommend making separate x, y, and z columns in your table. You could easily write functions to store a list in the correct columns and convert the columns to a list, if you need that.

pytables - how to copy a row to memory

I have a pytable. I often need to copy the rows to an in-memory object and then insert into another pytable. I am wondering what is the easiest way to do this. The following code does not work as one cannot convert a Row object to a dict.
for row in hf.root.my_table.iterrows():
rec = dict(row)
Also, I want to conditionally copy the data to another file, possibly adding 1-2 new columns.To do this, I will need to extract the table description from one table, modify it, and use the modified table description to create a new table. How can I do that?
This won't work either. In general, I find my way of using pytables a little bit awkward, and would like to know better way of doing things.
As mentioned in the documentation, you can use the Row.fetch_all_fields method to obtain to obtain an independent copy which retains its information after the table file is closed, for example.
Your example code would look like
for row in hf.root.my_table.iterrows():
rec = row.fetch_all_fields()
rec is a numpy void scalar with the same keys as the row; rec['field'] yields the same data as row['field'].

Store repetitive data in python?

I'm working on a small task with excel sheet and python and the problem that I'm facing is i have few lines of code to perform string manipulation on the data which i fetch from the sheet. Since i got plenty of sheets,sometimes only limited number of sheets are required and couple of time whole excel sheet to perform string manipulation i can't write the same code everywhere so i thought of performing the operation once and storing it like oldvalue : newvalue so that whenever i read oldvalue i don't have to do manipulation again just fetch the newvalue from there. Now i tried using dictionary which is the best way to do it but the problem with using it is my key and value can both be repetitive and i don't want to update my previous entry with it. As per my knowledge we can't achieve it using dictionary. So what I'm asking is whether we have some kind of different data type to store it? Or do we actually need one? Can you help me figure out a way to solve it without using any data type?
EDIT :
The point is I'm getting the data from excel sheet and performing string manipulation on it and sometimes the key and the value are getting repetitive and since i'm using dictionary, it's updating previous value which i don't want to.
This will check if your dictionary contains a value for a specified key. If not, you can manipulate your string and save it for that key. If it does, it will grab that value and use it as your manipulated string.
""" Stuff is done. New string to manipulated is found """
if key not in dict:
value = ... #manipulated string
dict[key] = value
else:
manipulated_string = dict[key] #did this before, have the value already

How to store numerical lookup table in Python (with labels)

I have a scientific model which I am running in Python which produces a lookup table as output. That is, it produces a many-dimensional 'table' where each dimension is a parameter in the model and the value in each cell is the output of the model.
My question is how best to store this lookup table in Python. I am running the model in a loop over every possible parameter combination (using the fantastic itertools.product function), but I can't work out how best to store the outputs.
It would seem sensible to simply store the output as a ndarray, but I'd really like to be able to access the outputs based on the parameter values not just indices. For example, rather than accessing the values as table[16][5][17][14] I'd prefer to access them somehow using variable names/values, for example:
table[solar_z=45, solar_a=170, type=17, reflectance=0.37]
or something similar to that. It'd be brilliant if I were able to iterate over the values and get their parameter values back - that is, being able to find out that table[16]... corresponds to the outputs for solar_z = 45.
Is there a sensible way to do this in Python?
Why don't you use a database? I have found MongoDB (and the official Python driver, Pymongo) to be a wonderful tool for scientific computing. Here are some advantages:
Easy to install - simply download the executables for your platform (2 minutes tops, seriously).
Schema-less data model
Blazing fast
Provides map/reduce functionality
Very good querying functionalities
So, you could store each entry as a MongoDB entry, for example:
{"_id":"run_unique_identifier",
"param1":"val1",
"param2":"val2" # etcetera
}
Then you could query the entries as you will:
import pymongo
data = pymongo.Connection("localhost", 27017)["mydb"]["mycollection"]
for entry in data.find(): # this will yield all results
yield entry["param1"] # do something with param1
Whether or not MongoDB/pymongo are the answer to your specific question, I don't know. However, you could really benefit from checking them out if you are into data-intensive scientific computing.
If you want to access the results by name, then you could use a python nested dictionary instead of ndarray, and serialize it in a .JSON text file using json module.
One option is to use a numpy ndarray for the data (as you do now), and write a parser function to convert the query values into row/column indices.
For example:
solar_z_dict = {...}
solar_a_dict = {...}
...
def lookup(dataArray, solar_z, solar_a, type, reflectance):
return dataArray[solar_z_dict[solar_z] ], solar_a_dict[solar_a], ...]
You could also convert to string and eval, if you want to have some of the fields to be given as "None" and be translated to ":" (to give the full table for that variable).
For example, rather than accessing the values as table[16][5][17][14]
I'd prefer to access them somehow using variable names/values
That's what numpy's dtypes are for:
dt = [('L','float64'),('T','float64'),('NMSF','float64'),('err','float64')]
data = plb.loadtxt(argv[1],dtype=dt)
Now you can access the data elements using date['T']['L']['NMSF']
More info on dtypes:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html

Categories