How do I use a python dictionary object in MATLAB? - python

I'm playing with the new capability in MATLAB 2014b where you can call python directly from matlab and get python objects in the workspace (like you've been able to with Java for a long time). I have successfully called a function and gotten a dictionary into the workspace, but I'm stuck on how to harvest the values from it. In this case I have a dictionary full of dictionaries, so I can't just convert it to MATLAB cells like they do in their example.
So a specific question: if I have a dictionary in MATLAB called "A", how to I get the sub-dictionary A['2'] out?

By consulting the MATLAB External Interfaces documentation, the Python dict type is essentially the same interface as a containers.Map. You need to use the values function to extract the value that you want for a certain key. As such, you would call values like so, given that your Python dictionary is stored in A and you want to use '2' as the key to index into your dictionary:
val = values(A, '2');
As such, val would contain what the associated value is with the key of '2'. MATLAB also has the ability to use multiple keys and it would return multiple values - one per key. Therefore, you could also do something like this:
val = cell(values(A, {'1', '2', '3'}));
val would be a three element cell array, where each element is the value for the associated keys that you input in. It is imperative that you convert the output of values into a cell array, because this would normally be a list in Python. In order to use these results in MATLAB, we need to convert to a cell.
Therefore, val{1} would be the dictionary output when you used '1' as the key. Similarly, val{2} would be the dictionary output when you used '2' as the key and so on.
Here are some more operations on a containers.Map object for you. If you want all of the keys in a dictionary, use the keys function:
k = keys(A);
If you were to just use values with the dictionary by itself, like so:
val = cell(values(A));
It would produce all values for every key that exists in your dictionary stored into a cell array.
If you wanted to update a particular key in your Python dictionary, use the update function:
update(A,py.dict(pyargs('4',5.677)));
Here, we use the dictionary A, and update the key-value pair, where the key is '4' and the new value is 5.677.

Related

How to facilitate dict record filtering by dict key value?

I want to interface with rocksdb in my python application and store arbitrary dicts in it. I gather that for that I can use something like pickle to for serialisation. But I need to be able to filter the records based on values of their keys. What's the proper approach here?
so let's say you have a list of keys named dict_keys and you have a dict named big_dict and you want to filter out only the values from dict_keys. You can write a dict comprehension that iterates through the list grabbing the items from the dict if they exist like this:
new_dict = {key: big_dict.get(key) for key in dict_keys}
RocksDB is a key-value store, and both key and value are binary strings.
If you want to filter by given keys, just use the Get interface to search the DB.
If you want to filter by given key patterns, you have to use the Iterator interface to iterating the whole DB, and filter the records with keys that match the pattern.
If you want to filter by values or value patterns, you still need to iterating the whole DB. For each key-value pair, deserialize the value, and check if it equals to the give value or matches the give pattern.
For case 1 and case 2, you don't need to deserialize all values, but only values that equal to the give key or match the pattern. However, for case 3, you have to deserialize all values.
Both case 2 and case 3 are inefficient, since they need to iterate the whole key space.
You can configure RocksDB's key to be ordered, and RocksDB have a good support for prefix indexing. So you can efficiently do range query and prefix query by key. Check the documentation for details.
In order to efficiently do value filter/search, you have to create a value index with RocksDB.

How does the collectAsMap() function work for Spark API

I am trying to understand as to what happens when we run the collectAsMap() function in spark. As per the Pyspark docs,it says,
collectAsMap(self)
Return the key-value pairs in this RDD to the master as a dictionary.
and for core spark it says,
def collectAsMap(): Map[K, V] Return the key-value pairs in this RDD
to the master as a Map.
When I try to run a sample code in pyspark for a List, I get this result:
and for scala I get this result:
I am a little confused as to why it is not returning all the elements in the List. Can somebody help me understand what is happening in this scenario as to why I am getting selective results.
Thanks.
The semantics of collectAsMap are identical between the Scala and Python APIs so I'll look at the first WLOG. The documentation for PairRDDFunctions.collectAsMap explicitly states:
Warning: this doesn't return a multimap (so if you have multiple values to the same key, only one value per key is preserved in the map returned)
In particular, the current implementation inserts the key-value pairs into the resultant map in order and thus only the last two pairs survive in each of your two examples.
If you use collect instead, it will return Array[(Int,Int)] without losing any of your pairs.
collectAsMap will return the results for paired RDD as Map collection. And since it is returning Map collection you will only get pairs with unique keys and pairs with duplicate keys will be removed.

Store repetitive data in python?

I'm working on a small task with excel sheet and python and the problem that I'm facing is i have few lines of code to perform string manipulation on the data which i fetch from the sheet. Since i got plenty of sheets,sometimes only limited number of sheets are required and couple of time whole excel sheet to perform string manipulation i can't write the same code everywhere so i thought of performing the operation once and storing it like oldvalue : newvalue so that whenever i read oldvalue i don't have to do manipulation again just fetch the newvalue from there. Now i tried using dictionary which is the best way to do it but the problem with using it is my key and value can both be repetitive and i don't want to update my previous entry with it. As per my knowledge we can't achieve it using dictionary. So what I'm asking is whether we have some kind of different data type to store it? Or do we actually need one? Can you help me figure out a way to solve it without using any data type?
EDIT :
The point is I'm getting the data from excel sheet and performing string manipulation on it and sometimes the key and the value are getting repetitive and since i'm using dictionary, it's updating previous value which i don't want to.
This will check if your dictionary contains a value for a specified key. If not, you can manipulate your string and save it for that key. If it does, it will grab that value and use it as your manipulated string.
""" Stuff is done. New string to manipulated is found """
if key not in dict:
value = ... #manipulated string
dict[key] = value
else:
manipulated_string = dict[key] #did this before, have the value already

Why doesn't Python hash lists using ID?

When using a dictionary in Python, the following is impossible:
d = {}
d[[1,2,3]] = 4
since 'list' is an unhashable type. However, the id function in Python returns an integer for an object that is guaranteed to be unique for the object's lifetime.
Why doesn't Python use id to hash a dictionary? Are there drawbacks?
The reason is right here (Why must dictionary keys be immutable)
Some unacceptable solutions that have been proposed:
Hash lists by their address (object ID). This doesn’t work because if you construct a new list with the same value it won’t be found; e.g.:
mydict = {[1, 2]: '12'}
print mydict[[1, 2]]
would raise a KeyError exception because the id of the [1, 2] used in the second line differs from that in the first line. In other words, dictionary keys should be compared using ==, not using is.
It is a requirement that if a == b, then hash(a) == hash(b). Using the id can break this, because the ID will not change if you mutate the list. Then you might have two lists that have equal contents, but have different hashes.
Another way to look at it is, yes, you could do it, but it would mean that you could not retrieve the dict value with another list with the same contents. You could only retrieve it by using the exact same list object as the key.
In Python dictionaries keys are compared using ==, and the equality operator with lists does an item-by-item equality check so two different lists with the same elements compare equal and they must behave as the same key in a dictionary.
If you need to keep a dictionary or set of lists by identity instead of equality you can just wrap the list in a user-defined object or, depending on the context, may be you can use a dictionary where elements are stored/retrieve by using id explicitly.
Note however that keeping the id of an object stored doesn't imply the object will remain alive, that there is no way for going from id to object and that id may be reused over time for objects that have been garbage collected. A solution is to use
my_dict[id(x)] = [x, value]
instead of
my_dict[id(x)] = value

How should python dictionaries be stored in pytables?

pytables doesn't natively support python dictionaries. The way I've approached it is to make a data structure of the form:
tables_dict = {
'key' : tables.StringCol(itemsize=40),
'value' : tables.Int32Col(),
}
(note that I ensure that the keys are <40 characters long) and then create a table using this structure:
file_handle.createTable('/', 'dictionary', tables_dict)
and then populate it with:
file_handle.dictionary.append(dictionary.items())
and retrieve data with:
dict(file_handle.dictionary.read())
This works ok, but reading the dictionary back in is extremely slow. I think the problem is that the read() function is causing the entire dictionary to be loaded into memory, which shouldn't really be necessary. Is there a better way to do this?
You can ask PyTables to search inside the table, and also create an index on the key column to speed that up.
To create an index:
table.cols.key.createIndex()
To query the values where key equals the variable search_key:
[row['value'] for row in table.where('key == search_key')]
http://pytables.github.com/usersguide/optimization.html#searchoptim

Categories