I am using Python 2.7 on Ubuntu. I have a script that writes an SPSS .sav file.
If I use ValueLabels with numbers as keys like this:
{1: 'yes', 2: 'no'}
the following line causes a Segmentation fault:
with savReaderWriter.SavWriter(sav_file_name, varNames, varTypes, valueLabels=value_labels, ioUtf8=True) as writer:
However, if my keys are strings like this:
{'1': 'yes', '2': 'no'}
I do not get the Segmentation fault, and my script runs fine. The problem, of course is that I need the keys to be numbers. How can I fix or work around this.
Thank you in advance.
-RLS
Depending on whether you specify a numerical (varType == 0) or a string (varType > 0, where varType is the length in bytes of the string value), one the following two C functions of the SPSS I/O library is called:
int spssSetVarNValueLabel(int handle, const char * varName, double value, const char * label)
int spssSetVarCValueLabel(int handle, const char * varName, const char * value, const char * label)
Note that ctypes.c_double accepts both floats and ints, so the values of numerical variables do not necessarily have to be specified as floats (doubles), they can also be ints.
It appears that you specified a varType > 1 (indicating a string variable), but a 'value label' value which is an int (suggesting a numerical variable). The fix is to make the two consistent. One way is already stated above, the other way is to set the varType for the variable in question to zero.
That said, it is ugly to get this segfault. I put it on my to-do list to specify the argtype attribute for all the setter functions (see 15.17.1.6 on https://docs.python.org/2/library/ctypes.html), so you would get a nice, understandable ArgumentError instead of this nasty segfault.
If the problem persists, could you please open an issue at https://bitbucket.org/fomcl/savreaderwriter/issues?status=new&status=open, please with a minimal example.
#ekhumoro: savReaderWriter has not been tested for Python 2.6 or earlier (I would be surprised it if works), so a dict comprehension should be fine.
UPDATE:
# RLS: You are welcome. Thank you too, it inspired me to correct this. As of commit 5c11704 this is now throwing a ctypes.ArgumentError (see https://bitbucket.org/fomcl/savreaderwriter). Here is an example that I might also use to write a unittest for this (the b" prefixes are needed for Python 3):
import savReaderWriter as rw, tempfile, os, pprint
savFileName = os.path.join(tempfile.gettempdir(), "some_file.sav")
varNames = [b"a_string", b"a_numeric"]
varTypes = {b"a_string": 1, b"a_numeric": 0}
records = [[b"x", 1], [b"y", 777], [b"z", 10 ** 6]]
# Incorrect, but now raises ctypes.ArgumentError:
valueLabels = {b"a_numeric": {b"1": b"male", b"2": b"female"},
b"a_string": {1: b"male", 2: b"female"}}
# Correct
#valueLabels = {b"a_numeric": {1: b"male", 2: b"female"},
# b"a_string": {b"1": b"male", b"2": b"female"}}
kwargs = dict(savFileName=savFileName, varNames=varNames,
varTypes=varTypes, valueLabels=valueLabels)
with rw.SavWriter(**kwargs) as writer:
writer.writerows(records)
# Check if the valueLabels look all right
with rw.SavHeaderReader(savFileName) as header:
metadata = header.dataDictionary(True)
pprint.pprint(metadata.valueLabels)
Just convert the dict before passing it to SavWriter:
labels = {str(key): value for key, value in value_labels.items()}
or for earlier versions of python:
labels = dict((str(key), value) for key, value in value_labels.items())
The best long-term solution, though, is to re-factor your code so that the keys don't have to be numbers.
UPDATE:
If the dicts are nested, then try this:
labels = {str(key): {str(key): value for key, value in value.items()}
for key, value in value_label.items()}
Related
I’m trying to better understand the concept of python dictionaries and want to use a dictionary as a container of several variables in my code. Most examples I looked for, show strings as dictionary keys, which implies the use of quotation marks for using keys as variables. However, I found out that one does not need to use quotation marks if the key is firstly given a value and after that placed in a dictionary. Then one get rid of the quotation marks. The variable is then actually an immutable value. In that case, even as one changes the value of the key, the original value remains in the key and can be retrieved by dictionary method -.keys() (and thus be used to restore the first given value). However, I’m wondering if this is a proper way of coding and if it is better to apply a class as a variable container, which looks more simple but is perhaps slower when executed. Both approaches lead to the same result. See my example below.
class Container ():
def __init__(self):
self.a = 15
self.b = 17
# first given values
a = 5
b = 7
# dictionary approach
container = {a:15, b:17}
print('values in container: ', container[a], container[b])
container[a], container[b] = 25, 27
print('keys and values in container: ', container[a], container[b])
for key in container.keys():
print('firstly given values: ', key)
print('\n')
# class approach
cont = Container()
print('values in cont: ', cont.a, cont.b)
cont.a, cont.b = 25, 27
print('keys and values in cont: ', cont.a, cont.b)
However, I found out that one does not need to use quotation marks if the key is firstly given a value and after that placed in a dictionary.
This isn’t really what’s happening. Your code isn’t using 'a' and 'b' as dictionary keys. It’s using the values of the variables a and b — which happen to be the integers 5 and 7, respectively.
Subsequent access to the dictionary also happens by value: whether you write container[a] or container[5] doesn’t matter (as long as a is in scope and unchanged). But *it is not the same as container['a'], and the latter would fail here.
You can also inspect the dictionary itself to see that it doesn’t have a key called 'a' (or unquoted, a):
>>> print(dictionary)
{5: 15, 7: 17}
Ultimately, if you want to use names (rather than values) to access data, use a class, not a dictionary. Use a dictionary when the keys are given as values.
Later you may assign other values to a and b, and the code using dictionary will crash. Using a variable as a key is not a good practice. Do it with the class. You may also add the attributes to the constructor of your class.
class Container ():
def __init__(self, a, b):
self.a = a
self.b = b
# creating
cont = Container(15, 17)
# changin
cont.a, cont.b = 25, 27
I would recommand the class approach, because the dict approach in this case does not seem a proper way to code.
When you do :
a = 5
b = 7
container = {a:15, b:17}
You actually do :
container = {5:15, 7:17}
But this is "hidden", so there is a risk that later you reassign your variables, or that you just get confused with this kind of dictionary :
container = {
a:15,
b:17,
"a": "something"
}
This is worded terribly but I'm very new to coding, here is an example of what I want to do:
import random
dict={
'option1': 1,
'option2': 2,
}
randomDictionaryChoice=random.choice(list(dict.keys()))
valueOfThatKey=dict.get('randomDictionaryChoice')
print(valueOfThatKey)
At the moment it prints "none" instead of 1 or 2.
This is likely structured badly or bad ettiquette so feel free to comment on that too. Thanks in advance.
You have quotes around 'randomDictionaryChoice', so it's looking for an entry in the dict with that literal word as the key (which doesn't exist). Remove the quotes:
>>> valueOfThatKey=dict.get(randomDictionaryChoice)
>>> print(valueOfThatKey)
2
(also -- you should avoid using the names of builtins like dict, list, or string for variable names)
You have no 'randomDictionaryChoice' key, indeed. Pass in the variable, not a string:
valueOfThatKey = somedict.get(randomDictionaryChoice)
Note that you shouldn't dict to name your dictionary, as that masks the built-in type.
If you don't need the key, you could just pick directly from dict.values():
random_value = random.choice(list(somedict.values()))
I've found how to split a delimited string into key:value pairs in a dictionary elsewhere, but I have an incoming string that also includes two parameters that amount to dictionaries themselves: parameters with one or three key:value pairs inside:
clientid=b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0&keyid=987654321&userdata=ip:192.168.10.10,deviceid:1234,optdata:75BCD15&md=AMT-Cam:avatar&playbackmode=st&ver=6&sessionid=&mk=PC&junketid=1342177342&version=6.7.8.9012
Obviously these are dummy parameters to obfuscate proprietary code, here. I'd like to dump all this into a dictionary with the userdata and md keys' values being dictionaries themselves:
requestdict {'clientid' : 'b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0', 'keyid' : '987654321', 'userdata' : {'ip' : '192.168.10.10', 'deviceid' : '1234', 'optdata' : '75BCD15'}, 'md' : {'Cam' : 'avatar'}, 'playbackmode' : 'st', 'ver' : '6', 'sessionid' : '', 'mk' : 'PC', 'junketid' : '1342177342', 'version' : '6.7.8.9012'}
Can I take the slick two-level delimitation parsing command that I've found:
requestDict = dict(line.split('=') for line in clientRequest.split('&'))
and add a third level to it to handle & preserve the 2nd-level dictionaries? What would the syntax be? If not, I suppose I'll have to split by & and then check & handle splits that contain : but even then I can't figure out the syntax. Can someone help? Thanks!
I basically took Kyle's answer and made it more future-friendly:
def dictelem(input):
parts = input.split('&')
listing = [part.split('=') for part in parts]
result = {}
for entry in listing:
head, tail = entry[0], ''.join(entry[1:])
if ':' in tail:
entries = tail.split(',')
result.update({ head : dict(e.split(':') for e in entries) })
else:
result.update({head: tail})
return result
Here's a two-liner that does what I think you want:
dictelem = lambda x: x if ':' not in x[1] else [x[0],dict(y.split(':') for y in x[1].split(','))]
a = dict(dictelem(x.split('=')) for x in input.split('&'))
Can I take the slick two-level delimitation parsing command that I've found:
requestDict = dict(line.split('=') for line in clientRequest.split('&'))
and add a third level to it to handle & preserve the 2nd-level dictionaries?
Of course you can, but (a) you probably don't want to, because nested comprehensions beyond two levels tend to get unreadable, and (b) this super-simple syntax won't work for cases like yours, where only some of the data can be turned into a dict.
For example, what should happen with 'PC'? Do you want to make that into {'PC': None}? Or maybe the set {'PC'}? Or the list ['PC']? Or just leave it alone? You have to decide, and write the logic for that, and trying to write it as an expression will make your decision very hard to read.
So, let's put that logic in a separate function:
def parseCommasAndColons(s):
bits = [bit.split(':') for bit in s.split(',')]
try:
return dict(bits)
except ValueError:
return bits
This will return a dict like {'ip': '192.168.10.10', 'deviceid': '1234', 'optdata': '75BCD15'} or {'AMT-Cam': 'avatar'} for cases where each comma-separated component has a colon inside it, but a list like ['1342177342'] for cases where any of them don't.
Even this may be a little too clever; I might make the "is this in dictionary format" check more explicit instead of just trying to convert the list of lists and see what happens.
Either way, how would you put that back into your original comprehension?
Well, you want to call it on the value in the line.split('='). So let's add a function for that:
def parseCommasAndColonsForValue(keyvalue):
if len(keyvalue) == 2:
return keyvalue[0], parseCommasAndColons(keyvalue[1])
else:
return keyvalue
requestDict = dict(parseCommasAndColonsForValue(line.split('='))
for line in clientRequest.split('&'))
One last thing: Unless you need to run on older versions of Python, you shouldn't often be calling dict on a generator expression. If it can be rewritten as a dictionary comprehension, it will almost certainly be clearer that way, and if it can't be rewritten as a dictionary comprehension, it probably shouldn't be a 1-liner expression in the first place.
Of course breaking expressions up into separate expressions, turning some of them into statements or even functions, and naming them does make your code longer—but that doesn't necessarily mean worse. About half of the Zen of Python (import this) is devoted to explaining why. Or one quote from Guido: "Python is a bad language for code golf, on purpose."
If you really want to know what it would look like, let's break it into two steps:
>>> {k: [bit2.split(':') for bit2 in v.split(',')] for k, v in (bit.split('=') for bit in s.split('&'))}
{'clientid': [['b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0']],
'junketid': [['1342177342']],
'keyid': [['987654321']],
'md': [['AMT-Cam', 'avatar']],
'mk': [['PC']],
'playbackmode': [['st']],
'sessionid': [['']],
'userdata': [['ip', '192.168.10.10'],
['deviceid', '1234'],
['optdata', '75BCD15']],
'ver': [['6']],
'version': [['6.7.8.9012']]}
That illustrates why you can't just add a dict call for the inner level—because most of those things aren't actually dictionaries, because they had no colons. If you changed that, then it would just be this:
{k: dict(bit2.split(':') for bit2 in v.split(',')) for k, v in (bit.split('=') for bit in s.split('&'))}
I don't think that's very readable, and I doubt most Python programmers would. Reading it 6 months from now and trying to figure out what I meant would take a lot more effort than writing it did.
And trying to debug it will not be fun. What happens if you run that on your input, with missing colons? ValueError: dictionary update sequence element #0 has length 1; 2 is required. Which sequence? No idea. You have to break it down step by step to see what doesn't work. That's no fun.
So, hopefully that illustrates why you don't want to do this.
I need to access all the non-integer keys for a dict that looks like:
result = {
0 : "value 1",
1 : "value 2",
"key 1" : "value 1",
"key 2" : "value 2",
}
I am currently doing this by:
headers = [header for header in tmp_dict.keys() if not isinstance(header, int)]
My question:
Is there a way to do this without type checking?
This tmp_dict is coming out of a query using pymssql with the as_dict=True attribute, and for some reason it returns all the column names with data as expected, but also includes the same data indexed by integers. How can I get my query result as a dictionary with only the column values and data?
Thanks for your help!
PS - Despite my issues being resolved by potentially answering 2, I'm curious how this can be done without type checking. Mainly for the people who say "never do type checking, ever."
With regard to your question about type checking, the duck-type approach would be to see whether it can be converted to or used as an int.
def can_be_int(obj):
try:
int(obj)
except (TypeError, ValueError):
return False
return True
headers = [header for header in tmp_dict.keys() if not can_be_int(header)]
Note that floats can be converted to ints by truncating them, so this isn't necessarily exactly equivalent.
A slight variation on the above would be to use coerce(0, obj) in place of int(obj). This will allow any kind of object that can be converted to a common type with an integer. You could also do something like 0 + obj and 1 * obj which will check for something that can be used in a mathematical expression with integers.
You could also check to see whether its string representation is all digits:
headers = [header for header in tmp_dict.keys() if not str(header).isdigit()]
This is probably closer to a solution that doesn't use type-checking, although it will be slower, and it's of course entirely possible that a column name would be a string that is only digits! (Which would fail with many of these approaches, to be honest.)
Sometimes explicit type-checking really is the best choice, which is why the language has tools for letting you check types. In this situation I think you're fine, especially since the result dictionary is documented to have only integers and strings as keys. And you're doing it the right way by using isinstance() rather than explicitly checking type() == int.
Looking at the source code of pymssql (1.0.2), it is clear that there is no option for the module to not generate data indexed by integers. But note that data indexed by column name can be omitted if the column name is empty.
/* mssqldbmodule.c */
PyObject *fetch_next_row_dict(_mssql_connection *conn, int raise) {
[...]
for (col = 1; col <= conn->num_columns; col++) {
[...]
// add key by column name, do not add if name == ''
if (strlen(PyString_AS_STRING(name)) != 0)
if ((PyDict_SetItem(dict, name, val)) == -1)
return NULL;
// add key by column number
if ((PyDict_SetItem(dict, PyInt_FromLong(col-1), val)) == -1)
return NULL;
}
[...]
}
Regarding your first question, filtering result set by type checking is surely the best way to do that. And this is exactly how pymssql is returning data when as_dict is False:
if self.as_dict:
row = iter(self._source).next()
self._rownumber += 1
return row
else:
row = iter(self._source).next()
self._rownumber += 1
return tuple([row[r] for r in sorted(row.keys()) if type(r) == int])
The rationale behind as_dict=True is that you can access by index and by name. Normally you'd get a tuple you index into, but for compatibility reasons being able to index a dict as though it was a tuple means that code depending on column numbers can still work, without being aware that column names are available.
If you're just using result to retrieve columns (either by name or index), I don't see why you're concerned about removing them? Just carry on regardless. (Unless for some reason you plan to pickle or otherwise persist the data elsewhere...)
The best way to filter them out though, is using isinstance - duck typing in this case is actually unpythonic and inefficient. Eg:
names_only = dict( (k, v) for k,v in result.iteritems() if not isinstance(k, int) )
Instead of a try and except dance.
>>> sorted(result)[len(result)/2:]
['key 1', 'key 2']
This will remove the duplicated integer-keyed entrys. I think what you're doing is fine though.
I have a question reguarding how I would perform the following task in python.
(I use python 3k)
what I have are several variables which can yield further variables on top of those
and each of those have even more variables
for example:
a generic name would be
item_version_type =
where each part (item, version, and type) refer to different variables(here there are 3 for each)
item = item_a, item_b, item_c
version = range(1,3)
itemtype = itemtype_a, itemtype_b, itemtype_c
simply listing each name and defining it is annoying:
itema_ver1_typea =
itemb_ver1_typea =
itemc_ver1_typea =
itema_ver2_typea =
etc.
etc.
etc.
especially when I have something where one variable is dependent on something else
for example:
if value == True:
version = ver + 1
and to top it off this whole example is rather simply compared to what I'm actually
working with.
one thing I am curious about is using multiple "." type of classes such as:
item.version.type
I know that this can be done
I just can't figure out how to get a class with more than one dot
either that or if anyone can point me to a better method
Thanks for help.
Grouping of data like this can be done in three ways in Python.
First way is tuples:
myvariable = ('Sammalamma', 1, 'Text')
The second way is a dictionary:
myvariable = {'value': 'Sammalamma', 'version': 1, 'type': 'Text'}
And the third way is a class:
class MyClass(object):
def __init__(self, value, version, type):
self.value = value
self.version = version
self.type = type
>>> myvariable = MyClass('Sammalamma', 1, 'Text')
>>> myvariable.value
'Sammalamma'
>>> myvariable.version
1
>>> myvariable.type
'Text'
Which one to use in each case is up to you, although in this case I would claim that the tuple doesn't seem to be the best choice, I would go for a dictionary or a class.
None of this is unique to Python 3, it works in any version of Python.
In addition to #Lennart Regebro's answer if items are immutable:
import collections
Item = collections.namedtuple('Item', 'value version type')
items = [Item(val, 'ver'+ver, t)
for val in 'abc' for ver in '12' for t in ['typea']]
print(items[0])
# -> Item(value='a', version='ver1', type='typea')
item = items[1]
print(item.value, item.type)
# -> b typea
sorry for posting this here instead of the comments but I have no clue how to work the site here.
for clarification
what I need is basically to have be able to get an output of said such as where
I could take a broad area (item) narrow it further (version) and even further (type as in type of item like lets say types are spoon, knife, fork)
or a better description is like arm.left.lower = lower left arm
where I could also have like leg.left.lower
so I could have arm.both.upper to get both left and right upper arms
where a value would be assigned to both.
what I need is to be able to do truth tests etc. and have it return the allowable values
such as
if leg == True
output is --> leg.both.lower, leg.both.upper, leg.left.upper leg.right.upper, etc., etc., etc.
if upper == True
output is --> leg.both.upper, leg.left.upper, etc., etc., etc.
hopefully that helps
Basically I get how to get something like item.version but how do I get something
like item.version.type
I need to have it to be more specific than just item.version
I need to be able to tell if item is this and version is that then type will be x
like
item.version.type
if version == 3:
item.version = spoon.3.bent
#which is different from
if version == 2:
item.version.type = spoon.2.bent