Python: Lazy String Decoding - python

I'm writing a parser, and there is LOTS of text to decode but most of my users will only care about a few fields from all the data. So I only want to do the decoding when a user actually uses some of the data. Is this a good way to do it?
class LazyString(str):
def __init__(self, v) :
self.value = v
def __str__(self) :
r = ""
s = self.value
for i in xrange(0, len(s), 2) :
r += chr(int(s[i:i+2], 16))
return r
def p_buffer(p):
"""buffer : HASH chars"""
p[0] = LazyString(p[2])
Is that the only method I need to override?

I'm not sure how implementing a string subclass is of much benefit here. It seems to me that if you're processing a stream containing petabytes of data, whenever you've created an object that you don't need to you've already lost the game. Your first priority should be to ignore as much input as you possibly can.
You could certainly build a string-like class that did this:
class mystr(str):
def __init__(self, value):
self.value = value
self._decoded = None
#property
def decoded(self):
if self._decoded == None:
self._decoded = self.value.decode("hex")
return self._decoded
def __repr__(self):
return self.decoded
def __len__(self):
return len(self.decoded)
def __getitem__(self, i):
return self.decoded.__getitem__(i)
def __getslice__(self, i, j):
return self.decoded.__getslice__(i, j)
and so on. A weird thing about doing this is that if you subclass str, every method that you don't explicitly implement will be called on the value that's passed to the constructor:
>>> s = mystr('a0a1a2')
>>> s
 ¡¢
>>> len(s)
3
>>> s.capitalize()
'A0a1a2'

I don't see any kind on lazy evaluation in your code. The fact that you use xrange only means that the list of integers from 0 to len(s) will be generated on demand. The whole string r will be decoded during string conversion anyway.
The best way to implement lazy sequence in Python is using generators. You could try something like this:
def lazy(v):
for i in xrange(0, len(v), 2):
yield int(v[i:i+2], 16)
list(lazy("0a0a0f"))
Out: [10, 10, 15]

What you're doing is built in already:
s = "i am a string!".encode('hex')
# what you do
r = ""
for i in xrange(0, len(s), 2) :
r += chr(int(s[i:i+2], 16))
# but decoding is builtin
print r==s.decode('hex') # => True
As you can see your whole decoding is s.decode('hex').
But "lazy" decoding sounds like premature optimization to me. You'd need gigabytes of data to even notice it. Try profiling, the .decode is 50 times faster that your old code already.
Maybe you want somthing like this:
class DB(object): # dunno what data it is ;)
def __init__(self, data):
self.data = data
self.decoded = {} # maybe cache if the field data is long
def __getitem__(self, name):
try:
return self.decoded[name]
except KeyError:
# this copies the fields data
self.decoded[name] = ret = self.data[ self._get_field_slice( name ) ].decode('hex')
return ret
def _get_field_slice(self, name):
# find out what part to decode, return the index in the data
return slice( ... )
db = DB(encoded_data)
print db["some_field"] # find out where the field is, get its data and decode it

The methods you need to override really depend on how are planning to use you new string type.
However you str based type looks a little suspicious to me, have you looked into the implementation of str to check that it has the value attribute that you are setting in your __init__()? Performing a dir(str) does not indicate that there is any such attribute on str. This being the case the normal str methods will not be operating on your data at all, I doubt that is the effect you want otherwise what would be the advantage of sub-classing.
Sub-classing base data types is a little strange anyway unless you have very specific requirements. For the lazy evaluation you want you are probably better of creating your class that contains a string rather than sub-classing str and write your client code to work with that class. You will then be free to add the just in time evaluation you want in a number of ways an example using the descriptor protocol can be found in this presentation: Python's Object Model (search for "class Jit(object)" to get to the relevant section)

The question is incomplete, in that the answer will depend on details of the encoding you use.
Say, if you encode a list of strings as pascal strings (i.e. prefixed with string length encoded as a fixed-size integer), and say you want to read the 100th string from the list, you may seek() forward for each of the first 99 strings and not read their contents at all. This will give some performance gain if the strings are large.
If, OTOH, you encode a list of strings as concatenated 0-terminated stirngs, you would have to read all bytes until the 100th 0.
Also, you're speaking about some "fields" but your example looks completely different.

Related

How to access a dictionary value from within the same dictionary in Python? [duplicate]

I'm new to Python, and am sort of surprised I cannot do this.
dictionary = {
'a' : '123',
'b' : dictionary['a'] + '456'
}
I'm wondering what the Pythonic way to correctly do this in my script, because I feel like I'm not the only one that has tried to do this.
EDIT: Enough people were wondering what I'm doing with this, so here are more details for my use cases. Lets say I want to keep dictionary objects to hold file system paths. The paths are relative to other values in the dictionary. For example, this is what one of my dictionaries may look like.
dictionary = {
'user': 'sholsapp',
'home': '/home/' + dictionary['user']
}
It is important that at any point in time I may change dictionary['user'] and have all of the dictionaries values reflect the change. Again, this is an example of what I'm using it for, so I hope that it conveys my goal.
From my own research I think I will need to implement a class to do this.
No fear of creating new classes -
You can take advantage of Python's string formating capabilities
and simply do:
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item) % self
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/%(user)s',
'bin' : '%(home)s/bin'
})
print dictionary["home"]
print dictionary["bin"]
Nearest I came up without doing object:
dictionary = {
'user' : 'gnucom',
'home' : lambda:'/home/'+dictionary['user']
}
print dictionary['home']()
dictionary['user']='tony'
print dictionary['home']()
>>> dictionary = {
... 'a':'123'
... }
>>> dictionary['b'] = dictionary['a'] + '456'
>>> dictionary
{'a': '123', 'b': '123456'}
It works fine but when you're trying to use dictionary it hasn't been defined yet (because it has to evaluate that literal dictionary first).
But be careful because this assigns to the key of 'b' the value referenced by the key of 'a' at the time of assignment and is not going to do the lookup every time. If that is what you are looking for, it's possible but with more work.
What you're describing in your edit is how an INI config file works. Python does have a built in library called ConfigParser which should work for what you're describing.
This is an interesting problem. It seems like Greg has a good solution. But that's no fun ;)
jsbueno as a very elegant solution but that only applies to strings (as you requested).
The trick to a 'general' self referential dictionary is to use a surrogate object. It takes a few (understatement) lines of code to pull off, but the usage is about what you want:
S = SurrogateDict(AdditionSurrogateDictEntry)
d = S.resolve({'user': 'gnucom',
'home': '/home/' + S['user'],
'config': [S['home'] + '/.emacs', S['home'] + '/.bashrc']})
The code to make that happen is not nearly so short. It lives in three classes:
import abc
class SurrogateDictEntry(object):
__metaclass__ = abc.ABCMeta
def __init__(self, key):
"""record the key on the real dictionary that this will resolve to a
value for
"""
self.key = key
def resolve(self, d):
""" return the actual value"""
if hasattr(self, 'op'):
# any operation done on self will store it's name in self.op.
# if this is set, resolve it by calling the appropriate method
# now that we can get self.value out of d
self.value = d[self.key]
return getattr(self, self.op + 'resolve__')()
else:
return d[self.key]
#staticmethod
def make_op(opname):
"""A convience class. This will be the form of all op hooks for subclasses
The actual logic for the op is in __op__resolve__ (e.g. __add__resolve__)
"""
def op(self, other):
self.stored_value = other
self.op = opname
return self
op.__name__ = opname
return op
Next, comes the concrete class. simple enough.
class AdditionSurrogateDictEntry(SurrogateDictEntry):
__add__ = SurrogateDictEntry.make_op('__add__')
__radd__ = SurrogateDictEntry.make_op('__radd__')
def __add__resolve__(self):
return self.value + self.stored_value
def __radd__resolve__(self):
return self.stored_value + self.value
Here's the final class
class SurrogateDict(object):
def __init__(self, EntryClass):
self.EntryClass = EntryClass
def __getitem__(self, key):
"""record the key and return"""
return self.EntryClass(key)
#staticmethod
def resolve(d):
"""I eat generators resolve self references"""
stack = [d]
while stack:
cur = stack.pop()
# This just tries to set it to an appropriate iterable
it = xrange(len(cur)) if not hasattr(cur, 'keys') else cur.keys()
for key in it:
# sorry for being a duche. Just register your class with
# SurrogateDictEntry and you can pass whatever.
while isinstance(cur[key], SurrogateDictEntry):
cur[key] = cur[key].resolve(d)
# I'm just going to check for iter but you can add other
# checks here for items that we should loop over.
if hasattr(cur[key], '__iter__'):
stack.append(cur[key])
return d
In response to gnucoms's question about why I named the classes the way that I did.
The word surrogate is generally associated with standing in for something else so it seemed appropriate because that's what the SurrogateDict class does: an instance replaces the 'self' references in a dictionary literal. That being said, (other than just being straight up stupid sometimes) naming is probably one of the hardest things for me about coding. If you (or anyone else) can suggest a better name, I'm all ears.
I'll provide a brief explanation. Throughout S refers to an instance of SurrogateDict and d is the real dictionary.
A reference S[key] triggers S.__getitem__ and SurrogateDictEntry(key) to be placed in the d.
When S[key] = SurrogateDictEntry(key) is constructed, it stores key. This will be the key into d for the value that this entry of SurrogateDictEntry is acting as a surrogate for.
After S[key] is returned, it is either entered into the d, or has some operation(s) performed on it. If an operation is performed on it, it triggers the relative __op__ method which simple stores the value that the operation is performed on and the name of the operation and then returns itself. We can't actually resolve the operation because d hasn't been constructed yet.
After d is constructed, it is passed to S.resolve. This method loops through d finding any instances of SurrogateDictEntry and replacing them with the result of calling the resolve method on the instance.
The SurrogateDictEntry.resolve method receives the now constructed d as an argument and can use the value of key that it stored at construction time to get the value that it is acting as a surrogate for. If an operation was performed on it after creation, the op attribute will have been set with the name of the operation that was performed. If the class has a __op__ method, then it has a __op__resolve__ method with the actual logic that would normally be in the __op__ method. So now we have the logic (self.op__resolve) and all necessary values (self.value, self.stored_value) to finally get the real value of d[key]. So we return that which step 4 places in the dictionary.
finally the SurrogateDict.resolve method returns d with all references resolved.
That'a a rough sketch. If you have any more questions, feel free to ask.
If you, just like me wandering how to make #jsbueno snippet work with {} style substitutions, below is the example code (which is probably not much efficient though):
import string
class MyDict(dict):
def __init__(self, *args, **kw):
super(MyDict,self).__init__(*args, **kw)
self.itemlist = super(MyDict,self).keys()
self.fmt = string.Formatter()
def __getitem__(self, item):
return self.fmt.vformat(dict.__getitem__(self, item), {}, self)
xs = MyDict({
'user' : 'gnucom',
'home' : '/home/{user}',
'bin' : '{home}/bin'
})
>>> xs["home"]
'/home/gnucom'
>>> xs["bin"]
'/home/gnucom/bin'
I tried to make it work with the simple replacement of % self with .format(**self) but it turns out it wouldn't work for nested expressions (like 'bin' in above listing, which references 'home', which has it's own reference to 'user') because of the evaluation order (** expansion is done before actual format call and it's not delayed like in original % version).
Write a class, maybe something with properties:
class PathInfo(object):
def __init__(self, user):
self.user = user
#property
def home(self):
return '/home/' + self.user
p = PathInfo('thc')
print p.home # /home/thc
As sort of an extended version of #Tony's answer, you could build a dictionary subclass that calls its values if they are callables:
class CallingDict(dict):
"""Returns the result rather than the value of referenced callables.
>>> cd = CallingDict({1: "One", 2: "Two", 'fsh': "Fish",
... "rhyme": lambda d: ' '.join((d[1], d['fsh'],
... d[2], d['fsh']))})
>>> cd["rhyme"]
'One Fish Two Fish'
>>> cd[1] = 'Red'
>>> cd[2] = 'Blue'
>>> cd["rhyme"]
'Red Fish Blue Fish'
"""
def __getitem__(self, item):
it = super(CallingDict, self).__getitem__(item)
if callable(it):
return it(self)
else:
return it
Of course this would only be usable if you're not actually going to store callables as values. If you need to be able to do that, you could wrap the lambda declaration in a function that adds some attribute to the resulting lambda, and check for it in CallingDict.__getitem__, but at that point it's getting complex, and long-winded, enough that it might just be easier to use a class for your data in the first place.
This is very easy in a lazily evaluated language (haskell).
Since Python is strictly evaluated, we can do a little trick to turn things lazy:
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
d1 = lambda self: lambda: {
'a': lambda: 3,
'b': lambda: self()['a']()
}
# fix the d1, and evaluate it
d2 = Y(d1)()
# to get a
d2['a']() # 3
# to get b
d2['b']() # 3
Syntax wise this is not very nice. That's because of us needing to explicitly construct lazy expressions with lambda: ... and explicitly evaluate lazy expression with ...(). It's the opposite problem in lazy languages needing strictness annotations, here in Python we end up needing lazy annotations.
I think with some more meta-programmming and some more tricks, the above could be made more easy to use.
Note that this is basically how let-rec works in some functional languages.
The jsbueno answer in Python 3 :
class MyDict(dict):
def __getitem__(self, item):
return dict.__getitem__(self, item).format(self)
dictionary = MyDict({
'user' : 'gnucom',
'home' : '/home/{0[user]}',
'bin' : '{0[home]}/bin'
})
print(dictionary["home"])
print(dictionary["bin"])
Her ewe use the python 3 string formatting with curly braces {} and the .format() method.
Documentation : https://docs.python.org/3/library/string.html

How to think about OOP design choices for code re-usability and extensibility in Python?

Context:
I've got this composition (new vocabulary word for me) of an OneHotEncoder object:
class CharEncoder:
characters = cn.ALL_LETTERS_ARRAY
def __init__(self):
self.encoder = OneHotEncoder(sparse=False).fit(self.characters.reshape(-1, 1))
self.categories = self.encoder.categories_[0].tolist()
def transform(self, word):
word = np.array(list(word)).reshape(-1, 1)
word_vect = self.encoder.transform(word)
return word_vect
def inverse_transform(self, X):
word_arr = self.encoder.inverse_transform(X).reshape(-1,)
return ''.join(word_arr)
As you can see it has a class attribute characters which is essentially an array of all the ASCII characters plus some punctuation.
I want to make this CharEncoder class useful for more than just ASCII. Maybe someone else would really like to use a different character set, and I want to allow them to do so. Or maybe they want to encode entire words instead of individual letters... who knows!?
My problem:
I feel like there are so many design choices here that could make this code re-usable for a slightly different task. I feel overwhelmed.
Do I make the character set a class attribute or an instance attribute?
Do I write getters and setters for the character set?
Do I instead write some parent class, and sub-classes for different character sets.
Or do I make users pass their own OneHotEncoder object to my class, and not worry about it myself?
My question:
What are some considerations that might help guide my design choice here?
I'd just make characters an instance attribute with a default value.
class CharEncoder:
def __init__(self, characters=cn.ALL_LETTERS_ARRAY):
self.characters = characters
self.encoder = OneHotEncoder(sparse=False).fit(self.characters.reshape(-1, 1))
self.categories = self.encoder.categories_[0].tolist()
Caution: If cn.ALL_LETTERS_ARRAY is mutable (ie a Python list or a numpy array), use None as a sentinel value:
def __init__(self, characters=None):
self.characters = characters or cn.ALL_LETTERS_ARRAY
# a shorter version for
# if characters is None:
# self.characters = cn.ALL_LETTERS_ARRAY
# else:
# self.characters = characters
# with a small caveat that self.characters can't be set to
# an empty string/list/array/dict because these evaluate to False
Usage:
default_chars_encoder = CharEncoder() # using the default cn.ALL_LETTERS_ARRAY
custom_chars_encoder = CharEncoder(CUSTOM_CHARCTERS_SET) # using CUSTOM_CHARCTERS_SET

How to compose two functions whose outer function supplies arguments to the inner function

I have two similar codes that need to be parsed and I'm not sure of the most pythonic way to accomplish this.
Suppose I have two similar "codes"
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
both codes end with $$otherthing and contain a number of values separated by -
At first I thought of using functools.wrap to separate some of the common logic from the logic specific to each type of code, something like this:
from functools import wraps
def parse_secret(f):
#wraps(f)
def wrapper(code, *args):
_code = code.split('$$')[0]
return f(code, *_code.split('-'))
return wrapper
#parse_secret
def parse_code_1b(code, a, b, c):
a = a.split('|')[0]
return (a,b,c)
#parse_secret
def parse_code_2b(code, a, b):
b = b.split('|')[1]
return (a,b)
However doing it this way makes it kind of confusing what parameters you should actually pass to the parse_code_* functions i.e.
parse_code_1b(secret_code_1)
parse_code_2b(secret_code_2)
So to keep the formal parameters of the function easier to reason about I changed the logic to something like this:
def _parse_secret(parse_func, code):
_code = code.split('$$')[0]
return parse_func(code, *_code.split('-'))
def _parse_code_1(code, a, b, c):
"""
a, b, and c are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
a = a.split('|')[0]
return (a,b,c)
def _parse_code_2(code, a, b):
"""
a and b are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
b = b.split('|')[1]
return (a,b)
def parse_code_1(code):
return _parse_secret(_parse_code_1, code)
def parse_code_2(code):
return _parse_secret(_parse_code_2, code)
Now it's easier to reason about what you pass to the functions:
parse_code_1(secret_code_1)
parse_code_2(secret_code_2)
However this code is significantly more verbose.
Is there a better way to do this? Would an object-oriented approach with classes make more sense here?
repl.it example
repl.it example
Functional approaches are more concise and make more sense.
We can start from expressing concepts in pure functions, the form that is easiest to compose.
Strip $$otherthing and split values:
parse_secret = lambda code: code.split('$$')[0].split('-')
Take one of inner values:
take = lambda value, index: value.split('|')[index]
Replace one of the values with its inner value:
parse_code = lambda values, p, q: \
[take(v, q) if p == i else v for (i, v) in enumerate(values)]
These 2 types of codes have 3 differences:
Number of values
Position to parse "inner" values
Position of "inner" values to take
And we can compose parse functions by describing these differences. Split values are keep packed so that things are easier to compose.
compose = lambda length, p, q: \
lambda code: parse_code(parse_secret(code)[:length], p, q)
parse_code_1 = compose(3, 0, 0)
parse_code_2 = compose(2, 1, 1)
And use composed functions:
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
results = [parse_code_1(secret_code_1), parse_code_2(secret_code_2)]
print(results)
I believe something like this could work:
secret_codes = ['asdf|qwer-sdfg-wert$$otherthing', 'qwersdfg-qw|er$$otherthing']
def parse_code(code):
_code = code.split('$$')
if '-' in _code[0]:
return _parse_secrets(_code[1], *_code[0].split('-'))
return _parse_secrets(_code[0], *_code[1].split('-'))
def _parse_secrets(code, a, b, c=None):
"""
a, b, and c are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
if c is not None:
return a.split('|')[0], b, c
return a, b.split('|')[1]
for secret_code in secret_codes:
print(parse_code(secret_code))
Output:
('asdf', 'sdfg', 'wert')
('qwersdfg', 'er')
I'm not sure about your secret data structure but if you used the index of the position of elements with data that has | in it and had an appropriate number of secret data you could also do something like this and have an infinite(well almost) amount of secrets potentially:
def _parse_secrets(code, *data):
"""
data is descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
i = 0
decoded_secrets = []
for secret in data:
if '|' in secret:
decoded_secrets.append(secret.split('|')[i])
else:
decoded_secrets.append(secret)
i += 1
return tuple(decoded_secrets)
I'm really not sure what exactly you mean. But I came with idea which might be what you are looking for.
What about using a simple function like this:
def split_secret_code(code):
return [code] + code[:code.find("$$")].split("-")
And than just use:
parse_code_1(*split_secret_code(secret_code_1))
I'm not sure exactly what constraints you're working with, but it looks like:
There are different types of codes with different rules
The number of dash separated args can vary
Which arg has a pipe can vary
Straightforward Example
This is not too hard to solve, and you don't need fancy wrappers, so I would just drop them because it adds reading complexity.
def pre_parse(code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
def parse_type_1(code):
dash_args = pre_parse(code)
dash_args[0], toss = dash_args[0].split('|')
return dash_args
def parse_type_2(code):
dash_args = pre_parse(code)
toss, dash_args[1] = dash_args[1].split('|')
return dash_args
# Example call
parse_type_1(secret_code_1)
Trying to answer question as stated
You can supply arguments in this way by using python's native decorator pattern combined with *, which rolls/unrolls positional arguments into a tuple, so you don't need to know exactly how many there are.
def dash_args(code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
def pre_parse(f):
def wrapper(code):
# HERE is where the outer function, the wrapper,
# supplies arguments to the inner function.
return f(code, *dash_args(code))
return wrapper
#pre_parse
def parse_type_1(code, *args):
new_args = list(args)
new_args[0], toss = args[0].split('|')
return new_args
#pre_parse
def parse_type_2(code, *args):
new_args = list(args)
toss, new_args[1] = args[1].split('|')
return new_args
# Example call:
parse_type_1(secret_code_1)
More Extendable Example
If for some reason you needed to support many variations on this kind of parsing, you could use a simple OOP setup, like
class BaseParser(object):
def get_dash_args(self, code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
class PipeParser(BaseParser):
def __init__(self, arg_index, split_index):
self.arg_index = arg_index
self.split_index = split_index
def parse(self, code):
args = self.get_dash_args(code)
pipe_arg = args[self.arg_index]
args[self.arg_index] = pipe_arg.split('|')[self.split_index]
return args
# Example call
pipe_parser_1 = PipeParser(0, 0)
pipe_parser_1.parse(secret_code_1)
pipe_parser_2 = PipeParser(1, 1)
pipe_parser_2.parse(secret_code_2)
My suggestion attempts the following:
to be non-verbose enough
to separate common and specific logic in a clear way
to be sufficiently extensible
Basically, it separates common and specific logic into different functions (you could do the same using OOP). The thing is that it uses a mapper variable that contains the logic to select a specific parser, according to each code's content. Here it goes:
def parse_common(code):
"""
Provides common parsing logic.
"""
encoded_components = code.split('$$')[0].split('-')
return encoded_components
def parse_code_1(code, components):
"""
Specific parsing for type-1 codes.
"""
components[0] = components[0].split('|')[0] # decoding some type-1 component
return tuple([c for c in components])
def parse_code_2(code, components):
"""
Specific parsing for type-2 codes.
"""
components[1] = components[1].split('|')[1] # decoding some type-2 component
return tuple([c for c in components])
def parse_code_3(code, components):
"""
Specific parsing for type-3 codes.
"""
components[2] = components[2].split('||')[0] # decoding some type-3 component
return tuple([c for c in components])
# ... and so on, if more codes need to be added ...
# Maps specific parser, according to the number of components
CODE_PARSER_SELECTOR = [
(3, parse_code_1),
(2, parse_code_2),
(4, parse_code_3)
]
def parse_code(code):
# executes common parsing
components = parse_common(code)
# selects specific parser
parser_info = [s for s in CODE_PARSER_SELECTOR if len(components) == s[0]]
if parser_info is not None and len(parser_info) > 0:
parse_func = parser_info[0][1]
return parse_func(code, components)
else:
raise RuntimeError('No parser found for code: %s' % code)
secret_codes = [
'asdf|qwer-sdfg-wert$$otherthing', # type 1
'qwersdfg-qw|er$$otherthing', # type 2
'qwersdfg-hjkl-yui||poiuy-rtyu$$otherthing' # type 3
]
print [parse_code(c) for c in secret_codes]
Are you married to the string parsing? If you are passing variables with values and are in no need for variable names you can "pack" them into integer.
If you are working with cryptography you can formulate a long hexadecimal number of characters and then pass it as int with "stop" bytes (0000 for example since "0" is actually 48 try: chr(48) ) and if you are married to a string I would suggest a lower character byte identifier for example ( 1 -> aka try: chr(1) ) so you can scan the integer and bit shift it by 8 to get bytes with 8 bit mask ( this would look like (secret_code>>8)&0xf.
Hashing works in similar manner since one variable with somename and somevalue, somename and somevalue can be parsed as integer and then joined with stop module, then retrieved when needed.
Let me give you an example for hashing
# lets say
a = 1
# of sort hashing would be
hash = ord('a')+(0b00<<8)+(1<<16)
#where a hashed would be 65633 in integer value on 64 bit computer
# and then you just need to find a 0b00 aka separator
if you want to use only variables ( names don't matter ) then you need to hash only variable value so the size of parsed value is a lot smaller ( not name part and no need for separator (0b00) and you can use separator cleverly to divide necessary data one fold (0b00) twofolds (0b00, 0b00<<8) etc.
a = 1
hash = a<<8 #if you want to shift it 1 byte
But if you want to hide it and you need cryptography example, you can do the above methods and then scramble, shift ( a->b ) or just convert it to another type later. You just need to figure out the order of operations you are doing. Since a-STOP-b-PASS-c is not equal to a-PASS-b-STOP-c.
You can find bitwise operators here binary operators
But have in mind that 65 is number and 65 is a character as well it only matters where are those bytes sent, if they are sent to graphics card they are pixels, if they are sent to audiocard they are sounds and if they are sent to mathematical processing they are numbers, and as programmers that is our playground.
But if this is not answering your problem, you can always use map.
def mapProcces(proccesList,listToMap):
currentProcces = proccesList.pop(0)
listToMap = map( currentProcces, listToMap )
if proccesList != []:
return mapProcces( proccesList, listToMap )
else:
return list( listToMap )
then you could map it:
mapProcces([str.lower,str.upper,str.title],"stackowerflow")
or you can simply replace every definitive separator with space and then split space.
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
separ = "|,-,$".split(",")
secret_code_1 = [x if x not in separ else " " for x in secret_code_1]# replaces separators with empty chars
secret_code_1 = "".join(secret_code_1) #coverts list to a string
secret_code_1 = secret_code_1.split(" ") #it splited them to list
secret_code_1 = filter(None,secret_code_1) # filter empty chars ''
first,second,third,fourth,other = secret_code_1
And there you have it, your secret_code_1 is split and assigned to definitive amount of variables. Of course " " is used as declaration, you can use whatever you want, you can replace every separator with "someseparator" if you want and then split with "someseparator". You can also use str.replace function to make it clearer.
I hope this helps
I think you need to provide more information of exactly what you're trying to achieve, and what the clear constraints are. For instance, how many times can $$ occur? Will there always be a | dividor? That kind of thing.
To answer your question broadly, an elegant pythonic way to do this is to use python's unpacking feature, combined with split. for example
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
first_$$_part, last_$$_part = secret_code_1.split('$$')
By using this technique, in addition to simple if blocks, you should be able to write an elegant parser.
If I understand it correctly, you want to be able to define your functions as if the parsed arguments are passed, but want to pass the unparsed code to the functions instead.
You can do that very similarly to the first solution you presented.
from functools import wraps
def parse_secret(f):
#wraps(f)
def wrapper(code):
args = code.split('$$')[0].split('-')
return f(*args)
return wrapper
#parse_secret
def parse_code_1(a, b, c):
a = a.split('|')[0]
return (a,b,c)
#parse_secret
def parse_code_2(a, b):
b = b.split('|')[1]
return (a,b)
For the secret codes mentioned in the examples,
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
print (parse_code_1(secret_code_1))
>> ('asdf', 'sdfg', 'wert')
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
print (parse_code_2(secret_code_2))
>> ('qwersdfg', 'er')
I haven't understood anything of your question, neither your code, but maybe a simple way to do it is by regular expression?
import re
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
def parse_code(code):
regex = re.search('([\w-]+)\|([\w-]+)\$\$([\w]+)', code) # regular expression
return regex.group(3), regex.group(1).split("-"), regex.group(2).split("-")
otherthing, first_group, second_group = parse_code(secret_code_2)
print(otherthing) # otherthing, string
print(first_group) # first group, list
print(second_group) # second group, list
The output:
otherthing
['qwersdfg', 'qw']
['er']

Reading in a CSV file AND sorting it in Python

I am trying to read in a CSV file that looks like this:
ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60
Here is some code I have been experimenting with.
class jewel:
def __init__(gem, name, carat, value):
gem.name = name
gem.carot = carat
gem.value = value
def __repr__(gem):
return repr((gem.name, gem.carat, gem.value))
jewel_objects = [jewel('diamond', '1', 400),
jewel('ruby', '2', 200),
jewel('opal', '1', 600),
]
aList = [sorted(jewel_objects, key=lambda jewel: (jewel.value))]
print aList
I would like to read in the values and assign them to name, carat, and value but I'm not sure how to do so. Then once I get them read in I would like to sort them by value per carat so value/carat. I have done quite a bit of searching and have came up blank. Thank you very much for your help in advance.
You need to do two things here, the first is actually loading the data into the objects. I recommend you look at the 'csv' module in the standard python library for this. It's very complete and will read each row and make it easily accessable
CSV docs: http://docs.python.org/library/csv.html
I would create a list of the objects, and then implement either an cmp function in your object, or (if you're using an older version of python) you can pass a function to sorted() that would define it. You can get more info about sorting in the python wiki
Wiki docs: http://wiki.python.org/moin/HowTo/Sorting
You would implement the cmp function like this in your class (this can be made a bit more efficent, but I'm being descriptive here)
def __cmp__(gem, other):
if (gem.value / gem.carot) < (other.value / other.carot):
return -1
elif (gem.value / gem.carot) > (other.value / other.carot):
return 1
else:
return 0
Python has a csv module that should be really helpful to you.
http://docs.python.org/library/csv.html
You can use numpy structured arrays along with the csv module and use numpy.sort() to sort the data. The following code should work. Suppose your csv file is named geminfo.csv
import numpy as np
import csv
fileobj = open('geminfo.csv','rb')
csvreader = csv.reader(fileobj)
# Convert data to a list of lists
importeddata = list(csvreader)
# Calculate Value/Carat and add it to the imported data
# and convert each entry to a tuple
importeddata = [tuple(entry + [float(entry[2])/entry[1]]) for entry in importeddata]
One way to sort this data is to use numpy as shown below.
# create an empty array
data = np.zeros(len(importeddata), dtype = [('Stone Name','a20'),
('Carats', 'f4'),
('Value', 'f4'),
('valuepercarat', 'f4')]
)
data[:] = importeddata[:]
datasortedbyvaluepercarat = np.sort(data, order='valuepercarat')
import csv
import operator
class Jewel(object):
#classmethod
def fromSeq(cls, seq):
return cls(*seq)
def __init__(self, name, carat, value):
self.name = str(name)
self.carat = float(carat)
self.value = float(value)
def __repr__(self):
return "{0}{1}".format(self.__class__.__name__, (self.name, self.carat, self.value))
#property
def valuePerCarat(self):
return self.value / self.carat
def loadJewels(fname):
with open(fname, 'rb') as inf:
incsv = csv.reader(inf)
jewels = [Jewel.fromSeq(row) for row in incsv if row]
jewels.sort(key=operator.attrgetter('valuePerCarat'))
return jewels
def main():
jewels = loadJewels('jewels.csv')
for jewel in jewels:
print("{0:35} ({1:>7.2f})".format(jewel, jewel.valuePerCarat))
if __name__=="__main__":
main()
produces
Jewel('amethyst', 2.0, 50.0) ( 25.00)
Jewel('ruby', 2.0, 100.0) ( 50.00)
Jewel('malachite', 1.0, 60.0) ( 60.00)
Jewel('emerald', 3.0, 250.0) ( 83.33)
Jewel('sapphire', 2.0, 500.0) ( 250.00)
Jewel('opal', 1.0, 300.0) ( 300.00)
Jewel('diamond', 1.0, 400.0) ( 400.00)
For parsing real-world CSV (comma-separated values) data you'll want to use the CSV module that's included with recent versions of Python.
CSV is a set of conventions rather than standard. The sample data you show is simple and regular, but CSV generally has some ugly corner cases for quoting where the contents of any field might have embedded commas, for example.
Here is a very crude program, based on your code, which does naïve parsing of the data (splitting by lines, then splitting each line on commas). It will not handle any data which doesn't split to precisely the correct number of fields, nor any where the numeric fields aren't correctly parsed by the Python int() and float() functions (object constructors). In other words this contains no error checking nor exception handling.
However, I've kept it deliberately simple so it can be easily compared to your rough notes. Also note that I've used the normal Python conventions regarding "self" references in the class definition. (About the only time one would use names other than "self" for these is when doing "meta-class" programming ... writing classes which dynamically instantiate other classes. Any other case will almost certainly cause serious concerns in the minds of any experienced Python programmers looking at your code).
#!/usr/bin/env python
class Jewel:
def __init__(self, name, carat, value):
self.name = name
self.carat = int(carat)
self.value = float(value)
assert self.carat != 0 # Division by zero would result from this
def __repr__(self):
return repr((self.name, self.carat, self.value))
if __name__ == '__main__':
sample='''ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60'''
these_jewels = list()
for each_line in sample.split('\n'):
gem_type, carat, value = each_line.split(',')
these_jewels.append(Jewel(gem_type, carat, value))
# Equivalently:
# these_jewels.append(Jewel(*each_line.split(',')))
decorated = [(x.value/x.carat, x) for x in these_jewels]
results = [x[1] for x in sorted(decorated)]
print '\n'.join([str(x) for x in results])
The parsing here is done simply using the string .split() method, and the data is extracted into names using Python's "tuple unpacking" syntax (this would fail if any line of input were to have the wrong number of fields).
The alternative syntax to those two lines uses Python's "apply" syntax. The * prefix on the argument causes it to be unpacked into separate arguments which are passed to the Jewel() class instantiation.
This code also uses the widespread (and widely recommended) DSU (decorate, sort, undecorate) pattern for sorting on some field of your data. I "decorate" the data by creating a series of tuples: (computed value, object reference), then "undecorate" the sorted data in a way which I hope is clear to you. (It would be immediately clear to any experienced Python programmer).
Yes the whole DSU could be reduced to a single line; I've separated it here for legibility and pedagogical purposes.
Again this sample code is purely for your edification. You should use the CSV module on any real-world data; and you should introduce exception handling either in the parsing or in the Jewel.__init__ handling (for converting the numeric data into the correct Python types.
(Also note that you should consider using Python's Decimal module rather than float()s for representing monetary values ... or at least storing the values in cents or mils and using your own functions to represent those as dollars and cents).

sharing a string between two objects

I want two objects to share a single string object. How do I pass the string object from the first to the second such that any changes applied by one will be visible to the other? I am guessing that I would have to wrap the string in a sort of buffer object and do all sorts of complexity to get it to work.
However, I have a tendency to overthink problems, so undoubtedly there is an easier way. Or maybe sharing the string is the wrong way to go? Keep in mind that I want both objects to be able to edit the string. Any ideas?
Here is an example of a solution I could use:
class Buffer(object):
def __init__(self):
self.data = ""
def assign(self, value):
self.data = str(value)
def __getattr__(self, name):
return getattr(self.data, name)
class Descriptor(object):
def __get__(self, instance, owner):
return instance._buffer.data
def __set__(self, instance, value):
if not hasattr(instance, "_buffer"):
if isinstance(value, Buffer):
instance._buffer = value
return
instance._buffer = Buffer()
instance._buffer.assign(value)
class First(object):
data = Descriptor()
def __init__(self, data):
self.data = data
def read(self, size=-1):
if size < 0:
size = len(self.data)
data = self.data[:size]
self.data = self.data[size:]
return data
class Second(object):
data = Descriptor()
def __init__(self, data):
self.data = data
def add(self, newdata):
self.data += newdata
def reset(self):
self.data = ""
def spawn(self):
return First(self._buffer)
s = Second("stuff")
f = s.spawn()
f.data == s.data
#True
f.read(2)
#"st"
f.data
# "uff"
f.data == s.data
#True
s.data
#"uff"
s._buffer == f._buffer
#True
Again, this seems like absolute overkill for what seems like a simple problem. As well, it requires the use of the Buffer class, a descriptor, and the descriptor's impositional _buffer variable.
An alternative is to put one of the objects in charge of the string and then have it expose an interface for making changes to the string. Simpler, but not quite the same effect.
I want two objects to share a single
string object.
They will, if you simply pass the string -- Python doesn't copy unless you tell it to copy.
How do I pass the string object from
the first to the second such that any
changes applied by one will be visible
to the other?
There can never be any change made to a string object (it's immutable!), so your requirement is trivially met (since a false precondition implies anything).
I am guessing that I would have to
wrap the string in a sort of buffer
object and do all sorts of complexity
to get it to work.
You could use (assuming this is Python 2 and you want a string of bytes) an array.array with a typecode of c. Arrays are mutable, so you can indeed alter them (with mutating methods -- and some operators, which are a special case of methods since they invoke special methods on the object). They don't have the myriad non-mutating methods of strings, so, if you need those, you'll indeed need a simple wrapper (delegating said methods to the str(...) of the array that the wrapper also holds).
It doesn't seem there should be any special complexity, unless of course you want to do something truly weird as you seem to given your example code (have an assignment, i.e., a *rebinding of a name, magically affect a different name -- that has absolutely nothing to do with whatever object was previously bound to the name you're rebinding, nor does it change that object in any way -- the only object it "changes" is the one holding the attribute, so it's obvious that you need descriptors or other magic on said object).
You appear to come from some language where variables (and particularly strings) are "containers of data" (like C, Fortran, or C++). In Python (like, say, in Java), names (the preferred way to call what others call "variables") always just refer to objects, they don't contain anything except exactly such a reference. Some objects can be changed, some can't, but that has absolutely nothing to do with the assignment statement (see note 1) (which doesn't change objects: it rebinds names).
(note 1): except of course that rebinding an attribute or item does alter the object that "contains" that item or attribute -- objects can and do contain, it's names that don't.
Just put your value to be shared in a list, and assign the list to both objects.
class A(object):
def __init__(self, strcontainer):
self.strcontainer = strcontainer
def upcase(self):
self.strcontainer[0] = self.strcontainer[0].upper()
def __str__(self):
return self.strcontainer[0]
# create a string, inside a shareable list
shared = ['Hello, World!']
x = A(shared)
y = A(shared)
# both objects have the same list
print id(x.strcontainer)
print id(y.strcontainer)
# change value in x
x.upcase()
# show how value is changed in both x and y
print str(x)
print str(y)
Prints:
10534024
10534024
HELLO, WORLD!
HELLO, WORLD!
i am not a great expert in python, but i think that if you declare a variable in a module and add a getter/setter to the module for this variable you will be able to share it this way.

Categories