Reducing the number of arguments in function in Python? - python

My question is about how to deal with the piece of code where I am using the Caesar´s cipher.
Functions Decrypt and Encrypt have to deal with the limits of the alphabet (A - Z and a - z). I tried to write the two possible cycles for both alphabets in one cycle function named cycleencrypt.
But the function takes about 6 arguments and I have read somewhere that is less readable and understandable having more than 3 arguments in one function so my question is:
Should I reduce the number of arguments by splitting in two functions and make the piece of code longer (but maybe more understandable)?
Thanks for any answer I aprreciate that.
EDIT: Docstrings around the functions were deleted to make visible the
main purpose of my question.
def offsetctrl(offset):
while offset < 0:
offset += 26
return offset
def cycleencrypt(string, offset, index, listing, first, last):
offset = offsetctrl(offset)
if string >= ord(first) and string <= ord(last):
string += offset
while string > ord(last):
string = ord(first) + (string - ord(last) -1)
listing[index] = chr(string)
Cycle for encrypting with a lots of arguments and control of negative offset´s
def encrypt(retezec, offset):
listing = list(retezec)
for index in range(0, len(retezec)):
string = ord(retezec[index])
cycleencrypt(string, offset, index, listing, 'A', 'Z')
cycleencrypt(string, offset, index, listing, 'a', 'z')
print(''.join(listing))
main encryption part taking many arguments in two lines with printing
def decrypt(retezec, offset):
return encrypt(retezec, -offset)
if __name__ == "__main__":
encrypt("hey fellow how is it going", 5)
decrypt("mjd kjqqtb mtb nx ny ltnsl", 5)

In this kind of situation, it's often better to write your code as a class. Your class's constructor could take just the minimum number of arguments that are required (which may be none at all!), and then optional arguments could be set as properties of the class or by using other methods.
When designing a class like this, I find it's most useful to start by writing the client code first -- that is, write the code that will use the class first, and then work backwards from there to design the class.
For example, I might want the code to look something like this:
cypher = Cypher()
cypher.offset = 17
cypher.set_alphabet('A', 'Z')
result = cypher.encrypt('hey fellow how is it going')
Hopefully it should be clear how to work from here to the design of the Cypher class, but if not, please ask a question on Stack Overflow about that!
If you want to provide encrypt and decrypt convenience methods, it's still easy to do. For example, you can write a function like:
def encrypt(text, offset):
cypher = Cypher()
cypher.offset = offset
return cypher.encrypt(text)

Here is the docstring of datetime.datetime:
class datetime(date):
"""datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])
...
"""
And the signature of its constructor:
def __new__(cls, year, month=None, day=None, hour=0, minute=0, second=0, microsecond=0, tzinfo=None):
What we could learn from it:
Add exactly as many arguments as it makes sense to add
Use parameters and to give sensible default values to arguments
Side thought: do you think users of your library would should use cycleencrypt()? You could mark it private (with underscore), so everybody will see it's not a public API and they should use encrypt() and decrypt() instead.

The number of arguments doesn't really matters as long as there are not a dozen of them (maybe someone can link to what you mention about having more than 3 arguments, I may be wrong).
To be more readable in the definition of a function, write comments by following docstrings convention.
To be more readable at the call of a function, gives default values in the definition as much as possible for the more useful values (for example, offset can have the value 1 by default, and index 0).
Either way, for a long line, use PEP8 guidelines which describes a way to jump lines correctly (the lines must not exceed 80 characters, according to PEP8).
def cycleencrypt(string, offset=1, index=0,
listing, first, last):
"""Description
:param string: description
:param offset: description
:param index: description
:param listing: description
:param first: description
:param last: description
:return description
"""
offset = offsetctrl(offset)
if string >= ord(first) and string <= ord(last):
string += offset
while string > ord(last):
string = ord(first) + (string - ord(last) - 1)
listing[index] = chr(string)

Related

Implementing new data type in Python - without classes

I'm trying to implement new data type "Fractions" in Python to represents fractions, where numenator and denominator are both integers. Moreover, I have to implement four basic arithmetic operations. The trick is, I can't use classes in this task.
I thoght maybe tuples can be a good idea but I really don't know how to approach this.
Is there an easy way to solve such a problem? Any hint would really help me.
You have two problems. 1) How to encapsulate the data, and 2) How to operate on the data.
First, let's solve encapsulation. Just put everything you need in a tuple:
half = (1,2)
whole = (1,1)
answer = (42,1)
See? The first item is the numerator, the second is the denominator.
Now you need a way to operate on the data. Since we can't use methods, we'll just use regular functions:
def mul(a,b):
'Multiply two fractions'
return (a[0]*b[0], a[1]*b[1])
Similarly, implement add(a,b), negate(a), sub(a,b), etc. You might need a simplify(), so you don't end up with 10240000/20480000 after a while.
To make our object-oriented-without-classes suite complete, we need a constructor:
def make_frac(num, denom):
'Create a fraction with the indicated numerate and denominator'
return (num, denom)
Finally, place all of these functions in a module, and your task is complete. The user of your library will write something like this:
import your_fraction_lib
half = your_fraction_lib.make_frac(1,2)
quarter = your_fraction_lib.mul(half, half)
three_quaters = your_fraction_lib.add(half, quarter)
If you want to troll your teacher, you could do something along the lines of:
def construct(values):
def mul(other_fraction):
new_numerator = values['numerator']*other_fraction['values']['numerator']
new_denominator = values['denominator']*other_fraction['values']['denominator']
new_values = {'numerator':new_numerator,'denominator':new_denominator}
return(construct(new_values))
return({'values':{'numerator':values['numerator'],'denominator':values['denominator']},'mul':mul})
This allows you to construct objects that contain a mul function that acts much like a class method:
x = construct({'numerator':1,'denominator':2})
y = construct({'numerator':3,'denominator':5})
product = x['mul'](y)
print(product['values']['numerator'],product['values']['denominator'])
>>3 10

How to compose two functions whose outer function supplies arguments to the inner function

I have two similar codes that need to be parsed and I'm not sure of the most pythonic way to accomplish this.
Suppose I have two similar "codes"
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
both codes end with $$otherthing and contain a number of values separated by -
At first I thought of using functools.wrap to separate some of the common logic from the logic specific to each type of code, something like this:
from functools import wraps
def parse_secret(f):
#wraps(f)
def wrapper(code, *args):
_code = code.split('$$')[0]
return f(code, *_code.split('-'))
return wrapper
#parse_secret
def parse_code_1b(code, a, b, c):
a = a.split('|')[0]
return (a,b,c)
#parse_secret
def parse_code_2b(code, a, b):
b = b.split('|')[1]
return (a,b)
However doing it this way makes it kind of confusing what parameters you should actually pass to the parse_code_* functions i.e.
parse_code_1b(secret_code_1)
parse_code_2b(secret_code_2)
So to keep the formal parameters of the function easier to reason about I changed the logic to something like this:
def _parse_secret(parse_func, code):
_code = code.split('$$')[0]
return parse_func(code, *_code.split('-'))
def _parse_code_1(code, a, b, c):
"""
a, b, and c are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
a = a.split('|')[0]
return (a,b,c)
def _parse_code_2(code, a, b):
"""
a and b are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
b = b.split('|')[1]
return (a,b)
def parse_code_1(code):
return _parse_secret(_parse_code_1, code)
def parse_code_2(code):
return _parse_secret(_parse_code_2, code)
Now it's easier to reason about what you pass to the functions:
parse_code_1(secret_code_1)
parse_code_2(secret_code_2)
However this code is significantly more verbose.
Is there a better way to do this? Would an object-oriented approach with classes make more sense here?
repl.it example
repl.it example
Functional approaches are more concise and make more sense.
We can start from expressing concepts in pure functions, the form that is easiest to compose.
Strip $$otherthing and split values:
parse_secret = lambda code: code.split('$$')[0].split('-')
Take one of inner values:
take = lambda value, index: value.split('|')[index]
Replace one of the values with its inner value:
parse_code = lambda values, p, q: \
[take(v, q) if p == i else v for (i, v) in enumerate(values)]
These 2 types of codes have 3 differences:
Number of values
Position to parse "inner" values
Position of "inner" values to take
And we can compose parse functions by describing these differences. Split values are keep packed so that things are easier to compose.
compose = lambda length, p, q: \
lambda code: parse_code(parse_secret(code)[:length], p, q)
parse_code_1 = compose(3, 0, 0)
parse_code_2 = compose(2, 1, 1)
And use composed functions:
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
results = [parse_code_1(secret_code_1), parse_code_2(secret_code_2)]
print(results)
I believe something like this could work:
secret_codes = ['asdf|qwer-sdfg-wert$$otherthing', 'qwersdfg-qw|er$$otherthing']
def parse_code(code):
_code = code.split('$$')
if '-' in _code[0]:
return _parse_secrets(_code[1], *_code[0].split('-'))
return _parse_secrets(_code[0], *_code[1].split('-'))
def _parse_secrets(code, a, b, c=None):
"""
a, b, and c are descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
if c is not None:
return a.split('|')[0], b, c
return a, b.split('|')[1]
for secret_code in secret_codes:
print(parse_code(secret_code))
Output:
('asdf', 'sdfg', 'wert')
('qwersdfg', 'er')
I'm not sure about your secret data structure but if you used the index of the position of elements with data that has | in it and had an appropriate number of secret data you could also do something like this and have an infinite(well almost) amount of secrets potentially:
def _parse_secrets(code, *data):
"""
data is descriptive parameters that explain
the different components in the secret code
returns a tuple of the decoded parts
"""
i = 0
decoded_secrets = []
for secret in data:
if '|' in secret:
decoded_secrets.append(secret.split('|')[i])
else:
decoded_secrets.append(secret)
i += 1
return tuple(decoded_secrets)
I'm really not sure what exactly you mean. But I came with idea which might be what you are looking for.
What about using a simple function like this:
def split_secret_code(code):
return [code] + code[:code.find("$$")].split("-")
And than just use:
parse_code_1(*split_secret_code(secret_code_1))
I'm not sure exactly what constraints you're working with, but it looks like:
There are different types of codes with different rules
The number of dash separated args can vary
Which arg has a pipe can vary
Straightforward Example
This is not too hard to solve, and you don't need fancy wrappers, so I would just drop them because it adds reading complexity.
def pre_parse(code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
def parse_type_1(code):
dash_args = pre_parse(code)
dash_args[0], toss = dash_args[0].split('|')
return dash_args
def parse_type_2(code):
dash_args = pre_parse(code)
toss, dash_args[1] = dash_args[1].split('|')
return dash_args
# Example call
parse_type_1(secret_code_1)
Trying to answer question as stated
You can supply arguments in this way by using python's native decorator pattern combined with *, which rolls/unrolls positional arguments into a tuple, so you don't need to know exactly how many there are.
def dash_args(code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
def pre_parse(f):
def wrapper(code):
# HERE is where the outer function, the wrapper,
# supplies arguments to the inner function.
return f(code, *dash_args(code))
return wrapper
#pre_parse
def parse_type_1(code, *args):
new_args = list(args)
new_args[0], toss = args[0].split('|')
return new_args
#pre_parse
def parse_type_2(code, *args):
new_args = list(args)
toss, new_args[1] = args[1].split('|')
return new_args
# Example call:
parse_type_1(secret_code_1)
More Extendable Example
If for some reason you needed to support many variations on this kind of parsing, you could use a simple OOP setup, like
class BaseParser(object):
def get_dash_args(self, code):
dash_code, otherthing = code.split('$$')
return dash_code.split('-')
class PipeParser(BaseParser):
def __init__(self, arg_index, split_index):
self.arg_index = arg_index
self.split_index = split_index
def parse(self, code):
args = self.get_dash_args(code)
pipe_arg = args[self.arg_index]
args[self.arg_index] = pipe_arg.split('|')[self.split_index]
return args
# Example call
pipe_parser_1 = PipeParser(0, 0)
pipe_parser_1.parse(secret_code_1)
pipe_parser_2 = PipeParser(1, 1)
pipe_parser_2.parse(secret_code_2)
My suggestion attempts the following:
to be non-verbose enough
to separate common and specific logic in a clear way
to be sufficiently extensible
Basically, it separates common and specific logic into different functions (you could do the same using OOP). The thing is that it uses a mapper variable that contains the logic to select a specific parser, according to each code's content. Here it goes:
def parse_common(code):
"""
Provides common parsing logic.
"""
encoded_components = code.split('$$')[0].split('-')
return encoded_components
def parse_code_1(code, components):
"""
Specific parsing for type-1 codes.
"""
components[0] = components[0].split('|')[0] # decoding some type-1 component
return tuple([c for c in components])
def parse_code_2(code, components):
"""
Specific parsing for type-2 codes.
"""
components[1] = components[1].split('|')[1] # decoding some type-2 component
return tuple([c for c in components])
def parse_code_3(code, components):
"""
Specific parsing for type-3 codes.
"""
components[2] = components[2].split('||')[0] # decoding some type-3 component
return tuple([c for c in components])
# ... and so on, if more codes need to be added ...
# Maps specific parser, according to the number of components
CODE_PARSER_SELECTOR = [
(3, parse_code_1),
(2, parse_code_2),
(4, parse_code_3)
]
def parse_code(code):
# executes common parsing
components = parse_common(code)
# selects specific parser
parser_info = [s for s in CODE_PARSER_SELECTOR if len(components) == s[0]]
if parser_info is not None and len(parser_info) > 0:
parse_func = parser_info[0][1]
return parse_func(code, components)
else:
raise RuntimeError('No parser found for code: %s' % code)
secret_codes = [
'asdf|qwer-sdfg-wert$$otherthing', # type 1
'qwersdfg-qw|er$$otherthing', # type 2
'qwersdfg-hjkl-yui||poiuy-rtyu$$otherthing' # type 3
]
print [parse_code(c) for c in secret_codes]
Are you married to the string parsing? If you are passing variables with values and are in no need for variable names you can "pack" them into integer.
If you are working with cryptography you can formulate a long hexadecimal number of characters and then pass it as int with "stop" bytes (0000 for example since "0" is actually 48 try: chr(48) ) and if you are married to a string I would suggest a lower character byte identifier for example ( 1 -> aka try: chr(1) ) so you can scan the integer and bit shift it by 8 to get bytes with 8 bit mask ( this would look like (secret_code>>8)&0xf.
Hashing works in similar manner since one variable with somename and somevalue, somename and somevalue can be parsed as integer and then joined with stop module, then retrieved when needed.
Let me give you an example for hashing
# lets say
a = 1
# of sort hashing would be
hash = ord('a')+(0b00<<8)+(1<<16)
#where a hashed would be 65633 in integer value on 64 bit computer
# and then you just need to find a 0b00 aka separator
if you want to use only variables ( names don't matter ) then you need to hash only variable value so the size of parsed value is a lot smaller ( not name part and no need for separator (0b00) and you can use separator cleverly to divide necessary data one fold (0b00) twofolds (0b00, 0b00<<8) etc.
a = 1
hash = a<<8 #if you want to shift it 1 byte
But if you want to hide it and you need cryptography example, you can do the above methods and then scramble, shift ( a->b ) or just convert it to another type later. You just need to figure out the order of operations you are doing. Since a-STOP-b-PASS-c is not equal to a-PASS-b-STOP-c.
You can find bitwise operators here binary operators
But have in mind that 65 is number and 65 is a character as well it only matters where are those bytes sent, if they are sent to graphics card they are pixels, if they are sent to audiocard they are sounds and if they are sent to mathematical processing they are numbers, and as programmers that is our playground.
But if this is not answering your problem, you can always use map.
def mapProcces(proccesList,listToMap):
currentProcces = proccesList.pop(0)
listToMap = map( currentProcces, listToMap )
if proccesList != []:
return mapProcces( proccesList, listToMap )
else:
return list( listToMap )
then you could map it:
mapProcces([str.lower,str.upper,str.title],"stackowerflow")
or you can simply replace every definitive separator with space and then split space.
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
separ = "|,-,$".split(",")
secret_code_1 = [x if x not in separ else " " for x in secret_code_1]# replaces separators with empty chars
secret_code_1 = "".join(secret_code_1) #coverts list to a string
secret_code_1 = secret_code_1.split(" ") #it splited them to list
secret_code_1 = filter(None,secret_code_1) # filter empty chars ''
first,second,third,fourth,other = secret_code_1
And there you have it, your secret_code_1 is split and assigned to definitive amount of variables. Of course " " is used as declaration, you can use whatever you want, you can replace every separator with "someseparator" if you want and then split with "someseparator". You can also use str.replace function to make it clearer.
I hope this helps
I think you need to provide more information of exactly what you're trying to achieve, and what the clear constraints are. For instance, how many times can $$ occur? Will there always be a | dividor? That kind of thing.
To answer your question broadly, an elegant pythonic way to do this is to use python's unpacking feature, combined with split. for example
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
first_$$_part, last_$$_part = secret_code_1.split('$$')
By using this technique, in addition to simple if blocks, you should be able to write an elegant parser.
If I understand it correctly, you want to be able to define your functions as if the parsed arguments are passed, but want to pass the unparsed code to the functions instead.
You can do that very similarly to the first solution you presented.
from functools import wraps
def parse_secret(f):
#wraps(f)
def wrapper(code):
args = code.split('$$')[0].split('-')
return f(*args)
return wrapper
#parse_secret
def parse_code_1(a, b, c):
a = a.split('|')[0]
return (a,b,c)
#parse_secret
def parse_code_2(a, b):
b = b.split('|')[1]
return (a,b)
For the secret codes mentioned in the examples,
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
print (parse_code_1(secret_code_1))
>> ('asdf', 'sdfg', 'wert')
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
print (parse_code_2(secret_code_2))
>> ('qwersdfg', 'er')
I haven't understood anything of your question, neither your code, but maybe a simple way to do it is by regular expression?
import re
secret_code_1 = 'asdf|qwer-sdfg-wert$$otherthing'
secret_code_2 = 'qwersdfg-qw|er$$otherthing'
def parse_code(code):
regex = re.search('([\w-]+)\|([\w-]+)\$\$([\w]+)', code) # regular expression
return regex.group(3), regex.group(1).split("-"), regex.group(2).split("-")
otherthing, first_group, second_group = parse_code(secret_code_2)
print(otherthing) # otherthing, string
print(first_group) # first group, list
print(second_group) # second group, list
The output:
otherthing
['qwersdfg', 'qw']
['er']

Understanding a Python function

I need some help understanding a function that i want to use but I'm not entirely sure what some parts of it do. I understand that the function is creating dictionaries from reads out of a Fasta-file. From what I understand this is supposed to generate pre- and suffix dictionaries for ultimately extending contigs (overlapping dna-sequences).
The code:
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
lenKeys = len(reads[0]) - lenSuffix
dict = {}
multipleKeys = []
i = 1
for read in reads:
if read[0:lenKeys] in dict:
multipleKeys.append(read[0:lenKeys])
else:
dict[read[0:lenKeys]] = read[lenKeys:]
if verbose:
print("\rChecking suffix", i, "of", len(reads), end = "", flush = True)
i += 1
for key in set(multipleKeys):
del(dict[key])
if verbose:
print("\nCreated", len(dict), "suffixes with length", lenSuffix, \
"from", len(reads), "Reads. (", len(reads) - len(dict), \
"unambigous)")
return(dict)
Additional Information: reads = readFasta("smallReads.fna", verbose = True)
This is how the function is called:
if __name__ == "__main__":
reads = readFasta("smallReads.fna", verbose = True)
suffixDicts = makeSuffixDicts(reads, 10)
The smallReads.fna file contains strings of bases (Dna):
"> read 1
TTATGAATATTACGCAATGGACGTCCAAGGTACAGCGTATTTGTACGCTA
"> read 2
AACTGCTATCTTTCTTGTCCACTCGAAAATCCATAACGTAGCCCATAACG
"> read 3
TCAGTTATCCTATATACTGGATCCCGACTTTAATCGGCGTCGGAATTACT
Here are the parts I don't understand:
lenKeys = len(reads[0]) - lenSuffix
What does the value [0] mean? From what I understand "len" returns the number of elements in a list.
Why is "reads" automatically a list? edit: It seems a Fasta-file can be declared as a List. Can anybody confirm that?
if read[0:lenKeys] in dict:
Does this mean "from 0 to 'lenKeys'"? Still confused about the value.
In another function there is a similar line: if read[-lenKeys:] in dict:
What does the "-" do?
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
Here I don't understand the parameters: How can reads be a parameter? What is lenSuffix = 20 in the context of this function other than a value subtracted from len(reads[0])?
What is verbose? I have read about a "verbose-mode" ignoring whitespaces but i have never seen it used as a parameter and later as a variable.
The tone of your question makes me feel like you're confusing things like program features (len, functions, etc) with things that were defined by the original programmer (the type of reads, verbose, etc).
def some_function(these, are, arbitrary, parameters):
pass
This function defines a bunch of parameters. They don't mean anything at all, other than the value I give to them implicitly. For example if I do:
def reverse_string(s):
pass
s is probably a string, right? In your example we have:
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
lenKeys = len(reads[0]) - lenSuffix
...
From these two lines we can infer a few things:
the function will probably return a dictionary (from its name)
lenSuffix is an int, and verbose is a bool (from their default parameters)
reads can be indexed (string? list? tuple?)
the items inside reads have length (string? list? tuple?)
Since Python is dynamically typed, this is ALL WE CAN KNOW about the function so far. The rest would be explained by its documentation or the way it's called.
That said: let me cover all your questions in order:
What does the value [0] mean?
some_object[0] is grabbing the first item in a container. [1,2,3][0] == 1, "Hello, World!"[0] == "H". This is called indexing, and is governed by the __getitem__ magic method
From what I understand "len" returns the number of elements in a list.
len is a built-in function that returns the length of an object. It is governed by the __len__ magic method. len('abc') == 3, also len([1, 2, 3]) == 3. Note that len(['abc']) == 1, since it is measuring the length of the list, not the string inside it.
Why is "reads" automatically a list?
reads is a parameter. It is whatever the calling scope passes to it. It does appear that it expects a list, but that's not a hard and fast rule!
(various questions about slicing)
Slicing is doing some_container[start_idx : end_idx [ : step_size]]. It does pretty much what you'd expect: "0123456"[0:3] == "012". Slice indexes are considered to be zero-indexed and lay between the elements, so [0:1] is identical to [0], except that slices return lists, not individual objects (so 'abc'[0] == 'a' but 'abc'[0:1] == ['a']). If you omit either start or end index, it is treated as the beginning or end of the string respectively. I won't go into step size here.
Negative indexes count from the back, so '0123456'[-3:] == '456'. Note that [-0]is not the last value,[-1]is. This is contrasted with[0]` being the first value.
How can reads be a parameter?
Because the function is defined as makeSuffixDict(reads, ...). That's what a parameter is.
What is lenSuffix = 20 in the context of this function
Looks like it's the length of the expected suffix!
What is verbose?
verbose has no meaning on its own. It's just another parameter. Looks like the author included the verbose flag so you could get output while the function ran. Notice all the if verbose blocks seem to do nothing, just provide feedback to the user.

Python: Lazy String Decoding

I'm writing a parser, and there is LOTS of text to decode but most of my users will only care about a few fields from all the data. So I only want to do the decoding when a user actually uses some of the data. Is this a good way to do it?
class LazyString(str):
def __init__(self, v) :
self.value = v
def __str__(self) :
r = ""
s = self.value
for i in xrange(0, len(s), 2) :
r += chr(int(s[i:i+2], 16))
return r
def p_buffer(p):
"""buffer : HASH chars"""
p[0] = LazyString(p[2])
Is that the only method I need to override?
I'm not sure how implementing a string subclass is of much benefit here. It seems to me that if you're processing a stream containing petabytes of data, whenever you've created an object that you don't need to you've already lost the game. Your first priority should be to ignore as much input as you possibly can.
You could certainly build a string-like class that did this:
class mystr(str):
def __init__(self, value):
self.value = value
self._decoded = None
#property
def decoded(self):
if self._decoded == None:
self._decoded = self.value.decode("hex")
return self._decoded
def __repr__(self):
return self.decoded
def __len__(self):
return len(self.decoded)
def __getitem__(self, i):
return self.decoded.__getitem__(i)
def __getslice__(self, i, j):
return self.decoded.__getslice__(i, j)
and so on. A weird thing about doing this is that if you subclass str, every method that you don't explicitly implement will be called on the value that's passed to the constructor:
>>> s = mystr('a0a1a2')
>>> s
 ¡¢
>>> len(s)
3
>>> s.capitalize()
'A0a1a2'
I don't see any kind on lazy evaluation in your code. The fact that you use xrange only means that the list of integers from 0 to len(s) will be generated on demand. The whole string r will be decoded during string conversion anyway.
The best way to implement lazy sequence in Python is using generators. You could try something like this:
def lazy(v):
for i in xrange(0, len(v), 2):
yield int(v[i:i+2], 16)
list(lazy("0a0a0f"))
Out: [10, 10, 15]
What you're doing is built in already:
s = "i am a string!".encode('hex')
# what you do
r = ""
for i in xrange(0, len(s), 2) :
r += chr(int(s[i:i+2], 16))
# but decoding is builtin
print r==s.decode('hex') # => True
As you can see your whole decoding is s.decode('hex').
But "lazy" decoding sounds like premature optimization to me. You'd need gigabytes of data to even notice it. Try profiling, the .decode is 50 times faster that your old code already.
Maybe you want somthing like this:
class DB(object): # dunno what data it is ;)
def __init__(self, data):
self.data = data
self.decoded = {} # maybe cache if the field data is long
def __getitem__(self, name):
try:
return self.decoded[name]
except KeyError:
# this copies the fields data
self.decoded[name] = ret = self.data[ self._get_field_slice( name ) ].decode('hex')
return ret
def _get_field_slice(self, name):
# find out what part to decode, return the index in the data
return slice( ... )
db = DB(encoded_data)
print db["some_field"] # find out where the field is, get its data and decode it
The methods you need to override really depend on how are planning to use you new string type.
However you str based type looks a little suspicious to me, have you looked into the implementation of str to check that it has the value attribute that you are setting in your __init__()? Performing a dir(str) does not indicate that there is any such attribute on str. This being the case the normal str methods will not be operating on your data at all, I doubt that is the effect you want otherwise what would be the advantage of sub-classing.
Sub-classing base data types is a little strange anyway unless you have very specific requirements. For the lazy evaluation you want you are probably better of creating your class that contains a string rather than sub-classing str and write your client code to work with that class. You will then be free to add the just in time evaluation you want in a number of ways an example using the descriptor protocol can be found in this presentation: Python's Object Model (search for "class Jit(object)" to get to the relevant section)
The question is incomplete, in that the answer will depend on details of the encoding you use.
Say, if you encode a list of strings as pascal strings (i.e. prefixed with string length encoded as a fixed-size integer), and say you want to read the 100th string from the list, you may seek() forward for each of the first 99 strings and not read their contents at all. This will give some performance gain if the strings are large.
If, OTOH, you encode a list of strings as concatenated 0-terminated stirngs, you would have to read all bytes until the 100th 0.
Also, you're speaking about some "fields" but your example looks completely different.

Generating a dynamic time delta: python

Here's my situation:
import foo, bar, etc
frequency = ["hours","days","weeks"]
class geoProcessClass():
def __init__(self,geoTaskHandler,startDate,frequency,frequencyMultiple=1,*args):
self.interval = self.__determineTimeDelta(frequency,frequencyMultiple)
def __determineTimeDelta(self,frequency,frequencyMultiple):
if frequency in frequency:
interval = datetime.timedelta(print eval(frequency + "=" + str(frequencyMultiple)))
return interval
else:
interval = datetime.timedelta("days=1")
return interval
I want to dynamically define a time interval with timedelta, but this does not seem to work.
Is there any specific way to make this work? I'm getting invalid syntax here.
Are there any better ways to do it?
You can call a function with dynamic arguments using syntax like func(**kwargs) where kwargs is dictionary of name/value mappings for the named arguments.
I also renamed the global frequency list to frequencies since the line if frequency in frequency didn't make a whole lot of sense.
class geoProcessClass():
def __init__(self, geoTaskHandler, startDate, frequency, frequencyMultiple=1, *args):
self.interval = self.determineTimeDelta(frequency, frequencyMultiple)
def determineTimeDelta(self, frequency, frequencyMultiple):
frequencies = ["hours", "days", "weeks"]
if frequency in frequencies:
kwargs = {frequency: frequencyMultiple}
else:
kwargs = {"days": 1}
return datetime.timedelta(**kwargs)
For what it's worth, stylistically it's usually frowned upon to silently correct errors a caller makes. If the caller calls you with invalid arguments you should probably fail immediately and loudly rather than try to keep chugging. I'd recommend against that if statement.
For more information on variable-length and keyword argument lists, see:
The Official Python Tutorial
PEP 3102: Keyword-Only Arguments
Your use of print eval(...) looks a bit over-complicated (and wrong, as you mention).
If you want to pass a keyword argument to a function, just do it:
interval = datetime.timedelta(frequency = str(frequencyMultiple)
I don't see a keyword argument called frequency though, so that might be a separate problem.

Categories