Related
def cons(a, b):
def pair(f):
return f(a, b)
return pair
def car(f):
def left(a, b):
return a
return f(left)
def cdr(f):
def right(a, b):
return b
return f(right)
Found this python code on git.
Just want to know what is f(a,b) in cons definition is, and how does it work?
(Not a function I guess)
cons is a function, that takes two arguments, and returns a function that takes another function, which will consume these two arguments.
For example, consider the following function:
def add(a, b):
return a + b
This is just a function that adds the two inputs, so, for instance, add(2, 5) == 7
As this function takes two arguments, we can use cons to call this function:
func_caller = cons(2, 5) # cons receives two arguments and returns a function, which we call func_caller
result = func_caller(add) # func_caller receives a function, that will process these two arguments
print(result) # result is the actual result of doing add(2, 5), i.e. 7
This technique is useful for wrapping functions and executing stuff, before and after calling the appropriate functions.
For example, we can modify our cons function to actually print the values before and after calling add:
def add(a, b):
print('Adding {} and {}'.format(a, b))
return a + b
def cons(a, b):
print('Received arguments {} and {}'.format(a, b))
def pair(f):
print('Calling {} with {} and {}'.format(f, a, b))
result = f(a, b)
print('Got {}'.format(result))
return result
return pair
With this update, we get the following outputs:
func_caller = cons(2, 5)
# prints "Received arguments 2 and 5" from inside cons
result = func_caller(add)
# prints "Calling add with 2 and 5" from inside pair
# prints "Adding 2 and 5" from inside add
# prints "Got 7" from inside pair
This isn't going to make any sense to you until you know what cons, car, and cdr mean.
In Lisp, lists are stored as a very simple form of linked list. A list is either nil (like None) for an empty list, or it's a pair of a value and another list. The cons function takes a value and a list and returns you another list just by making a pair:
def cons(head, rest):
return (head, rest)
And the car and cdr functions (they stand for "Contents of Address|Data Register", because those are the assembly language instructions used to implement them on a particular 1950s computer, but that isn't very helpful) return the first or second value from a pair:
def car(lst):
return lst[0]
def cdr(lst):
return lst[1]
So, you can make a list:
lst = cons(1, cons(2, cons(3, None)))
… and you can get the second value from it:
print(car(cdr(lst))
… and you can even write functions to get the nth value:
def nth(lst, n):
if n == 0:
return car(lst)
return nth(cdr(lst), n-1)
… or print out the whole list:
def printlist(lst):
if lst:
print(car(lst), end=' ')
printlist(cdr(lst))
If you understand how these work, the next step is to try them on those weird definitions you found.
They still do the same thing. So, the question is: How? And the bigger question is: What's the point?
Well, there's no practical point to using these weird functions; the real point is to show you that everything in computer science can be written with just functions, no built-in data structures like tuples (or even integers; that just takes a different trick).
The key is higher-order functions: functions that take functions as values and/or return other functions. You actually use these all the time: map, sort with a key, decorators, partial… they’re only confusing when they’re really simple:
def car(f):
def left(a, b):
return a
return f(left)
This takes a function, and calls it on a function that returns the first of its two arguments.
And cdr is similar.
It's hard to see how you'd use either of these, until you see cons:
def cons(a, b):
def pair(f):
return f(a, b)
return pair
This takes two things and returns a function that takes another function and applies it to those two things.
So, what do we get from cons(3, None)? We get a function that takes a function, and applies it to the arguments 3 and None:
def pair3(f):
return f(3, None)
And if we call cons(2, cons(3, None))?
def pair23(f):
return f(2, pair3)
And what happens if you call car on that function? Trace through it:
def left(a, b):
return a
return pair23(left)
That pair23(left) does this:
return left(2, pair3)
And left is dead simple:
return 2
So, we got the first element of (2, cons(3, None)).
What if you call cdr?
def right(a, b):
return a
return pair23(right)
That pair23(right) does this:
return right(2, pair3)
… and right is dead simple, so it just returns pair3.
You can work out that if we call car(cdr(pair23)), we're going to get the 3 out of it.
And now you can write lst = cons(1, cons(2, cons(3, None))), write the recursive nth and printlist functions above, and trace through how they work on lst.
I mentioned above that you can even get rid of integers. How do you do that? Read about Church numerals. You define zero and successor functions. Then you can define one as successor(zero) and two as successor(one). You can even recursively define add so that add(x, zero) is x but add(x, successor(y)) is successor(add(x, y)), and go on to define mul, etc.
You also need a special function you can use as a value for nil.
Anyway, once you've done that, using all of the other definitions above, you can do lst = cons(zero(cons(one, cons(two, cons(three, nil)))), and nth(lst, two) will give you back one. (Of course writing printlist will be a bit trickier…)
Obviously, this is all going to be a lot slower than just using tuples and integers and so on. But theoretically, it’s interesting.
Consider this: we could write a tiny dialect of Python that has only three kinds of statements—def, return, and expression statements—and only three kinds of expressions—literals, identifiers, and function calls—and it could do everything normal Python does. (In fact, you could get rid of statements altogether just by having a function-defining expression, which Python already has.) That tiny language would be a pain to use, but it would a lot easier to write a program to reason about programs in that tiny language. And we even know how to translate code using tuples, loops, etc. into code in this tiny subset language, which means we can write a program that reasons about that real Python code.
In fact, with a couple more tricks (curried functions and/or static function types, and lazy evaluation), the compiler/interpreter could do that kind of reasoning on the fly and optimize our code for us. It’s easy to tell programmatically that car(cdr(cons(2, cons(3, None)) is going to return 3 without having to actually evaluate most of those function calls, so we can just skip evaluating them and substitute 3 for the whole expression.
Of course this breaks down if any function can have side effects. You obviously can’t just substitute None for print(3) and get the same results. So instead, you need some clever trick where IO is handled by some magic object that evaluates functions to figure out what it should read and write, and then the whole rest of the program, the part that users write, becomes pure and can be optimized however you want. With a couple more abstractions, we can even make IO something that doesn’t have to be magical to do that.
And then you can build a standard library that gives you back all those things we gave up, written in terms of defining and calling functions, so it’s actually usable—but under the covers it’s all just reducing pure function calls, which is simple enough for a computer to optimize. And then you’ve basically written Haskell.
I need some help understanding a function that i want to use but I'm not entirely sure what some parts of it do. I understand that the function is creating dictionaries from reads out of a Fasta-file. From what I understand this is supposed to generate pre- and suffix dictionaries for ultimately extending contigs (overlapping dna-sequences).
The code:
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
lenKeys = len(reads[0]) - lenSuffix
dict = {}
multipleKeys = []
i = 1
for read in reads:
if read[0:lenKeys] in dict:
multipleKeys.append(read[0:lenKeys])
else:
dict[read[0:lenKeys]] = read[lenKeys:]
if verbose:
print("\rChecking suffix", i, "of", len(reads), end = "", flush = True)
i += 1
for key in set(multipleKeys):
del(dict[key])
if verbose:
print("\nCreated", len(dict), "suffixes with length", lenSuffix, \
"from", len(reads), "Reads. (", len(reads) - len(dict), \
"unambigous)")
return(dict)
Additional Information: reads = readFasta("smallReads.fna", verbose = True)
This is how the function is called:
if __name__ == "__main__":
reads = readFasta("smallReads.fna", verbose = True)
suffixDicts = makeSuffixDicts(reads, 10)
The smallReads.fna file contains strings of bases (Dna):
"> read 1
TTATGAATATTACGCAATGGACGTCCAAGGTACAGCGTATTTGTACGCTA
"> read 2
AACTGCTATCTTTCTTGTCCACTCGAAAATCCATAACGTAGCCCATAACG
"> read 3
TCAGTTATCCTATATACTGGATCCCGACTTTAATCGGCGTCGGAATTACT
Here are the parts I don't understand:
lenKeys = len(reads[0]) - lenSuffix
What does the value [0] mean? From what I understand "len" returns the number of elements in a list.
Why is "reads" automatically a list? edit: It seems a Fasta-file can be declared as a List. Can anybody confirm that?
if read[0:lenKeys] in dict:
Does this mean "from 0 to 'lenKeys'"? Still confused about the value.
In another function there is a similar line: if read[-lenKeys:] in dict:
What does the "-" do?
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
Here I don't understand the parameters: How can reads be a parameter? What is lenSuffix = 20 in the context of this function other than a value subtracted from len(reads[0])?
What is verbose? I have read about a "verbose-mode" ignoring whitespaces but i have never seen it used as a parameter and later as a variable.
The tone of your question makes me feel like you're confusing things like program features (len, functions, etc) with things that were defined by the original programmer (the type of reads, verbose, etc).
def some_function(these, are, arbitrary, parameters):
pass
This function defines a bunch of parameters. They don't mean anything at all, other than the value I give to them implicitly. For example if I do:
def reverse_string(s):
pass
s is probably a string, right? In your example we have:
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
lenKeys = len(reads[0]) - lenSuffix
...
From these two lines we can infer a few things:
the function will probably return a dictionary (from its name)
lenSuffix is an int, and verbose is a bool (from their default parameters)
reads can be indexed (string? list? tuple?)
the items inside reads have length (string? list? tuple?)
Since Python is dynamically typed, this is ALL WE CAN KNOW about the function so far. The rest would be explained by its documentation or the way it's called.
That said: let me cover all your questions in order:
What does the value [0] mean?
some_object[0] is grabbing the first item in a container. [1,2,3][0] == 1, "Hello, World!"[0] == "H". This is called indexing, and is governed by the __getitem__ magic method
From what I understand "len" returns the number of elements in a list.
len is a built-in function that returns the length of an object. It is governed by the __len__ magic method. len('abc') == 3, also len([1, 2, 3]) == 3. Note that len(['abc']) == 1, since it is measuring the length of the list, not the string inside it.
Why is "reads" automatically a list?
reads is a parameter. It is whatever the calling scope passes to it. It does appear that it expects a list, but that's not a hard and fast rule!
(various questions about slicing)
Slicing is doing some_container[start_idx : end_idx [ : step_size]]. It does pretty much what you'd expect: "0123456"[0:3] == "012". Slice indexes are considered to be zero-indexed and lay between the elements, so [0:1] is identical to [0], except that slices return lists, not individual objects (so 'abc'[0] == 'a' but 'abc'[0:1] == ['a']). If you omit either start or end index, it is treated as the beginning or end of the string respectively. I won't go into step size here.
Negative indexes count from the back, so '0123456'[-3:] == '456'. Note that [-0]is not the last value,[-1]is. This is contrasted with[0]` being the first value.
How can reads be a parameter?
Because the function is defined as makeSuffixDict(reads, ...). That's what a parameter is.
What is lenSuffix = 20 in the context of this function
Looks like it's the length of the expected suffix!
What is verbose?
verbose has no meaning on its own. It's just another parameter. Looks like the author included the verbose flag so you could get output while the function ran. Notice all the if verbose blocks seem to do nothing, just provide feedback to the user.
I wanted to remove a substring from a string, for example "a" in "a,b,c" and then return "b,c" to me, it does not matter what's the order of a in string(like "a,b,c", "b,a,c", and so one).
DELIMITER = ","
def remove(member, members_string):
"""removes target from string"""
members = members_string.split(DELIMITER)
members.remove(member)
return DELIMITER.join(members)
print remove("a","b,a,c")
output: b,c
The above function is working as it is expected.
My question is that accidently I modified my code, and it looks as:
def remove_2(member, members_string):
"""removes target from string"""
members = members_string.split(DELIMITER).remove(member)
return DELIMITER.join(members)
You can see that I modified
members = members_string.split(DELIMITER)
members.remove(member)
to
members = members_string.split(DELIMITER).remove(member)
after that the method is broken, it throws
Traceback (most recent call last):
File "test.py", line 15, in <module>
remove_2("a","b,a,c")
File "test.py", line 11, in remove_2
return DELIMITER.join(members)
TypeError
Based on my understanding, members_string.split(DELIMITER) is a list, and invokes remove() is allowed and it should return the new list and stores into members, but
when I print members_string.split(DELIMITER) it returns None, it explains why throws TypeError, my question is , why it returns None other than a list with elements "b" and "c"?
remove() does not return anything. It modifies the list it's called on (lists are mutable, so it would be a major waste of cpu time and memory to create a new list) so returning the same list would be somewhat pointless.
This was already answered here.
Quote from the pythondocs:
You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. This is a design principle for all mutable data structures in Python.
Mutable objects like lists can be manipulated under the hood via their data-manipulation methods, like remove(),insert(),add().
Immutable objects like strings always return a copy of themselves from their data-manipulation methods, like with replace() or upper().
Method chaining
The next sample shows that your intended method-chaining works with strings:
# Every replace() call is catching a different case from
# member_string like
# a,b,member
# member,b,c
# a,member,c
DELIMITER = ","
def remove(member, member_string):
members = member_string.replace(DELIMITER + member, '').replace(member + DELIMITER, '').replace(DELIMITER + member + DELIMITER, '').upper()
return members
# puts out B,C
print remove("a","b,a,c")
List comprehension
Now for clever lists manipulation (it is even faster than for-looping) the pythonians invented a different feature named list comprehension. You can read about it in python documentation.
DELIMITER = ","
def remove(member, members_string):
members = [m.upper() for m in members_string.split(DELIMITER) if m != member]
return DELIMITER.join(members)
# puts out B,C
print remove("a","b,a,c")
In addition you could google for generators or look into pythondocs. But don't know about that a lot.
BTW, flame me down as a noob but, I hate it when they call python a beginner language, as above list-comprehension looks easy, it could be intimidating for a beginner, couldn't it?
So Ive been giving the following code in a kind of sort of python class. Its really a discrete math class but he uses python to demonstrate everything. This code is supposed to demonstate a multiplexer and building a xor gate with it.
def mux41(i0,i1,i2,i3):
return lambda s1,s0:{(0,0):i0,(0,1):i1,(1,0):i2,(1,1):i3}[(s1,s0)]
def xor2(a,b):
return mux41(0,1,1,0)(a,b)
In the xor2 function I dont understand the syntax behind return mux41(0,1,1,0)(a,b) the 1's and 0's are the input to the mux function, but what is the (a,b) doing?
The (a, b) is actually the input to the lambda function that you return in the mux41 function.
Your mux41 function returns a lambda function which looks like it returns a value in a dictionary based on the input to the mux41 function. You need the second input to say which value you want to return.
It is directly equivalent to:
def xor2(a,b):
f = mux41(0,1,1,0)
return f(a,b)
That is fairly advanced code to throw at Python beginners, so don't feel bad it wasn't obvious to you. I also think it is rather trickier than it needs to be.
def mux41(i0,i1,i2,i3):
return lambda s1,s0:{(0,0):i0,(0,1):i1,(1,0):i2,(1,1):i3}[(s1,s0)]
This defines a function object that returns a value based on two inputs. The two inputs are s1 and s0. The function object builds a dictionary that is pre-populated with the four values passed int to mux41(), and it uses s0 and s1 to select one of those four values.
Dictionaries use keys to look up values. In this case, the keys are Python tuples: (0, 0), (0, 1), (1, 0), and (1,1). The expression (s1,s0) is building a tuple from the arguments s0 and s1. This tuple is used as the key to lookup a value from the dictionary.
def xor2(a,b):
return mux41(0,1,1,0)(a,b)
So, mux41() returns a function object that does the stuff I just discussed. xor2() calls mux41() and gets a function object; then it immediately calls that returned function object, passing in a and b as arguments. Finally it returns the answer.
The function object created by mux41() is not saved anywhere. So, every single time you call xor2(), you are creating a function object, which is then garbage collected. When the function object runs, it builds a dictionary object, and this too is garbage collected after each single use. This is possibly the most complicated XOR function I have ever seen.
Here is a rewrite that might make this a bit clearer. Instead of using lambda to create an un-named function object, I'll just use def to create a named function.
def mux41(i0,i1,i2,i3):
def mux_fn(s1, s0):
d = {
(0,0):i0,
(0,1):i1,
(1,0):i2,
(1,1):i3
}
tup = (s1, s0)
return d[tup]
return mux_fn
def xor2(a,b):
mux_fn = mux41(0,1,1,0)
return mux_fn(a,b)
EDIT: Here is what I would have written if I wanted to make a table-lookup XOR in Python.
_d_xor2 = {
(0,0) : 0,
(0,1) : 1,
(1,0) : 1,
(1,1) : 0
}
def xor2(a,b):
tup = (a, b)
return _d_xor2[tup]
We build the lookup dictionary once, then use it directly from xor2(). It's not really necessary to make an explicit temp variable in xor2() but it might be a bit clearer. You could just do this:
def xor2(a,b):
return _d_xor2[(a, b)]
Which do you prefer?
And of course, since Python has an XOR operator built-in, you could write it like this:
def xor2(a,b):
return a ^ b
If I were writing this for real I would probably add error handling and/or make it operate on bool values.
def xor2(a,b):
return bool(a) ^ bool(b)
EDIT: One more thing just occurred to me. In Python, the rule is "the comma makes the tuple". The parentheses around a tuple are sometimes optional. I just checked, and it works just fine to leave off the parentheses in a dictionary lookup. So you can do this:
def xor2(a,b):
return _d_xor2[a, b]
And it works fine. This is perhaps a bit too tricky? If I saw this in someone else's code, it would surprise me.
I have a small python script which i use everyday......it basically reads a file and for each line i basically apply different string functions like strip(), replace() etc....im constanstly editing the file and commenting to change the functions. Depending on the file I'm dealing with, I use different functions. For example I got a file where for each line, i need to use line.replace(' ','') and line.strip()...
What's the best way to make all of these as part of my script? So I can just say assign numbers to each functions and just say apply function 1 and 4 for each line.
First of all, many string functions – including strip and replace – are deprecated. The following answer uses string methods instead. (Instead of string.strip(" Hello "), I use the equivalent of " Hello ".strip().)
Here's some code that will simplify the job for you. The following code assumes that whatever methods you call on your string, that method will return another string.
class O(object):
c = str.capitalize
r = str.replace
s = str.strip
def process_line(line, *ops):
i = iter(ops)
while True:
try:
op = i.next()
args = i.next()
except StopIteration:
break
line = op(line, *args)
return line
The O class exists so that your highly abbreviated method names don't pollute your namespace. When you want to add more string methods, you add them to O in the same format as those given.
The process_line function is where all the interesting things happen. First, here is a description of the argument format:
The first argument is the string to be processed.
The remaining arguments must be given in pairs.
The first argument of the pair is a string method. Use the shortened method names here.
The second argument of the pair is a list representing the arguments to that particular string method.
The process_line function returns the string that emerges after all these operations have performed.
Here is some example code showing how you would use the above code in your own scripts. I've separated the arguments of process_line across multiple lines to show the grouping of the arguments. Of course, if you're just hacking away and using this code in day-to-day scripts, you can compress all the arguments onto one line; this actually makes it a little easier to read.
f = open("parrot_sketch.txt")
for line in f:
p = process_line(
line,
O.r, ["He's resting...", "This is an ex-parrot!"],
O.c, [],
O.s, []
)
print p
Of course, if you very specifically wanted to use numerals, you could name your functions O.f1, O.f2, O.f3… but I'm assuming that wasn't the spirit of your question.
If you insist on numbers, you can't do much better than a dict (as gimel suggests) or list of functions (with indices zero and up). With names, though, you don't necessarily need an auxiliary data structure (such as gimel's suggested dict), since you can simply use getattr to retrieve the method to call from the object itself or its type. E.g.:
def all_lines(somefile, methods):
"""Apply a sequence of methods to all lines of some file and yield the results.
Args:
somefile: an open file or other iterable yielding lines
methods: a string that's a whitespace-separated sequence of method names.
(note that the methods must be callable without arguments beyond the
str to which they're being applied)
"""
tobecalled = [getattr(str, name) for name in methods.split()]
for line in somefile:
for tocall in tobecalled: line = tocall(line)
yield line
It is possible to map string operations to numbers:
>>> import string
>>> ops = {1:string.split, 2:string.replace}
>>> my = "a,b,c"
>>> ops[1](",", my)
[',']
>>> ops[1](my, ",")
['a', 'b', 'c']
>>> ops[2](my, ",", "-")
'a-b-c'
>>>
But maybe string descriptions of the operations will be more readable.
>>> ops2={"split":string.split, "replace":string.replace}
>>> ops2["split"](my, ",")
['a', 'b', 'c']
>>>
Note:
Instead of using the string module, you can use the str type for the same effect.
>>> ops={1:str.split, 2:str.replace}
To map names (or numbers) to different string operations, I'd do something like
OPERATIONS = dict(
strip = str.strip,
lower = str.lower,
removespaces = lambda s: s.replace(' ', ''),
maketitle = lamdba s: s.title().center(80, '-'),
# etc
)
def process(myfile, ops):
for line in myfile:
for op in ops:
line = OPERATIONS[op](line)
yield line
which you use like this
for line in process(afile, ['strip', 'removespaces']):
...