OCaml equivalent of Python generators

OCaml equivalent of Python generators - python

The french Sécurité Sociale identification numbers end with a check code of two digits. I have verified that every possible common transcription error can be detected, and found some other kinds of errors (e.g., rolling three consecutive digits) that may stay undetected.
def check_code(number):
return 97 - int(number) % 97
def single_digit_generator(number):
for i in range(len(number)):
for wrong_digit in "0123456789":
yield number[:i] + wrong_digit + number[i+1:]
def roll_generator(number):
for i in range(len(number) - 2):
yield number[:i] + number[i+2] + number[i] + number[i+1] + number[i+3:]
yield number[:i] + number[i+1] + number[i+2] + number[i] + number[i+3:]
def find_error(generator, number):
control = check_code(number)
for wrong_number in generator(number):
if number != wrong_number and check_code(wrong_number) == control:
return (number, wrong_number)
assert find_error(single_digit_generator, "0149517490979") is None
assert find_error(roll_generator, "0149517490979") == ('0149517490979', '0149517499709')
My Python 2.7 code (working fragment above) makes heavy use of generators. I was wondering how I could adapt them in OCaml. I surely can write a function maintaining some internal state, but I'm looking for a purely functional solution. May I study the library lazy, which I'm not too familiar with? I'm not asking for code, just directions.

You may simply define a generator as a stream, using the extension of the language :
let range n = Stream.from (fun i -> if i < n then Some i else None);;
The for syntactic construct cannot be used with that, but there's a set of functions provided by the Stream module to check the state of the stream and iterate on its elements.
try
let r = range 10 in
while true do
Printf.printf "next element: %d\n" ## Stream.next r
done
with Stream.Failure -> ();;
Or more simply:
Stream.iter (Printf.printf "next element: %d\n") ## range 10;;
You may also use a special syntax provided by the camlp4 preprocessor:
Stream.iter (Printf.printf "next element: %d\n") [< '11; '3; '19; '52; '42 >];;
Other features include creating streams out of lists, strings, bytes or even channels. The API documentation succinctly describes the different possibilities.
The special syntax let you compose them, so as to put back elements before or after, but it can be somewhat unintuitive at first:
let dump = Stream.iter (Printf.printf "next element: %d\n");;
dump [< range 10; range 20 >];;
will produce the elements of the first stream, and then pick up the second stream at element ranked right after the rank of the last yielded element, thus it will appear in this case as if only the second stream got iterated.
To get all the elements, you could construct a stream of 'a Stream.t and then recursively iterate on each:
Stream.iter dump [< '(range 10); '(range 20) >];;
These would produce the expected output.
I recommend reading the old book on OCaml (available online) for a better introduction on the topic.

Core library provides generators in a python style, see Sequence module.
Here is an example, taken from one of my project:
open Core_kernel.Std
let intersections tab (x : mem) : _ seq =
let open Sequence.Generator in
let init = return () in
let m = fold_intersections tab x ~init ~f:(fun addr x gen ->
gen >>= fun () -> yield (addr,x)) in
run m

Related

How to solve a Linear System of Equations in Python When the Coefficients are Unknown (but still real numbers)

Im not a programer so go easy on me please ! I have a system of 4 linear equations and 4 unknowns, which I think I could use python to solve relatively easily. However my equations not of the form " 5x+2y+z-w=0 " instead I have algebraic constants c_i which I dont know the explicit numerical value of, for example " c_1 x + c_2 y + c_3 z+ c_4w=c_5 " would be one my four equations. So does a solver exist which gives answers for x,y,z,w in terms of the c_i ?

Numpy has a function for this exact problem: numpy.linalg.solve
To construct the matrix we first need to digest the string turning it into an array of coefficients and solutions.
Finding Numbers
First we need to write a function that takes a string like "c_1 3" and returns the number 3.0. Depending on the format you want in your input string you can either iterate over all chars in this array and stop when you find a non-digit character, or you can simply split on the space and parse the second string. Here are both solutions:
def find_number(sub_expr):
"""
Finds the number from the format
number*string or numberstring.
Example:
3x -> 3
4*x -> 4
"""
num_str = str()
for char in sub_expr:
if char.isdigit():
num_str += char
else:
break
return float(num_str)
or the simpler solution
def find_number(sub_expr):
"""
Returns the number from the format "string number"
"""
return float(sub_expr.split()[1])
Note: See edits
Get matrices
Now we can use that to split each expression into two parts: The solution and the equation by the "=". The equation is then split into sub_expressions by the "+" This way we would end turn the string "3x+4y = 3" into
sub_expressions = ["3x", "4y"]
solution_string = "3"
Each sub expression then needs to be fed into our find_numbers function. The End result can be appended to the coefficient and solution matrices:
def get_matrices(expressions):
"""
Returns coefficient_matrix and solutions from array of string-expressions.
"""
coefficient_matrix = list()
solutions = list()
last_len = -1
for expression in expressions:
# Note: In this solution all coefficients must be explicitely noted and must always be in the same order.
# Could be solved with dicts but is probably overengineered.
if not "=" in expression:
print(f"Invalid expression {expression}. Missing \"=\"")
return False
try:
c_string, s_string = expression.split("=")
c_strings = c_string.split("+")
solutions.append(float(s_string))
current_len = len(c_strings)
if last_len != -1 and current_len != last_len:
print(f"The expression {expression} has a mismatching number of coefficients")
return False
last_len = current_len
coefficients = list()
for c_string in c_strings:
coefficients.append(find_number(c_string))
coefficient_matrix.append(coefficients)
except Exception as e:
print(f"An unexpected Runtime Error occured at {coefficient}")
print(e)
exit()
return coefficient_matrix, solutions
Now let's write a simple main function to test this code:
# This is not the code you want to copy-paste
# Look further down.
from sys import argv as args
def main():
expressions = args[1:]
matrix, solutions = get_matrices(expressions)
for row in matrix:
print(row)
print("")
print(solutions)
if __name__ == "__main__":
main()
Let's run the program in the console!
user:$ python3 solve.py 2x+3y=4 3x+3y=2
[2.0, 3.0]
[3.0, 3.0]
[4.0, 2.0]
You can see that the program identified all our numbers correctly
AGAIN: use the find_number function appropriate for your format
Put The Pieces Together
These Matrices now just need to be pumped directly into the numpy function:
# This is the main you want
from sys import argv as args
from numpy.linalg import solve as solve_linalg
def main():
expressions = args[1:]
matrix, solutions = get_matrices(expressions)
coefficients = solve_linalg(matrix, solutions)
print(coefficients)
# This bit needs to be at the very bottom of your code to load all functions first.
# You could just paste the main-code here, but this is considered best-practice
if __name__ == '__main__':
main()
Now let's test that:
$ python3 solve.py x*2+y*4+z*0=20 x*1+y*1+z*-1=3 x*2+y*2+z*-3=3
[2. 4. 3.]
As you can see the program now solves the functions for us.
Out of curiosity: Math homework? This feels like math homework.
Edit: Had a typo "c_string" instead of "c_strings" worked out in all tests out of pure and utter luck.
Edit 2: Upon further inspection I would reccomend to split the sub-expressions by a "*":
def find_number(sub_expr):
"""
Returns the number from the format "string number"
"""
return float(sub_expr.split("*")[1])
This results in fairly readable input strings

Write a functon to modify a certain string in a certain way by adding character

I have to write a function that takes a string, and will return the string with added "asteriks" or "*" symbols to signal multiplication.
As we know 4(3) is another way to show multiplication, as well as 4*3 or (4)(3) or 4*(3) etc. Anyway, my code needs to fix that problem by adding an asterik between the 4 and the 3 for when multiplication is shown WITH PARENTHESIS but without the multiplication operator " * ".
Some examples:
"4(3)" -> "4*(3)"
"(4)(3)" -> "(4)*(3)"
"4*2 + 9 -4(-3)" - > "4*2 + 9 -4*(-3)"
"(-9)(-2) (4)" -> "(-9)*(2) *(4)"
"4^(3)" -> "4^(3)"
"(4-3)(4+2)" -> "(4-3)*(4+2)"
"(Aflkdsjalkb)(g)" -> "(Aflkdsjalkb)*(g)"
"g(d)(f)" -> "g*(d)*(f)"
"(4) (3)" -> "(4)*(3)"
I'm not exactly sure how to do this, I am thinking about finding the left parenthesis and then simply adding a " * " at that location but that wouldn't work hence the start of my third example would output "* (-9)" which is what I don't want or my fourth example that would output "4^*(3)". Any ideas on how to solve this problem? Thank you.
Here's something I've tried, and obviously it doesn't work:
while index < len(stringtobeconverted)
parenthesis = stringtobeconverted[index]
if parenthesis == "(":
stringtobeconverted[index-1] = "*"

In [15]: def add_multiplies(input_string):
...: return re.sub(r'([^-+*/])\(', r'\1*(', input_string)
...:
...:
...:
In [16]: for example in examples:
...: print(f"{example} -> {add_multiplies(example)}")
...:
4(3) -> 4*(3)
(4)(3) -> (4)*(3)
4*2 + 9 -4(-3) -> 4*2 + 9 -4*(-3)
(-9)(-2) (4) -> (-9)*(-2) *(4)
4^(3) -> 4^*(3)
(4-3)(4+2) -> (4-3)*(4+2)
(Aflkdsjalkb)(g) -> (Aflkdsjalkb)*(g)
g(d)(f) -> g*(d)*(f)
(g)-(d) -> (g)-(d)

tl;dr– Rather than thinking of this as string transformation, you might:
Parse an input string into an abstract representation.
Generate a new output string from the abstract representation.
Parse input to create an abstract syntax tree, then emit the new string.
Generally you should:
Create a logical representation for the mathematical expressions.You'll want to build an abstract syntax tree (AST) to represent each expression. For example,
2(3(4)+5)
could be form a tree like:
*
/ \
2 +
/ \
* 5
/ \
3 4
, where each node in that tree (2, 3, 4, 5, both *'s, and the +) are each an object that has references to its child objects.
Write the logic for parsing the input.Write a logic that can parse "2(3(4)+5)" into an abstract syntax tree that represents what it means.
Write a logic to serialize the data.Now that you've got the data in conceptual form, you can write methods that convert it into a new, desired format.
Note: String transformations might be easier for quick scripting.
As other answers have shown, direct string transformations can be easier if all you need is a quick script, e.g. you have some text you just want to reformat real quick. For example, as #PaulWhipp's answer demonstrates, regular expressions can make such scripting really quick-and-easy.
That said, for professional projects, you'll generally want to parse data into an abstract representation before emitting a new representation. String-transform tricks don't generally scale well with complexity, and they can be both functionally limited and pretty error-prone outside of simple cases.

I'll share mine.
def insertAsteriks(string):
lstring = list(string)
c = False
for i in range(1, len(lstring)):
if c:
c = False
pass
elif lstring[i] == '(' and (lstring[i - 1] == ')' or lstring[i - 1].isdigit() or lstring[i - 1].isalpha() or (lstring[i - 1] == ' ' and not lstring[i - 2] in "*^-+/")):
lstring.insert(i, '*')
c = True
return ''.join(lstring)
Let's check against your inputs.
print(insertAsteriks("4(3)"))
print(insertAsteriks("(4)(3)"))
print(insertAsteriks("4*2 + 9 -4(-3)"))
print(insertAsteriks("(-9)(-2) (4)"))
print(insertAsteriks("(4)^(-3)"))
print(insertAsteriks("ABC(DEF)"))
print(insertAsteriks("g(d)(f)"))
print(insertAsteriks("(g)-(d)"))
The output is:
4*(3)
(4)*(3)
4*2 + 9 -4*(-3)
(-9)*(-2) (4)
(4)^(-3)
ABC*(DEF)
g*(d)*(f)
(g)-(d)
[Finished in 0.0s]

One way would be to use a simple replacement. The cases to be replaced are:
)( -> )*(
N( -> N*(
)N -> )*N
Assuming you want to preserve whitespace as well, you need to find all patterns on the left side with an arbitrary number of spaces in between and replace that with the same number of spaces less one plus the asterisk at the end. You can use a regex for that.
A more fun way would be using kind of a recursion with fake linked lists:) You have entities and operators. An entity can be a number by itself or anything enclosed in parentheses. Anything else is an operator. How bout something like this:
For each string, find all entities and operators (keep them in a list for example)
Then for each entity see if there are more entities inside.
Keep doing that until there are no more entities left in any entities.
Then starting from the very bottom (the smallest of entities that is) see if there is an operator between two adjacent entities, if there is not, insert an asterisk there. Do that all the way up to the top level. The start from the bottom again and reassemble all the pieces.

Here is a code tested on your examples :
i = 0
input_string = "(4-3)(4+2)"
output_string = ""
while i < len(input_string):
if input_string[i] == "(" and i != 0:
if input_string[i-1] in list(")1234567890"):
output_string += "*("
else:
output_string += input_string[i]
else:
output_string += input_string[i]
i += 1
print(output_string)
The key here is to understand the logic you want to achieve, which is in fact quite simple : you just want to add some "*" before opening parenthesis based on a few conditions.
Hope that helps !

Python: itertools.product consuming too much resources

I've created a Python script that generates a list of words by permutation of characters. I'm using itertools.product to generate my permutations. My char list is composed by letters and numbers 01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ. Here is my code:
#!/usr/bin/python
import itertools, hashlib, math
class Words:
chars = '01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ'
def __init__(self, size):
self.make(size)
def getLenght(self, size):
res = []
for i in range(1, size+1):
res.append(math.pow(len(self.chars), i))
return sum(res)
def getMD5(self, text):
m = hashlib.md5()
m.update(text.encode('utf-8'))
return m.hexdigest()
def make(self, size):
file = open('res.txt', 'w+')
res = []
i = 1
for i in range(1, size+1):
prod = list(itertools.product(self.chars, repeat=i))
res = res + prod
j = 1
for r in res:
text = ''.join(r)
md5 = self.getMD5(text)
res = text+'\t'+md5
print(res + ' %.3f%%' % (j/float(self.getLenght(size))*100))
file.write(res+'\n')
j = j + 1
file.close()
Words(3)
This script works fine for list of words with max 4 characters. If I try 5 or 6 characters, my computer consumes 100% of CPU, 100% of RAM and freezes.
Is there a way to restrict the use of those resources or optimize this heavy processing?

Does this do what you want?
I've made all the changes in the make method:
def make(self, size):
with open('res.txt', 'w+') as file_: # file is a builtin function in python 2
# also, use with statements for files used on only a small block, it handles file closure even if an error is raised.
for i in range(1, size+1):
prod = itertools.product(self.chars, repeat=i)
for j, r in enumerate(prod):
text = ''.join(r)
md5 = self.getMD5(text)
res = text+'\t'+md5
print(res + ' %.3f%%' % ((j+1)/float(self.get_length(size))*100))
file_.write(res+'\n')
Be warned this will still chew up gigabytes of memory, but not virtual memory.
EDIT: As noted by Padraic, there is no file keyword in Python 3, and as it is a "bad builtin", it's not too worrying to override it. Still, I'll name it file_ here.
EDIT2:
To explain why this works so much faster and better than the previous, original version, you need to know how lazy evaluation works.
Say we have a simple expression as follows (for Python 3) (use xrange for Python 2):
a = [i for i in range(1e12)]
This immediately evaluates 1 trillion elements into memory, overflowing your memory.
So we can use a generator to solve this:
a = (i for i in range(1e12))
Here, none of the values have been evaluated, just given the interpreter instructions on how to evaluate it. We can then iterate through each item one by one and do work on each separately, so almost nothing is in memory at a given time (only 1 integer at a time). This makes the seemingly impossible task very manageable.
The same is true with itertools: it allows you to do memory-efficient, fast operations by using iterators rather than lists or arrays to do operations.
In your example, you have 62 characters and want to do the cartesian product with 5 repeats, or 62**5 (nearly a billion elements, or over 30 gigabytes of ram). This is prohibitively large."
In order to solve this, we can use iterators.
chars = '01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVXYZ'
for i in itertools.product(chars, repeat=5):
print(i)
Here, only a single item from the cartesian product is in memory at a given time, meaning it is very memory efficient.
However, if you evaluate the full iterator using list(), it then exhausts the iterator and adds it to a list, meaning the nearly one billion combinations are suddenly in memory again. We don't need all the elements in memory at once: just 1. Which is the power of iterators.
Here are links to the itertools module and another explanation on iterators in Python 2 (mostly true for 3).

homogenization the functions can be compiled into a calculate networks?

Inside of a network, information (package) can be passed to different node(hosts), by modify it's content it can carry different meaning. The final package depends on hosts input via it's given route of network.
Now I want to implement a calculating network model can do small jobs by give different calculate path.
Prototype:
def a(p): return p + 1
def b(p): return p + 2
def c(p): return p + 3
def d(p): return p + 4
def e(p): return p + 5
def link(p, r):
p1 = p
for x in r:
p1 = x(p1)
return p1
p = 100
route = [a,c,d]
result = link(p,result)
#========
target_result = 108
if result = target_result:
# route is OK
I think finally I need something like this:
p with [init_payload, expected_target, passed_path, actual_calculated_result]
|
\/
[CHAOS of possible of functions networks]
|
\/
px [a,a,b,c,e] # ok this path is ok and match the target
Here is my questions hope may get your help:
can p carry(determin) the route(s) by inspect the function and estmated result？
(1.1 ) for example, if on the route there's a node x()
def x(p): return x / 0 # I suppose it can pass the compile
can p know in somehow this path is not good then avoid select this path?
(1.2) Another confuse is if p is a self-defined class type, the payload inside of this class essentially is a string, when it carry with a path [a,c,d], can p know a() must with a int type then avoid to select this node?'
same as 1.2 when generating the path, can I avoid such oops
def a(p): return p + 1
def b(p): return p + 2
def x(p): return p.append(1)
def y(p): return p.append(2)
full_node_list = [a,b,x,y]
path = random(2,full_node_list) # oops x,y will be trouble for the inttype P and a,b will be trouble to list type.
pls consider if the path is lambda list of functions
PS: as the whole model is not very clear in my mind the any leading and directing will be appreciated.
THANKS!

You could test each function first with a set of sample data; any function which returns consistently unusable values might then be discarded.
def isGoodFn(f):
testData = [1,2,3,8,38,73,159] # random test input
goodEnough = 0.8 * len(testData) # need 80% pass rate
try:
good = 0
for i in testData:
if type(f(i)) is int:
good += 1
return good >= goodEnough
except:
return False
If you know nothing about what the functions do, you will have to essentially do a full breadth-first tree search with error-checking at each node to discard bad results. If you have more than a few functions this will get very large very quickly. If you can guarantee some of the functions' behavior, you might be able to greatly reduce the search space - but this would be domain-specific, requiring more exact knowledge of the problem.
If you had a heuristic measure for how far each result is from your desired result, you could do a directed search to find good answers much more quickly - but such a heuristic would depend on knowing the overall form of the functions (a distance heuristic for multiplicative functions would be very different than one for additive functions, etc).

Your functions can raise TypeError if they are not satisfied with the data types they receive. You can then catch this exception and see whether you are passing an appropriate type. You can also catch any other exception type. But trying to call the functions and catching the exceptions can be quite slow.
You could also organize your functions into different sets depending on the argument type.
functions = { list : [some functions taking a list], int : [some functions taking an int]}
...
x = choose_function(functions[type(p)])
p = x(p)

I'm somewhat confused as to what you're trying to do, but: p cannot "know about" the functions until it is run through them. By design, Python functions don't specify what type of data they operate on: e.g. a*5 is valid whether a is a string, a list, an integer or a float.
If there are some functions that might not be able to operate on p, then you could catch exceptions, for example in your link function:
def link(p, r):
try:
for x in r:
p = x(p)
except ZeroDivisionError, AttributeError: # List whatever errors you want to catch
return None
return p

Where do you use generators feature in your python code?

I have studied generators feature and i think i got it but i would like to understand where i could apply it in my code.
I have in mind the following example i read in "Python essential reference" book:
# tail -f
def tail(f):
f.seek(0,2)
while True:
line = f.readline()
if not line:
time.sleep(0.1)
continue
yield line
Do you have any other effective example where generators are the best tool for the job like tail -f?
How often do you use generators feature and in which kind of functionality\part of program do you usually apply it?

I use them a lot when I implement scanners (tokenizers) or when I iterate over data containers.
Edit: here is a demo tokenizer I used for a C++ syntax highlight program:
whitespace = ' \t\r\n'
operators = '~!%^&*()-+=[]{};:\'"/?.,<>\\|'
def scan(s):
"returns a token and a state/token id"
words = {0:'', 1:'', 2:''} # normal, operator, whitespace
state = 2 # I pick ws as first state
for c in s:
if c in operators:
if state != 1:
yield (words[state], state)
words[state] = ''
state = 1
words[state] += c
elif c in whitespace:
if state != 2:
yield (words[state], state)
words[state] = ''
state = 2
words[state] += c
else:
if state != 0:
yield (words[state], state)
words[state] = ''
state = 0
words[state] += c
yield (words[state], state)
Usage example:
>>> it = scan('foo(); i++')
>>> it.next()
('', 2)
>>> it.next()
('foo', 0)
>>> it.next()
('();', 1)
>>> it.next()
(' ', 2)
>>> it.next()
('i', 0)
>>> it.next()
('++', 1)
>>>

Whenever your code would either generate an unlimited number of values or more generally if too much memory would be consumed by generating the whole list at first.
Or if it is likely that you don't iterate over the whole generated list (and the list is very large). I mean there is no point in generating every value first (and waiting for the generation) if it is not used.
My latest encounter with generators was when I implemented a linear recurrent sequence (LRS) like e.g. the Fibonacci sequence.

In all cases where I have algorithms that read anything, I use generators exclusively.
Why?
Layering in filtering, mapping and reduction rules is so much easier in a context of multiple generators.
Example:
def discard_blank( source ):
for line in source:
if len(line) == 0:
continue
yield line
def clean_end( source ):
for line in source:
yield line.rstrip()
def split_fields( source ):
for line in source;
yield line.split()
def convert_pos( tuple_source, position ):
for line in tuple_source:
yield line[:position]+int(line[position])+line[position+1:]
with open('somefile','r') as source:
data= convert_pos( split_fields( discard_blank( clean_end( source ) ) ), 0 )
total= 0
for l in data:
print l
total += l[0]
print total
My preference is to use many small generators so that a small change is not disruptive to the entire process chain.

In general, to separate data aquisition (which might be complicated) from consumption. In particular:
to concatenate results of several b-tree queries - the db part generates and executes the queries yield-ing records from each one, the consumer only sees single data items arriving.
buffering (read-ahead ) - the generator fetches data in blocks and yields single elements from each block. Again, the consumer is separated from the gory details.
Generators can also work as coroutines. You can pass data into them using nextval=g.next(data) on the 'consumer' side and data = yield(nextval) on the generator side. In this case the generator and its consumer 'swap' values. You can even make yield throw an exception within the generator context: g.throw(exc) does that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.