Running Time: Does Print Count as a Step? - python

Suppose I have a function and I want to analyze the run-time of the exact number of steps:
def function(L):
print ("Hello")
i = 0 # 1 step
while i < len(L): # 3 steps
print (L[i] + 1)
i += 2 # 2 steps
print ("Goodbye")
I was wondering if print statements count as a step?

"Step" isn't a well-defined term in this context -- as pswaminathan points out, usually what we care about isn't a precise numerical value but the mathematical behavior in a broader sense: as the problem size gets bigger (in this case, if you increase the size of the input L) does the execution time stay the same? Grow linearly? Quadratically? Exponentially?
Explicitly counting steps can be a part of that analysis -- it can be a good way to get an intuitive handle on an algorithm. But we don't have a consistent way of determining what counts as a "step" and what doesn't. You could be looking at lines of code -- but in Python, for example, a list comprehension can express a long, complicated loop in a single line. Or you could be counting CPU instructions, but in a higher-level language full of abstractions that's hopelessly difficult. So you pick a heuristic -- in straightforward code like your example, "every execution of a single line of code is a step" is a decent rule. And in that case, you'd certainly count the print statements. If you wanted to get in a little deeper, you could look at the bytecode as tobias_k suggests, to get a sense for what the instructions look like behind Python's syntax.
But there's no single agreed-upon rule. You mention it's for homework; in that case, only your instructor knows what definition they want you to use. That said, the simple answer to your question is most likely "yes, the print statements count."

If your task is to count the exact number of steps, then yes, print would count as a step. But note also that your second print is at least three steps long: list access, addition, and print.
In fact, print (and other 'atomic' statements) might actually be worth many "steps", depending on how you interpret step, e.g., CPU cycles, etc. This may be overkill for your assignment, but to be accurate, it might be worth having a look at the generated byte code. Try this:
import dis
print dis.dis(function)
This will give you the full list of more-or-less atomic steps in your function, e.g., loading a function, passing arguments to that function, popping elements from the stack, etc. According to this, even your first print is worth three steps (in Python 2.6):
2 0 LOAD_CONST 1 ('Hello')
3 PRINT_ITEM
4 PRINT_NEWLINE
How to interpret this: The first number (2) is the line number (i.e., all the following instructions are for that line alone); the central numbers (0) are jump labels (used by, e.g., if-statements and loops), followed by the actual instructions (LOAD_CONST); the right column holds the arguments for those instructions ('Hello'). For more details, see this answer.

Related

How a for in loop in python ends when there is no update statement in it? [duplicate]

This question already has an answer here:
How does the Python for loop actually work?
(1 answer)
Closed 3 months ago.
For example:
#1
val = 5
for i in range(val) :
print(i)
When the range is exhausted i.e. last value reached how python knows for in loop ends . As in other languages
#2
for(i=0;i<=5;i++){
print(i)
}
As in this exp. when i's values becomes larger than 5 false condition leads to termination of loop .
I tried reading docs of python and browsed over google but no satisfying answer. So unable to get a picture of this .
So this is actually a complicated question, but the very rough version of the answer is "the compiler/interpreter can do what it wants".
It isn't actually running the human-readable text you write at all - instead it goes through a whole pipeline of transformations. At minimum, a lexer converts the text to a sequence of symbols, and then a parser turns that into a tree of language constructs; that may then be compiled into machine code or interpreted by a virtual machine.
So, the python interpreter creates a structure that handles the underlying logic. Depending on the optimizations performed (those are really a black box, it's hard to say what they do), this may be producing structures logically equivalent to what a Java-like for loop would make, or it could actually create a data structure of numbers (that's what the range() function does on its own) and then iterate over them.
Editing to give some more foundation for what this construct even means:
Python iteration-style loops are different in how they're defined from C-style i++ sorts of loops. Python loops are intended to iterate on each element of a list or other sequence data structure - you can say, for instance, for name in listOfNames, and then use name in the following block.
When you say for i in range(x), this is the pythonic way of doing something like the C-style loop. Think of it as the reverse of
for(int i = 0; i < arr.length(); i++){
foo(arr[i[)
}
In that code block you're accessing each element of an indexible sequence arr by going through each valid index. You don't actually care about i - it's just a means to an end, a way to make sure you visit each element.
Python assumes that's what you're trying to do: the python variant is
for elem in arr:
foo(elem)
Which most people would agree is simpler, clearer and more elegant.
However, there are times when you actually do want to explicitly go number by number. To do that with a python style, you create a list of all the numbers you'll want to visit - that's what the range function does. You'll mostly see it as part of a loop statement, but it can exist independently - you can say x = range(10), and x will hold a list that consists of the numbers 0-9 inclusive.
So, where before you were incrementing a number to visit each item of a list, now you're taking a list of numbers to get incrementing values.
"How it does this" is still explanation I gave above - the parser and interpreter know how to create the nitty-gritty logic that actually creates this sequence and step through it, or possibly transform it into some logically equivalent steps.

I know the length and restricted character set of some CRC32 hashes. Does this make it easier to reverse them?

We have a bunch of CRC32 hashes which would be really nice to know the input of. Some of them are short enough that brute-forcing them is feasible; others are not. The CRC32 algorithm being used is equal to the one in Python's binascii (its implementation is spelled out at https://rosettacode.org/wiki/CRC-32#Python).
I've read these pages:
http://www.danielvik.com/2010/10/calculating-reverse-crc.html
http://www.danielvik.com/2013/07/rewinding-crc-calculating-crc-backwards.html
...and it seems to me that there's something hidden somewhere that can reduce the amount of effect to reverse these things that I just can't puzzle out.
The main reason I think we can do better than full brute force is that we know two things about the hash inputs:
We know the length of every input. I believe this means that as soon as we find a guess that equals length and has a reversed CRC value of 0, that must be correct (might not help much though). Maybe there's some other property of length caused by the algorithm that could cut down on effort that I don't see.
We also know that every input has a restricted character set of [A-Za-z_0-9] (only letters, numbers, and underscores). In addition, we know that numbers are rare, and A-Z do not appear to ever be mixed with a-z, so we can often get away with just [a-z_] in 90% of cases.
Also, the majority of these are snake_case English words/phrases (e.g. "weight" or "turn_around"). So we can also filter out any guesses that contain "qxp" or such.
Since the above links discuss how you can add any 4 chars to an input and leave the hash unchanged, one idea I thought of is to brute force the last 4 chars (most of which are invalid because they're not in the restricted charset), filter out the ones that are clearly not right (because of illegal 3-letter English combos and such), and come up with a "short"list of potentially valid last 4 chars. Then we repeat until the whole thing has been whittled down, which should be faster than pure brute force. But I can't find the leap in logic to figure out how.
If I'm going off in the wrong direction to solve this, that would be good to know as well.
the majority of these are snake_case English words/phrases (e.g. "weight" or "turn_around") - These ones could be brute-forced by using dictionary (e.g from this question) and utilities. Assuming total amount of English words up to 1M, trying (1M)^2 CRC32 looks feasible and quite fast.
Given a text file with all dictionary words, enumerating all word_word and comparing with CRC hashes could be done with e.g. Hashcat tool with instruction from here as:
hashcat64.bin -m 11500 -a 1 -j '$_' hashes.txt dictionary.txt dictionary.txt
and just testing against each word in dictionary as:
hashcat64.bin -m 11500 -a 0 hashes.txt dictionary.txt
For phrases longer than 2 words, each phrase length would be an individual case as e.g. Hashcat has no option to permutate 3 or more dictionaries (ref) . For 3-words phrases you need to generate a file with 2-words combination first (e.g. as here but in form of {}_{} ). Then combine it with 1-word dictionary: hashcat64.bin -m 11500 -a 1 -j '$_' hashes.txt two_words_dictionary.txt dictionary.txt . Next, 4-words phrases could be brute forced as: hashcat64.bin -m 11500 -a 1 -j '$_' hashes.txt two_words_dictionary.txt two_words_dictionary.txt , etc. (Another option would be to pipe combinations to Hashcat as combine_scrip ... | hashcat64.bin -m 11500 -a 0 hashes.txt but as CRC32 is very fast to check, pipe would be a bottleneck here, using dictionary files would be much faster than piping)
Of course n-words permutation increases complexity exponentially over n (with huge base). But as dictionary is limited to some subset instead of all English words, it depends on the size of dictionary how deep is practical to go with brute force.
Let me take another direction on the question and talk about the nature of CRC.
As you know Cyclic Redundancy Check is something calculated by dividing the message (considered as a polynomial in GF(2)) by some primitive value, and it is by nature linear (a concept borrowed from coding theory).
Let me explain what I mean by linear.
Let's assume we have three messages A, B, C.
then if we have
CRC(A) = CRC(B)
then we can say
CRC(A^C)=CRC(B^C)
(meaning that CRC will be changed based on XOR).
Note that CRC is not a hash and its behaviour can be predicted.
So you don't need a complicated tool like hashcat if your space is too big.
So theoretically you can find the linear space that CRC(x) = b, by setting
x = b0 + nullspace.
b0 is some string that satisfies
CRC(b0) = expectedCRC.
(Another note, because these systems usually have the initial condition and final xor implemented in them and it means that CRC(0)!=0).
Then you can reorder the nullspace to be localized.
And then knowing your space to contain only ASCII characters and conditional sequence of characters you can search your space easily.
knowing that your space is pow(2,32) and your possible input is about pow(10,12) I would say there are too many texts that map into the same CRC.

Does Python automatically optimize/cache function calls?

I'm relatively new to Python, and I keep seeing examples like:
def max_wordnum(texts):
count = 0
for text in texts:
if len(text.split()) > count:
count = len(text.split())
return count
Is the repeated len(text.split()) somehow optimized away by the interpreter/compiler in Python, or will this just take twice the CPU cycles of storing len(text.split()) in a variable?
Duplicate expressions are not "somehow optimized away". Use a local variable to capture and re-use a result that is 'known not to change' and 'takes some not-insignificant time' to create; or where using a variable increases clarity.
In this case, it's impossible for Python to know that 'text.split()' is pure - a pure function is one with no side-effects and always returns the same value for the given input.
Trivially: Python, being a dynamically-typed language, doesn't even know the type of 'text' before it actually gets a value, so generalized optimization of this kind is not possible. (Some classes may provide their own internal 'cache optimizations', but digressing..)
As: even a language like C#, with static typing, won't/can't optimize away general method calls - as, again, there is no basic enforceable guarantee of purity in C#. (ie. What if the method returned a different value on the second call or wrote to the console?)
But: a Haskell, a Purely Functional language, has the option to not 'evaluate' the call twice, being a different language with different rules...
Even if python did optimize this (which isn't the case), the code is copy/paste all over and more difficult to maintain, so creating a variable to hold the result of a complex computation is always a good idea.
A better idea yet is to use max with a key function in this case:
return max(len(text.split()) for text in texts)
this is also faster.
Also note that len(text.split()) creates a list and you just count the items. A better way would be to count the spaces (if words are separated by only one space) by doing
return max(text.count(" ") for text in texts) + 1
if there can be more than 1 space, use regex and finditer to avoid creating lists:
return max(sum(1 for _ in re.finditer("\s+",text)) for text in texts) + 1
note the 1 value added in the end to correct the value (number of separators is one less than the number of words)
As an aside, even if the value isn't cached, you still can use complex expressions in loops with range:
for i in range(len(text.split())):
the range object is created at the start, and the expression is only evaluated once (as opposed as C loops for instance)

Python- stuck trying to create a "free hand" calculator

I'm trying to create a calculator program in which the user can type an equation and get an answer. I don't want the full code for this, I just need help with a specific part.
The approach I am trying to take is to have the user input the equation as a string (raw_input) and then I am trying to convert the numbers from their input to integers. After that I need to know how I can get the operands to do what I want them to do depending on which operand the user uses and where it is in the equation.
What are some methods I might use to accomplish this task?
Here is basically what I have right now:
equation_number = raw_input("\nEnter your equation now: ")
[int(d) for d in equation_number if d.isdigit()]
Those lines are just for collecting input and attempting to convert the numbers into integers. Unfortunately, it does not seem to be working very well and .isdigit will only work for positive numbers anyway.
Edit- aong152 mentioned recursive parsing, which I looked into, and it appears to have desirable results:
http://blog.erezsh.com/how-to-write-a-calculator-in-70-python-lines-by-writing-a-recursive-descent-parser/
However, I do not understand the code that the author of this post is using, could anyone familiarize me with the basics of recursive parsing?
The type of program you are trying to make is probably more complicated than you think
The first step would be separating the string into each argument.
Let's say that the user inputs:
1+2.0+3+4
Before you can even convert to ints, you are going to need to split the string up into its components:
1
+
2.0
+
3
+
4
This will require a recursive parser, which (seeing as you are new to python) maybe be a bit of a hurdle.
Assuming that you now have each part seperately as strings,
float("2.0") = 2.0
int(2.0) = 2
Here is a helper function
def num (s):
try:
return int(s)
except exceptions.ValueError:
return int(float(s))
instead of raw_input just use input because raw_input returns a string and input returns ints
This is a very simple calculator:
def calculate():
x = input("Equation: ")
print x
while True:
calculate()
the function takes the input and prints it then the while loop executes it again
im not sure if this is what you want but here you go and also you should make a way to end the loop
After using raw_input() you can use eval() on the result to compute the value of this string. eval() evaluates any valid Python expression and returns the outcome.
But I think this is not to your liking. You probably want to do more by yourself.
So I think you should have a look at the re module to split the input using regular expressions into tokens (sth like numbers and operators). After this you should write a parser which gets the token stream as input. You should decide whether this parser shall just return the computed value (e. g. a number) or maybe an abstract syntax tree, i. e. a data structure which represents the expression in an object-oriented (instead of character-oriented) way. Such an Absy could then be evaluated to get the final result.
Are you familiar with regular expressions? If not, it's probably a good idea to first learn about them. They are the weak, non-recursive cousin of parsing. Don't go deep, just understand the building blocks — A then B, A many times, A or B.
The blog post you found is hard because it implements the parsing by hand. It's using recursive descent, which is the only way to write a parser by hand and keep your sanity, but it's still tricky.
What people do most of the time is only write a high level grammar and use a library (or code generator) to do the hard work of parsing.
Indeed he had an earlier post where he uses a library:
http://blog.erezsh.com/how-to-write-a-calculator-in-50-python-lines-without-eval/
At least the beginning should be very easy. Things to pay attention to:
How precedence arises from the structure of the grammar — add consists of muls, not vice versa.
The moment he adds a rule for parentheses:
atom: neg | number | '(' add ')';
This is where it really becomes recursive!
6-2-1 should parse as (6-2)-1, not 6-(2-1). He doesn't discuss it, but if you look
carefully, it also arises from the structure of the grammar. Don't waste tome on this; just know for future reference that this is called associativity.
The result of parsing is a tree. You can then compute its value in a bottom-up manner.
In the "Calculating!" chapter he does that, but the in a sort of magic way.
Don't worry about that.
To build a calculator yourself, I suggest you strip the problem as much as possible.
Recognizing where numbers end etc. is a bit messy. It could be part of the grammar, or done by a separate pass called lexer or tokenizer.
I suggest you skip it — require the user to type spaces around all operators and parens. Or just assume you're already given a list of the form [2.0, "*", "(", 3.0, "+", -1.0, ")"].
Start with a trivial parser(tokens) function that only handles 3-element expressions — [number, op, number].
Return a single number, the result of the computation. (I previously said parsers output a tree which is processed later. Don't worry about that, returning a number is simpler.)
Write a function that expects either a number or parentheses — in the later case it calls parser().
>>> number_or_expr([1.0, "rest..."])
(1.0, ["rest..."])
>>> number_or_expr(["(", 2.0, "+", 2.0, ")", "rest..."])
(4.0, ["rest..."])
Note that I'm now returning a second value - the remaining part of the input. Change parser() to also use this convention.
Now Rewrite parser() to call number_or_expr() instead of directly assuming tokens[0] and tokens[2] are numbers.
Viola! You now have a (mutually) recursive calculator that can compute anything — it just has to be written in verbose style with parens around everything.
Now stop and admire your code, for at least a day :-) It's still simple but has the essential recursive nature of parsing. And the code structure reflects the grammar 1:1 (which is the nice property of recursive descent. You don't want to know how the other algorithms look).
From here there many improvements possible — support 2+2+2, allow (1), precedence... — but there are 2 ways to go about it:
Improve your code step by step. You'll have to refactor a lot.
Stop working hard and use a parsing library, e.g. pyparsing.
This will allow you to experiment with grammar changes faster.

How to work with very long strings in Python?

I'm tackling project euler's problem 220 (looked easy, in comparison to some of the
others - thought I'd try a higher numbered one for a change!)
So far I have:
D = "Fa"
def iterate(D,num):
for i in range (0,num):
D = D.replace("a","A")
D = D.replace("b","B")
D = D.replace("A","aRbFR")
D = D.replace("B","LFaLb")
return D
instructions = iterate("Fa",50)
print instructions
Now, this works fine for low values, but when you put it to repeat higher then you just get a "Memory error". Can anyone suggest a way to overcome this? I really want a string/file that contains instructions for the next step.
The trick is in noticing which patterns emerge as you run the string through each iteration. Try evaluating iterate(D,n) for n between 1 and 10 and see if you can spot them. Also feed the string through a function that calculates the end position and the number of steps, and look for patterns there too.
You can then use this knowledge to simplify the algorithm to something that doesn't use these strings at all.
Python strings are not going to be the answer to this one. Strings are stored as immutable arrays, so each one of those replacements creates an entirely new string in memory. Not to mention, the set of instructions after 10^12 steps will be at least 1TB in size if you store them as characters (and that's with some minor compressions).
Ideally, there should be a way to mathematically (hint, there is) generate the answer on the fly, so that you never need to store the sequence.
Just use the string as a guide to determine a method which creates your path.
If you think about how many "a" and "b" characters there are in D(0), D(1), etc, you'll see that the string gets very long very quickly. Calculate how many characters there are in D(50), and then maybe think again about where you would store that much data. I make it 4.5*10^15 characters, which is 4500 TB at one byte per char.
Come to think of it, you don't have to calculate - the problem tells you there are 10^12 steps at least, which is a terabyte of data at one byte per character, or quarter of that if you use tricks to get down to 2 bits per character. I think this would cause problems with the one-minute time limit on any kind of storage medium I have access to :-)
Since you can't materialize the string, you must generate it. If you yield the individual characters instead of returning the whole string, you might get it to work.
def repl220( string ):
for c in string:
if c == 'a': yield "aRbFR"
elif c == 'b': yield "LFaLb"
else yield c
Something like that will do replacement without creating a new string.
Now, of course, you need to call it recursively, and to the appropriate depth. So, each yield isn't just a yield, it's something a bit more complex.
Trying not to solve this for you, so I'll leave it at that.
Just as a word of warning be careful when using the replace() function. If your strings are very large (in my case ~ 5e6 chars) the replace function would return a subset of the string (around ~ 4e6 chars) without throwing any errors.
You could treat D as a byte stream file.
Something like:-
seedfile = open('D1.txt', 'w');
seedfile.write("Fa");
seedfile.close();
n = 0
while (n
warning totally untested

Categories