Python std.stdout.write loop problems - python

I'm not sure of what's going on here, but I have some python code:
import sys
max_cols = 350
max_rows = 1
r1 = range(max_rows)
r2 = range(max_cols)
for y in r1:
for x in r2:
sys.stdout.write('something')
if x is not max_cols-1:
sys.stdout.write(',')
Now, this works fine for values of max_cols <= 257.
However, if you use >= 258, you end up with an extra ',' at the end.
(The idea here is obviously to generate a CSV file.)
Now, 256 is a CS number, so there's clearly something going on here that I'm unaware of, since everything works perfectly up until that point. This also happens when I try to write to a file using the same pattern.
Why does this happen?
Using Python 3.2.

is is not for checking equality but for checking identity. x is y is only true if both variables refer to the same object. As it happens, CPython resuses objects for small integers - but in general, the concept of identity is very different from the concept of equality. Use the correct operators, == and != for equality and inequality respectively, and it works.
Also note that the code can be made much simpler and robust by just using the csv module. No need to reinvent the wheel.

The CPython implementation caches small numbers, so all instances of the number 12 are the same object. The is operator compares the identities of objects, not their values. What you wanted to do was use the != operator to compare the values.
It's likely that your instance of the CPython implementation caches numbers up to 256.
Incidentally, whenever you bump into a pattern like this, where you have to drop the last separator from a list of delimited things, str.join is probably what you wanted.

Related

How is the string.join(str_list, ''") implemented under the hood in Python?

I know that concatenating two strings using the += operator makes a new copy of the old string and then concatenates the new string to that, resulting in quadratic time complexity.
This answer gives a nice time comparison between the += operation and string.join(str_list, ''). It looks like the join() method runs in linear time (correct me if I am wrong). Out of curiosity, I wanted to know how the string.join(str_list, '') method is implemented in Python since strings are immutable objects?
It's implemented in C, so python mutability is less important. You can find the appropriate source here: unicodeobject.c

Python "zfill()" equivalent in OCaml

I'm a beginner and I have to learn Ocaml for scientific programming. I just have one question:
Is there an equivalent of Python's .zfill() method in Ocaml to make leading zeros appear in a string?
Strings in OCaml are immutable. That means you're not supposed to modify a string but to create a new one.
There is no zfill function in the standard library, but you can easily make one that way:
let zfill s width =
let to_fill = width - (String.length s) in
if to_fill <= 0 then s
else (String.make to_fill '0') ^ s
I don't think there's one.
You can do it easily with build-in functions when you're working with numbers. For instance, to print the number 142857 with leading 0's over 30 characters, use Printf.printf "%030d" 142857.
You can also make it work with strings if you're fine with using leading spaces instead of leading zeros. For instance, Printf.printf "%30s" "abcdefg".
Finally if you have to, you can define your own function if need be.
The way the first two options work is by using Printf, which is an extremely useful too you really should learn at some point. Here is its documentation for OCaml, but a lot of programming languages have a similar tool.
In %030d, we started from %d which is a placeholder that will be replaced by an integer (in our case, 142857). We fixed its minimum width to 30 (right-aligned by default) by adding 30 between the two characters: %30d. Finally, we added the option to make the leading characters zeros instead of spaces by adding a 0 after the percent sign.
%30s is just a placeholder for a right-aligned string of at least 30 characters (with leading spaces, because the options for leading zeros only works with numbers).
Now here's a zfill function if for some reason you can't use a well-chosen Printf format in your scenario:
let zfill n s =
let length = Bytes.length s in
if n <= length then
s
else
let result = Bytes.make n '0' in
Bytes.blit s 0 result (n-length) length;
result
;;
Notice that if performance is an issue (though it probably isn't), this should perform faster than the solution of creating a string of zeros and then concatenating it with s, as while blit is done "in-place", string concatenation is not, so a temporary string of zeros has to be created. In most scenarios, it shouldn't matter all that much and you can use either option.

Comparison of strings in Python [duplicate]

This question already has answers here:
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
(15 answers)
Closed 9 years ago.
I noticed a Python script I was writing was acting squirrelly, and traced it to an infinite loop, where the loop condition was while line is not ''. Running through it in the debugger, it turned out that line was in fact ''. When I changed it to !='' rather than is not '', it worked fine.
Also, is it generally considered better to just use '==' by default, even when comparing int or Boolean values? I've always liked to use 'is' because I find it more aesthetically pleasing and pythonic (which is how I fell into this trap...), but I wonder if it's intended to just be reserved for when you care about finding two objects with the same id.
For all built-in Python objects (like
strings, lists, dicts, functions,
etc.), if x is y, then x==y is also
True.
Not always. NaN is a counterexample. But usually, identity (is) implies equality (==). The converse is not true: Two distinct objects can have the same value.
Also, is it generally considered better to just use '==' by default, even
when comparing int or Boolean values?
You use == when comparing values and is when comparing identities.
When comparing ints (or immutable types in general), you pretty much always want the former. There's an optimization that allows small integers to be compared with is, but don't rely on it.
For boolean values, you shouldn't be doing comparisons at all. Instead of:
if x == True:
# do something
write:
if x:
# do something
For comparing against None, is None is preferred over == None.
I've always liked to use 'is' because
I find it more aesthetically pleasing
and pythonic (which is how I fell into
this trap...), but I wonder if it's
intended to just be reserved for when
you care about finding two objects
with the same id.
Yes, that's exactly what it's for.
I would like to show a little example on how is and == are involved in immutable types. Try that:
a = 19998989890
b = 19998989889 +1
>>> a is b
False
>>> a == b
True
is compares two objects in memory, == compares their values. For example, you can see that small integers are cached by Python:
c = 1
b = 1
>>> b is c
True
You should use == when comparing values and is when comparing identities. (Also, from an English point of view, "equals" is different from "is".)
The logic is not flawed. The statement
if x is y then x==y is also True
should never be read to mean
if x==y then x is y
It is a logical error on the part of the reader to assume that the converse of a logic statement is true. See http://en.wikipedia.org/wiki/Converse_(logic)
See This question
Your logic in reading
For all built-in Python objects (like
strings, lists, dicts, functions,
etc.), if x is y, then x==y is also
True.
is slightly flawed.
If is applies then == will be True, but it does NOT apply in reverse. == may yield True while is yields False.

Python - use lists instead of strings?

From an S.O answer:
"Don't modify strings.
Work with them as lists; turn them into strings only when needed.
... code sample ...
Python strings are immutable (i.e. they can't be modified). There are a lot of reasons for this. Use lists until you have no choice, only then turn them into strings."
Is this considered best practice?
I find it a bit odd that Python has methods that return new modified strings (such as upper(), title(), replace() etc.) but doesn't have an insert method that returns a new string. Have I missed such a method?
Edit: I'm trying to rename files by inserting a character:
import os
for i in os.listdir('.'):
i.insert(3, '_')
Which doesn't work due to immutability. Adding to the beginning of a string works fine though:
for i in os.listdir('.'):
os.rename(i, 'some_random_string' + i)
Edit2: the solution:
>>> for i in os.listdir('.'): │··
... os.rename(i, i[:4] + '_' + i[4:])
Slicing certainly is nice and solves my problem, but is there a logical explanation why there is no insert() method that returns a new string?
Thanks for the help.
If you want to insert at a particular spot, you can use slices and +. For example:
a = "hello"
b = a[:2] + '_S1M0N_' + a[2:]
then b will be equal to he_S1M0N_llo.
It's at least arguably a best practice if you are doing a very large number of modifications to a string. It is not a general purpose best practice. It's simply a useful technique for solving performance problems when doing heavy string manipulation.
My advice is, don't do it until performance becomes an issue.
You can define a generic function that works on any sequence (strings, lists, tuples, etc.) using the slice syntax:
def insert(s, c, p):
return s[:p] + c + s[p:]
insert('FILE1', '_', 4)
> 'FILE_1'

How to work with very long strings in Python?

I'm tackling project euler's problem 220 (looked easy, in comparison to some of the
others - thought I'd try a higher numbered one for a change!)
So far I have:
D = "Fa"
def iterate(D,num):
for i in range (0,num):
D = D.replace("a","A")
D = D.replace("b","B")
D = D.replace("A","aRbFR")
D = D.replace("B","LFaLb")
return D
instructions = iterate("Fa",50)
print instructions
Now, this works fine for low values, but when you put it to repeat higher then you just get a "Memory error". Can anyone suggest a way to overcome this? I really want a string/file that contains instructions for the next step.
The trick is in noticing which patterns emerge as you run the string through each iteration. Try evaluating iterate(D,n) for n between 1 and 10 and see if you can spot them. Also feed the string through a function that calculates the end position and the number of steps, and look for patterns there too.
You can then use this knowledge to simplify the algorithm to something that doesn't use these strings at all.
Python strings are not going to be the answer to this one. Strings are stored as immutable arrays, so each one of those replacements creates an entirely new string in memory. Not to mention, the set of instructions after 10^12 steps will be at least 1TB in size if you store them as characters (and that's with some minor compressions).
Ideally, there should be a way to mathematically (hint, there is) generate the answer on the fly, so that you never need to store the sequence.
Just use the string as a guide to determine a method which creates your path.
If you think about how many "a" and "b" characters there are in D(0), D(1), etc, you'll see that the string gets very long very quickly. Calculate how many characters there are in D(50), and then maybe think again about where you would store that much data. I make it 4.5*10^15 characters, which is 4500 TB at one byte per char.
Come to think of it, you don't have to calculate - the problem tells you there are 10^12 steps at least, which is a terabyte of data at one byte per character, or quarter of that if you use tricks to get down to 2 bits per character. I think this would cause problems with the one-minute time limit on any kind of storage medium I have access to :-)
Since you can't materialize the string, you must generate it. If you yield the individual characters instead of returning the whole string, you might get it to work.
def repl220( string ):
for c in string:
if c == 'a': yield "aRbFR"
elif c == 'b': yield "LFaLb"
else yield c
Something like that will do replacement without creating a new string.
Now, of course, you need to call it recursively, and to the appropriate depth. So, each yield isn't just a yield, it's something a bit more complex.
Trying not to solve this for you, so I'll leave it at that.
Just as a word of warning be careful when using the replace() function. If your strings are very large (in my case ~ 5e6 chars) the replace function would return a subset of the string (around ~ 4e6 chars) without throwing any errors.
You could treat D as a byte stream file.
Something like:-
seedfile = open('D1.txt', 'w');
seedfile.write("Fa");
seedfile.close();
n = 0
while (n
warning totally untested

Categories