Someone posted this interesting formulation, and I tried it out in a Python 3 console:
>>> (a, b) = a[b] = {}, 5
>>> a
{5: ({...}, 5)}
While there is a lot to unpack here, what I don't understand (and the semantics of interesting character formulations seems particularly hard to search for) is what the {...} means in this context? Changing the above a bit:
>>> (a, b) = a[b] = {'x':1}, 5
>>> a
{5: ({...}, 5), 'x': 1}
It is this second output that really baffles me: I would have expected the {...} to have been altered, but my nearest guess is that the , 5 implies a tuple where the first element is somehow undefined? And that is what the {...} means? If so, this is a new category of type for me in Python, and I'd like to have a name for it so I can learn more.
It's an indication that the dict recurses, i.e. contains itself. A much simpler example:
>>> a = []
>>> a.append(a)
>>> a
[[...]]
This is a list whose only element is itself. Obviously the repr can't be printed literally, or it would be infinitely long; instead, the builtin types notice when this has happened and use ... to indicate self-containment.
So it's not a special type of value, just the normal English use of "..." to mean "something was omitted here", plus braces to indicate the omitted part is a dict. You may also see it with brackets for a list, as shown above, or occasionally with parentheses for a tuple:
>>> b = [],
>>> b[0].append(b)
>>> b
([(...)],)
Python 3 provides some tools so you can do this with your own objects, in the form of reprlib.
Related
I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.
Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True
To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.
The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.
In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.
Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.
'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.
I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.
Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True
To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.
The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.
In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.
Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.
'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.
>>> sum(range(49999951,50000000))
2449998775L
Is there any possible way to avoid the L at the end of number?
You are looking at a Python literal representation of the number, which just indicates that it is a python long integer. This is normal. You do not need to worry about that L.
If you need to print such a number, the L will not normally be there.
What happens is that the Python interpreter prints the result of repr() on all return values of expressions, unless they return None, to show you what the expression did. Use print if you want to see the string result instead:
>>> sum(range(49999951,50000000))
2449998775L
>>> print sum(range(49999951,50000000))
2449998775
The L is just for you. (So you know its a long) And it is nothing to worry about.
>>> a = sum(range(49999951,50000000))
>>> a
2449998775L
>>> print a
2449998775
As you can see, the printed value (actual value) does not have the L it is only the repr (representation) that displays the L
Consult this post
How would I print a list of strings as their individual variable values?
For example, take this code:
a=1
b=2
c=3
text="abc"
splittext = text.split(text)
print(splittext)
How would I get this to output 123?
You could do this using eval, but it is very dangerous:
>>> ''.join(map(lambda x : str(eval(x)),Text))
'123'
eval (perhaps they better rename it to evil, no hard feelings, simply use it as a warning) evaluates a string as if you would have coded it there yourself. So eval('a') will fetch the value of a. The problem is that a hacker could perhaps find some trick to inject arbitrary code using this, and thus hack your server, program, etc. Furthermore by accident it can perhaps change the state of your program. So a piece of advice is "Do not use it, unless you have absolutely no other choice" (which is not the case here).
Or a less dangerous variant:
>>> ''.join(map(lambda x : str(globals()[x]),Text))
'123'
in case these are global variables (you can use locals() for local variables).
This is ugly and dangerous, because you do not know in advance what a, b and c are, neither do you have much control on what part of the program can set these variables. So it can perhaps allow code injection. As is advertised in the comments on your question, you better use a dictionary for that.
Dictionary approach
A better way to do this is using a dictionary (as #Ignacio Vazquez-Abrams was saying):
>>> dic = {'a':1,'b': 2,'c':3}
>>> ''.join(map(lambda x : str(dic[x]),Text))
'123'
List instead of string
In the above we converted the content to a string using str in the lambda-expression and used ''.join to concatenate these strings. If you are however interested in an array of "results", you can drop these constructs. For instance:
>>> map(lambda x : dic[x],Text)
[1, 2, 3]
The same works for all the above examples.
EDIT
For some reason, I later catched the fact that you want to print the valuesm, this can easily be achieved using list comprehension:
for x in Text :
print dic[x]
again you can use the same technique for the above cases.
In case you want to print out the value of the variables named in the string you can use locals (or globals, depending on what/where you want them)
>>> a=1
>>> b=2
>>> c=3
>>> s='abc'
>>> for v in s:
... print(locals()[v])
...
1
2
3
or, if you use separators in the string
>>> s='a,b,c'
>>> for v in s.split(','):
... print(locals()[v])
...
1
2
3
I am a Perl user for many years and started Python recently.
I learned that there is always "one obvious way" to do certain things. I wish to check the "one" way to translate my coding style in Perl below into Python. Thanks!
The objective is to:
Detect the existance of the pattern
If found, extract certain portion of the pattern
Perl:
if ($str =~ /my(pat1)and(pat2)/) {
my ($var1, $var2) = ($1, $2);
}
As far as I had learn for Python, below is how I am coding now. It seems to be taking more steps than Perl. That's why I have doubt about my Python code.
mySearch = re.search ( r'my(pat1)and(pat2)', str )
if mySearch:
var1 = mySearch.group(1)
var2 = mySearch.group(2)
Python doesn't prioritize pattern matching and string manipulation like perl does. These are analogous patterns, and yeah, Python's is longer (it's also got a lot of great things going for it, like the fact it's OOP and doesn't use weird magical global variables).
For the record though, you can be using tuple unpacking to make this more succinct:
var1, var2 = mySearch.groups()
Update:
Tuple unpacking
Tuple unpacking is a useful feature in Python. To understand it, let's first ask, what is a tuple. A tuple is, at its heart, an immutable sequence -- unlike with a list, you cannot append or pop or any of that stuff. Syntactically, it's very simple to declare a tuple -- it's just a few values separated by commas.
my_tuple = "I", "am", "awesome"
my_tuple[0] # "I"
my_tuple[1] # "am"
my_tuple[2] # "awesome"
People often think a tuple is in fact defined by surrounding parentheses -- my_tuple = ("I", "am", "awesome") -- but this is wrong; the parentheses are only useful insofar as they clarify or enforce a certain order of operations.
Tuple unpacking is one of the sweetest features in Python. You define a tuple data structure containing undefined names on the left, and you unpack the iterable on the right into it. The right side can contain any kind of iterable, but the shape of its contained data must exactly match the tuple structure of names on the left.
# some_var and other_var are both undefined
print some_var # NameError: some_var is undefined
print other_var # NameError: other_var is undefined
my_iterable = ["so", "cool"]
# note that 'some_var, other_var' looks a whole lot like a tuple
some_var, other_var = my_iterable
print some_var # "so"
print other_var # "cool"
Again, we don't need a list on the right but any kind of iterable -- for example, a generator:
def some_generator():
yield 1
yield 2
yield 3
a, b, c = some_generator()
print a # 1
print b # 2
print c # 3
You can even do tuple unpacking with nested data structures.
nested_list = [1, [2, 3], 4]
# note that parentheses are necessary here to delimit tuples
a, (b, c), d = nested_list
If the iterable on the right doesn't match the pattern on the left, things blow up:
# THESE EXAMPLES DON'T WORK
a, b = [1, 2, 3] # ValueError: too many values to unpack
a, b = [] # ValueError: need more than 0 values to unpack
Actually, this noisy failure makes tuple unpacking my favorite way to get an item from an iterable when I think that iterable should only have one item in it and I want my code to fail if it has more than one.
# note that the left side below is how you define a tuple of one
bank_statement, = bank_statements # we def want to blow up if too many statements
Multiple Assignment
What people think of as multiple assignment is actually just plain tuple unpacking.
a, b = 1, 2
print a # 1
print b # 2
This is nothing special. The interpreter evaluates the right hand side of the equation as a tuple -- remember, a tuple is just values (literals, variables, evaluated function calls, whatever) separated by a commas -- and then the interpreter matches it against the left side, just like it did with all the examples above.
Bringing it on home
I wrote this to explain the two different answers you were getting for this problem:
var1, var2 = mySearch.group(1), mySearch.group(2)
and
var1, var2 = mySearch.groups()
First, recognize that these two statements, for your situation -- where mySearch is a MatchObject resulting from a regex with two matching groups -- are entirely functionally equivalent.
They differ only very slightly in terms of the nature of the tuple unpacking. The first one declares a tuple on the right while the second uses the tuple returned by MatchObject.groups.
This does not really apply to your situation, but it might be useful to understand that MatchObject.group and MatchObject.groups have slightly different behavior (see here and here). MatchObject.groups returns all the 'subgroups' -- i.e. capturing groups -- that the regex encounters while MatchObject.group returns an individual group and counts the entire pattern as a group accessible at 0.
In reality, for this situation, you should use whichever of these two you think is most expressive or clearest. I personally think mentioning groups 1 and 2 on the right side is redundant and I am constantly annoyed by the fact that MatchObject.groups(0) returns the string matched by the entire pattern, thus offsetting all the 'subgroups' to one-indexing.
You can extract all groups at once and assign them to variables:
var1, var2 = mySeach.groups()
In Python you could do more than one variable assignments in a single line with comma as a delimiter.
var1, var2 = mySearch.group(1), mySearch.group(2)
Other answers says about tuple unpacking. So that would be better if you want to extract all the captured group contents to variables. If you want to grab particular groups contents, you must need to go for the method I mentioned.
va1, var2, var3 = mySearch.group(2), mySearch.group(3), mySearch.group(1)
Example:
>>> import re
>>> x = "foobarbuzzlorium"
>>> m = re.search(r'(foo)(bar.*)(lorium)', x)
>>> if m:
x, y = m.group(1), m.group(3)
print(x,y)
foo lorium