Indexing vs Slicing of empty strings [duplicate] - python

Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.

You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)

For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).

Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.

Related

Why the final index using range in python is not equal to end parameter? [duplicate]

This question already has answers here:
Why are slice and range upper-bound exclusive?
(6 answers)
Closed last month.
>>> range(1,11)
gives you
[1,2,3,4,5,6,7,8,9,10]
Why not 1-11?
Did they just decide to do it like that at random or does it have some value I am not seeing?
Because it's more common to call range(0, 10) which returns [0,1,2,3,4,5,6,7,8,9] which contains 10 elements which equals len(range(0, 10)). Remember that programmers prefer 0-based indexing.
Also, consider the following common code snippet:
for i in range(len(li)):
pass
Could you see that if range() went up to exactly len(li) that this would be problematic? The programmer would need to explicitly subtract 1. This also follows the common trend of programmers preferring for(int i = 0; i < 10; i++) over for(int i = 0; i <= 9; i++).
If you are calling range with a start of 1 frequently, you might want to define your own function:
>>> def range1(start, end):
... return range(start, end+1)
...
>>> range1(1, 10)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Although there are some useful algorithmic explanations here, I think it may help to add some simple 'real life' reasoning as to why it works this way, which I have found useful when introducing the subject to young newcomers:
With something like 'range(1,10)' confusion can arise from thinking that pair of parameters represents the "start and end".
It is actually start and "stop".
Now, if it were the "end" value then, yes, you might expect that number would be included as the final entry in the sequence. But it is not the "end".
Others mistakenly call that parameter "count" because if you only ever use 'range(n)' then it does, of course, iterate 'n' times. This logic breaks down when you add the start parameter.
So the key point is to remember its name: "stop".
That means it is the point at which, when reached, iteration will stop immediately. Not after that point.
So, while "start" does indeed represent the first value to be included, on reaching the "stop" value it 'breaks' rather than continuing to process 'that one as well' before stopping.
One analogy that I have used in explaining this to kids is that, ironically, it is better behaved than kids! It doesn't stop after it supposed to - it stops immediately without finishing what it was doing. (They get this ;) )
Another analogy - when you drive a car you don't pass a stop/yield/'give way' sign and end up with it sitting somewhere next to, or behind, your car. Technically you still haven't reached it when you do stop. It is not included in the 'things you passed on your journey'.
I hope some of that helps in explaining to Pythonitos/Pythonitas!
Exclusive ranges do have some benefits:
For one thing each item in range(0,n) is a valid index for lists of length n.
Also range(0,n) has a length of n, not n+1 which an inclusive range would.
It works well in combination with zero-based indexing and len(). For example, if you have 10 items in a list x, they are numbered 0-9. range(len(x)) gives you 0-9.
Of course, people will tell you it's more Pythonic to do for item in x or for index, item in enumerate(x) rather than for i in range(len(x)).
Slicing works that way too: foo[1:4] is items 1-3 of foo (keeping in mind that item 1 is actually the second item due to the zero-based indexing). For consistency, they should both work the same way.
I think of it as: "the first number you want, followed by the first number you don't want." If you want 1-10, the first number you don't want is 11, so it's range(1, 11).
If it becomes cumbersome in a particular application, it's easy enough to write a little helper function that adds 1 to the ending index and calls range().
It's also useful for splitting ranges; range(a,b) can be split into range(a, x) and range(x, b), whereas with inclusive range you would write either x-1 or x+1. While you rarely need to split ranges, you do tend to split lists quite often, which is one of the reasons slicing a list l[a:b] includes the a-th element but not the b-th. Then range having the same property makes it nicely consistent.
The length of the range is the top value minus the bottom value.
It's very similar to something like:
for (var i = 1; i < 11; i++) {
//i goes from 1 to 10 in here
}
in a C-style language.
Also like Ruby's range:
1...11 #this is a range from 1 to 10
However, Ruby recognises that many times you'll want to include the terminal value and offers the alternative syntax:
1..10 #this is also a range from 1 to 10
Consider the code
for i in range(10):
print "You'll see this 10 times", i
The idea is that you get a list of length y-x, which you can (as you see above) iterate over.
Read up on the python docs for range - they consider for-loop iteration the primary usecase.
Basically in python range(n) iterates n times, which is of exclusive nature that is why it does not give last value when it is being printed, we can create a function which gives
inclusive value it means it will also print last value mentioned in range.
def main():
for i in inclusive_range(25):
print(i, sep=" ")
def inclusive_range(*args):
numargs = len(args)
if numargs == 0:
raise TypeError("you need to write at least a value")
elif numargs == 1:
stop = args[0]
start = 0
step = 1
elif numargs == 2:
(start, stop) = args
step = 1
elif numargs == 3:
(start, stop, step) = args
else:
raise TypeError("Inclusive range was expected at most 3 arguments,got {}".format(numargs))
i = start
while i <= stop:
yield i
i += step
if __name__ == "__main__":
main()
The range(n) in python returns from 0 to n-1. Respectively, the range(1,n) from 1 to n-1.
So, if you want to omit the first value and get also the last value (n) you can do it very simply using the following code.
for i in range(1, n + 1):
print(i) #prints from 1 to n
It's just more convenient to reason about in many cases.
Basically, we could think of a range as an interval between start and end. If start <= end, the length of the interval between them is end - start. If len was actually defined as the length, you'd have:
len(range(start, end)) == start - end
However, we count the integers included in the range instead of measuring the length of the interval. To keep the above property true, we should include one of the endpoints and exclude the other.
Adding the step parameter is like introducing a unit of length. In that case, you'd expect
len(range(start, end, step)) == (start - end) / step
for length. To get the count, you just use integer division.
Two major uses of ranges in python. All things tend to fall in one or the other
integer. Use built-in: range(start, stop, step). To have stop included would mean that the end step would be assymetric for the general case. Consider range(0,5,3). If default behaviour would output 5 at the end, it would be broken.
floating pont. This is for numerical uses (where sometimes it happens to be integers too). Then use numpy.linspace.

No index error raised for range indexing. How to raise it? [duplicate]

Why doesn't 'example'[999:9999] result in error? Since 'example'[9] does, what is the motivation behind it?
From this behavior I can assume that 'example'[3] is, essentially/internally, not the same as 'example'[3:4], even though both result in the same 'm' string.
You're correct! 'example'[3:4] and 'example'[3] are fundamentally different, and slicing outside the bounds of a sequence (at least for built-ins) doesn't cause an error.
It might be surprising at first, but it makes sense when you think about it. Indexing returns a single item, but slicing returns a subsequence of items. So when you try to index a nonexistent value, there's nothing to return. But when you slice a sequence outside of bounds, you can still return an empty sequence.
Part of what's confusing here is that strings behave a little differently from lists. Look what happens when you do the same thing to a list:
>>> [0, 1, 2, 3, 4, 5][3]
3
>>> [0, 1, 2, 3, 4, 5][3:4]
[3]
Here the difference is obvious. In the case of strings, the results appear to be identical because in Python, there's no such thing as an individual character outside of a string. A single character is just a 1-character string.
(For the exact semantics of slicing outside the range of a sequence, see mgilson's answer.)
For the sake of adding an answer that points to a robust section in the documentation:
Given a slice expression like s[i:j:k],
The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). When k is positive, i and j are reduced to len(s) if they are greater
if you write s[999:9999], python is returning s[len(s):len(s)] since len(s) < 999 and your step is positive (1 -- the default).
Slicing is not bounds-checked by the built-in types. And although both of your examples appear to have the same result, they work differently; try them with a list instead.

Why doesn't Python throw an error for slicing out of bounds? [duplicate]

This question already has answers here:
Why does substring slicing with index out of range work?
(3 answers)
Closed 9 years ago.
MATLAB throws an error for this:
>> a = [2,3,4]
>> a(3:4)
index out of bounds
If something similar is tried with Python, why isn't it illegal?
>>> a = [2,3,4]
>>> a[2:3]
[4]
Isn't the Index '3' in python out of bounds, considering Numbering starts from Zero in Python?
Slicing never raise error in python for out of bound indexes..
>>> s =[1,2,3]
>>> s[-1000:1000]
[1, 2, 3]
From the docs on string(applies to lists, tuples as well):
Degenerate slice indices are handled gracefully: an index that is too
large is replaced by the string size, an upper bound smaller than the
lower bound returns an empty string.
Docs(lists):
The slice of s from i to j is defined as the sequence of items with
index k such that i <= k < j. If i or j is greater than len(s), use
len(s). If i is omitted or None, use 0. If j is omitted or None, use
len(s). If i is greater than or equal to j, the slice is empty.
Out-of-range negative slice indices are truncated, but don’t try this for single-element (non-slice) indices:
>>> word = 'HelpA'
>>> word[-100:]
'HelpA'
As others answered, Python generally doesn't raise an exception for out-of-range slices. However, and this is important, your slice is not out-of-range. Slicing is specified as a closed-open interval, where the beginning of the interval is inclusive, and the end point is exclusive.
In other words, [2:3] is a perfectly valid slice of a three-element list, that specifies a one-element interval, beginning with index 2 and ending just before index 3. If one-after-the-last endpoint such as 3 in your example were illegal, it would be impossible to include the last element of the list in the slice.
You have a range there. As soon as one index from the range goes outside the bounds the process of extracting elements stops.
There are no errors in slicing in Python.
Because [2:3] is from 4 to the next ele - 1, which returns 4.
Slicing never raises an error. The least it can do is return an empty list/tuple/string (depending on the type of course):
>>> a[12312312:]
[]
[start:end:step]
So index 2 is 4, then end - 1 is index 2 which is 4.

List Range Refinement

I have been using the a[0:2] format for ranges but it has been bothering me that if I have a = range(0, 5) I get a[0, 1, 2, 3, 4] but if I use a[0:-1] I get a[0, 1, 2, 3].
I know if I use a[0:] I get the full range, but if I want to have the end of the range defined by a variable (example: c = -1 then a[0,c]) there is no way for me to get the full range without using a conditional statement (for instance: if c == -1: c = None).
Is there some nice format that I could use to be able to access the whole range while using variables as the limits? Or am I stuck needing a conditional statement?
Thanks.
Edit: It appears I have two options available, I can either set the variable to None conditionally or I can set the variable so that the last term is set at len(a). I am not 100% sure which way I am going to go with yet, but thank you all for your responses.
Just assign None to c:
c = None
a[2:c]
It works as you want. Actually that's how slices (not ranges) are created.
They are actually ordinary Python objects. You can even use them inside [].
a = [0, 1, 2, 3]
s = slice(2, None)
a[s] # equal to a[2:]
a[0:] is just syntactic sugar for a[0:len(a)]
Thus
c = len(a)
a[0:c] #a[:c], a[:], a[0:] all work as well
Gives you the full range.
You can use slice:
endLimit = int(raw_input("what is the limit?"))
if endLimit == 0:
endLimit = None
range(5)[slice(0,endLimit)]
slice is in fact the object the represents the [start:end:jumps] part of the range(5)[start:end:jumps].
You can use the ternary operator:
a[0:c if c>0 else None]
But that's pretty ugly.
I think the better question is "Why do you want to use -1 to get the full slice?". Python already uses -1 to get up to (but not including) the final element. Changing that behavior just leads to unexpected results for people familiar with the language. Just document the fact that if the user of this function wants to get the full slice, they should pass None.
EDIT:
Now that I understand exactly what OP is looking for, this'd be the way to do it:
a[0:c if c != 0 else len(a)]
Which uses a ternary, or if you really really really don't want to use a conditional, I am pretty certain this does exactly what you want and gives you a full slice on c = 0:
a[0:((c-1) % len(a)) + 1]
---------------------- Old Post ---------------------------
Well, I may be missing something, but you can go arbitrarily high. If you do:
a = range(5)
a[0:200]
You get:
[0, 1, 2, 3, 4]
So if c is just growing or doing it's own thing the full range will be given every time c ≥ len(a).

Please explain Python sequence reversal

Not sure where I picked this up, but it stuck and I use it all the time.
Can someone explain how this string reversal works? I use it to test for palindromic strings without converting it to a mutable type first.
>>> word = "magic"
>>> magic = word[::-1]
>>> magic
'cigam'
I would put my best guess, but I don't want to walk in with any preconceptions about the internals behind this useful trick.
The slice notation goes like this:
my_list[start:end:step]
So, when you do [::-1], it means:
start: nothing (default)
end: nothing (default)
step: -1 (descendent order)
So, you're going from the end of the list (default) to the first element (default), decreasing the index by one (-1).
So, as many answers said, there is no sorting nor in-place swapping, just slice notation.
You can have a look here - it is an extended slice.
"What's New in Python 2.3", section 15, "Extended Slices".
This "trick" is just a particular instance of applying a slice operation to a sequence. You can use it to produce a reversed copy of a list or a tuple as well. Another "trick" from the same family: [:] is often used to produce a (shallow) copy of a list.
"What's new in Python 2.3" is an unexpected entry point into the maze. Let's start at a more obvious(?) place, the current 2.X documentation for sequence objects.
In the table of sequence operations, you'll see a row with Operation = s[i:j:k], Result = "slice of s from i to j with step k", and Notes = "(3)(5)".
Note 3 says "If i or j is negative, the index is relative to the end of the string: len(s) + i or len(s) + j is substituted. But note that -0 is still 0."
Note 5 says "The slice of s from i to j with step k is defined as the sequence of items with index x = i + n*k such that 0 <= n < (j-i)/k. In other words, the indices are i, i+k, i+2*k, i+3*k and so on, stopping when j is reached (but never including j). If i or j is greater than len(s), use len(s). If i or j are omitted or None, they become “end” values (which end depends on the sign of k). Note, k cannot be zero. If k is None, it is treated like 1."
We have k == -1, so the indices used are i, i-1, i-2, i-3 and so on, stopping when j is reached (but never including j). To obtain the observed effect, the "end" value used for i must be len(s)-1, and the "end" value used for j must be -1. Thus the indices used are last, last-1, ..., 2, 1.
Another entry point is to consider how we might produce such a result for any sequence if [::-1] didn't exist in the language:
def reverse_traversal_of_sequence(s):
for x in range(len(s) - 1, -1, -1):
do_something_with(s[x])

Categories