How to get the first 2 letters of a string in Python? - python

Let's say I have a string
str1 = "TN 81 NZ 0025"
two = first2(str1)
print(two) # -> TN
How do I get the first two letters of this string? I need the first2 function for this.

It is as simple as string[:2]. A function can be easily written to do it, if you need.
Even this, is as simple as
def first2(s):
return s[:2]

In general, you can get the characters of a string from i until j with string[i:j].
string[:2] is shorthand for string[0:2]. This works for lists as well.
Learn about Python's slice notation at the official tutorial

t = "your string"
Play with the first N characters of a string with
def firstN(s, n=2):
return s[:n]
which is by default equivalent to
t[:2]

Heres what the simple function would look like:
def firstTwo(string):
return string[:2]

In python strings are list of characters, but they are not explicitly list type, just list-like (i.e. it can be treated like a list). More formally, they're known as sequence (see http://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange):
>>> a = 'foo bar'
>>> isinstance(a, list)
False
>>> isinstance(a, str)
True
Since strings are sequence, you can use slicing to access parts of the list, denoted by list[start_index:end_index] see Explain Python's slice notation . For example:
>>> a = [1,2,3,4]
>>> a[0]
1 # first element, NOT a sequence.
>>> a[0:1]
[1] # a slice from first to second, a list, i.e. a sequence.
>>> a[0:2]
[1, 2]
>>> a[:2]
[1, 2]
>>> x = "foo bar"
>>> x[0:2]
'fo'
>>> x[:2]
'fo'
When undefined, the slice notation takes the starting position as the 0, and end position as len(sequence).
In the olden C days, it's an array of characters, the whole issue of dynamic vs static list sounds like legend now, see Python List vs. Array - when to use?

All previous examples will raise an exception in case your string is not long enough.
Another approach is to use
'yourstring'.ljust(100)[:100].strip().
This will give you first 100 chars.
You might get a shorter string in case your string last chars are spaces.

For completeness: Instead of using def you could give a name to a lambda function:
first2 = lambda s: s[:2]

Related

"in" statement behavior in lists vs. strings

In Python, asking if a substring exists in a string is pretty straightforward:
>>> their_string = 'abracadabra'
>>> our_string = 'cad'
>>> our_string in their_string
True
However, checking if these same characters are "in" a list fails:
>>> ours, theirs = map(list, [our_string, their_string])
>>> ours in theirs
False
>>> ours, theirs = map(tuple, [our_string, their_string])
>>> ours in theirs
False
I wasn't able to find any obvious reason why checking for elements "in" an ordered (even immutable) iterable would behave differently than a different type of ordered, immutable iterable.
For container types such as lists and tuples, x in container checks if x is an item in the container. Thus with ours in theirs, Python checks if ours is an item in theirs and finds that it is False.
Remember that a list could contain a list. (e.g [['a','b','c'], ...])
>>> ours = ['a','b','c']
>>> theirs = [['a','b','c'], 1, 2]
>>> ours in theirs
True
Are you looking to see if 'cad' is in any of the strings in a list of strings? That would like something like:
stringsToSearch = ['blah', 'foo', 'bar', 'abracadabra']
if any('cad' in s for s in stringsToSearch):
# 'cad' was in at least one string in the list
else:
# none of the strings in the list contain 'cad'
From the Python documentation, https://docs.python.org/2/library/stdtypes.html for sequences:
x in s True if an item of s is equal to x, else False (1)
x not in s False if an item of s is equal to x, else True (1)
(1) When s is a string or Unicode string object the in and not in operations act like a substring test.
For user defined classes, the __contains__ method implements this in test. list and tuple implement the basic notion. string has the added notion of 'substring'. string is a special case among the basic sequences.

What does [u'abcd', u'bcde'] mean in Python?

Used a loop to add a bunch of elements to a list with
mylist = []
for x in otherlist:
mylist.append(x[0:5])
But instead of the expected result ['x1','x2',...], I got: [u'x1', u'x2',...]. Where did the u's come from and why? Also is there a better way to loop through the other list, inserting the first six characters of each element into a new list?
The u means unicode, you probably will not need to worry about it
mylist.extend(x[:5] for x in otherlist)
The u means unicode. It's Python's internal string representation (from version ... ?).
Most times you don't need to worry about it. (Until you do.)
The answers above me already answered the "u" part - that the string is encoded in Unicode. About whether there's a better way to extract the first 6 letters from the items in a list:
>>> a = ["abcdefgh", "012345678"]
>>> b = map(lambda n: n[0:5], a);
>>> for x in b:
print(x)
abcde
01234
So, map applies a function (lambda n: n[0:5]) to each element of a and returns a new list with the results of the function for every element. More precisely, in Python 3, it returns an iterator, so the function gets called only as many times as needed (i.e. if your list has 5000 items, but you only pull 10 from the result b, lambda n: n[0:5] gets called only 10 times). In Python2, you need to use itertools.imap instead.
>>> a = [1, 2, 3]
>>> def plusone(x):
print("called with {}".format(x))
return x + 1
>>> b = map(plusone, a)
>>> print("first item: {}".format(b.__next__()))
called with 1
first item: 2
Of course, you can apply the function "eagerly" to every element by calling list(b), which will give you a normal list with the function applied to each element on creation.
>>> b = map(plusone, a)
>>> list(b)
called with 1
called with 2
called with 3
[2, 3, 4]

Slicing a list using a variable, in Python

Given a list
a = range(10)
You can slice it using statements such as
a[1]
a[2:4]
However, I want to do this based on a variable set elsewhere in the code. I can easily do this for the first one
i = 1
a[i]
But how do I do this for the other one? I've tried indexing with a list:
i = [2, 3, 4]
a[i]
But that doesn't work. I've also tried using a string:
i = "2:4"
a[i]
But that doesn't work either.
Is this possible?
that's what slice() is for:
a = range(10)
s = slice(2,4)
print a[s]
That's the same as using a[2:4].
Why does it have to be a single variable? Just use two variables:
i, j = 2, 4
a[i:j]
If it really needs to be a single variable you could use a tuple.
With the assignments below you are still using the same type of slicing operations you show, but now with variables for the values.
a = range(10)
i = 2
j = 4
then
print a[i:j]
[2, 3]
>>> a=range(10)
>>> i=[2,3,4]
>>> a[i[0]:i[-1]]
range(2, 4)
>>> list(a[i[0]:i[-1]])
[2, 3]
I ran across this recently, while looking up how to have the user mimic the usual slice syntax of a:b:c, ::c, etc. via arguments passed on the command line.
The argument is read as a string, and I'd rather not split on ':', pass that to slice(), etc. Besides, if the user passes a single integer i, the intended meaning is clearly a[i]. Nevertheless, slice(i) will default to slice(None,i,None), which isn't the desired result.
In any case, the most straightforward solution I could come up with was to read in the string as a variable st say, and then recover the desired list slice as eval(f"a[{st}]").
This uses the eval() builtin and an f-string where st is interpolated inside the braces. It handles precisely the usual colon-separated slicing syntax, since it just plugs in that colon-containing string as-is.

How do I modify a single character in a string, in Python?

How do I modify a single character in a string, in Python? Something like:
a = "hello"
a[2] = "m"
'str' object does not support item assignment.
Strings are immutable in Python. You can use a list of characters instead:
a = list("hello")
When you want to display the result use ''.join(a):
a[2] = 'm'
print ''.join(a)
In python, string are immutable. If you want to change a single character, you'll have to use slicing:
a = "hello"
a = a[:2] + "m" + a[3:]
Try constructing a list from it. When you pass an iterable into a list constructor, it will turn it into a list (this is a bit of an oversimplification, but usually works).
a = list("hello")
a[2] = m
You can then join it back up with ''.join(a).
It's because strings in python are immutable.

How do I do what strtok() does in C, in Python?

I am learning Python and trying to figure out an efficient way to tokenize a string of numbers separated by commas into a list. Well formed cases work as I expect, but less well formed cases not so much.
If I have this:
A = '1,2,3,4'
B = [int(x) for x in A.split(',')]
B results in [1, 2, 3, 4]
which is what I expect, but if the string is something more like
A = '1,,2,3,4,'
if I'm using the same list comprehension expression for B as above, I get an exception. I think I understand why (because some of the "x" string values are not integers), but I'm thinking that there would be a way to parse this still quite elegantly such that tokenization of the string a works a bit more directly like strtok(A,",\n\t") would have done when called iteratively in C.
To be clear what I am asking; I am looking for an elegant/efficient/typical way in Python to have all of the following example cases of strings:
A='1,,2,3,\n,4,\n'
A='1,2,3,4'
A=',1,2,3,4,\t\n'
A='\n\t,1,2,3,,4\n'
return with the same list of:
B=[1,2,3,4]
via some sort of compact expression.
How about this:
A = '1, 2,,3,4 '
B = [int(x) for x in A.split(',') if x.strip()]
x.strip() trims whitespace from the string, which will make it empty if the string is all whitespace. An empty string is "false" in a boolean context, so it's filtered by the if part of the list comprehension.
Generally, I try to avoid regular expressions, but if you want to split on a bunch of different things, they work. Try this:
import re
result = [int(x) for x in filter(None, re.split('[,\n,\t]', A))]
Mmm, functional goodness (with a bit of generator expression thrown in):
a = "1,2,,3,4,"
print map(int, filter(None, (i.strip() for i in a.split(','))))
For full functional joy:
import string
a = "1,2,,3,4,"
print map(int, filter(None, map(string.strip, a.split(','))))
For the sake of completeness, I will answer this seven year old question:
The C program that uses strtok:
int main()
{
char myLine[]="This is;a-line,with pieces";
char *p;
for(p=strtok(myLine, " ;-,"); p != NULL; p=strtok(NULL, " ;-,"))
{
printf("piece=%s\n", p);
}
}
can be accomplished in python with re.split as:
import re
myLine="This is;a-line,with pieces"
for p in re.split("[ ;\-,]",myLine):
print("piece="+p)
This will work, and never raise an exception, if all the numbers are ints. The isdigit() call is false if there's a decimal point in the string.
>>> nums = ['1,,2,3,\n,4\n', '1,2,3,4', ',1,2,3,4,\t\n', '\n\t,1,2,3,,4\n']
>>> for n in nums:
... [ int(i.strip()) for i in n if i.strip() and i.strip().isdigit() ]
...
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
How about this?
>>> a = "1,2,,3,4,"
>>> map(int,filter(None,a.split(",")))
[1, 2, 3, 4]
filter will remove all false values (i.e. empty strings), which are then mapped to int.
EDIT: Just tested this against the above posted versions, and it seems to be significantly faster, 15% or so compared to the strip() one and more than twice as fast as the isdigit() one
Why accept inferior substitutes that cannot segfault your interpreter? With ctypes you can just call the real thing! :-)
# strtok in Python
from ctypes import c_char_p, cdll
try: libc = cdll.LoadLibrary('libc.so.6')
except WindowsError:
libc = cdll.LoadLibrary('msvcrt.dll')
libc.strtok.restype = c_char_p
dat = c_char_p("1,,2,3,4")
sep = c_char_p(",\n\t")
result = [libc.strtok(dat, sep)] + list(iter(lambda: libc.strtok(None, sep), None))
print(result)
Why not just wrap in a try except block which catches anything not an integer?
I was desperately in need of strtok equivalent in Python. So I developed a simple one by my own
def strtok(val,delim):
token_list=[]
token_list.append(val)
for key in delim:
nList=[]
for token in token_list:
subTokens = [ x for x in token.split(key) if x.strip()]
nList= nList + subTokens
token_list = nList
return token_list
I'd guess regular expressions are the way to go: http://docs.python.org/library/re.html

Categories