Replace multiple elements in string with str methods - python

I am trying to write a function that takes a string of DNA and returns the compliment. I have been trying to solve this for a while now and looked through the Python documentation but couldn't work it out. I have written the docstring for the function so you can see what the answer should look like. I have seen a similar question asked on this forum but I could not understand the answers. I would be grateful if someone can explain this using only str formatting and loops / if statements, as I have not yet studied dictionaries/lists in detail.
I tried str.replace but could not get it to work for multiple elements, tried nested if statements and this didn't work either. I then tried writing 4 separate for loops, but to no avail.
def get_complementary_sequence(dna):
""" (str) -> str
Return the DNA sequence that is complementary
to the given DNA sequence.
>>> get_complementary_sequence('AT')
TA
>>> get_complementary_sequence('GCTTAA')
CGAATT
"""
for char in dna:
if char == A:
dna = dna.replace('A', 'T')
elif char == T:
dna = dna.replace('T', 'A')
# ...and so on

For a problem like this, you can use string.maketrans (str.maketrans in Python 3) combined with str.translate:
import string
table = string.maketrans('CGAT', 'GCTA')
print 'GCTTAA'.translate(table)
# outputs CGAATT

You can map each letter to another letter.
You probably need not create translation table with all possible combination.
>>> M = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
>>> STR = 'CGAATT'
>>> S = "".join([M.get(c,c) for c in STR])
>>> S
'GCTTAA'
How this works:
# this returns a list of char according to your dict M
>>> L = [M.get(c,c) for c in STR]
>>> L
['G', 'C', 'T', 'T', 'A', 'A']
The method join() returns a string in which the string elements of sequence have been joined by str separator.
>>> str = "-"
>>> L = ['a','b','c']
>>> str.join(L)
'a-b-c'

Related

How to do math operations with string?

If a have the string calculation = '1+1x8'. How can I convert this into calculation = 1+1*8? I tried doing something like
for char in calculation:
if char == 'x':
calculation = calculation.replace('x', *)
# and
if char == '1':
calculation = calculation.replace('1', 1)
This clearly doesn't work, since you can't replace just one character with an integer. The entire string needs to be an integer, and if I do that it doesn't work either since I can't convert 'x' and '+' to integers
Let's use a more complicated string as an example: 1+12x8. What follows is a rough outline; you need to supply the implementation for each step.
First, you tokenize it, turning 1+12x8 into ['1', '+', '12', 'x', '8']. For this step you need to write a tokenizer or a lexical analyzer. This is the step where you define your operators and literals.
Next, you convert the token stream into a parse tree. Perhaps you represent the tree as an S-expression ['+', '1', ['x', '12', '8']] or [operator.add, 1, [operator.mul, 12, 8]]. This step requires writing a parser, which requires you to define things like the precedence of your operators.
Finally, you write an evaluator that can reduce your parse tree to a single value. Doing this in two steps might yield
[operator.add, 1, [operator.mul, 12, 8]] to [operator.add, 1, 96]
[operator.add, 1, 96] to 97
You could write something like:
def parse_exp(s):
return eval(s.replace('x','*'))
and expand for whatever other exotic symbols you want to use.
To limit the risks of eval you can also eliminate bad characters:
import string
good = string.digits + '()/*+-x'
def parse_exp(s):
s2 = ''.join([i for i in s if i in good])
return eval(s2.replace('x','*'))
Edit: additional bonus is that the in-built eval function will take care of things like parenthesis and general calculation rules :)
Edit 2: As another user pointed out, evalcan be dangerous. As such, only use it if your code will ever only run locally
Adding code to what chepner suggested:
Tokenize '1+12x8' -> ['1', '+', '12', 'x', '8'].
Use order of operation '/*+-' -> reduce calculation 1 + (12*8)
Return the answer
import re
import operator
operators = {
'/': operator.truediv,
'x':operator.mul,
'+':operator.add,
'-':operator.sub,
}
def op(operators, data):
# apply operating to all occurrences
for p in operators:
while p in data:
x = data.index(p)
replacer = operators.get(p)(int(data[x-1]) , int(data[x+1]))
data[x-1] = replacer
del data[x:x+2]
return data[0]
def func(data):
# Tokenize
d = [i for i in re.split('(\d+)', data) if i ]
# Use order of operations
d = op(operators, d)
return d
s1 = "1+1x8"
s2 = '2-4/2+5'
s = func(s1) # 9
print(s)
t = func(s2) #-5
print(t)

What is the difference between list(a) and [a]?

I noticed a strange difference between two list constructors that I believed to be equivalent.
Here is a small example:
hello = 'Hello World'
first = list(hello)
second = [hello]
print(first)
print(second)
This code will produce the following output:
['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd']
['Hello World']
So, the difference is quite clear between the two constructors... And, I guess that this could be generalized to other constructors as well, but I fail to understand the logic behind it.
Can somebody cast its lights upon my interrogations?
The list() constructor function takes exactly one argument, which must be an iterable. It returns a new list with each element being an element from the given iterable. Since strings are iterable (by character), a list with individual characters is returned.
[] takes as many "arguments" as you like, each being a single element in the list; the items are not "evaluated" or iterated, they are taken as is.
Everything as documented.
The first just transform the list "Hello world" (an character array) into a list
first = list(hello)
The second create a list with element inside brackets.
first = [hello]
In the second case for example you could also do:
first = [hello, 'hi', 'world']
and as output of the print you will get
['Hello World', 'hi', 'world']
your "first" uses the list method, which takes in hello and treats it as an iterable, converting it to a list. Which is why each chararcter is seperate.
your "second" creates a new list, using the string as its value
You are assuming that list(hello) should create a list containing one element, the object referred to by hello. That's not true; by that logic you would expect list(5) to return [5]. list takes a single iterable argument (a list, a tuple, a string, a dict, etc) and returns a list whose elements are taken from the given iterable.
The bracket notation, however, is not limited to containing a single item. Each comma-separated object is treated as a distinct element for the new list.
The most important distinction of these 2 behaviours comes when you work with generators. Given that Python 3 transformed things like map and zip into generators ...
If we assume map returns generators:
a = list(map(lambda x: str(x), [1, 2, 3]))
print(a)
The result is:
['1', '2', '3']
But if we do:
a = [map(lambda x: str(x), [1, 2, 3])]
print(a)
The result is:
[<map object at 0x00000209231CB2E8>]
It is obvious that the 2nd case is in most situations undesirable and not expected.
P.S.
If you are in Python 2, then do at the beginning: from itertools import imap as map
first = list(hello)
converts a string into a list.
second = [hello]
this places an item into a new list. it is not a constructor

Value Incrementation Confusion

Hey am new to python development and i am fully filled with a lots of doubts since am a newbie.Suppose
s = 'something'
for something in s:
something = something + 1
print something
I know here something act as an index and it would print out the whole elements in s.
And in
s = 'something'
for something in s:
s[something] = s[something] + 1
print something
I didnt understand the correct meaning of the second part of the code..Is it possible in python??..
'
Sorry for low grade question and any help would be appreciated ..
When you loop through a string like this:
for c in 'something':
print(c)
c does not act as an index, it acts as character of the string, so the output would be:
s
o
m
e
t
h
i
n
g
If you want to loop through the indices you can do:
s = 'something'
for i in range(len(s)):
print(i)
And the output would be:
0
1
2
3
4
5
6
7
8
You can access a character from the string by indexing like this:
s = 'something'
for i in range(len(s)):
print(s[i])
And the output of that would be:
s
o
m
e
t
h
i
n
g
If you want to loop through a string so that you get the characters as well as the indices, you can use the enumerate() function:
s = 'something'
for i, c in enumerate(s):
print(i, c)
The output:
0 s
1 o
2 m
3 e
4 t
5 h
6 i
7 n
8 g
Note that strings are immutable, so you can't change them:
>>> s = 'something'
>>> s[0] = 'a'
TypeError: 'str' object does not support item assignment
When you do string concatenation, you are not actually changing the string, you are creating a new one.
EDIT 1
Strings have methods that can be called on them to do certain tasks, such as the .split() method:
>>> s = 'something'
>>> s.split('e')
['som', 'thing']
They also have some special methods like __getitem__. The following two are equivalent:
>>> s = 'something'
>>> s[0]
's'
>>> s.__getitem__(0)
's'
Other sequences like lists are mutable, so they also have a __setitem__ method:
>>> s = ['s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g']
>>> s[0] = 't'
>>> s
['t', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g']
>>> s.__setitem__(0, 's')
>>> s
['s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g']
EDIT 2
This is what happens when you try to do this s[something] = s[something] + 1:
>>> s = 'something'
>>> s[0] = s[0] + 1
TypeError: Can't convert 'int' object to str implicitly
The reason this happens is because s[0] is 's' so you are trying to add a number to a string, which doesn't make any sense. Then if you try and do s[something] = s[something] + 'a' you will get a TypeError because strings are immutable:
>>> s = 'something'
>>> s[0] = s[0] + 'a'
TypeError: 'str' object does not support item assignment
And this will definitely not work:
>>> s = 'something'
>>> s['a']
TypeError: string indices must be integers
s[something] = s[something] + 1 shouldn't work; string values are immutable.
Syntax like s += "foo" actually creates a new string value from s + "foo", then assigns it to s, releasing the original value of s to be garbage collected.
A key thing to remember about all variables in Python is that they're just references to values. There's no guarantee the values aren't pooled somewhere and have a copy-on-write semantic. Another example is that a like like x = 5 doesn't set x to 5, it creates (or otherwise obtains) the value 5 and sets x to refer to it.
For the most part this distinction really doesn't matter. In general, the Right Thing(TM) happens.
The code:
s = 'something'
for something in s:
# ...
treats s like a list of characters and sets something to each one in sequence through the loop. (This is unlike JavaScript.) If you want the indices and not just the characters, use:
s = 'something'
for i, something in enumerate(s):
# ...
so s[something] = s[something] + 1 is not possible in any situations ..right ?
It works fine for lists (e.g. [1, 2, 3]) and dictionaries (e.g. {"a": 1, "b": 2}). Just not for strings.
If you simply want to get a string where every character is replaced with the next one, first split the string with a list comprehension:
l = [c for c in s]
Replace each character with the next one:
l2 = [chr(ord(c) + 1) for c in l]
and glue them back together into a new string:
s2 = ''.join(l2)
Putting it all together:
s = 'something'
s2 = ''.join([chr(ord(c) + 1) for c in s])
The square brackets after a variable name invoke __getitem__ or __setitem__ on the variable, depending on the context. So for example, x[i] = x[i] + 1 is equivalent to x.__setitem__(i, x.__getitem__(i) + 1). You can read up about this in the docs here:
https://docs.python.org/2/reference/datamodel.html
There are several built-in types that implement one or both of these, for example strings, tuples, lists, and dictionaries. For the sequence types (strings, tuples, lists) the "item" being accessed or set is an index, so for example print 'hello'[0] would print h because you are getting the character at the first index in the list.
In this case, it looks like the second piece of code would actually cause an error because strings are not mutable. This means that string objects can't be modified, so they won't have __setitem__ implemented and s[something] = s[something] + 1 would fail. This could work with a mutable type like list or dict though, for example:
s = [1, 1, 1]
s[0] = s[0] + 1
# s is now [2, 1, 1]

How to get the first 2 letters of a string in Python?

Let's say I have a string
str1 = "TN 81 NZ 0025"
two = first2(str1)
print(two) # -> TN
How do I get the first two letters of this string? I need the first2 function for this.
It is as simple as string[:2]. A function can be easily written to do it, if you need.
Even this, is as simple as
def first2(s):
return s[:2]
In general, you can get the characters of a string from i until j with string[i:j].
string[:2] is shorthand for string[0:2]. This works for lists as well.
Learn about Python's slice notation at the official tutorial
t = "your string"
Play with the first N characters of a string with
def firstN(s, n=2):
return s[:n]
which is by default equivalent to
t[:2]
Heres what the simple function would look like:
def firstTwo(string):
return string[:2]
In python strings are list of characters, but they are not explicitly list type, just list-like (i.e. it can be treated like a list). More formally, they're known as sequence (see http://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange):
>>> a = 'foo bar'
>>> isinstance(a, list)
False
>>> isinstance(a, str)
True
Since strings are sequence, you can use slicing to access parts of the list, denoted by list[start_index:end_index] see Explain Python's slice notation . For example:
>>> a = [1,2,3,4]
>>> a[0]
1 # first element, NOT a sequence.
>>> a[0:1]
[1] # a slice from first to second, a list, i.e. a sequence.
>>> a[0:2]
[1, 2]
>>> a[:2]
[1, 2]
>>> x = "foo bar"
>>> x[0:2]
'fo'
>>> x[:2]
'fo'
When undefined, the slice notation takes the starting position as the 0, and end position as len(sequence).
In the olden C days, it's an array of characters, the whole issue of dynamic vs static list sounds like legend now, see Python List vs. Array - when to use?
All previous examples will raise an exception in case your string is not long enough.
Another approach is to use
'yourstring'.ljust(100)[:100].strip().
This will give you first 100 chars.
You might get a shorter string in case your string last chars are spaces.
For completeness: Instead of using def you could give a name to a lambda function:
first2 = lambda s: s[:2]

Alternative to python string item assignment

What is the best / correct way to use item assignment for python string ?
i.e s = "ABCDEFGH" s[1] = 'a' s[-1]='b' ?
Normal way will throw : 'str' object does not support item assignment
Strings are immutable. That means you can't assign to them at all. You could use formatting:
>>> s = 'abc{0}efg'.format('d')
>>> s
'abcdefg'
Or concatenation:
>>> s = 'abc' + 'd' + 'efg'
>>> s
'abcdefg'
Or replacement (thanks Odomontois for reminding me):
>>> s = 'abc0efg'
>>> s.replace('0', 'd')
'abcdefg'
But keep in mind that all of these methods create copies of the string, rather than modifying it in-place. If you want in-place modification, you could use a bytearray -- though that will only work for plain ascii strings, as alexis points out.
>>> b = bytearray('abc0efg')
>>> b[3] = 'd'
>>> b
bytearray(b'abcdefg')
Or you could create a list of characters and manipulate that. This is probably the most efficient and correct way to do frequent, large-scale string manipulation:
>>> l = list('abc0efg')
>>> l[3] = 'd'
>>> l
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> ''.join(l)
'abcdefg'
And consider the re module for more complex operations.
String formatting and list manipulation are the two methods that are most likely to be correct and efficient IMO -- string formatting when only a few insertions are required, and list manipulation when you need to frequently update your string.
Since strings are "immutable", you get the effect of editing by constructing a modified version of the string and assigning it over the old value. If you want to replace or insert to a specific position in the string, the most array-like syntax is to use slices:
s = "ABCDEFGH"
s = s[:3] + 'd' + s[4:] # Change D to d at position 3
It's more likely that you want to replace a particular character or string with another. Do that with re, again collecting the result rather than modifying in place:
import re
s = "ABCDEFGH"
s = re.sub("DE", "--", s)
I guess this Object could help:
class Charray(list):
def __init__(self, mapping=[]):
"A character array."
if type(mapping) in [int, float, long]:
mapping = str(mapping)
list.__init__(self, mapping)
def __getslice__(self,i,j):
return Charray(list.__getslice__(self,i,j))
def __setitem__(self,i,x):
if type(x) <> str or len(x) > 1:
raise TypeError
else:
list.__setitem__(self,i,x)
def __repr__(self):
return "charray['%s']" % self
def __str__(self):
return "".join(self)
For example:
>>> carray = Charray("Stack Overflow")
>>> carray
charray['Stack Overflow']
>>> carray[:5]
charray['Stack']
>>> carray[-8:]
charray['Overflow']
>>> str(carray)
'Stack Overflow'
>>> carray[6] = 'z'
>>> carray
charray['Stack zverflow']
s = "ABCDEFGH" s[1] = 'a' s[-1]='b'
you can use like this
s=s[0:1]+'a'+s[2:]
this is very simple than other complex ways

Categories