Multiple-replace in python - python

I do the following for replacing.
import fileinput
for line in fileinput.FileInput("input.txt",inplace=1):
line = line.replace("A","A'")
print line,
But I want to do it many replaces.
Replace A with A' , B with BB, C with CX, D with KK, etc.
I can of course do this by repeating the above code many times.
But I guess that will consume a lot of time especially when input.txt is large.
How can I do this elegantly?
Emphasis added
My input is not just a str ABCD.
I need to use input.txt as input and I want to replace every occurrences of A in input.txt to A', every occurrences in input.txt of B to BB, every occurrences of C in input.txt to CX, every occurrences of D in input.txt to KK.

Use a mapping dictionary:
>>> map_dict = {'A':"A'", 'B':'BB', 'C':'CX', 'D':'KK'}
>>> strs = 'ABCDEF'
>>> ''.join(map_dict.get(c,c) for c in strs)
"A'BBCXKKEF"
In Python3 use str.translate instead of str.join:
>>> map_dict = {ord('A'):"A'", ord('B'):'BB', ord('C'):'CX', ord('D'):'KK'}
>>> strs = 'ABCDEF'
>>> strs.translate(map_dict)
"A'BBCXKKEF"

Using regular expression:
>>> import re
>>>
>>> replace_map = {
... 'A': "A'",
... 'B': 'BB',
... 'C': 'CX',
... 'D': 'KK',
... 'EFG': '.',
... }
>>> pattern = '|'.join(map(re.escape, replace_map))
>>> re.sub(pattern, lambda m: replace_map[m.group()], 'ABCDEFG')
"A'BBCXKK."

Related

Removing punctuations and spaces in a string without using regex

I used import string and string.punctuation but I realized I still have '…' after conducting string.split(). I also get '', which I don't know why I would get it after doing strip(). As far as I understand, strip() removes the peripheral spaces, so if I have spaces between a string it would not matter:
>>> s = 'a dog barks meow! # … '
>>> s.strip()
'a dog barks meow! # …'
>>> import string
>>> k = []
>>> for item in s.split():
... k.append(item.strip(string.punctuation))
...
>>> k
['a', 'dog', 'barks', 'meow', '', '…']
I would like to get rid of '', '…', the final output I'd like is ['a', 'dog', 'barks', 'meow'].
I would like to refrain from using regex, but if that's the only solution I will consider it .. for now I'm more interested in solving this without resorting to regex.
You can remove punctuation by retaining only alphanumeric characters and spaces:
s = 'a dog barks meow! # …'
print(''.join(c for c in s if c.isalnum() or c.isspace()).split())
This outputs:
['a', 'dog', 'barks', 'meow']
I used the following:
s = 'a dog barks Meow! # … '
import string
p = string.punctuation+'…'
k = []
for item in s.split():
k.append(item.strip(p).lower())
k = [x for x in k if x]
building on the accepted answer to this question:
import itertools
k = []
for ok, grp in itertools.groupby(s, lambda c: c.isalnum()):
if ok:
k.append(''.join(list(grp)))
or the same as a one-liner (except for the import):
k = [''.join(list(grp)) for ok, grp in itertools.groupby(s, lambda c: c.isalnum()) if ok]
itertools.groupby() scans the string s as a list of characters, grouping them (grp) by the value (ok) of the lambda expression. The if ok filters out the groups not matching the lambda. The groups are iterators that have to be converted to a list of characters and then joined to get back the words.
The meaning of isalnum() is essentially “is alphanumeric”. Depending on your use case, you might prefer isalpha(). In both cases, for this input:
s = 'a 狗 barks meow! # …'
the output is
['a', '狗', 'barks', 'meow']
(For experts: this reminds us of the problem that not in all languages words are separated by non-word characters - e.g.)

Find "one letter that appears twice" in a string

I'm trying to catch if one letter that appears twice in a string using RegEx (or maybe there's some better ways?), for example my string is:
ugknbfddgicrmopn
The output would be:
dd
However, I've tried something like:
re.findall('[a-z]{2}', 'ugknbfddgicrmopn')
but in this case, it returns:
['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn'] # the except output is `['dd']`
I also have a way to get the expect output:
>>> l = []
>>> tmp = None
>>> for i in 'ugknbfddgicrmopn':
... if tmp != i:
... tmp = i
... continue
... l.append(i*2)
...
...
>>> l
['dd']
>>>
But that's too complex...
If it's 'abbbcppq', then only catch:
abbbcppq
^^ ^^
So the output is:
['bb', 'pp']
Then, if it's 'abbbbcppq', catch bb twice:
abbbbcppq
^^^^ ^^
So the output is:
['bb', 'bb', 'pp']
You need use capturing group based regex and define your regex as raw string.
>>> re.search(r'([a-z])\1', 'ugknbfddgicrmopn').group()
'dd'
>>> [i+i for i in re.findall(r'([a-z])\1', 'abbbbcppq')]
['bb', 'bb', 'pp']
or
>>> [i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]
['bb', 'bb', 'pp']
Note that , re.findall here should return the list of tuples with the characters which are matched by the first group as first element and the second group as second element. For our case chars within first group would be enough so I mentioned i[0].
As a Pythonic way You can use zip function within a list comprehension:
>>> s = 'abbbcppq'
>>>
>>> [i+j for i,j in zip(s,s[1:]) if i==j]
['bb', 'bb', 'pp']
If you are dealing with large string you can use iter() function to convert the string to an iterator and use itertols.tee() to create two independent iterator, then by calling the next function on second iterator consume the first item and use call the zip class (in Python 2.X use itertools.izip() which returns an iterator) with this iterators.
>>> from itertools import tee
>>> first = iter(s)
>>> second, first = tee(first)
>>> next(second)
'a'
>>> [i+j for i,j in zip(first,second) if i==j]
['bb', 'bb', 'pp']
Benchmark with RegEx recipe:
# ZIP
~ $ python -m timeit --setup "s='abbbcppq'" "[i+j for i,j in zip(s,s[1:]) if i==j]"
1000000 loops, best of 3: 1.56 usec per loop
# REGEX
~ $ python -m timeit --setup "s='abbbcppq';import re" "[i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]"
100000 loops, best of 3: 3.21 usec per loop
After your last edit as mentioned in comment if you want to only match one pair of b in strings like "abbbcppq" you can use finditer() which returns an iterator of matched objects, and extract the result with group() method:
>>> import re
>>>
>>> s = "abbbcppq"
>>> [item.group(0) for item in re.finditer(r'([a-z])\1',s,re.I)]
['bb', 'pp']
Note that re.I is the IGNORECASE flag which makes the RegEx match the uppercase letters too.
Using back reference, it is very easy:
import re
p = re.compile(ur'([a-z])\1{1,}')
re.findall(p, u"ugknbfddgicrmopn")
#output: [u'd']
re.findall(p,"abbbcppq")
#output: ['b', 'p']
For more details, you can refer to a similar question in perl: Regular expression to match any character being repeated more than 10 times
It is pretty easy without regular expressions:
In [4]: [k for k, v in collections.Counter("abracadabra").items() if v==2]
Out[4]: ['b', 'r']
Maybe you can use the generator to achieve this
def adj(s):
last_c = None
for c in s:
if c == last_c:
yield c * 2
last_c = c
s = 'ugknbfddgicrmopn'
v = [x for x in adj(s)]
print(v)
# output: ['dd']
"or maybe there's some better ways"
Since regex is often misunderstood by the next developer to encounter your code (may even be you),
And since simpler != shorter,
How about the following pseudo-code:
function findMultipleLetters(inputString) {
foreach (letter in inputString) {
dictionaryOfLettersOccurrance[letter]++;
if (dictionaryOfLettersOccurrance[letter] == 2) {
multipleLetters.add(letter);
}
}
return multipleLetters;
}
multipleLetters = findMultipleLetters("ugknbfddgicrmopn");
A1 = "abcdededdssffffccfxx"
print A1[1]
for i in range(len(A1)-1):
if A1[i+1] == A1[i]:
if not A1[i+1] == A1[i-1]:
print A1[i] *2
>>> l = ['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn']
>>> import re
>>> newList = [item for item in l if re.search(r"([a-z]{1})\1", item)]
>>> newList
['dd']

Splitting strings in python, then joining them into so that each string is one substring longer than the next

Basically what I want to do is take a string like this:
-o-pp-gg-s-h
then turn it into the series of strings:
-o
-o-pp
-o-pp-gg
-o-pp-gg-s
-o-pp-gg-s-h
I know that I could do this by splitting the string (str.split('-')), then having a loop that joins the substrings to produce that output ('-'.join(lst)). However, is there a more elegant way to do this in Python?
List comprehension!
s = "-o-pp-gg-s-h"
ss = s.split("-")
series = ["-".join(ss[:x]) for x in range(2,len(ss)+1)]
Is this elegant enough?
>>> s='-o-pp-gg-s-h'
>>> nlist=s.split('-')
>>> for i in range(len(nlist)):
... print '-'.join(nlist[:i])
...
-o
-o-pp
-o-pp-gg
-o-pp-gg-s
>>>
>>> a
'-o-pp-gg-s-h'
>>> b=a.split('-')
>>> b
['', 'o', 'pp', 'gg', 's', 'h']
>>> for i in range(len(b)+1):
... print '-'.join(b[0:i])
-o
-o-pp
-o-pp-gg
-o-pp-gg-s
-o-pp-gg-s-h
Only for variety, using accumulate (in modern Python):
In [23]: s = '-o-pp-gg-s-h'
In [24]: from itertools import accumulate
In [25]: list(accumulate(s.split('-'), lambda x,y: x+'-'+y))[1:]
Out[25]: ['-o', '-o-pp', '-o-pp-gg', '-o-pp-gg-s', '-o-pp-gg-s-h']

Split line in Python 2.7

I want to split line with Python W03*17*65.68*KG*0.2891*CR*1*1N and then capture
Value qty as 17
Value kg as 65,68
Tried with split
myarray = Split(strSearchString, "*")
a = myarray(0)
b = myarray(1)
Thanks for your help
split is a method of the string itself, and you can access elements of a list with [42], not the method call (42)doc. Try:
s = 'W03*17*65.68*KG*0.2891*CR*1*1N'
lst = s.split('*')
qty = lst[1]
weight = lst[2]
weight_unit = lst[3]
You may also be interested in tuple unpacking:
s = 'W03*17*65.68*KG*0.2891*CR*1*1N'
_,qty,weight,weight_unit,_,_,_,_ = s.split('*')
You can even use a slice:
s = 'W03*17*65.68*KG*0.2891*CR*1*1N'
qty,weight,weight_unit = s.split('*')[1:4]
>>> s = "W03*17*65.68*KG*0.2891*CR*1*1N"
>>> lst = s.split("*")
>>> lst[1]
'17'
>>> lst[2]
'65.68'
You need to invoke split method on a certain string to split it. Just using Split(my_str, "x") won't work: -
>>> my_str = "Python W03*17*65.68*KG*0.2891*CR*1*1N"
>>> tokens = my_str.split('*')
>>> tokens
['Python W03', '17', '65.68', 'KG', '0.2891', 'CR', '1', '1N']
>>> tokens[1]
'17'
>>> tokens[2]
'65.68'
import string
myarray = string.split(strSearchString, "*")
qty = myarray[1]
kb = myarray[2]
If you'd like to capture Value qty as 17 Value kg as 65.68,
one way to solve it is using dictionary after splitting strings.
>>> s = 'W03*17*65.68*KG*0.2891*CR*1*1N'
>>> s.split('*')
['W03', '17', '65.68', 'KG', '0.2891', 'CR', '1', '1N']
>>> t = s.split('*')
>>> dict(qty=t[1],kg=t[2])
{'kg': '65.68', 'qty': '17'}
Hope it helps.
>>>s ="W03*17*65.68*KG*0.2891*CR*1*1N"
>>>my_string=s.split("*")[1]
>>> my_string
'17'
>>> my_string=s.split("*")[2]
>>> my_string
'65'

How to convert a string with comma-delimited items to a list in Python?

How do you convert a string into a list?
Say the string is like text = "a,b,c". After the conversion, text == ['a', 'b', 'c'] and hopefully text[0] == 'a', text[1] == 'b'?
Like this:
>>> text = 'a,b,c'
>>> text = text.split(',')
>>> text
[ 'a', 'b', 'c' ]
Just to add on to the existing answers: hopefully, you'll encounter something more like this in the future:
>>> word = 'abc'
>>> L = list(word)
>>> L
['a', 'b', 'c']
>>> ''.join(L)
'abc'
But what you're dealing with right now, go with #Cameron's answer.
>>> word = 'a,b,c'
>>> L = word.split(',')
>>> L
['a', 'b', 'c']
>>> ','.join(L)
'a,b,c'
The following Python code will turn your string into a list of strings:
import ast
teststr = "['aaa','bbb','ccc']"
testarray = ast.literal_eval(teststr)
I don't think you need to
In python you seldom need to convert a string to a list, because strings and lists are very similar
Changing the type
If you really have a string which should be a character array, do this:
In [1]: x = "foobar"
In [2]: list(x)
Out[2]: ['f', 'o', 'o', 'b', 'a', 'r']
Not changing the type
Note that Strings are very much like lists in python
Strings have accessors, like lists
In [3]: x[0]
Out[3]: 'f'
Strings are iterable, like lists
In [4]: for i in range(len(x)):
...: print x[i]
...:
f
o
o
b
a
r
TLDR
Strings are lists. Almost.
In case you want to split by spaces, you can just use .split():
a = 'mary had a little lamb'
z = a.split()
print z
Output:
['mary', 'had', 'a', 'little', 'lamb']
If you actually want arrays:
>>> from array import array
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> myarray = array('c', text)
>>> myarray
array('c', 'abc')
>>> myarray[0]
'a'
>>> myarray[1]
'b'
If you do not need arrays, and only want to look by index at your characters, remember a string is an iterable, just like a list except the fact that it is immutable:
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> text[0]
'a'
m = '[[1,2,3],[4,5,6],[7,8,9]]'
m= eval(m.split()[0])
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
All answers are good, there is another way of doing, which is list comprehension, see the solution below.
u = "UUUDDD"
lst = [x for x in u]
for comma separated list do the following
u = "U,U,U,D,D,D"
lst = [x for x in u.split(',')]
I usually use:
l = [ word.strip() for word in text.split(',') ]
the strip remove spaces around words.
To convert a string having the form a="[[1, 3], [2, -6]]" I wrote yet not optimized code:
matrixAr = []
mystring = "[[1, 3], [2, -4], [19, -15]]"
b=mystring.replace("[[","").replace("]]","") # to remove head [[ and tail ]]
for line in b.split('], ['):
row =list(map(int,line.split(','))) #map = to convert the number from string (some has also space ) to integer
matrixAr.append(row)
print matrixAr
split() is your friend here. I will cover a few aspects of split() that are not covered by other answers.
If no arguments are passed to split(), it would split the string based on whitespace characters (space, tab, and newline). Leading and trailing whitespace is ignored. Also, consecutive whitespaces are treated as a single delimiter.
Example:
>>> " \t\t\none two three\t\t\tfour\nfive\n\n".split()
['one', 'two', 'three', 'four', 'five']
When a single character delimiter is passed, split() behaves quite differently from its default behavior. In this case, leading/trailing delimiters are not ignored, repeating delimiters are not "coalesced" into one either.
Example:
>>> ",,one,two,three,,\n four\tfive".split(',')
['', '', 'one', 'two', 'three', '', '\n four\tfive']
So, if stripping of whitespaces is desired while splitting a string based on a non-whitespace delimiter, use this construct:
words = [item.strip() for item in string.split(',')]
When a multi-character string is passed as the delimiter, it is taken as a single delimiter and not as a character class or a set of delimiters.
Example:
>>> "one,two,three,,four".split(',,')
['one,two,three', 'four']
To coalesce multiple delimiters into one, you would need to use re.split(regex, string) approach. See the related posts below.
Related
string.split() - Python documentation
re.split() - Python documentation
Split string based on regex
Split string based on a regular expression
# to strip `,` and `.` from a string ->
>>> 'a,b,c.'.translate(None, ',.')
'abc'
You should use the built-in translate method for strings.
Type help('abc'.translate) at Python shell for more info.
Using functional Python:
text=filter(lambda x:x!=',',map(str,text))
Example 1
>>> email= "myemailid#gmail.com"
>>> email.split()
#OUTPUT
["myemailid#gmail.com"]
Example 2
>>> email= "myemailid#gmail.com, someonsemailid#gmail.com"
>>> email.split(',')
#OUTPUT
["myemailid#gmail.com", "someonsemailid#gmail.com"]

Categories