I have a string and 2 arrays like below:
st="a1b2c3d"
arr1 = ['1','2','3']
arr2 = ['X','Y','Z']
I want to replace all the value of '1', '2', '3' to 'X', 'Y', 'Z'. The final string will look like:
'aXbYcZd'
So I wrote this for loop:
for i in range(0, len(arr1)):
st.replace(str(arr1[i]),str(arr2[i]))
The result is:
'aXb2c3d'
'a1bYc3d'
'a1b2cZd'
How to correctly do what I want above?
Thanks!
Use zip() to iterate through two lists simultaneously to replace values:
st = "a1b2c3d"
arr1 = ['1','2','3']
arr2 = ['X','Y','Z']
for x, y in zip(arr1, arr2):
st = st.replace(x, y)
print(st)
# aXbYcZd
str.replace() does not replace a string in-place. You need to assign returned value back to a variable.
If you're replacing characters, instead of the inefficient replace loop use str.translate with str.maketrans:
>>> table = str.maketrans('123', 'XYZ')
>>> result = 'a1b2c3d'.translate(table)
>>> result
'aXbYcZd'
maketrans requires 2 strings as arguments. If you really have a list, you can use ''.join(l) to make it into a suitable string. You need to make the table only once.
The efficiency is but one point. str.translate is the way to do this correctly in cases where you will map a => b and b => something else. If you want to replace strings then you might need to use re.sub instead.
Calling replace over and over means you have to iterate through the entire string for each replacement, which is O(m * n). Instead:
rep = dict(zip(arr1, arr2)) # make mapping, O(m)
result = ''.join(rep.get(ch, ch) for ch in st)
The first line is O(m), where m is the length of arr1 and arr2.
The second line is O(n), where n is the length of st.
In total this is O(m + n) instead of O(m * n), which is a significant win if either m or n is large.
Related
I want to multiply a list of list with a list using python 3 suppose that the list of list is of name L as follows:
L = [[45.909221207388235, 84.41069326628269], [80.6591435966521, 47.93257841035172]]
and the second list is:
S = [0.002, 0.001]
the multiplication should be: L[0][0]* S[0] and L[0][1]* S[0] then L[1][0]* S[1] and L[1][1]* S[1].
I tried the zip method :
[a*b for x,y in zip(S,L) for a,b in zip(x,y)]
But an error appears: zip argument 1 must support iteration.
the second trial was using map(lambda):
map(lambda x,y:x*y,L,S)
but the obtained results were wrong:
[9.181844241477647e-05, 0.00016882138653256538, 0.0001613182871933042, 9.586515682070343e-05]
the correct values are:
[0.09181844241477648, 0.1688213865325654, 0.0806591435966521, 0.047932578410351714]
You want to use zip, but not twice:
>>> L = [[45.909221207388235, 84.41069326628269], [80.6591435966521, 47.93257841035172]]
>>> S = [0.002, 0.001]
>>> [n*x for n, sub in zip(S, L) for x in sub]
[0.09181844241477648, 0.1688213865325654, 0.0806591435966521, 0.047932578410351714]
>>>
So, you want to pair up every number with every sublist, then multiply every number in the sublist by that main number.
Note, just in case you are using numpy (I don't think you are, and I don't think it would be reasonable to use numpy just for this), and S and L are numpy.ndarray objects, i.e.:
>>> S = np.array(S)
>>> L = np.array(L)
Then you probably just want:
>>> (S*L).ravel()
array([0.09181844, 0.08441069, 0.16131829, 0.04793258])
If I understand corresly you want to multiply each column in L by the corresponding value of S:
L = [[45.909221207388235, 84.41069326628269],
[80.6591435966521, 47.93257841035172]]
S = [0.002, 0.001]
R = [ [s*n for n in row] for s,row in zip(S,L) ]
output:
print(R)
[ [0.09181844241477648, 0.1688213865325654],
[0.0806591435966521, 0.047932578410351714]]
You should have given an example with a different number of rows than columns to make this clearer
I have an array I want to iterate through. The array consists of strings consisting of numbers and signs.
like this: €110.5M
I want to loop over it and remove all Euro sign and also the M and return that array with the strings as ints.
How would I do this knowing that the array is a column in a table?
You could just strip the characters,
>>> x = '€110.5M'
>>> x.strip('€M')
'110.5'
def sanitize_string(ss):
ss = ss.replace('$', '').replace('€', '').lower()
if 'm' in ss:
res = float(ss.replace('m', '')) * 1000000
elif 'k' in ss:
res = float(ss.replace('k', '')) * 1000
return int(res)
This can be applied to a list as follows:
>>> ls = [sanitize_string(x) for x in ["€3.5M", "€15.7M" , "€167M"]]
>>> ls
[3500000, 15700000, 167000000]
If you want to apply it to the column of a table instead:
dataFrame = dataFrame.price.apply(sanitize_string) # Assuming you're using DataFrames and the column is called 'price'
You can use a string comprehension:
numbers = [float(p.replace('€','').replace('M','')) for p in a]
which gives:
[110.5, 210.5, 310.5]
You can use a list comprehension to construct one list from another:
foo = ["€13.5M", "€15M" , "€167M"]
foo_cleaned = [value.translate(None, "€M")]
str.translate replaces all occurrences of characters in the latter string with the first argument None.
Try this
arr = ["€110.5M","€110.5M","€110.5M","€110.5M","€110.5M","€110.5M","€110.5M"]
f = [x.replace("€","").replace("M","") for x in arr]
You can call .replace() on a string as often as you like. An initial solution could be something like this:
my_array = ['€110.5M', '€111.5M', '€112.5M']
my_cleaned_array = []
for elem in my_array:
my_cleaned_array.append(elem.replace('€', '').replace('M', ''))
At this point, you still have strings in your array. If you want to return them as ints, you can write int(elem.replace('€', '').replace('M', '')) instead. But be aware that you will then lose everything after the floating point, i.e. you will end up with [110, 111, 112].
You can use Regex to do that.
import re
str = "€110.5M"
x = re.findall("\-?\d+\.\d+", str )
print(x)
I didn't quite understand the second part of the question.
Aim
I would like to generate a sequence as list in python, such as:
['s1a', 's1b', 's2a', 's2b', ..., 's10a', 's10b']
Properties:
items contain a single prefix
numbers are sorted numerical
suffix is alternating per number
Approach
To get this, I applied the following code, using an xrange and comprehensive list approach:
# prefix
p = 's'
# suffix
s = ['a', 'b']
# numbers
n = [ i + 1 for i in list(xrange(10))]
# result
[ p + str(i) + j for i, j in zip(sorted(n * len(s)), s * len(n)) ]
Question
Is there a more simple syntax to obtain the results, e.g. using itertools?
Similar to this question?
A doubled-for list comprehension can accomplish this:
['s'+str(x)+y for x in range(1,11) for y in 'ab']
itertools.product might be your friend:
all_combos = ["".join(map(str, x)) for x in itertools.product(p, n, s)]
returns:
['s1a', 's1b', 's2a', 's2b', 's3a', 's3b', 's4a', 's4b', 's5a', 's5b', 's6a', 's6b', 's7a', 's7b', 's8a', 's8b', 's9a', 's9b', 's10a', 's10b']
EDIT: as a one-liner:
all_combos = ["".join(map(str,x)) for x in itertools.product(['s'], range(1, 11), ['a', 'b'])]
EDIT 2: as pointed out in James' answer, we can change our listed string element in the product call to just strings, and itertools will still be able to iterate over them, selecting characters from each:
all_combos = ["".join(map(str,x)) for x in itertools.product('s', range(1, 11), 'ab')]
How about:
def func(prefix,suffixes,size):
k = len(suffixes)
return [prefix+str(n/k+1)+suffixes[n%k] for n in range(size*k)]
# usage example:
print func('s',['a','b'],10)
This way you can alternate as many suffixes as you want.
And of course, each one of the suffixes can be as long as you want.
You can use a double-list comprehension, where you iterate on number and suffix. You don't need to load any
Below is a lambda function that takes 3 parameters, a prefix, a number of iterations, and a list of suffixes
foo = lambda prefix,n,suffix: list(prefix+str(i)+s for s in suffix for i in range(n))
You can use it like this
foo('p',10,'abc')
Or like that, if your suffixes have more than one letter
foo('p',10,('a','bc','de'))
For maximum versatility I would do this as a generator. That way you can either create a list, or just produce the sequence items as they are needed.
Here's code that runs on Python 2 or Python 3.
def psrange(prefix, suffix, high):
return ('%s%d%s' % (prefix, i, s) for i in range(1, 1 + high) for s in suffix)
res = list(psrange('s', ('a', 'b'), 10))
print(res)
for s in psrange('x', 'abc', 3):
print(s)
output
['s1a', 's1b', 's2a', 's2b', 's3a', 's3b', 's4a', 's4b', 's5a', 's5b', 's6a', 's6b', 's7a', 's7b', 's8a', 's8b', 's9a', 's9b', 's10a', 's10b']
x1a
x1b
x1c
x2a
x2b
x2c
x3a
x3b
x3c
I'm trying to combine a string with a series of numbers as tuples to a list.
For example, starting with:
a = [12,23,45,67,89]
string = "John"
I want to turn that into:
tuples = [(12,'John'),(23,'John'),(45,'John'),(67,'John'),(89,'John')]
I tried:
string2 = string * len(a)
tuples = zip(a, string2)
but this returned:
tuples = [(12,'J'), (23,'o'), ...]
If you want to use zip(), then create a list for your string variable before multiplying:
string2 = [string] * len(a)
tuples = zip(a,string2)
string * len(a) creates one long string, and zip() then iterates over that to pull out individual characters. By multiplying a list instead, you get a list with len(a) separate references to the string value; iteration then gives you string each time.
You could also use itertools.repeat() to give you string repeatedly:
from itertools import repeat
tuples = zip(a, repeat(string))
This avoids creating a new list object, potentially quite large.
>>> a = [12,23,45,67,89]
>>> string = "John"
>>> my_tuple = [(i,string) for i in a]
>>> print my_tuple
You can iterate over each position within a string so zip causes the behavior you were seeing previously.
I have a string and I need to generate a list of the lengths of all the sub-strings terminating in a given separator.
For example: string = 'a0ddb0gf0', separator = '0', so I need to generate: lengths = [2,4,3], since len('a0')==2, len('ddb0')=4, and len('gf0')==3.
I am aware that it can be done by the following (for example):
separators = [index for index in range(len(string)) if string[index]==separator]
lengths = [separators[index+1] - separators[index] for index in range(len(separators)-1)]
But I need it to be done extremely fast (on large amounts of data). Generating an intermediate list for large amounts of data is time consuming.
Is there a solution that does this neatly and fast (py2.7)?
Fastest? Don't know. You might like to profile it.
>>> print [len(s) for s in 'a0ddb0gf0'.split('0')]
[1, 3, 2, 0]
And, if you really don't want to include zero length strings:
>>> print [len(s) for s in 'a0ddb0gf0'.split('0') if s]
[1, 3, 2]
Personally, I love itertools.groupby()
>>> from itertools import groupby
>>> sep = '0'
>>> data = 'a0ddb0gf0'
>>> [sum(1 for i in g) for (k, g) in groupby(data, sep.__ne__) if k]
[1, 3, 2]
This groups the data according to whether each element is equal to the separator, then gets the length of each group for which the element was not equal (by summing 1's for each item in the group).
itertools functions are generally quite fast, though I don't know for sure how much better than split() this is. The one point that I think is strongly in its favor is that this can seamlessly handle multiple consecutive occurrences of the separator character. It will also handle any iterable for data, not just strings.
I don't know how fast this will go, but here's another way:
def len_pieces(s, sep):
i = 0
while True:
f = s.find(sep, i)
if f == -1:
yield len(s) - i
return
yield f - i + 1
i = f + 1
>>> [len(i) for i in re.findall('.+?0', 'a0ddb0gf0')]
[2, 4, 3]
You may use re.finditer to avoid an intermediary list, but it may not be much different in performance:
[len(i.group(0)) for i in re.finditer('.+?0', 'a0ddb0gf0')]
Maybe using an re:
[len(m.group()) for m in re.finditer('(.*?)0', s)]