I need a way to copy all of the positions of the spaces of one string to another string that has no spaces.
For example:
string1 = "This is a piece of text"
string2 = "ESTDTDLATPNPZQEPIE"
output = "ESTD TD L ATPNP ZQ EPIE"
Insert characters as appropriate into a placeholder list and concatenate it after using str.join.
it = iter(string2)
output = ''.join(
[next(it) if not c.isspace() else ' ' for c in string1]
)
print(output)
'ESTD TD L ATPNP ZQ EPIE'
This is efficient as it avoids repeated string concatenation.
You need to iterate over the indexes and characters in string1 using enumerate().
On each iteration, if the character is a space, add a space to the output string (note that this is inefficient as you are creating a new object as strings are immutable), otherwise add the character in string2 at that index to the output string.
So that code would look like:
output = ''
si = 0
for i, c in enumerate(string1):
if c == ' ':
si += 1
output += ' '
else:
output += string2[i - si]
However, it would be more efficient to use a very similar method, but with a generator and then str.join. This removes the slow concatenations to the output string:
def chars(s1, s2):
si = 0
for i, c in enumerate(s1):
if c == ' ':
si += 1
yield ' '
else:
yield s2[i - si]
output = ''.join(char(string1, string2))
You can try insert method :
string1 = "This is a piece of text"
string2 = "ESTDTDLATPNPZQEPIE"
string3=list(string2)
for j,i in enumerate(string1):
if i==' ':
string3.insert(j,' ')
print("".join(string3))
outout:
ESTD TD L ATPNP ZQ EPIE
Related
Say I have an incoming string that varies a little:
" 1 |r|=1.2e10 |v|=2.4e10"
" 12 |r|=-2.3e10 |v|=3.5e-04"
"134 |r|= 3.2e10 |v|=4.3e05"
I need to extract the numbers (ie. 1.2e10, 3.5e-04, etc)... so I would like to start at the end of '|r|' and grab all characters up to the ' ' (space) after it. Same for '|v|'
I've been looking for something that would:
Extract a substring form a string starting at an index and ending on a specific character...
But have not found anything remotely close.
Ideas?
NOTE: Added new scenario, which is the one that is causing lots of head-scratching...
To keep it elegant and generic, let's utilize split:
First, we split by ' ' to tokens
Then we find if it has an equal sign and parse the key-value
import re
sabich = "134 |r| = 3.2e10 |v|=4.3e05"
parts = sabich.split(' |')
values = {}
for p in parts:
if '=' in p:
k, v = p.split('=')
values[k.replace('|', '').strip()] = v.strip(' ')
# {'r': '3.2e10', 'v': '4.3e05'}
print(values)
This can be converted to the one-liner:
import re
sabich = "134 |r| = 3.2e10 |v|=4.3e05"
values = {t[0].replace('|', '').strip() : t[1].strip(' ') for t in [tuple(p.split('=')) for p in sabich.split(' |') if '=' in p]}
# {'|r|': '1.2e10', '|v|': '2.4e10'}
print(values)
You can solve it with a regular expression.
import re
strings = [
" 1 |r|=1.2e10 |v|=2.4e10",
" 12 |r|=-2.3e10 |v|=3.5e-04"
]
out = []
pattern = r'(?P<name>\|[\w]+\|)=(?P<value>-?\d+(?:\.\d*)(?:e-?\d*)?)'
for s in strings:
out.append(dict(re.findall(pattern, s)))
print(out)
Output
[{'|r|': '1.2e10', '|v|': '2.4e10'}, {'|r|': '-2.3e10', '|v|': '3.5e-04'}]
And if you want to convert the strings to number
out = []
pattern = r'(?P<name>\|[\w]+\|)=(?P<value>-?\d+(?:\.\d*)(?:e-?\d*)?)'
for s in strings:
# out.append(dict(re.findall(pattern, s)))
out.append({
name: float(value)
for name, value in re.findall(pattern, s)
})
Output
[{'|r|': 12000000000.0, '|v|': 24000000000.0}, {'|r|': -23000000000.0, '|v|': 0.00035}]
I have the following string:
string1 = "1/0/1/A1,A2"
string2 = "1/1/A1,A2"
string3 = "0/A1,A2"
In the above strings I have to replace the character with zero if it does not exist. The default structure will be "number/number/number/any_character`", if any of number is missing It has to replace with zero. The answer will be as follows.
print(string1) = "1/0/1/A1,A2"
print(string2) = "1/1/0/A1,A2"
print(string3) = "0/0/0/A1,A2"
You can use str.split:
def pad_string(_input, _add='0'):
*_vals, _str = _input.split('/')
return '/'.join([*_vals, *([_add]*(3-len(_vals))), _str])
results = list(map(pad_string, ['1/0/1/A1,A2', '1/1/A1,A2', '0/A1,A2']))
Output:
['1/0/1/A1,A2', '1/1/0/A1,A2', '0/0/0/A1,A2']
You can easily fill missing elements from the left:
def fillZeros(item):
chunks = item.split('/')
for inserts in range(0, 4 - len(chunks)):
chunks.insert(0, '0')
return '/'.join(chunks)
string1 = "1/0/1/A1,A2"
string2 = "1/1/A1,A2"
string3 = "0/A1,A2"
for myString in (string1, string2, string3):
print fillZeros(myString)
Prints:
1/0/1/A1,A2
0/1/1/A1,A2
0/0/0/A1,A2
But for you string2 example you need to identify which element is missing: 1/1/A1,A2. Is the first or the third element missing ?!
If you want to use just string manipulation and loops, try this
strings_list = []
for string in [string1, string2, string3]: # make list containing all strings
strings_list.append(string)
new_strings = [] # make list containing the new strings
for string in strings_list:
if string.count("0/") + string.count("1/") == 3:
# identify the strings not missing a number
new_strings.append(string)
if string.count("0/") + string.count("1/") == 2:
# identify the strings missing 1 number
string = string[:4] + "0/" + string[4:]
new_strings.append(string)
if string.count("0/") + string.count("1/") == 1:
# identify the strings missing 2 numbers
string = string[:2] + "0/" + string[2:]
new_strings.append(string)
print(new_strings)
This results in ['1/0/1/A1,A2', '1/1/0/A1,A2', '0/0/A1,A2'].
import re
string = "is2 Thi1s T4est 3a"
def order(sentence):
res = ''
count = 1
list = sentence.split()
for i in list:
for i in list:
a = re.findall('\d+', i)
if a == [str(count)]:
res += " ".join(i)
count += 1
print(res)
order(string)
Above there is a code which I have problem with. Output which I should get is:
"Thi1s is2 3a T4est"
Instead I'm getting the correct order but with spaces in the wrong places:
"T h i 1 si s 23 aT 4 e s t"
Any idea how to make it work with this code concept?
You are joining the characters of each word:
>>> " ".join('Thi1s')
'T h i 1 s'
You want to collect your words into a list and join that instead:
def order(sentence):
number_words = []
count = 1
words = sentence.split()
for word in words:
for word in words:
matches = re.findall('\d+', word)
if matches == [str(count)]:
number_words.append(word)
count += 1
result = ' '.join(number_words)
print(result)
I used more verbose and clear variable names. I also removed the list variable; don't use list as a variable name if you can avoid it, as that masks the built-in list name.
What you implemented comes down to a O(N^2) (quadratic time) sort. You could instead use the built-in sort() function to bring this to O(NlogN); you'd extract the digit and sort on its integer value:
def order(sentence):
digit = re.compile(r'\d+')
return ' '.join(
sorted(sentence.split(),
key=lambda w: int(digit.search(w).group())))
This differs a little from your version in that it'll only look at the first (consecutive) digits, it doesn't care about the numbers being sequential, and will break for words without digits. It also uses a return to give the result to the caller rather than print. Just use print(order(string)) to print the return value.
If you assume the words are numbered consecutively starting at 1, then you can sort them in O(N) time even:
def order(sentence):
digit = re.compile(r'\d+')
words = sentence.split()
result = [None] * len(words)
for word in words:
index = int(digit.search(word).group())
result[index - 1] = word
return ' '.join(result)
This works by creating a list of the same length, then using the digits from each word to put the word into the correct index (minus 1, as Python lists start at 0, not 1).
I think the bug is simply in the misuse of join(). You want to concatenate the current sorted string. i is simply a token, hence simply add it to the end of the string. Code untested.
import re
string = "is2 Thi1s T4est 3a"
def order(sentence):
res = ''
count = 1
list = sentence.split()
for i in list:
for i in list:
a = re.findall('\d+', i)
if a == [str(count)]:
res = res + " " + i # your bug here
count += 1
print(res)
order(string)
Let
s = 'hello you blablablbalba qyosud'
i = 17
How to get the word around position i? i.e. blablablbalba in my example.
I was thinking about this, but it seems unpythonic:
for j, c in enumerate(s):
if c == ' ':
if j < i:
start = j
else:
end = j
break
print start, end
print s[start+1:end]
Here is another simple approach with regex,
import re
s = 'hello you blablablbalba qyosud'
i = 17
string_at_i = re.findall(r"(\w+)", s[i:])[0]
print(re.findall(r"\w*%s\w*" % string_at_i, s))
Updated : Previous pattern was failing when there is space. Current pattern takes care of it !
To answer your first question,
p = s[0 : i].rfind(' ')
Output: 9
For your second question,
s[ p + 1 : (s[p + 1 : ].find(' ') + p + 1) ]
Output: 'blablablbalba'
Description:
Extract the string from the starting to the ith position.
Find the index of the last occurrence of space. This will be your starting point for your required word (the second question).
Go from here to the next occurrence of space and extract the word in between.
The following consolidated code should work in all scenarios:
s = s + ' '
p = s[0 : i].rfind(' ')
s[ p + 1 : (s[p + 1 : ].find(' ') + p + 1) ]
You can split the word by space, after that you count the number of the spaces until the threshold parameter (i) and this would be the index of the item in the splitted list.
Solution:
print (s.split()[s[:i].count(" ")])
EDIT:
If we have more than one space between words and we want to consider two spaces (or more) as one space we can do:
print (s.split()[" ".join(s[:i].split()).count(" ")])
Output:
blablablbalba
Explanation:
This return's 2 as there are two spaces until the 17 index.
s[:i].count(" ") # return's 2
This return's a list splitted by space.
s.split()
What you need is the index of the relevant item, which you got from s[:i].count(" ")
['hello', 'you', 'blablablbalba', 'qyosud']
def func(s, i):
s1 = s[0:i]
k = s1.rfind(' ')
pos1 = k
s1 = s[k+1:len(s)]
k = s1.find(' ')
k = s[pos1+1:pos1+k+1]
return k
s = 'hello you blablablbalba qyosud'
i = 17
k = func(s, i)
print(k)
output:
blablablbalba
You can use index or find to get the index of the space starting from a precise position. In this case it will look for the space character position starting from start+1. Then, if it finds any space it will print out the word between the two indexes start and end
s = 'hello you blablablbalba qyosud'
def get_word(my_string, start_index):
end = -1
try:
end = s.find(' ', start_index + 1)
except ValueError:
# no second space was found
pass
return s[start_index:end] if end else None
print get_word(s)
Output: 'blablablbalba'
You can use rfind to search for the previous whitespace including s[i]:
>>> s = 'hello you blablablbalba qyosud'
>>> i = 17
>>> start = s.rfind(' ', 0, i + 1)
>>> start
9
Then you can use find to search the following whitespace again including s[i]:
>>> end = s.find(' ', i)
>>> end
23
And finally use slice to generate the word:
>>> s[start+1:(end if end != -1 else None)]
'blablablbalba'
Above will result to the word in case s[i] is not whitespace. In case s[i] is whitespace the result is empty string.
I have some strings that I want to delete some unwanted characters from them.
For example: Adam'sApple ----> AdamsApple.(case insensitive)
Can someone help me, I need the fastest way to do it, cause I have a couple of millions of records that have to be polished.
Thanks
One simple way:
>>> s = "Adam'sApple"
>>> x = s.replace("'", "")
>>> print x
'AdamsApple'
... or take a look at regex substitutions.
Here is a function that removes all the irritating ascii characters, the only exception is "&" which is replaced with "and". I use it to police a filesystem and ensure that all of the files adhere to the file naming scheme I insist everyone uses.
def cleanString(incomingString):
newstring = incomingString
newstring = newstring.replace("!","")
newstring = newstring.replace("#","")
newstring = newstring.replace("#","")
newstring = newstring.replace("$","")
newstring = newstring.replace("%","")
newstring = newstring.replace("^","")
newstring = newstring.replace("&","and")
newstring = newstring.replace("*","")
newstring = newstring.replace("(","")
newstring = newstring.replace(")","")
newstring = newstring.replace("+","")
newstring = newstring.replace("=","")
newstring = newstring.replace("?","")
newstring = newstring.replace("\'","")
newstring = newstring.replace("\"","")
newstring = newstring.replace("{","")
newstring = newstring.replace("}","")
newstring = newstring.replace("[","")
newstring = newstring.replace("]","")
newstring = newstring.replace("<","")
newstring = newstring.replace(">","")
newstring = newstring.replace("~","")
newstring = newstring.replace("`","")
newstring = newstring.replace(":","")
newstring = newstring.replace(";","")
newstring = newstring.replace("|","")
newstring = newstring.replace("\\","")
newstring = newstring.replace("/","")
return newstring
Any characters in the 2nd argument of the translate method are deleted:
>>> "Adam's Apple!".translate(None,"'!")
'Adams Apple'
NOTE: translate requires Python 2.6 or later to use None for the first argument, which otherwise must be a translation string of length 256. string.maketrans('','') can be used in place of None for pre-2.6 versions.
Try:
"Adam'sApple".replace("'", '')
One step further, to replace multiple characters with nothing:
import re
print re.sub(r'''['"x]''', '', '''a'"xb''')
Yields:
ab
str.replace("'","");
As has been pointed out several times now, you have to either use replace or regular expressions (most likely you don't need regexes though), but if you also have to make sure that the resulting string is plain ASCII (doesn't contain funky characters like é, ò, µ, æ or φ), you could finally do
>>> u'(like é, ò, µ, æ or φ)'.encode('ascii', 'ignore')
'(like , , , or )'
An alternative that will take in a string and an array of unwanted chars
# function that removes unwanted signs from str
#Pass the string to the function and an array ofunwanted chars
def removeSigns(str,arrayOfChars):
charFound = False
newstr = ""
for letter in str:
for char in arrayOfChars:
if letter == char:
charFound = True
break
if charFound == False:
newstr += letter
charFound = False
return newstr
Let's say we have the following list:
states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'south carolina##', 'West virginia?']
Now we will define a function clean_strings()
import re
def clean_strings(strings):
result = []
for value in strings:
value = value.strip()
value = re.sub('[!#?]', '', value)
value = value.title()
result.append(value)
return result
When we call the function clean_strings(states)
The result will look like:
['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']
I am probably late for the answer but i think below code would also do ( to an extreme end)
it will remove all the unncesary chars:
a = '; niraj kale 984wywn on 2/2/2017'
a= re.sub('[^a-zA-Z0-9.?]',' ',a)
a = a.replace(' ',' ').lstrip().rstrip()
which will give
'niraj kale 984wywn on 2 2 2017'