Replacing all multispaces with single spaces - python

For example
s = "a b c d e f "
Needs to be reduced to
s = "a b c d e f "
Right now I do something like this
for i in xrange(arbitrarilyHighNumber,1,-1):
s = s.replace(" "*i," ")
But I want to make it more dynamic and Pythonic (and assume any number of spaces, too). How can I replace every contiguous space threshold with a single space?

You can use re.sub:
>>> import re
>>> s = "a b c d e f "
>>> re.sub('\s{2,}', ' ', s)
'a b c d e f '
>>>
\s{2,} matches two or more whitespace characters.

Since the regular expression answer has already been given. You could also do it with iterative replacements.
while s.find(" ") is not -1:
s = s.replace(" ", " ")
My original answer of splitting and rejoining gets rid of the leading and trailing whitespaces
' '.join(s.split())

Related

How to add spaces in a string [duplicate]

This question already has answers here:
Efficient way to add spaces between characters in a string
(5 answers)
Closed 9 months ago.
I am using the python module, markovify. I want to make new words instead of making new sentences.
How can I make a function return an output like this?
spacer('Hello, world!') # Should return 'H e l l o , w o r l d !'
I tried the following,
def spacer(text):
for i in text:
text = text.replace(i, i + ' ')
return text
but it returned, 'H e l l o , w o r l d ! ' when I gave, 'Hello, world!'
You can use this one.
def spacer(string):
return ' '.join(string)
print(spacer('Hello,World'))
Or You can change this into.
def spacer(text):
out = ''
for i in text:
out+=i+' '
return out[:-1]
print(spacer("Hello, World"))
(If you want)
You could make the same function into a custom spacer function,
But here you also need to pass how many spaces(Default 1) you want in between.
def spacer(string,space=1):
return (space*' ').join(string)
print(spacer('Hello,World',space=1))
OR FOR CUSTOM SPACES.
def spacer(text,space=1):
out = ''
for i in text:
out+=i+' '*space
return out[:-(space>0) or len(out)]
print(spacer("Hello, World",space=1))
.→ OUTPUT.
H e l l o , W o r l d
The simplest method is probably
' '.join(string)
Since replace works on every instance of a character, you can do
s = set(string)
if ' ' in s:
string = string.replace(' ', ' ')
s.remove(' ')
for c in s:
string = string.replace(c, c + ' ')
if string:
string = string[:-1]
The issue with your original attempt is that you have ox2 and lx3 in your string. Replacing all 'l' with 'l ' leads to l . Similarly for o .
The simplest answer to this question would be to use this:-
"Hello world".replace("", " ")[1:-1]
This code reads as follows:-
Replace every empty substring with a space, and then trim off the trailing spaces.
print(" ".join('Hello, world!'))
Output
H e l l o , w o r l d !

Python, Slicing a string at intervals while keeping the spaces?

I need to select every third letter out of a sentence (starting from the first letter), and print out those letters with spaces in between them.
So it should look like this
Message? cxohawalkldflghemwnsegfaeap
c h a l l e n g e
or
Message? pbaynatnahproarnsm
p y t h o n
I've tried this:
nim = input("Line: ")[::+3]
and it works fine, but I have to keep the spaces between the letters.
Use str.join:
nim = ' '.join(input("Line: ")[::3])
# Line: pbaynatnahproarnsm
print(nim)
Output:
'p y t h o n'
If you want to just print the letters out of sentence with spaces between them, you can use sep= parameter of print() and asterisk *:
print(*input("Line: ")[::3], sep=' ')
Prints:
Line: cxohawalkldflghemwnsegfaeap
c h a l l e n g e

Python - print split line starting at the n-th element

I have a string as the result of a line.split from a file.
How can I write this string to another file starting by the 5th element?
EDIT:
I got this and it works:
for line in data.readlines ()
if not line.startswith ("H"):
s = line.split ()
finaldata.write (" ".join (s [5:]))
finaldata.write ("\n")
The only problem is that i have some empty "cells" in my input and that is messing the output (shifting the data to the left where i the original input has a blank)
How can i do it?
Thanks!
To answer the original question: If you know the element by count you should slice the string. string[5:] would print the 5th character to the end of the line. Slicing has a pretty basic syntax; lets say you have a string
a = "a b c d e f g h"
You can slice "a" from the 5th character like this
>>> a[5:]
' d e f g h'
Slicing syntax is [start:end:step] . so [5:] says start at 5 and include the rest. There are a ton of examples here Understanding Python's slice notation
The second question isn't exactly clear what you're trying to achieve... Here are some examples of common standard string manipulations with inline comments
>>> a = "a b c d e f g h"
>>> a[5] # Access the 5th element of list using the string index
' '
>>> a[5:] # Slice the string from the 5th element to the end
' d e f g h'
>>> a[5::2] # Get every other character from the 5th element to the end
' '
>>> a[6::2] # Get every other character from the 6th element to the end
'defgh'
# Use a list comprehension to remove all spaces from a string
>>> "".join([char for char in a if char != " "])
'abcdefgh'
# remove all spaces and print from the fifth character
>>> "".join([char for char in a if char != " "])[5:]
'fgh'
>>> a.strip(" ") # Strip spaces from the beginning and end
'a b c d e f g h'
>>> a[5:].strip(" ") # slice and strip spaces from both sides
'd e f g h'
>>> a[5:].lstrip(" ") # slice and use a left strip
'd e f g h'
Edit: To add in a comment from another user. if you know the character rather than the position, you can slice from that. Though, if you have duplicate characters you'll have to be careful.
>>> a[a.index("e"):] # Slice from the index of character
'e f g h'
>>> b = "a e b c d e f g h e"
>>> b[b.index("e"):]
'e b c d e f g h e'

How to copy spaces from one string to another in Python?

I need a way to copy all of the positions of the spaces of one string to another string that has no spaces.
For example:
string1 = "This is a piece of text"
string2 = "ESTDTDLATPNPZQEPIE"
output = "ESTD TD L ATPNP ZQ EPIE"
Insert characters as appropriate into a placeholder list and concatenate it after using str.join.
it = iter(string2)
output = ''.join(
[next(it) if not c.isspace() else ' ' for c in string1]
)
print(output)
'ESTD TD L ATPNP ZQ EPIE'
This is efficient as it avoids repeated string concatenation.
You need to iterate over the indexes and characters in string1 using enumerate().
On each iteration, if the character is a space, add a space to the output string (note that this is inefficient as you are creating a new object as strings are immutable), otherwise add the character in string2 at that index to the output string.
So that code would look like:
output = ''
si = 0
for i, c in enumerate(string1):
if c == ' ':
si += 1
output += ' '
else:
output += string2[i - si]
However, it would be more efficient to use a very similar method, but with a generator and then str.join. This removes the slow concatenations to the output string:
def chars(s1, s2):
si = 0
for i, c in enumerate(s1):
if c == ' ':
si += 1
yield ' '
else:
yield s2[i - si]
output = ''.join(char(string1, string2))
You can try insert method :
string1 = "This is a piece of text"
string2 = "ESTDTDLATPNPZQEPIE"
string3=list(string2)
for j,i in enumerate(string1):
if i==' ':
string3.insert(j,' ')
print("".join(string3))
outout:
ESTD TD L ATPNP ZQ EPIE

re to find longest matching postfix of two strings

I have two strings like:
a = '54515923333558964'
b = '48596478923333558964'
Now the longest postfix match is
c = '923333558964'
what will be a solution using re?
Here is a solution I found for prefix match:
import re
pattern = re.compile("(?P<mt>\S*)\S*\s+(?P=mt)")
a = '923333221486456'
b = '923333221486234567'
c = pattern.match(a + ' ' + b).group('mt')
Try the difflib.SequenceMatcher:
import difflib
a = '54515923333558964'
b = '48596478923333558964'
s = difflib.SequenceMatcher(None, a, b)
m = s.find_longest_match(0, len(a), 0, len(b))
print a[m.a:m.a+m.size]
You can use this variation of the regex pattern:
\S*?(?P<mt>\S*)\s+\S*(?P=mt)$
EDIT.
Note, however, that this may require O(n3) time with some inputs. Try e.g.
a = 1000 * 'a'
b = 1000 * 'a' + 'b'
This takes one second to process on my system.

Categories