Python match whitespaces - python

Im trying to remove multiple white-spaces in a string. I've read about regular expressions in python langauge and i've tried to make it match all white-sapces in the string, but no success. The return msg part returns empty:
CODE
import re
def correct(string):
msg = ""
fmatch = re.match(r'\s', string, re.I|re.L)
if fmatch:
msg = fmatch.group
return msg
print correct("This is very funny and cool.Indeed!")

To accomplish this task, you can instead replace consecutive whitespaces with a single space character, for example, using re.sub.
Example:
import re
def correct(string):
fmatch = re.sub(r'\s+', ' ', string)
return fmatch
print correct("This is very funny and cool.Indeed!")
The output will be:
This is very funny and cool.Indeed!

re.match matches only at the beginning of the string. You need to use re.search instead.

Maybe this code helps you?
import re
def correct(string):
return " ".join(re.split(' *', string))

One line no direct import
ss= "This is very funny and cool.Indeed!"
ss.replace(" ", " ")
#ss.replace(" ", " "*2)
#'This is very funny and cool.Indeed!'
Or, as the question states:
ss= "This is very funny and cool.Indeed!"
ss.replace(" ", "")
#'Thisisveryfunnyandcool.Indeed!'

Related

How to insert quotes around a string in the middle of another string

I need to change this string:
input_str = '{resourceType=Type, category=[{coding=[{system=http://google.com, code=item, display=Item}]}]}'
To json format:
output_str = '{"resourceType":"Type", "category":[{"coding":[{"system":"http://google.com", "code":"item", "display":"Item"}]}]}'
Changing the equal sign "=" to colon ":" is quite easy by using replace function:
input_str.replace("=", ":")
But adding quotes before and after each value / word is something that I can't find the solution for
I suggest to surround with double quotes any sequence of characters that are not reserved in your markup. I also made a provision for escaped double quotes, and you can add more escaped symbols to it:
import re
input_str = '{resourceType=Type, category=[{coding=[{system=http://google.com, code=item, display=Item}]}]}'
output_str = re.sub (r'(([^=([\]{},\s]|\")+)', r'"\1"', input_str).replace('=', ':')
print (output_str)
Output:
{"resourceType":"Type", "category":[{"coding":[{"system":"http://google.com", "code":"item", "display":"Item"}]}]}
You can use this function for the conversion.
def to_json(in_str):
return in_str.replace('{', '{"').replace('=', '":"').replace(',', '", "').replace('[', '[').replace('}', '"}').replace(']', ']').replace('" ', '"').replace(':"[', ':[').replace(']"', ']')
this works correctly for the input you have mentioned.
print(to_json(input_str))
#output = {"resourceType":"Type", "category":[{"coding":[{"system":"http://google.com", "code":"item", "display":"Item"}]}]}
Regex is certainly more concise and efficient but, just for the fun, it's also possible using replace :
input_str = input_str.replace("=", "\":\"")
input_str = input_str.replace("=[", "\":[")
input_str = input_str.replace(", ", "\", \"")
input_str = input_str.replace("{", "{\"")
input_str = input_str.replace("}", "\"}")
input_str = input_str.replace("]\"}", "]}")
input_str = input_str.replace("\"[", "[")
print(input_str) #=> '{"resourceType":"Type", "category":[{"coding":[{"system":"http://google.com", "code":"item", "display":"Item"}]}]}'

Python Regex Replaces All Matches

I have a string such as "Hey people #Greetings how are we? #Awesome" and every time there is a hashtag I need to replace the word with another string.
I have the following code which works when only one hashtag but the problem is that because it uses the sub to replace all instances, it overwrites the every string with the last string.
match = re.findall(tagRE, content)
print(match)
for matches in match:
print(matches)
newCode = "The result is: " + matches + " is it correct?"
match = re.sub(tagRE, newCode, content)
What should I be doing instead to replace just the current match? Is there a way of using re.finditer to replace the current match or another way?
Peter's method would work. You could also just supply the match object as the regex string so that it only replaces that specific match. Like so:
newCode = "whatever" + matches + "whatever"
content = re.sub(matches, newCode, content)
I ran some sample code and this was the output.
import re
content = "This is a #wonderful experiment. It's #awesome!"
matches = re.findall('#\w+', content)
print(matches)
for match in matches:
newCode = match[1:]
print(content)
content = re.sub(match, newCode, content)
print(content)
#['#wonderful', '#awesome']
#This is a #wonderful experiment. It's #awesome!
#This is a wonderful experiment. It's #awesome!
#This is a wonderful experiment. It's #awesome!
#This is a wonderful experiment. It's awesome!
You can try like this:
In [1]: import re
In [2]: s = "Hey people #Greetings how are we? #Awesome"
In [3]: re.sub(r'(?:^|\s)(\#\w+)', ' replace_with_new_string', s)
Out[3]: 'Hey people replace_with_new_string how are we? replace_with_new_string'

Remove all newlines from inside a string

I'm trying to remove all newline characters from a string. I've read up on how to do it, but it seems that I for some reason am unable to do so. Here is step by step what I am doing:
string1 = "Hello \n World"
string2 = string1.strip('\n')
print string2
And I'm still seeing the newline character in the output. I've tried with rstrip as well, but I'm still seeing the newline. Could anyone shed some light on why I'm doing this wrong? Thanks.
strip only removes characters from the beginning and end of a string. You want to use replace:
str2 = str.replace("\n", "")
re.sub('\s{2,}', ' ', str) # To remove more than one space
As mentioned by #john, the most robust answer is:
string = "a\nb\rv"
new_string = " ".join(string.splitlines())
Answering late since I recently had the same question when reading text from file; tried several options such as:
with open('verdict.txt') as f:
First option below produces a list called alist, with '\n' stripped, then joins back into full text (optional if you wish to have only one text):
alist = f.read().splitlines()
jalist = " ".join(alist)
Second option below is much easier and simple produces string of text called atext replacing '\n' with space;
atext = f.read().replace('\n',' ')
It works; I have done it. This is clean, easier, and efficient.
strip() returns the string after removing leading and trailing whitespace. see doc
In your case, you may want to try replace():
string2 = string1.replace('\n', '')
or you can try this:
string1 = 'Hello \n World'
tmp = string1.split()
string2 = ' '.join(tmp)
This should work in many cases -
text = ' '.join([line.strip() for line in text.strip().splitlines() if line.strip()])
text = re.sub('[\r\n]+', ' ', text)
strip() returns the string with leading and trailing whitespaces(by default) removed.
So it would turn " Hello World " to "Hello World", but it won't remove the \n character as it is present in between the string.
Try replace().
str = "Hello \n World"
str2 = str.replace('\n', '')
print str2
If the file includes a line break in the middle of the text neither strip() nor rstrip() will not solve the problem,
strip family are used to trim from the began and the end of the string
replace() is the way to solve your problem
>>> my_name = "Landon\nWO"
>>> print my_name
Landon
WO
>>> my_name = my_name.replace('\n','')
>>> print my_name
LandonWO

Python Regular expression must strip whitespace except between quotes

I need a way to remove all whitespace from a string, except when that whitespace is between quotes.
result = re.sub('".*?"', "", content)
This will match anything between quotes, but now it needs to ignore that match and add matches for whitespace..
I don't think you're going to be able to do that with a single regex. One way to do it is to split the string on quotes, apply the whitespace-stripping regex to every other item of the resulting list, and then re-join the list.
import re
def stripwhite(text):
lst = text.split('"')
for i, item in enumerate(lst):
if not i % 2:
lst[i] = re.sub("\s+", "", item)
return '"'.join(lst)
print stripwhite('This is a string with some "text in quotes."')
Here is a one-liner version, based on #kindall's idea - yet it does not use regex at all! First split on ", then split() every other item and re-join them, that takes care of whitespaces:
stripWS = lambda txt:'"'.join( it if i%2 else ''.join(it.split())
for i,it in enumerate(txt.split('"')) )
Usage example:
>>> stripWS('This is a string with some "text in quotes."')
'Thisisastringwithsome"text in quotes."'
You can use shlex.split for a quotation-aware split, and join the result using " ".join. E.g.
print " ".join(shlex.split('Hello "world this is" a test'))
Oli, resurrecting this question because it had a simple regex solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
Here's the small regex:
"[^"]*"|(\s+)
The left side of the alternation matches complete "quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expression on the left.
Here is working code (and an online demo):
import re
subject = 'Remove Spaces Here "But Not Here" Thank You'
regex = re.compile(r'"[^"]*"|(\s+)')
def myreplacement(m):
if m.group(1):
return ""
else:
return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
Here little longish version with check for quote without pair. Only deals with one style of start and end string (adaptable for example for example start,end='()')
start, end = '"', '"'
for test in ('Hello "world this is" atest',
'This is a string with some " text inside in quotes."',
'This is without quote.',
'This is sentence with bad "quote'):
result = ''
while start in test :
clean, _, test = test.partition(start)
clean = clean.replace(' ','') + start
inside, tag, test = test.partition(end)
if not tag:
raise SyntaxError, 'Missing end quote %s' % end
else:
clean += inside + tag # inside not removing of white space
result += clean
result += test.replace(' ','')
print result

How to replace whitespaces with underscore?

I want to replace whitespace with underscore in a string to create nice URLs. So that for example:
"This should be connected"
Should become
"This_should_be_connected"
I am using Python with Django. Can this be solved using regular expressions?
You don't need regular expressions. Python has a built-in string method that does what you need:
mystring.replace(" ", "_")
Replacing spaces is fine, but I might suggest going a little further to handle other URL-hostile characters like question marks, apostrophes, exclamation points, etc.
Also note that the general consensus among SEO experts is that dashes are preferred to underscores in URLs.
import re
def urlify(s):
# Remove all non-word characters (everything except numbers and letters)
s = re.sub(r"[^\w\s]", '', s)
# Replace all runs of whitespace with a single dash
s = re.sub(r"\s+", '-', s)
return s
# Prints: I-cant-get-no-satisfaction"
print(urlify("I can't get no satisfaction!"))
This takes into account blank characters other than space and I think it's faster than using re module:
url = "_".join( title.split() )
Django has a 'slugify' function which does this, as well as other URL-friendly optimisations. It's hidden away in the defaultfilters module.
>>> from django.template.defaultfilters import slugify
>>> slugify("This should be connected")
this-should-be-connected
This isn't exactly the output you asked for, but IMO it's better for use in URLs.
Using the re module:
import re
re.sub('\s+', '_', "This should be connected") # This_should_be_connected
re.sub('\s+', '_', 'And so\tshould this') # And_so_should_this
Unless you have multiple spaces or other whitespace possibilities as above, you may just wish to use string.replace as others have suggested.
use string's replace method:
"this should be connected".replace(" ", "_")
"this_should_be_disconnected".replace("_", " ")
Python has a built in method on strings called replace which is used as so:
string.replace(old, new)
So you would use:
string.replace(" ", "_")
I had this problem a while ago and I wrote code to replace characters in a string. I have to start remembering to check the python documentation because they've got built in functions for everything.
Surprisingly this library not mentioned yet
python package named python-slugify, which does a pretty good job of slugifying:
pip install python-slugify
Works like this:
from slugify import slugify
txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")
txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")
txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")
txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")
txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")
txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")
You can try this instead:
mystring.replace(r' ','-')
I'm using the following piece of code for my friendly urls:
from unicodedata import normalize
from re import sub
def slugify(title):
name = normalize('NFKD', title).encode('ascii', 'ignore').replace(' ', '-').lower()
#remove `other` characters
name = sub('[^a-zA-Z0-9_-]', '', name)
#nomalize dashes
name = sub('-+', '-', name)
return name
It works fine with unicode characters as well.
mystring.replace (" ", "_")
if you assign this value to any variable, it will work
s = mystring.replace (" ", "_")
by default mystring wont have this
OP is using python, but in javascript (something to be careful of since the syntaxes are similar.
// only replaces the first instance of ' ' with '_'
"one two three".replace(' ', '_');
=> "one_two three"
// replaces all instances of ' ' with '_'
"one two three".replace(/\s/g, '_');
=> "one_two_three"
x = re.sub("\s", "_", txt)
perl -e 'map { $on=$_; s/ /_/; rename($on, $_) or warn $!; } <*>;'
Match et replace space > underscore of all files in current directory

Categories