Might be a noob question, but how could I make a for statement for each spaces in a string?
text = "Hello World, this is a string!"
for spaces in text
# do blah blah
Take each character in the for loop and check it is space or not with if the condition
text = "Hello World, this is a string!"
for character in text:
if character == " ":
# do blah blah
Related
I have a string
Some sentance startx here blah blah [Example](https://someSite.com/another/blah/blah)
and I want this string to become this one:
Some sentance startx here blah blah Example
I have tried this regex:
"[\[\]]\(\S*(https|http)*\.(ru|com)\S*"
but I get this:
Some sentance startx here blah blah [Example
The code:
pattern = r"[\[\]]\(\S*(https|http)*\.(ru)\S*"
text = re.sub(pattern, '', text)
maybe like this:
string = 'Some sentance startx here blah blah [Example](https://someSite.com/another/blah/blah)'
string = string.split("]")[0].replace("[","")
print(string)
Use
\[([^][]*)]\(http[^\s()]*\)
Replace with \1.
See regex proof.
Python code snippet:
text = re.sub(r'\[([^][]*)]\(http[^\s()]*\)', r'\1', text)
I need data to train a bot, so I have scraped SO questions. How can I replace new lines without removing \n from strings?
If I have the following string:
"""You can use \n to print a new line.
Text text text."""
How can I get: You can use \n to print a new line. Text text text.
I've tried this: string.replace("\n","")
But I end up with: 'You can use to print a new line.Text text text.'
Since I'm dealing with programming questions, I'm destined to run into \n in a string and wouldn't want to replace that.
you could print it as a real string
this is done with the letter r
example 1:
print(r"You can use \n to print a new line.")
# You can use \n to print a new line.
this will not remove it, but make it visible as you want in the output
example 2:
text = r"You can use \n to print a new line."
print(text)
# You can use \n to print a new line.
If you are printing the string and the output is:
You can use \n to print a new line.
Text text text.
then the \n visible in the output is actually the backslash character followed by the letter n, and not a newline character.
Doing replace("\n", "") should not remove the sequence of characters \n, because the replace pattern "\n" itself is not the sequence of characters \n, but rather the actual single newline character. So it does not match the \n sequence of characters visible in your string, but it does match (and replace) the newline characters.
This REPL snippet illustrates that:
>>> x = """You can use \\n to print a new line.\n\nText text text.""" # this string literal is how you would create the string you have shown in you question.
>>> x == r"""You can use \n to print a new line.
...
... Text text text.""" # or you can use a raw string literal to initialize your variable, it is exactly the same thing
True
>>> print(x)
You can use \n to print a new line.
Text text text.
>>> print(x.replace("\n", ""))
You can use \n to print a new line.Text text text.
If you mean that you are creating a string with the literal:
"""You can use \n to print a new line.
Text text text."""
Then it is impossible to distinguish between the typed \n and the result of pressing the Enter key in your string literal (unless you use a raw string initializer, as other answers have explained). Once the code is interpreted by Python they are identical. Consider escaping the newline character in your literal to have it included in your string as is:
myString = """You can use \\n to print a new line.
Text text text."""
If you want to convert new lines to literal string \n, you can escape the slash character:
string.replace("\n","\\n")
The \n in your string is an escape sequence that gets evaluated to the newline character.
In [1]: s = """You can use \n to print a new line.
...:
...: Text text text."""
In [2]: print(s)
You can use
to print a new line.
Text text text.
If you want to actually include the characters \ and n in your string, you need to escape the backslash with another backslash.
In [3]: s = """You can use \\n to print a new line.
...:
...: Text text text."""
In [4]: print(s)
You can use \n to print a new line.
Text text text.
In [5]: print(s.replace("\n", ""))
You can use \n to print a new line.Text text text.
Alternatively, you could use a "raw string", i.e. a string prefixed with r, e.g. r"..." or r"""...""" but then you would no longer be able to use escape sequences such as \n to insert a newline character, \t to insert a tab, etc.
I have a string that contains words or phrases that are enclosed in double quotes and I need to remove them from quotes., in python. Example:
The text has "single quotes" and "commas".
The text has "double quotes".
removing the words from the quotes results in this:
The text has " " and " ".
The text has " ".
I used the RE re.finditer that lists all the quotes found, but I know how it would be to remove the words that exist between the quotes in the string. Anybody know?
>> from re import sub
>> s
'The text has "single quotes" and "commas".'
>> sub('".*?"', '" "',s)
'The text has " " and " ".'
A bit complicated, but maybe,
(?<=")[^\s".][^"\r\n]*|[^"\r\n]*[^\s".](?=")
might be OK to look into.
RegEx Demo
This pattern would probably fail on some edge cases, which you'd likely want to look into:
[^\s".]
Test
import re
string = '''
The text has "single quotes" and "commas".
The text has "double quotes"
"single quotes" and "commas"
"double quotes"
"d"
"d""d""d""d"
'''
expression = r'(?<=")[^\s".][^"\r\n]*|[^"\r\n]*[^\s".](?=")'
print(re.sub(expression, '', string))
Output
The text has "" and "".
The text has ""
"" and ""
""
""
""""""""
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Take a look at this simple regex:
"[\w\s]+"
Regex Demo
We capture any word characters and possible spaces between " ", and then replace with "":
expression = r'"[\w\s]+"'
print(re.sub(expression, '""', string))
you can use this code. Hope it helps.
text = 'The text has "single quotes" and "commas".'
text = re.sub('"[^"]*[$"]', '""', text)
print(text) # The text has "" and "".
I have a large file (f) with a lot of dialogue. I need a regex that will concatenate the split quotes (i.e. "Hello," Josh said enthusiastically, "I have a question!"), but not delete the middle portion. So, for this example, the output would be, "Hello, I have a question!" and then "Josh said enthusiastically" would be retained somewhere. I think I am on the right track, but haven't found something that works for these specifications. Here is the code I have already tried out:
for line in f:
re.findall(r'"(.*?)"', line)
output_file.write(line)
and
split = re.compile(r'''
(,\")
(.*?)
(,)
( )
(")''', re.VERBOSE)
for line in f:
m = split_quote.match(split)
if m:
output_file.write(m.group(1) + m.group(5))
Thank you for any help!
How about something like this?
/(".+?)"(.+?),\s+?"(.+?[.?!]+")/g
Then replace the capture groups in this order:
$1 $3$2.
like so:
m.group(1) + " " + m.group(3) + m.group(2) + "."
Example:
"Hello," Josh said enthusiastically, "I have a question!"
to
"Hello, I have a question!" Josh said enthusiastically.
Explanation:
http://bsite.cc/inoD/Screen%20Shot%202017-01-18%20at%206.01.22%20PM.png
First part matches a ", and then any characters until it sees another quote.
"Hello,"| Josh said enthusiastically, "I have a question!"
Second part matches text in the middle of the quotes, until it reaches a comma (also matches whitespace after comma and the first quote)
"Hello," Josh said enthusiastically, | "I have a question!"
Third group matches until the next quote
"Hello," Josh said enthusiastically, "I have a question!"
Try this regex:
(?<=\")([^\s].*?[^\s])(?=\")|(?<=\")\s(.*?)\s(?=\")
The regex above will match these two strings: Hello, and I have a question! in group 1, which will make you able to print them together. The same regex will distinguish this portion Josh said enthusiastically, and match it in group 2 which will be handy in case you've decided to use it later.
Check out demo: https://regex101.com/r/m7nqnu/1
This is a working Python code:
import re
text = '''"Hello," Josh said enthusiastically, "I have a question!"'''
print ('Group 1: ')
for m in re.finditer(r"(?<=\")([^\s].*?[^\s])(?=\")|(?<=\")\s(.*?)\s(?=\")", text):
if m.group(1) is not None:
print('%s ' % (m.group(1)))
print ('<br />Group 2: ')
for m in re.finditer(r"(?<=\")([^\s].*?[^\s])(?=\")|(?<=\")\s(.*?)\s(?=\")", text):
if m.group(2) is not None:
print('%s ' % (m.group(2)))
Output:
Group 1: Hello, I have a question!
Group 2: Josh said enthusiastically,
As long as there are no quotes within quotes, and all quotes properly match, and the phrase always consists of two quoted parts with an unquoted part in the middle:
parts = [x.strip() for x in re.findall(r'"([^"]+)', text)]
print(parts[0] + " " + parts[2])
# Hello, I have a question!
print(parts[1])
# Josh said enthusiastically,
I am very new a Python
I want to change sentence if there are repeated words.
Correct
Ex. "this just so so so nice" --> "this is just so nice"
Ex. "this is just is is" --> "this is just is"
Right now am I using this reg. but it do all so change on letters.
Ex. "My friend and i is happy" --> "My friend and is happy" (it remove the "i" and space) ERROR
text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row
How can I do the same change but instead of letters it have to check on words?
text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row
The \b matches the empty string, but only at the beginning or end of a word.
Non- regex solution using itertools.groupby:
>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice"
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'
\b: Matches Word Boundaries
\w: Any word character
\1: Replaces the matches with the second word found
import re
def Remove_Duplicates(Test_string):
Pattern = r"\b(\w+)(?:\W\1\b)+"
return re.sub(Pattern, r"\1", Test_string, flags=re.IGNORECASE)
Test_string1 = "Good bye bye world world"
Test_string2 = "Ram went went to to his home"
Test_string3 = "Hello hello world world"
print(Remove_Duplicates(Test_string1))
print(Remove_Duplicates(Test_string2))
print(Remove_Duplicates(Test_string3))
Result:
Good bye world
Ram went to his home
Hello world