Python: split string by closing bracket and write in new line [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have a text file that looks like this:
This is one sentence()This is another sentence()This is a full sentence at all)Maybe this too)This is the last sentence()
I need to split the parts that the text looks like this:
This is one sentence()
This is another sentence()
This is a full sentence at all)
Maybe this too)
This is the last sentence()
I tried it with regex and the help of https://regex101.com/r/sH8aR8/5#python but I can't find any solution. Any ideas?

You don't need a regex just str.replace any closing paren with a closing paren followed by a newline:
s="This is one sentence()This is another sentence()This is a full sentence at all)Maybe this too)This is the last sentence()"
print(s.replace(")",")\n"))
Output:
This is one sentence()
This is another sentence()
This is a full sentence at all)
Maybe this too)
This is the last sentence()

You can search using this lookbehind regex:
r'(?<=\))'
and replace by "\n"
RegEx Demo
Code:
input = u"This is one sentence()This is another sentence()This is a full sentence at all)Maybe this too)This is the last sentence()"
result = re.sub(ur'(?<=\))', "\n", input)

Related

is it correct? i hope someone can help me hehe [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Write a Python program that will search for lines that start with 'F', followed by 2 characters, followed by 'm:' using the mbox-short.txt text file.
Write a Python program that will search for lines that start with From and have an # sign
My code:
import re
file_hand = open("mbox-short.txt")
for line in file_hand:
line = line.rstrip()
if re.search('From:', line):
print(line)
your code seems to lack the actual regular expression that will find the result you are looking for. If I understand correctly, your aim is to find lines starting with F, followed by ANY two characters. If this is the case, you wish to print the line to the terminal. Let me guide you:
import re
file_hand = open("mbox-short.txt")
for line in file_hand: #NB: After a new scope is entered, use indentation
result = re.search("$f..", line) #pattern, search string
#$ matches character before the first in a line
#. matches 1 occurence of any character
if result.group() != "": #access result of re.search with group() method
print(line)
I trust you can follow this. If you need capital F, I will leave it as a homework exercise for you to find out how to do the capital F.
You can practice with regexp here:
https://regexr.com/
Or read more about it here:
https://www.youtube.com/watch?v=rhzKDrUiJVk
I think you didn't ask your question clear enough for everybody to understand. Also, insert your code for better readability ('Code Sample'). I already did that with your code, so you can have a look at that.

Add missing full-stops at the end of a text block [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Currently, I'm trying to prepare some texts for my machine learning task in python3.
The input data is a single long string and has the following format:
<SPEAKER gender="female" id="1" name="unknown"> sentence_1. sentence_2? ... sentence_n, </SPEAKER><SPEAKER gender="male" id="2" name="unknown"> sentence_1. sentence_2? ... sentence_n </SPEAKER><SPEAKER gender="female" id="1" name="unknown"> sentence_1. sentence_2? ... sentence_n; </SPEAKER> ...
It consists of multiple "text blocks", starting <SPEAKER ...> and ending </SPEAKER> with tags.
As you can see, sometimes the last sentence within a block (sentence_n) is missing a full-stop . or the sentence end with a comma , or semicolon ;.
The current problem is, when I cleanse the provided string and delete the tags, the last sentence (sentence_n) of a block and the first sentence (sentence_1) of the following block merge. I just want to avoid this. I want to the sentences to end with punctuation to be able to split the total string sentence-wise in my later text preprocessing steps.
Therefore, I would like to check the LAST character of the LAST sentence (sentence_n) of every block and
add a full-stop if it's missing
replace a comma or semicolon with full-stop
if a full-stop already exists, just keep it
Thank you very much in advance!
Edit1: It does not have to be a regex solution. Since I handle thousands of such strings, performance is still important.
Edit2: Specified the question.
You can indeed use a regular expression:
import re
s = re.sub(r"([;,.])?(\s*</SPEAKER>)", r".\2", s)
This captures the ;, , or . when it is the last non-white-space character in the tag, or -- if not possible -- captures the empty string at the spot where the point should occur. In either case it replaces that capture with a point.
Then apply your solution for removing the tags.

Need to get substring inside a string Forexample : the world is very beauty by describing color:blue and water,air,fire etc, [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Forexample : the world is very beauty by describing color:blue and water,air,fire etc,.
I need to get the word next to word "color".
For that I did small python script by getting the word "describing" index and also the index of word "and".So I can able to print within the index range there color:blue.
But In some cases, the word before "color" will dynamically change and after the word "blue" also dynamically change.So, in this scenraio, I am struggling here to how to put the regular expression to get the word "blue".
And the word next to color:blue.sometimes be like color:green.
I am just updating the question like,
I have set of string like For example "HELLO:rosa I am fine and you Good_Morning:U look very beauty temple:Will be in town.
Here I need to extract the string next to the word "Good_morning:"
SO if,
Input: "HELLO:rosa I am fine and you GOOD_MORNING:U look very beauty TEMPLE:Will be in town"
Output:U look very beauty
So the script need to do search for the string next to GOOD_MORNING which is small letters and it need to stop before the next Capital WORD(TEMPLE).and print that small letters alone there.
For that I did python script to get that string next to GOOD_MORNING but getting the whole string next to GOOD_MORNING like :U look very beauty TEMPLE:Will be in town" but not" U look very beauty".
You might try color:(\w+\b). It will always find the word next to color: (no spaces allowed).
https://regex101.com/r/IOkH1j/1

Replace Substring with another but only if a certain substring follows it [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I'm trying to replace the substring 'gta' with substring 'cat'. But the condition is that 'gta' immediately has to be followed by substring 'dog'.
Example: 'gtagtadogcat' would become 'gtacatdogcat'
The part I'm struggling with is trying to write the program to find 'gta' and validate that 'dog' is behind it and if true, change 'gta' to 'cat'.
>>> 'gtagtadogcat'.replace('gta'+'dog', 'cat'+'dog')
'gtacatdogcat'
old_string = 'gtagtadogcat'
print(old_string.replace('gtacat','dogcat'))
output: gtagtadogcat
You could use regex:
re.sub('gta(dog)', r'cat\1', 'gtagtadogcat')
Output:
'gtacatdogcat'
*Edit: You would not need a forloop if you put in the whole string. Here is an example:
re.sub('gta(dog)', r'cat\1', 'gtagtadogcat_moretextgta_lastgtadog')
Output:
'gtacatdogcat_moretextgta_lastcatdog'

How to write a regex to capture letters separated by punctuation in Python 3? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am new to regex and encountered a problem. I need to parse a list of last names and first names to use in a url and fetch an html page. In my last names or first names, if it's something like "John, Jr" then it should only return John but if it's something like "J.T.R", it should return "JTR" to make the url work. Here is the code I wrote but it doesn't capture "JTR".
import re
last_names_parsed=[]
for ln in last_names:
L_name=re.match('\w+', ln)
last_names_parsed.append(L_name[0])
However, this will not capture J.T.R properly. How should I modify the code to properly handle both?
you can add \. to the regular expression:
import re
final_data = [re.sub('\.', '', re.findall('(?<=^)[a-zA-Z\.]+', i)[0]) for i in last_names]
Regex explanation:
(?<=^): positive lookbehind, ensures that the ensuring regex will only register the match if the match is found at the beginning of the string
[a-zA-Z\.]: matches any occurrence of alphabetical characters: [a-zA-Z], along with a period .
+: searches the previous regex ([a-zA-Z\.]) as long as a period or alphabetic character is found. For instance, in "John, Jr", only John will be matched, because the comma , is not included in the regex expression [a-zA-Z\.], thus halting the match.

Categories