Remove String between two characters for all occurrences - python

I am looking for help on string manipulation in Python 3.
Input String
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
Desired Output
s = "ID,FIRST_NM,LAST_NM,FILLER1"
Basically, the objective is to remove anything between space and comma at all occurrences in the input string.
Any help is much appreciated

using simple regex
import re
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
res = re.sub('\s\w+', '', s)
print(res)
# output ID,FIRST_NM,LAST_NM,FILLER1

You can use regex
import re
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
s = ','.join(re.findall('\w+(?= \w+)', s))
print(s)
Output:
ID,FIRST_NM,LAST_NM,FILLER1

Related

Python Regex find all matches after specific word

I have a string as below
"Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
I'm currently struggling to write a function in python that will find all aliases and put them in a list. So basically, I need a list that will be ['alias1.myserver.mysite.com', 'myserver.mysite.com']
I tried the following code
pattern = '(?<=Aliases: )([\S*]+)'
name = re.findall(pattern, mystring)
but it only matches the first alias and not both of them.
Any ideas on this?
Greatly appreciated!
Try the following:
import re
s = "Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
l = re.findall(r'\S+', s.split('Aliases: ')[1])
print(l)
Prints:
['alias1.myserver.mysite.com', 'myserver.mysite.com']
Explanation
First we split the string into two pieces and keep the second piece with s.split('Aliases: ')[1]. This evaluates to the part of the string that follows 'Aliases: '.
Next we use findall with the regaular expression:
\S+
This matches all consecutive strings of one or more non-space characters.
But this can be more simply done in this case without using a regex:
s = "Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
l = s.split('Aliases: ')[1].split()
print(l)
Try this :
import re
regex = re.compile(r'[\n\r\t]')
t="Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
t = regex.sub(" ", t)
t = t.split("Aliases:")[1].strip().split()
print(t)

How to start at a specific letter and end when it hits a digit?

I have some sample strings:
s = 'neg(able-23, never-21) s2-1/3'
i = 'amod(Market-8, magical-5) s1'
I've got the problem where I can figure out if the string has 's1' or 's3' using:
word = re.search(r's\d$', s)
But if I want to know if the contains 's2-1/3' in it, it won't work.
Is there a regex expression that can be used so that it works for both cases of 's#' and 's#+?
Thanks!
You can allow the characters "-" and "/" to be captured as well, in addition to just digits. It's hard to tell the exact pattern you're going for here, but something like this would capture "s2-1/3" from your example:
import re
s = "neg(able-23, never-21) s2-1/3"
word = re.search(r"s\d[-/\d]*$", s)
I'm guessing that maybe you would want to extract that with some expression, such as:
(s\d+)-?(.*)$
Demo 1
or:
(s\d+)-?([0-9]+)?\/?([0-9]+)?$
Demo 2
Test
import re
expression = r"(s\d+)-?(.*)$"
string = """
neg(able-23, never-21) s211-12/31
neg(able-23, never-21) s2-1/3
amod(Market-8, magical-5) s1
"""
print(re.findall(expression, string, re.M))
Output
[('s211', '12/31'), ('s2', '1/3'), ('s1', '')]

How can i find a pattern in regex?

I want to find a pattern and replace it with another
Suppose i have:
"Name":"hello"
And want to do this
Name= "hello"
Using python regex
The string could be anything inside double quotes so i need to find pattern "": "" and replace it with =""
This expression,
^"\s*([^"]+?)\s*"\s*:\s*"?([^"]+)"?$
has two capturing groups:
([^"]+?)
for collecting our desired data. Then, we would simply re.sub.
In this demo, the expression is explained, if you might be interested.
Test
import re
result = re.sub('^"\s*([^"]+?)\s*"\s*:\s*"?([^"]+)"?$', '\\1= "\\2"', '" Name ":" hello "')
print(result)
Why not use this regex:
import re
s = '"Name":"hello"'
print(re.sub('"(.*)":"(.*)"', '\\1= \"\\2\"', s))
Output:
Name= "hello"
Explanation here.
For strings containing more than one of those kind of strings, you would need to add some python code to it:
import re
s = '"Name":"hello", "Name2":"hello2"'
print(re.sub('"(.*?)":"(.*?)"', '\\1= \"\\2\"', s))
Output:
Name= "hello", Name2= "hello2"
Using pure Python, this is as simple as:
s = '"Name":"hello"'
print(s.replace(':', '= ').replace('"', '', 2))
# Name= "hello"

Converting regex whitespace characters from list into string

So i want to convert regex whitespaces into a string for example
list1 = ["Hello","\s","my","\s","name","\s","is"]
And I want to convert it to a string like
"Hello my name is"
Can anyone please help.
But also if there was characters such as
"\t"
how would i do this?
list = ["Hello","\s","my","\s","name","\s","is"]
str1 = ''.join(list).replace("\s"," ")
Output :
>>> str1
'Hello my name is'
Update :
If you have something like this list1 = ["Hello","\s","my","\s","name","\t","is"] then you can use multiple replace
>>> str1 = ''.join(list).replace("\s"," ").replace("\t"," ")
>>> str1
'Hello my name is'
or if it's only \t
str1 = ''.join(list).replace("\t","anystring")
I would highly recommend using the join string function mentioned in one of the earlier answers, as it is less verbose. However, if you absolutely needed to use regex in order to complete the task, here's the answer:
import re
list1 = ["Hello","\s","my","\s","name","\s","is"]
list_str = ''.join(list1)
updated_str = re.split('\\\s', list_str)
updated_str = ' '.join(updated_str)
print(updated_str)
Output is:
'Hello my name is'
In order to use raw string notation, replace the 5th line of code with the one below:
updated_str = re.split(r'\\s', list_str)
Both will have the same output result.
You don't even need regular expressions for that:
s = ' '.join([item for item in list if item is not '\s'])
Please note that list is an invalid name for a variable in python as it conflicts with the list function.

Breaking up substrings in Python based on characters

I am trying to write code that will take a string and remove specific data from it. I know that the data will look like the line below, and I only need the data within the " " marks, not the marks themselves.
inputString = 'type="NN" span="123..145" confidence="1.0" '
Is there a way to take a Substring of a string within two characters to know the start and stop points?
You can extract all the text between pairs of " characters using regular expressions:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
pat=re.compile('"([^"]*)"')
while True:
mat=pat.search(inputString)
if mat is None:
break
strings.append(mat.group(1))
inputString=inputString[mat.end():]
print strings
or, easier:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
strings=re.findall('"([^"]*)"', inputString)
print strings
Output for both versions:
['NN', '123..145', '1.0']
fields = inputString.split('"')
print fields[1], fields[3], fields[5]
You could split the string at each space to get a list of 'key="value"' substrings and then use regular expressions to parse the substrings.
Using your input string:
>>> input_string = 'type="NN" span="123..145" confidence="1.0" '
>>> input_string_split = input_string.split()
>>> print input_string_split
[ 'type="NN"', 'span="123..145"', 'confidence="1.0"' ]
Then use regular expressions:
>>> import re
>>> pattern = r'"([^"]+)"'
>>> for substring in input_string_split:
match_obj = search(pattern, substring)
print match_obj.group(1)
NN
123..145
1.0
The regular expression '"([^"]+)"' matches anything within quotation marks (provided there is at least one character). The round brackets indicate the bit of the regular expression that you are interested in.

Categories