How can i include the separator when calling split() - python

I have a list of phone numbers like so:
numbers=[
‘(080)3453421256’,
‘(04256)6679345390’,
‘(022)1135643320‘]
and i have to get the prefixes of those numbers which have different lengths.
numbers.split(‘)’, 0) gives the output without the bracket.
How can i include the bracket and get the prefixes?

Try this code :
for i in numbers:
a = i.split(")")
s = a[0]
print(s[0:]+")",end = ",")
Or Try this :
for i in numbers:
a = i.index(")")
s = i[0:a+1]
print(s,end = ", ")
Output :
(080),(04256),(022),

Using Python's regular expression package might be more straightforward for you:
import re
numbers=[
"(080)3453421256",
"(04256)6679345390",
"(022)1135643320"]
pattern = "\(\d+\)" # L_PAREN, at least one digit, R_PAREN
for num in numbers:
print(re.match(pattern, num).group()) # Your data has only one match in each line.
Output:
(080)
(04256)
(022)

Related

Split a string into Name and Time

I want to split a string as it contains a combined string of name and time.
I want to split as shown in example below:
Complete string
cDOT_storage01_esx_infra02_07-19-2021_04.45.00.0478
Desired output
cDOT_storage01_esx_infra02 07-19-2021
Efforts performed, not giving desired output
j['name'].split("-")[0], j['name'].split("-")[1][0:10]
Use rsplit. The only two _ you care about are the last two, so you can limit the number of splits rsplit will attempt using _ as the delimiter.
>>> "cDOT_storage01_esx_infra02_07-19-2021_04.45.00.0478".rsplit("_", 2)
['cDOT_storage01_esx_infra02', '07-19-2021', '04.45.00.0478']
You can index the resulting list as necessary to get your final result.
If all the strings follow the same pattern (separated by an underscore(_)), you can try this.
(Untested)
string = "cDOT_storage01_esx_infra02_07-19-2021_04.45.00.0478"
splitted = list(map(str, string.split('_')))
# splitted[-1] will be "04.45.00.0478"
# splitted[-2] will be "07-19-2021"
# Rest of the list will contain the front part
other = splitted.pop()
date = splitted.pop()
name = '_'.join(splitted)
print(name, date)
You use regex for searching and printing.
import re
txt = "cDOT_storage01_esx_infra02_07-19-2021_04.45.00.0478"
# searching the date in the string
x = re.search("\d{2}-\d{2}-\d{4}", txt)
if x:
print("Matched")
a = re.split("[0-9]{2}-[0-9]{2}-[0-9]{4}", txt)
y = re.compile("\d{2}-\d{2}-\d{4}")
print(a[0][:-1] , " ", y.findall(txt)[0])
else:
print("No match")
Output:
Matched
cDOT_storage01_esx_infra02 07-19-2021

How to split or cut the string in python

I am trying to split the string with python code with following output:
import os
f = "Retirement-User-Portfolio-DEV-2020-7-29.xml"
to_output = os.path.splitext(f)[0]
print(to_output)
I have received an output :
Retirement-User-Portfolio-DEV-2020-7-29
However, I want the output like this below and remove "-DEV-2020-7-29" FROM THE STRING:
Retirement-User-Portfolio
You can use split() and join() to split on the kth occurrence of a character.
f = "Retirement-User-Portfolio-DEV-2020-7-29.xml"
to_output = '-'.join(f.split('-')[0:3])
You should explain your question more with details on the pattern you are trying to match - is it always the third character? Other solutions (e.g., regex) may be more appropriate.
Try this code -
f = "Retirement-User-Portfolio-DEV-2020-7-29.xml"
a = f.split('-')
print('-'.join(a[:3]))

Change part of a word (string) into in a different string if a sign occurs. Python

How most effectively do I cut out a part of a word if the character '=#=' appears and then finish cutting the word if the character '=#=' appears? For example:
From a large string
'321#5=85#45#41=#=I-LOVE-STACK-OVER-FLOW=#=3234#41#=q#$^1=#=xx$q=#=xpa$=4319'
The python code returns:
'I-LOVE-STACK-OVER-FLOW'
Any help will be appreciated.
Using split():
s = '321#5=85#45#41=#=I-LOVE-STACK-OVER-FLOW=#=3234#41#=q#$^1=#=xx$q=#=xpa$=4319'
st = '=#='
ed = '=#='
print((s.split(st))[1].split(ed)[0])
Using regex:
import re
s = '321#5=85#45#41=#=I-LOVE-STACK-OVER-FLOW=#=3234#41#=q#$^1=#=xx$q=#=xpa$=4319'
print(re.search('%s(.*)%s' % (st, ed), s).group(1))
OUTPUT:
I-LOVE-STACK-OVER-FLOW
In addition to #DirtyBit's answer, if you want to also handle cases of more than 2 '=#='s, you can split the string, and then add every other element:
s = '321#5=85#45#41=#=I-LOVE-STACK-OVER-FLOW=#=3234#41#=q#$^1=#=xx$q=#=xpa$=4319=#=|I-ALSO-LOVE-SO=#=3123123'
parts = s.split('=#=')
print(''.join([parts[i] for i in range(1,len(parts),2)]))
Output
I-LOVE-STACK-OVER-FLOW|I-ALSO-LOVE-SO
The explanation is in the code.
import re
ori_list = re.split("=#=",ori_str)
# you can imagine your goal is to find the string wrapped between signs of "=#="
# so after the split, the even number position must be the parts outsides of "=#="
# and the odd number position is what you want
for i in range(len(ori_list)):
if i%2 == 1:#odd position
print(ori_list[i])

Excluding a specific string of characters in a str()-function

A small issue I've encountered during coding.
I'm looking to print out the name of a .txt file.
For example, the file is named: verdata_florida.txt, or verdata_newyork.txt
How can I exclude .txt and verdata_, but keep the string between? It must work for any number of characters, but .txt and verdata_ must be excluded.
This is where I am so far, I've already defined filename to be input()
print("Average TAM at", str(filename[8:**????**]), "is higher than ")
3 ways of doing it:
using str.split twice:
>>> "verdata_florida.txt".split("_")[1].split(".")[0]
'florida'
using str.partition twice (you won't get an exception if the format doesn't match, and probably faster too):
>>> "verdata_florida.txt".partition("_")[2].partition(".")[0]
'florida'
using re, keeping only center part:
>>> import re
>>> re.sub(".*_(.*)\..*",r"\1","verdata_florida.txt")
'florida'
all those above must be tuned if _ and . appear multiple times (must we keep the longest or the shortest string)
EDIT: In your case, though, prefixes & suffixes seem fixed. In that case, just use str.replace twice:
>>> "verdata_florida.txt".replace("verdata_","").replace(".txt","")
'florida'
Assuming you want it to split on the first _ and the last . you can use slicing and the index and rindex functions to get this done. These functions will search for the first occurrence of the substring in the parenthesis and return the index number. If no substring is found, they will throw a ValueError. If the search is desired, but not the ValueError, you can also use find and rfind, which do the same thing but always return -1 if no match is found.
s = 'verdata_new_hampshire.txt'
s_trunc = s[s.index('_') + 1: s.rindex('.')] # or s[s.find('_') + 1: s.rfind('.')]
print(s_trunc) # new_hampshire
Of course, if you are always going to exclude verdata_ and .txt you could always hardcode the slice as well.
print(s[8:-4]) # new_hampshire
You can leverage str.split() on strings. For example:
s = 'verdata_newyork.txt'
s.split('verdata_')
# ['', 'florida.txt']
s.split('verdata_')[1]
# 'florida.txt'
s.split('verdata_')[1].split('.txt')
['florida', '']
s.split('verdata_')[1].split('.txt')[0]
# 'florida'
You can just split string by dot and underscore like this:
string filename = "verdata_prague.txt";
string name = filename.split("."); //verdata_prague
name = name[0].split("_")[1]; //prague
or by replace function:
string filename = "verdata_prague.txt";
string name = filename.replace(".txt",""); //verdata_prague
name = name[0].replace("verdata_","")[1]; //prague

Extract substrings from logical expressions

Let's say I have a string that looks like this:
myStr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
What I would like to obtain in the end would be:
myStr_l1 = '(Txt_l1) or (Txt2_l1)'
and
myStr_l2 = '(Txt_l2) or (Txt2_l2)'
Some properties:
all "Txt_"-elements of the string start with an uppercase letter
the string can contain much more elements (so there could also be Txt3, Txt4,...)
the suffixes '_l1' and '_l2' look different in reality; they cannot be used for matching (I chose them for demonstration purposes)
I found a way to get the first part done by using:
myStr_l1 = re.sub('\(\w+\)','',myStr)
which gives me
'(Txt_l1 ) or (Txt2_l1 )'
However, I don't know how to obtain myStr_l2. My idea was to remove everything between two open parentheses. But when I do something like this:
re.sub('\(w+\(', '', myStr)
the entire string is returned.
re.sub('\(.*\(', '', myStr)
removes - of course - far too much and gives me
'Txt2_l2))'
Does anyone have an idea how to get myStr_l2?
When there is an "and" instead of an "or", the strings look slightly different:
myStr2 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2))'
Then I can still use the command from above:
re.sub('\(\w+\)','',myStr2)
which gives:
'(Txt_l1 and Txt2_l1 )'
but I again fail to get myStr2_l2. How would I do this for these kind of strings?
And how would one then do this for mixed expressions with "and" and "or" e.g. like this:
myStr3 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2)) or (Txt3_l1 (Txt3_l2) and Txt4_l1 (Txt2_l2))'
re.sub('\(\w+\)','',myStr3)
gives me
'(Txt_l1 and Txt2_l1 ) or (Txt3_l1 and Txt4_l1 )'
but again: How would I obtain myStr3_l2?
Regexp is not powerful enough for nested expressions (in your case: nested elements in parentheses). You will have to write a parser. Look at https://pyparsing.wikispaces.com/
I'm not entirely sure what you want but I wrote this to strip everything between the parenthesis.
import re
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
sets = mystr.split(' or ')
noParens = []
for line in sets:
mat = re.match(r'\((.* )\((.*\)\))', line, re.M)
if mat:
noParens.append(mat.group(1))
noParens.append(mat.group(2).replace(')',''))
print(noParens)
This takes all the parenthesis away and puts your elements in a list. Here's an alternate way of doing it without using Regular Expressions.
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
noParens = []
mystr = mystr.replace(' or ', ' ')
mystr = mystr.replace(')','')
mystr = mystr.replace('(','')
noParens = mystr.split()
print(noParens)

Categories