how to deliminate string with different and multiple deliminater - python

I have a string (python), that has the date, model, make, and the year it is below:
string = "Mar 17 1997 H569, CAT: 2022"
I want to write a program that will ask the user to enter the string, and the program will automatically do something like:
date: data
model:data
make: data
year: data
The question, how can I deliminate,since I have space, comma, colon, etc. If I use characters then the problem will be not all makes and model have the same number of characters. What I am trying to do is to deliminate a string with more than once deliminater randomly mixed, in python?

One option is to use a regex:
import re
regex = re.compile('(\w+ \d+ \d+) (\w+), (\w+): (\d+)')
string = "Mar 17 1997 H569, CAT: 2022"
regex.findall(string)
output: [('Mar 17 1997', 'H569', 'CAT', '2022')]

string = input('date :') #"Mar 17 1997 H569, CAT: 2022"
mounth,day,dyear,model,make,year=string.split()
print(f'date:{day}/{mounth}/{year}')
print(f'model:{model.strip(",")}')
print(f'make:{make.strip(":")}')
print(f'year:{year}')

Related

How to find the words correspond to month and replace it with numerical?

How to find the words that correspond to the month "January, February, March,.. etc." and replace them with numerical "01, 02, 03,.."
I tried the code below
def transformMonths(string):
rep = [("May", "05"), ("June", "06")]
for pat, repl in rep:
s = re.sub(pat, repl, string)
return s
print( transformMonths('I was born on June 24 and my sister was born on May 17') )
My code provides this result ('I was born on 06 24 and my sister was born on May 17')
However, I want the output to be like this ('I was born on 06 24 and my sister was born on 05 17')
You are performing the replacement on the initial (unmodified) string at each iteration so you end up with only one month name being replaced. You can fix that by assigning string instead of s in the loop (and return string at the end).
Note that your approach does not require a regular expression and could use a simple string replace: string = string.replace(pat,repl).
In both cases, because the replacement does not take into account word boundaries, the function would replace partial words such as:
"Mayor Smith was elected on May 25" --> "05or Smith was elected on 05 25".
You can fix that in your regular expression by adding \b before and after each month name. This will ensure that the month names are only found if they are between word boundaries.
The re.sub can perform multiple replacements with varying values if you give it a function instead of a fixed string. So you can build a combined regular expression that will find all the months and replace the words that are found using a dictionary:
import re
def numericMonths(string):
months = {"January":"01", "Ffebruary":"02","March":"03", "April":"04",
"May":"05", "June":"06", "July":"07", "August":"08",
"September":"09","October":"10", "November":"11","December":"12"}
pattern = r"\b("+"|".join(months)+r")\b" # all months as distinct words
return re.sub(pattern,lambda m:months[m.group()],string)
output:
numericMonths('I was born on June 24 and my sister was born on May 17')
'I was born on 06 24 and my sister was born on 05 17'

Compare two strings and Extract value of variable data in Python

In my python script,
I have a list of strings like,
birth_year = ["my birth year is *","i born in *","i was born in *"]
I want to compare one input sentence with the above list and need a birth year as output.
The input sentence is like:
Example1: My birth year is 1994.
Example2: I born in 1995
The output will be:
Example1: 1994
Example2: 1995
I applied many approaches by using regex. But I didn't find a perfect solution for the same.
If you change birth_year to a list of regexes you could match more easily with your input string. Use a capturing group for the year.
Here's a function that does what you want:
def match_year(birth_year, input):
for s in birth_year:
m = re.search(s, input, re.IGNORECASE)
if m:
output = f'{input[:m.start(0)]}{m[1]}'
print(output)
break
Example:
birth_year = ["my birth year is (\d{4})","i born in (\d{4})","i was born in (\d{4})"]
match_year(birth_year, "Example1: My birth year is 1994.")
match_year(birth_year, "Example2: I born in 1995")
Output:
Example1: 1994
Example2: 1995
You need at least Python 3.6 for f-strings.
str1=My birth year is 1994.
str2=str1.replace('My birth year is ','')
You can try something like this and replace the unnecessary string with empty string.
For the code you shared, you can do something like :
for x in examples:
for y in birth_year:
if x.find(y)==1: #checking if the substring exists in example
x.replace(y,'') #if it exists we replace it with empty string
I think the above code might work
If you can guaranty those "strings like" always contain one 4 digits number, which is a year of birth, somewhere in there... i'd say just use regex to get whatever 4 digits in there surrounded by non-digits. Rather dumb, but hey, works with your data.
import re
examples = ["My birth year is 1993.", "I born in 1995", "я родился в 1976м году"]
for str in examples:
y = int(re.findall(r"^[^\d]*([\d]{4})[^\d]*$", str)[0])
print(y)

I am confused on how to replace a sentence with 're.sub' for this particular problem

I have trouble with changing this particular string with re.sub:
string = "
Name: Carolyn\r\n
Age : 20\r\n
Hobby: skiing, diving\r\n"
Is there a way to easily replace for example from Hobby: skiing, diving\r\n to Hobby: swimming, reading\r\n?
Assuming you're trying to match anything after Hobby not just skiing and diving specifically. One option is to match the whole line, capture Hobby: in a capture group, and replace the line with the capture plus replacement text. You can use re.M to change to multiline mode allowing you to match the line ending rather than the string ending.
import re
string = '''
Name: Carolyn
Age : 20
Hobby: skiing, diving
'''
print(re.sub(r'(Hobby: ).*$', r'\1swimming, reading', string, flags=re.M))
result
Name: Carolyn
Age : 20
Hobby: swimming, reading

how find regex pattern word in outre word in python?

I have the string like this :
str = '4 167213860 Mar 7 2017 10:37:42 +00:00 c7600rsp72043-advipservicesk9-mz-obs_v151_3_s1_RLS10_ES5'
I want to recover only one part of this word (c7600rsp72043-advipservicesk9-mz-obs_v151_3_s1_RLS10_ES5)
I looking for the regex pattern, but I can't find. I do something like that in python :
import re
str = '4 167213860 Mar 7 2017 10:37:42 +00:00 c7600rsp72043-advipservicesk9-mz-obs_v151_3_s1_RLS10_ES5'
output = re.findall(r'[a-z0-9]rsp[a-zA-Z0-9_-]+$',string)
This return me []
If some one of you can help me I will be very happy.
Use a regex that gets all adjacent non whitespace at the end of the string: \S+$
string = '4 167213860 Mar 7 2017 10:37:42 +00:00 c7600rsp72043-advipservicesk9-mz-obs_v151_3_s1_RLS10_ES5'
output = re.findall(r'\S+$',string)
Working example: https://regex101.com/r/lXFRNT/1
#Ruzhim's answer is good, but if you want to keep on doing it the way you thought about it you could just replace the "rsp" bit with a \w+
output = re.findall(r'[a-z0-9]\w+[a-zA-Z0-9_-]+$', str)
>>>['c7600rsp72043-advipservicesk9-mz-obs_v151_3_s1_RLS10_ES5']

Python string split without common delimiter

I am fairly new to Python. An external simulation software I use gives me reports which include data in the following format:
1 29 Jan 2013 07:33:19.273 29 Jan 2013 09:58:10.460 8691.186
I am looking to split the above data into four strings namely;
'1', '29 Jan 2013 07:33:19.273', '29 Jan 2013 09:58:10.460', '8691.186'
I cannot use str.split since it splits out the date into multiple strings. There appears to be four white spaces between 1 and the first date and between the first and second dates. I don't know if this is four white spaces or tabs.
Using '\t' as a delimiter on split doesn't do much. If I specify ' ' (4 spaces) as a delimiter, I get the first three strings. I also then get an empty string and leading spaces in the final string. There are 10 spaces between the second date and the number.
Any suggestions on how to deal with this would be much helpful!
Thanks!
You can split on more than one space with a simple regular expression:
import re
multispace = re.compile(r'\s{2,}') # 2 or more whitespace characters
fields = multispace.split(inputline)
Demonstration:
>>> import re
>>> multispace = re.compile(r'\s{2,}') # 2 or more whitespace characters
>>> multispace.split('1 29 Jan 2013 07:33:19.273 29 Jan 2013 09:58:10.460 8691.186')
['1', '29 Jan 2013 07:33:19.273', '29 Jan 2013 09:58:10.460', '8691.186']
If the data is fixed width you can use character addressing in the string
n=str[0]
d1=str[2:26]
d2=str[27:51]
l=str[52:]
However, if Jan 02 is shown as Jan 2 this may not work as the width of the string may be variable

Categories