Best way to extract the Date from a string [duplicate] - python

This question already has answers here:
Python/Regex - How to extract date from filename using regular expression?
(5 answers)
Closed 2 years ago.
I am trying to extract the date from a string. I used to be able to just pull the entire line, but the company sending the data keeps adding characters to the front/back of the date, which causes my code to stop functioning till I fix it. I am getting mixed reviews searching on if I should use regex or datetime module. Here is what I am currently using, which you can see if cumbersome and not efficient.
line = ' .10/10/2020<=x'
date = line.strip().replace('.', '').replace('<', '').replace('=', '').replace('x', '')
edit:
I ended up taking Yash's regex and it worked perfectly.

Why not extract using regex? this will only work for format xx/xx/xxxx. need to change regex if multiple formats are found
import re
line=' .10/10/2020<=x'
a=re.search("([0-9]{2}/[0-9]{2}/[0-9]{4})", line)
print(a.group(1))

Related

Split python string in a specific way [duplicate]

This question already has answers here:
Split a string by a delimiter in python
(5 answers)
Match text between two strings with regular expression
(3 answers)
Closed 5 months ago.
I have a string like a = 'This is an example string that has a code !3377! this is the code I want to extract'.
How can I extract 3377 from this string, i.e., the part surrounded by !?
There are multiple ways of doing what you are looking for. But the most optimal way of doing it would be by using regular expressions.
For example, in the case you gave:
import re
def subtract_code_from(sentence: str) -> str:
m = re.search(r'\w?!(\d+)!\w?', sentence)
return m.group(0)
Keep in mind that what I've done is a very quick and loose solution I implemented in five minutes. I don't know what other types of particular cases you could encounter for each sentence. So it is your job to implement the proper regex to match all the cases.
I encourage you to follow this tutorial. And you can use this website to build your regexes.
Good luck.

Regex working in text editor(sublime) but not in python [duplicate]

This question already has answers here:
Case insensitive regular expression without re.compile?
(10 answers)
Closed 2 years ago.
I want to extract the line using regex.
The line that I want to extract from document is:
":method":"POST",":path":"/api/browser/projects/8bd4d1d3-0b69-515e-8e15-e9c49992f7d5/buckets/b-ao-mock-testing/copy
The regex I am using is:
":method"[:"a-z,/\d-]{20,1000}/copy
The code for the same in python is:
re.findall('":method"[:"a-z,/\d-]{20,1000}/copy', str(s), re.MULTILINE)
It is working perfectly fine in sublime text but not in python. It is returning an empty list in python. How to resolve this?
You need to use i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z]).
Without this how will POST match?
or use ":method"[:"a-zA-Z,/\d-]{20,1000}/copy
See demo

Removing two non printable characters from a string in python [duplicate]

This question already has answers here:
Removing control characters from a string in python
(9 answers)
Closed 2 years ago.
I am getting a text like below by reading a word file
Exe Command\r\x07
My desired text is
Exe Command
I tried this solution but it gives me
Exe Command\r
How can i remove 2 any backslash characters? I would like a speed friendly solution because I have thousands of inputs like this.
You can use replace() method twice.
In [1]: myStr.replace("\r", "").replace("\x07", "")
Out[1]: 'Exe Command'
If this isn't working, you can try using raw string
In [1]: myStr.replace(r"\r", "").replace(r"\x07", "")
Out[1]: 'Exe Command'
EDIT: As per comment, for removing any of those control characters, use this post's solution.
import unicodedata
def remove_control_characters(s):
return "".join(ch for ch in s if unicodedata.category(ch)[0]!="C")
All credits for this solution goes to Alex Quinn.

findall string that starts with letter "CU" and return full string [duplicate]

This question already has answers here:
pandas select from Dataframe using startswith
(5 answers)
Closed 3 years ago.
It seems like straight forward thing however could not find appropriate SO answer.
I have a column called title which contain strings. I want to find out rows that starts with letter "CU".
I've tried using df.loc however It's giving me indexError,
Using regex, re.findall(r'^CU', string)
returns 'CU' instead of full name ex: 'CU abcd'. How can I get full name that starts with 'CU'?
EDIT: SORRY, I did not notice it was a duplicate question, problem solved by reading duplicate question.
You can try:
string.startswith("CU")

get strings between 2 delimiter in python [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I would like to get, from the following string "/path/to/%directory_1%/%directory_2%.csv"
the following list: [directory_1, directory_2]. I would like to avoid using split by "%" my string. I was hoping to find a regex that could help me. However I cannot find the correct one.
For now, I have the following:
re.findall('%(.*)%', dirty_arg)
which output ["directory_1%/%directory_2"]
Do you have any recommandation about that?
Thank you very much for your help.
Try this:
import re
regex = r"%(.*?)%"
dirty_arg = "/path/to/%directory_1%/%directory_2%.csv"
print(re.findall(regex, dirty_arg))
I've added ? to your regex which makes sure it matches as few times as possible. The output of this code is ['directory_1', 'directory_2']

Categories