Splitting text from number in python using pandas [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a string
x='125mg'
First, i want to detect that number and text are together and if they are together so i want to separate it into 125 and mg.

Try this:
import re
a = '125mg switch'
' '.join(re.findall(r'[A-Za-z]+|\d+', a))
Output:
'125 mg switch'

This is the very simple task in python using the regular expresiion package in python .Here i am providing u the code for splitting the number from the string:
python code:
import re
a='125msg'
result=re.findall('\d+',a)
for i in result:
print(i)

You could simply do it using Regular Expression in Python. I don't know whether pandas can do that.
read more about it from this link
import re
test_str = "125mg"
res = re.findall(r'[A-Za-z]+|\d+', test_str)
print(str(res))

Related

Splitting a line of complex string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I need to split a really complex line for me. The line I want to split is as follows
2019.10.20-22.01.33: '10.11.111.25 9999995555884411:TechnoBeceT(69)' logged in
how can i split this like this
['2019.10.20-22.01.33', '10.11.111.25', '9999995555884411', 'logged in']
i don't need
TechnoBeceT(69) this area.
Using Regular Expression
import re
p = re.compile(r'(([\d\.-]+)(?::|\s)|(logged in))')
s = "2019.10.20-22.01.33: '10.11.111.25 9999995555884411:TechnoBeceT(69)' logged in"
q = [x[1] or x[2] for x in p.findall(s)]
print(q)
Output
['2019.10.20-22.01.33', '10.11.111.25', '9999995555884411', 'logged in']
Looks like you just need to split by ' ', ':' and 'TechnoBeceT(69)' as an appropriate regex. This existing question is probably what you need: Split string with multiple delimiters in Python

Easy way to extract multiple dates from string without spaces? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am looking for a way to automatically extract dates from a string, but following each other without a delimiter
For example my string is: \n-\n24-04-201923-04-201922-04-201921-04-201920-04-201919-04-201918-04-2019
How can I get this output:
24-04-2019
23-04-2019
22-04-2019
21-04-2019
20-04-2019
19-04-2019
18-04-2019
Any help would be appreciated!
Given that they're all of equal length, you can just clear the \n's then use textwrap:
import textwrap
print(textwrap.wrap(my_string, 10))
You can remove \n's using strip():
my_string = my_string.strip()
You can use this code also.
string='\n-\n24-04-201923-04-201922-04-201921-04-201920-04-201919-04-201918-04-2019'
newStr=string[3:]
for char in range(0,len(newStr),10):
print newStr[char:char+10]
Here's the output
24-04-2019
23-04-2019
22-04-2019
21-04-2019
20-04-2019
19-04-2019
18-04-2019

Regular expression operations in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
How to get what I want? For example, I have a string like this
'RC00001 C00003_C00004RC00087 C00756_C01545RC01045 C06756_C03485'
I want to get
'RC00001 C00003_C00004','RC00087 C00756_C01545','RC01045 C06756_C03485'
What should I do? I have tried many times, but I failed. Please help me! Thank you!
answer=[]
a="RC00001 C00003_C00004RC00087 C00756_C01545RC01045 C06756_C03485"
b = a.split("RC")
for i in b[1:]:
answer.append("RC%s" % (i))
print(answer)
This will output:
['RC00001 C00003_C00004', 'RC00087 C00756_C01545', 'RC01045 C06756_C03485']
If you want to achieve this using regex, you could try the following
import re
input_str = 'RC00001 C00003_C00004RC00087 C00756_C01545RC01045 C06756_C03485'
pattern = '(RC[\d+]+\s+C[\d]+_C[\d]+)'
print(re.findall(pattern, input_str))
# output
# [('RC00001 C00003_C00004', 'RC00087 C00756_C01545', 'RC01045 C06756_C03485')]
provided the format is always RC{numbers} C{numbers}

How to match the pattern using re using the commas? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
How do I match the following pattern using re?
2016-02-13 02:00:00.0,3525,http://www.heatherllindsey.com/2016/02/my-husband-left-his-9-5-job-for-good-it.html,158,0,2584490
I used python's split() function to separate the attributes out but as the data is huge, the process is getting killed due to memory errors.
If you put the long version of string it would be better.
So how can you make it ? That is the answer:
import re
str = "2016-02-13 02:00:00.0,3525,http://www.heatherllindsey.com/2016/02/my-husband-left-his-9-5-job-for-good-it.html,158,0,2584490"
pattern = re.compile("(.*?),", re.DOTALL) #we use re.DOTALL to continue splitting after endlines.
result = pattern.findall(str) #we can't find the last statement (2584490) because of the pattern so we will apply second process
pattern = re.compile("(.*?)", re.DOTALL)
str2 = str[-50:-1]+str[-1] #we take last partition of string to find out last statement by using split() method
result.append(str2.split(",")[-1])
print result
It works...

Newbie need Help python regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence:
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)
There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.
The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.

Categories