Different outcome with the same regular expression in Python - python

I just learning Python and the regular "language". But I just encountered this problem which I can't answer. Some help or recommendations would be helpful.
import re
m = " text1 \abs{foo} text2 text3 \L{E.F} text4 "
def separate_mycmd(cmd,m):
math_cmd = re.compile(cmd)
math = math_cmd.findall(m)
text = math_cmd.split(m)
return(math,text)
(math,text) = separate_mycmd(r'\abs',m)
print math # ['\x07bs']
print text # [' text1 ', '{foo} text2 text3 \\L{E.F} text4 ']
(math,text) = separate_mycmd(r'\L',m)
print math # **Question:** Just ['L'] and not ['\\L'] or ['\L]
print text # [' text1 \x07bs{foo} text2 text3 \\', '{E.F} text4 ']
# **Question:** why the \\ after text3 ?
I don't understand the output from the last call. My related questions are in the comments.
Thanks in advance,
Ulrich

You want to match on "\\L", not \L, giving you:
> (math,text) = separate_mycmd(r'\\L',m)
> print math
['\\L']
> print text
[' text1 \x07bs{foo} text2 text3 ', '{E.F} text4 ']
You also probably wanted to use \\a as well. And you probably also wanted to use it in the string you're searching, giving:
m = " text1 \\abs{foo} text2 text3 \\L{E.F} text4 "

Related

Pandas: create a function to delete links

I necessary need a function to delete links from my oldText column (more then 1000 rows) in a pandas DataFrame.
I've created it using regex, but it doesn't work. This is my code:
def remove_links(text):
text = re.sub(r'http\S+', '', text)
text = text.strip('[link]')
return text
df['newText'] = df['oldText'].apply(remove_links)
I have not error, the code do just nothing
Your code is working for me:
CSV:
oldText
https://abc.xy/oldText asd
https://abc.xy/oldTe asd
https://abc.xy/oldT
https://abc.xy/old
https://abc.xy/ol
Code:
import pandas as pd
import re
def remove_links(text):
text = re.sub(r'http\S+', '', text)
text = text.strip('[link]')
return text
df = pd.read_csv('test2.csv')
df['newText'] = df['oldText'].apply(remove_links)
print(df)
Result:
oldText newText
0 https://abc.xy/oldText asd asd
1 https://abc.xy/oldTe asd asd
2 https://abc.xy/oldT
3 https://abc.xy/old
4 https://abc.xy/ol

if/else: get condition met - Python

animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
if(text1 in animals or text2 in animals or text3 in animals):
print(text2) # because it was met in the if/else statment!
I tried to simplify but this animals string will be update everytime.
What is the best and easy way to achieve this without so many if/else statment in my code?
You can use regex.
import re
pattern = '|'.join([text1, text2, text3])
# pattern -> 'brown dog|white cat|fat cow'
res = re.findall(pattern, animals)
print(res)
# ['white cat']
ANY time you have a set of variables of the form xxx1, xxx2, and xxx3, you need to convert that to a list.
animals = 'silly monkey small bee white cat'
text = [
'brown dog',
'white cat',
'fat cow'
]
for t in text:
if t in animals:
print("Found",t)
Use a loop to check each case:
animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
for text in [text1, text2, text3]:
if text in animals:
print(text)

Getting rid of white space between name, number and height

I have txt file like this;
name lastname 17 189cm
How do I get it to be like this?
name lastname, 17, 189cm
Using str.strip and str.split:
>>> my_string = 'name lastname 17 189cm'
>>> s = list(map(str.strip, my_string.split()))
>>> ', '.join([' '.join(s[:2]), *s[2:] ])
'name lastname, 17, 189cm'
You can use regex to replace multiple spaces (or tabs) with a comma:
import re
text = 'name lastname 17 189cm'
re.sub(r'\s\s+|\t', ', ', text)
text = 'name lastname 17 189cm'
out = ', '.join(text.rsplit(maxsplit=2)) # if sep is not provided then any consecutive whitespace is a separator
print(out) # name lastname, 17, 189cm
You could use re.sub:
import re
s = "name lastname 17 189cm"
re.sub("[ ]{2,}",", ", s)
PS: for the first problem you proposed, I had the following solution:
s = "name lastname 17 189cm"
s[::-1].replace(" ",",", 2)[::-1]

How do I retain the decimal numbers during text preprocessing in python?(Edited)

def text_process(text):
text = text.translate(str.maketrans('', '', string.punctuation))
return " ".join(text)
Input text: 'Transaction value was - RS.3456.63 '
Output : 'Transaction value was RS 345663 '
Could someone suggest me how to remove special characters (including '.' ) during text pre-processing but retain the decimal numbers?
Required Output : 'Transaction value was RS 3456.63 '
You can use a more generic regex to replace all special characters except .
import re
def text_process(text):
text = re.sub('[^\w.]+', ' ', text)
return text
s = 'Transaction: value* #was - 3456.63 Rupees'
text_process(s)
You get
'Transaction value was 3456.63 Rupees'
EDIT: The following function returns only the number with decimals.
def text_process(text):
text = re.sub('[^\d.]+', '', text)
return text
s = 'Transaction: value* #was - 3456.63 Rupees'
text_process(s)
'3456.63'
If I understand your question correctly, this code is for you:
text = 'Transaction value was, - 3456.63 Rupees'
regex = r"(?<!\d)[" + string.punctuation + "](?!\d)"
result = re.sub(regex, "", text)
# output: 'Transaction value was 3456.63 Rupees'
To solve your second question, try using this trick:
text = 'Transaction value was, - Rs.3456.63'
regex_space = r"([0-9]+(\.[0-9]+)?)"
regex_punct = r'[^\w.]+'
re.sub(r'[^\w.]+', ' ', re.sub(regex_space,r" \1 ", text).strip())
# output: 'Transaction value was Rs. 3456.63 Rupees'

How to add text to python calculation?

I'm trying to create this with the help of python:
in a for loop (from 1,10)
text i+(i*(i+2.5)) text [i+(i*(i+2.5))] text
results:
text 1+(1*(1+2.5) text 4.5 text
text 2+(2*(2+2.5) text 11 text
text 3+(3*(3+2.5) text 19.5 text
etc
Every i must have a different formatting
All code must be in one commandline
This is what I have created
python -c "import locale; locale.setlocale(locale.LC_ALL, ''); print('\n'.join(locale.format('\%.1f',i+(i*(i+2.5))) for i in range(1,10,1)))"
How can add the rest of the text?
UPDATE:
With the help of Tanmaya Meher's answer I created this command:
python -c "import sys,locale; [sys.stdout.write('text1 ' + '\%.2f+(\%.2f*(\%.2f+2.5))'\%(i,i,i) +' text2 ' + '\%.2f'\%(i+(i*(i+2.5))) + ' text3' + '\n') for i in range(1,11,1)]"
but I still don't know where to place local formatting (locale)
I think this will help. Please specify if anything need to be added
python -c "import locale; locale.setlocale(locale.LC_ALL, ''); print ('\n'.join('text1 ' + locale.format_string('%.1f+(%.1f*(%.1f+2.5))',(i,i,i), grouping = True) + ' text2 ' + locale.format_string('%.1f',i+(i*(i+2.5)), grouping = True) + ' text3'for i in range(1,1111)))"
Output:
text1 1.0+(1.0*(1.0+2.5)) text2 4.5 text3
text1 2.0+(2.0*(2.0+2.5)) text2 11.0 text3
text1 3.0+(3.0*(3.0+2.5)) text2 19.5 text3
text1 4.0+(4.0*(4.0+2.5)) text2 30.0 text3
text1 5.0+(5.0*(5.0+2.5)) text2 42.5 text3
text1 6.0+(6.0*(6.0+2.5)) text2 57.0 text3
text1 7.0+(7.0*(7.0+2.5)) text2 73.5 text3
text1 8.0+(8.0*(8.0+2.5)) text2 92.0 text3
text1 9.0+(9.0*(9.0+2.5)) text2 112.5 text3
text1 10.0+(10.0*(10.0+2.5)) text2 135.0 text3
...
...
text1 1,109.0+(1,109.0*(1,109.0+2.5)) text2 12,33,762.5 text3
text1 1,110.0+(1,110.0*(1,110.0+2.5)) text2 12,35,985.0 text3
If you don't want grouping, i.e., 12,35,985.0 format (with commas); then just remove grouping = True from the code.

Categories