Replcae inner space in Python - python

Iam new to Python, And I need to remove space between string and a digit only not between two strings.
eg:
Input : Paragraph 25 is in documents and paragraph number in another file.
Output : Paragraph25 is in documents and paragraph number in another file.
How this can be done in Python ? I tried regex
re.sub("paragraph\s[a-z]", "paragraph[a-z]", Input)
But its not working.

>>> re.sub(r'\s+(\d+)', r'\1', 'Program 25 is fun')
'Program25 is fun'
That might work in a pinch. I'm not the most familiar with regexes, so hopefully someone who is can chime in with something more robust.
Basically we match on whitespace succeeded by numbers and remove it.

Related

Extract values in name=value lines with regex

I'm really sorry for asking because there are some questions like this around. But can't get the answer fixed to make problem.
This are the input lines (e.g. from a config file)
profile2.name=share2
profile8.name=share8
profile4.name=shareSSH
profile9.name=share9
I just want to extract the values behind the = sign with Python 3.9. regex.
I tried this on regex101.
^profile[0-9]\.name=(.*?)
But this gives me the variable name including the = sign as result; e.g. profile2.name=. But I want exactly the inverted opposite.
The expected results (what Pythons re.find_all() return) are
['share2', 'share8', 'shareSSH', 'share9']
Try pattern profile\d+\.name=(.*), look at Regex 101 example
import re
re.findall('profile\d+\.name=(.*)', txt)
# output
['share2', 'share8', 'shareSSH', 'share9']
But this problem doesn't necessarily need regex, split should work absolutely fine:
Try removing the ? quantifier. It will make your capture group match an empty st
regex101

How to separate user's input with two separators? And controlling the users input

I want to separate the users input using two different separators which are ":" and ";"
Like the user should input 4 subject and it's amounts. The format should be:
(Subject:amount;Subject:amount;Subject:amount;Subject:amount)
If the input is wrong it should print "Invalid Input "
Here's my code but I can only used one separator and how can I control the users input?
B = input("Enter 4 subjects and amount separated by (;) like Math:90;Science:80:").split(";")
Please help. I can't figure it out.
If you are fine with using regular expressions in python you could use the following code:
import re
output_list = re.split("[;:]", input_string)
Where inside the square brackets you include all the characters (also known as delimiters) that you want to split by, just make sure to keep the quotes around the square brackets as that makes a regex string (what we are using to tell the computer what to split)
Further reading on regex can be found here if you feel like it: https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285
However, if you want to do it without importing anything you could do this, which is another possible solution (and I would recommend against, but it gets the job done well):
input_string = input_string.replace(";", ":")
output_list = input_string.split(":")
Which works by first replacing all of the semicolons in the input string with colons (it could also work the other way around) and then splitting by the remaining character (in this case the colons)
Hope this helped, as it is my first answer on Stack overflow.

Python using AND with regex

So I have looked around the internet for like 20 minutes and I haven't been able to figure it out. Is it possible to use AND in regex, or something similar (I've just started learning about regex)?
For example, I have the string "finksdssfsk32residogs" and I want to get the output: "32 dogs". I've tried using re.search, re.match, and re.findall but I haven't had any luck. I've tried things like:
re.findall(r"(\d{2})(dogs)", str)
re.search(r"(\d{2})(dogs)", str)
And I've tried a few combinations of each. And I know I can do this with multiple lines but the goal is to get "32 dogs" from "finksdssfsk32residogs" with only one line. Any help is appreciated, thanks.
You've almost got it. You just need some space between the numbers and the dogs.
Can you just match anything? How about (\d{2}).*(dogs)? Then you can replace the middle part with a space using join:
>>> print(' '.join(re.search(r'(\d{2}).*(dogs)', 'finksdssfsk32residogs').groups()))
32 dogs

Hidden characters in integer-like string

I scraped data about fundraising from the web and put it into a table.
As I start to clean the data , I see that some elements, for instance "2 000000", are read "2\xa0000000" by the machine.
1/ What does that mean ?
2/ How can I remove it ? (as I want to transform the whole column to integers)
Best,
To fix a DataFrame column, use:
df['col'] = df['col'].str.replace('\D', '').astype(int)
The issue is that you have escape sequences read in as Unicode characters in the string. The easiest way to remove those characters without using replace on each specific showing is using the unicodedata package.
Specifically:
from unicodedata import normalize
string1 = "2\xa0000000"
new_string = normalize('NFKD', string1)
print(new_string)
Output:
2 000000
This package was already built into my machine, but you may need to install it if you used a different method to build your python package than I. I find this better because this normalization works across a lot of various formatting, so you do not need to use replace each time you see something else that is not formatted correctly. It's an escape sequence
Character of hex code A0 is non-breaking space. So to speak, you can just treat it as a space in most cases. According to my experience, it mostly come up when I process some data generated from Microsoft Office products, or from the web when people put the HTML code on it.
Unfortunately, python split() (for example, I don't know how you process your data) will not treat that as space. But as it is just a distinct character, you can solve the issue with:
longstring.replace('\xA0', ' ').split()
PS: Read again your question, seems it should be ignored to produce the number two million as an data entity. So you might want to replace '\xA0' with empty string.

Replacing strings in a text and ignoring certain parts

I found many programs online to replace text in a string or file with words prescribed in a dictionary. For example, https://www.daniweb.com/programming/software-development/code/216636/multiple-word-replace-in-text-python
But I was wondering how to get the program to ignore certain parts of the text. For instance, I would like it to ignore parts that are ensconced within say % signs (%Please ignore this%). Better still, how do I get it to ignore the text within but remove the % sign at the end of the run.
Thank you.
This could very easily be done with regular expressions, although they may not be supported by any online programs you find. You will probably need to write something yourself and then use regex as your dict's search key's.
Good place to start playing around with regex is: http://regexr.com
Well in the replacing dictionary just have any word you want to be ignored such as teh be replaced with the but %teh% be replaced with teh. For the program in the link you could have
wordDic = {
'booster': 'rooster',
'%booster%': 'booster'
}

Categories