Delete first 3 characters of string in Python - python

I'm trying to delete up some initial preceding characters in a string in Python 2.7. To be more specific, the string is an mx record that looks like 10 aspmx2.googlemail.com. I need to delete the preceding number (which can be single or double digits) and space character.
Here is the code I've come up with thus far, but I'm stuck
mx_name = "10 aspmx2.googlemail.com"
for i in range(0,3):
char = mx_name[i]
if char == "0123456789 ":
short_mx_name.replace(char, "")
For some reason, the if statement is not working correctly and I fail to see why. Any help would be much appreciated.
Thank you.

You can use re.sub:
import re
mx_name = "10 aspmx2.googlemail.com"
new_name = re.sub("^\d+\s", '', mx_name)
Output:
'aspmx2.googlemail.com'
Regex explanation:
^:anchor for the expression, forcing it to start its search at the beginning of the string
\d+:finds all digits until a non numeric character (in this case the space) is found.
\s: empty whitespace, must be included in this example so that the substitution also catches the space between the digit and email.
In short, ^\d+\s starts the search at the beginning of the string, finds all proceeding digits, and lastly targets the space to make sure that the regex is not scanning part of the email.

mx_name.split()[1]
Output:
'aspmx2.googlemail.com'

Using split function
mx_name = "10 aspmx2.googlemail.com"
mx_name_url = mx_name.strip().split(' ')[1]
# aspmx2.googlemail.com
Using slice function
mx_name = "10 aspmx2.googlemail.com"
mx_name[3:]
# aspmx2.googlemail.com

You can use regex :
import re
pattern=r'\b[\d\s]{1,3}\b'
string='10 aspmx2.googlemail.com'
new_string=re.sub(pattern,"",string)
print(new_string)
output:
aspmx2.googlemail.com
with single digit:
string='1 aspmx2.googlemail.com' then output:
aspmx2.googlemail.com

You should use regex for that; There are plenty of regex answers to this question but if you want a more abstract solution you can use:
m = "10 aspmx2.googlemail.com"
match = re.search('(?:\s)(\w.*#.*\.)', m)
match.group(1)
'aspmx2.googlemail.com'
This pattern will match any email address after the first space.
(?:\s) - non capturing space char
(\w.*#.*\.) - matches alphanumeric character and the underscore followed by # and anything after in its own group
This will match 4123 name#email.com or some_text name#email.com etc.

The minimum modification to your code would be this:
mx_name = "10 aspmx2.googlemail.com"
short_name = mx_name[:]
for i in range(0,3):
char = mx_name[i]
if char in "0123456789 ":
short_name = short_name.replace(char, "", 1)
Your if was checking if the char WAS 1234567890, not if it was included in that set. Also including the 1 is needed to avoid deelting digits and spaces further in the string.

Related

find and replace the word on a string based on the condition

hello dear helpful ppl at stackoverflow ,
I have couple questions about manipulating a string in python ,
first question:-
if I have a string like :
'What's the use?'
and I want to locate the first letter after 'the'
like (What's the use?) the letter is u
how I could do it in the best way possible ?
second question:-
if I want to change something on this string based on the first letter i found in the (First question)
how I could do it ?
and thanks for helping !
You could use a regex replacement to remove all content up and including the first the (along with any following whitespace). Then, just access the first character from that output.
inp = 'What''s the use?'
inp = re.sub(r'^.*?\bthe\b\s*', '', inp)
print("First character after first 'the' is: " + inp[0])
This prints:
First character after first 'the' is: u
Another re take:
import re
sample = "What is the use?"
pattern = r"""
(?<=\bthe\b) # look-behind to ensure 'the' is there. This is non-capturing.
\s+ # one or more whitespace characters
(\w) # Only one alphanumeric or underscore character
"""
# re.X is for verbose, which handles multi-line patterns
m = re.search(pattern, sample, flags = re.X).groups(1)
if not m is None:
print(f"First character after first 'the' is: {m[0]}")
You can find the index of 'u' by using the str.index() method. Then you can extract string before and after using slice operation.
s = "What's the use?"
character_index = s.lower().index('the ') + 4
print(character_index)
# 11
print(s[:character_index] + '*' + s[character_index+1:])
# What's the *se?

Match everything except a pattern and replace matched with string

I want to use python in order to manipulate a string I have.
Basically, I want to prepend"\x" before every hex byte except the bytes that already have "\x" prepended to them.
My original string looks like this:
mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
And I want to create the following string from it:
mystr = r"\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00"
I thought of using regular expressions to match everything except /\x../g and replace every match with "\x". Sadly, I struggled with it a lot without any success. Moreover, I'm not sure that using regex is the best approach to solve such case.
Regex: (?:\\x)?([0-9A-Z]{2}) Substitution: \\x$1
Details:
(?:) Non-capturing group
? Matches between zero and one time, match string \x if it exists.
() Capturing group
[] Match a single character present in the list 0-9 and A-Z
{n} Matches exactly n times
\\x String \x
$1 Group 1.
Python code:
import re
text = R'30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00'
text = re.sub(R'(?:\\x)?([0-9A-Z]{2})', R'\\x\1', text)
print(text)
Output:
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00
Code demo
You don't need regex for this. You can use simple string manipulation. First remove all of the "\x" from your string. Then add add it back at every 2 characters.
replaced = mystr.replace(r"\x", "")
newstr = "".join([r"\x" + replaced[i*2:(i+1)*2] for i in range(len(replaced)/2)])
Output:
>>> print(newstr)
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00
You can get a list with your values to manipulate as you wish, with an even simpler re pattern
mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
import re
pat = r'([a-fA-F0-9]{2})'
match = re.findall(pat, mystr)
if match:
print('\n\nNew string:')
print('\\x' + '\\x'.join(match))
#for elem in match: # match gives you a list of strings with the hex values
# print('\\x{}'.format(elem), end='')
print('\n\nOriginal string:')
print(mystr)
This can be done without replacing existing \x by using a combination of positive lookbehinds and negative lookaheads.
(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})
Usage
See code in use here
import re
regex = r"(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})"
test_str = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
subst = r"\\x$1"
result = re.sub(regex, subst, test_str, 0, re.IGNORECASE)
if result:
print (result)
Explanation
(?!(?<=\\x)|(?<=\\x[a-f\d])) Negative lookahead ensuring either of the following doesn't match.
(?<=\\x) Positive lookbehind ensuring what precedes is \x.
(?<=\\x[a-f\d]) Positive lookbehind ensuring what precedes is \x followed by a hexidecimal digit.
([a-f\d]{2}) Capture any two hexidecimal digits into capture group 1.

Get the last 4 characters of a string as long as they are special characters

I have web URLs that look like this:
http://example.com/php?id=2/*
http://example.com/php?id=2'
http://example.com/php?id=2*/"
What I need to do is grab the last characters of the string, I've tried:
for urls in html_page:
syntax = list(url)[-1]
# <= *
# <= '
# etc...
However this will only grab the last character of the string, is there a way I could grab the last characters as long as they are special characters?
Use a regex. Assuming that by "special character" you mean "anything besides A-Za-z0-9":
>>> import re
>>> re.search(r"\W+$", "http://example.com/php?id=2*/'").group()
"*/'"
\W+ matches one or more "non-word" characters, and $ anchors the search to the end of the string.
Use a regular expression?
import re
addr = "http://example.com/php?id=2*/"
chars = re.search(addr, "[\*\./_]{0,4}$").group()
Characters you want to match are the ones between the [] brackets. You may want to add or remove characters depending on what you expect to encounter.
For example, you would (probably) not want to match the '=' character in your example URLs, which the other answer would match.
{0,4} means to match 0-4 characters (defaults to being greedy)

match any decimals appearing immediately before a character in python

I can't seem to find an example of this, but I doubt the regex is that sophisticated. Is there a simple way of getting the immediately preceding digits of a certain character in Python?
For the character "A" and the string:
"&#238A"
It should return 238A
As long as you intend to include the trailing character in the resulting match, the regex pattern to do that is very simple. For instance, if you want to capture any series of digits followed by a letter A, the pattern would be \d+A
If you are on python 3, try this.
Please refer to this link for more information.
import re
char = "A" # the character you're searching for.
string = "BA &#238A 123A" # test string.
regex = "[0-9]+%s" %char # capturing digits([0-9]) which appear more than once(+) followed by a desired character "%s"%char
compiled_regex = re.compile(regex) # compile the regex
result = compiled_regex.findall(string)
print (result)
>>['238A', '123A']

Python: Ignore a # / and random numbers in a string

I use part of code to read a website and scrap some information and place it into Google and print some directions.
I'm having an issue as some of the information. the site i use sometimes adds a # followed by 3 random numbers then a / and another 3 numbers e.g #037/100
how can i use python to ignore this "#037/100" string?
I currently use
for i, part in enumerate(list(addr_p)):
if '#' in part:
del addr_p[i]
break
to remove the # if found but I'm not sure how to do it for the random numbers
Any ideas ?
If you find yourself wanting to remove "three digits followed by a forward slash followed by three digits" from a string s, you could do
import re
s = "this is a string #123/234 with other stuff"
t = re.sub('#\d{3}\/\d{3}', '', s)
print t
Result:
'this is a string with other stuff'
Explanation:
# - literal character '#'
\d{3} - exactly three digits
\/ - forward slash (escaped since it can have special meaning)
\d{3} - exactly three digits
And the whole thing that matches the above (if it's present) is replaced with '' - i.e. "removed".
import re
re.sub('#[0-9]+\/[0-9]+$', '', addr_p[i])
I'm no wizzard with regular expressions but i'd imagine you could so something like this.
You could even handle '#' in the regexp as well.
If the format is always the same, then you could check if the line starts with a #, then set the string to itself without the first 8 characters.
if part[0:1] == '#':
part = part[8:]
if the first letter is a #, it sets the string to itself, from the 8th character to the end.
I'd double your problems and match against a regular expression for this.
import re
regex = re.compile(r'([\w\s]+)#\d+\/\d+([\w\s]+)')
m = regex.match('This is a string with a #123/987 in it')
if m:
s = m.group(1) + m.group(2)
print(s)
A more concise way:
import re
s = "this is a string #123/234 with other stuff"
t = re.sub(r'#\S+', '', s)
print(t)

Categories