Split the string every special character with regular expressions [duplicate] - python

This question already has answers here:
What are non-word boundary in regex (\B), compared to word-boundary?
(2 answers)
Closed 3 years ago.
I want to split my string into pieces but every some text and a special character. I have a string:
str = "ImEmRe#b'aEmRe#b'testEmRe#b'string"
I want my string to be split every EmRe#b' characters as you can see it contais the ' and that's the problem.
I tried doing re.split(r"EmRe#b'\B", str), re.split(r"EmRe#b?='\B", str) and also I tried both of them but without the r before the pattern. How do I do it? I'm really new to regular expressions. I would even say I've never used them.

Firstly, change the name of your variable, since str() is a built-in Python function.
If you named your variable word, you could get a list of elements split by your specified string by doing this:
>>> word = "ImEmRe#b'aEmRe#b'testEmRe#b'string"
>>> word
"ImEmRe#b'aEmRe#b'testEmRe#b'string"
>>> word.split("EmRe#b'")
['Im', 'a', 'test', 'string']
Allowing you to use them in many more ways than just a string! It can be saved to a variable, of course:
>>> foo = word.split("EmRe#b'")
>>> foo
['Im', 'a', 'test', 'string']

Related

Python Numpy: Why is an underscore necessary here [duplicate]

This question already has answers here:
Underscore _ as variable name in Python [duplicate]
(3 answers)
Closed 1 year ago.
import numpy
n,m=map(int, input().split())
arr=numpy.array([input().strip().split() for _ in range(n)],int)
print (numpy.transpose(arr))
print(arr.flatten())
Why should there be an underscore before "in range" in the third line? It would also be useful if someone explained why .strip and .split need to be applied here.
Thanks a lot!
_ is just a variable, it could be named differently, for example i. _ is just a conventional name for unused variables. In this case, you execute input().strip().split() n times in exactly the same way, without caring which iteration (i) it is.
.split() splits the input string by spaces, for example:
>>> '1 2 3'.split()
['1', '2', '3']
.strip() trims whitespace at the edges:
>>> ' 1 2 3 '.strip()
'1 2 3'
You can read more about these methods by googling the docs or, even simpler, running help(str.split) in an inerpreter
In Python, the underscore holds the result of the last executed expression.
In some cases, it is used to replace a variable that will not be used.
In your example, as you just need to loop n number of times without needing to know the value of each iteration, you can use for _ in range(n) instead of for i in range(n).
You can find more information about the underscore in Python here: What is the purpose of the single underscore "_" variable in Python?
As for the strip and split methods, here is a quick explanation based on the Python documentation.
str.strip: Return a copy of the string with the leading and trailing characters removed.
str.split: Return a list of the words in the string, using sep as the delimiter string.
So from your example, your code takes the input of the user, removes any leading and trailing characters with strip, and split the input into a list of words.
For example, if the user input is Hello World! , the result will be: ["Hello", "World!"]
Hope that helps!

python string strip weird behavior [duplicate]

This question already has answers here:
How do the .strip/.rstrip/.lstrip string methods work in Python?
(4 answers)
Closed 4 years ago.
Is there a reason why I am having this kind of string strip behavior ? Is this a bug or some string magic I am missing
# THIS IS CORRECT
>>> 'name.py'.rstrip('.py')
'name'
# THIS IS WRONG
>>> 'namey.py'.rstrip('.py')
'name'
# TO FIX THE ABOVE I DID THE FOLLOWING
>>> 'namey.py'.rstrip('py').rstrip('.')
'namey'
That's because the str.rstrip() command removes each trailing character, not the whole string.
https://docs.python.org/2/library/string.html
string.rstrip(s[, chars])
Return a copy of the string with trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the end of the string this method is called on.
This also generates same result
>>> 'nameyp.py'.rstrip('.py')
'name'
You could try str().endswith
>>> name = 'namey.py'
... if name.endswith('.py'):
... name = name[:-3]
>>> name
'namey'
Or just str().split()
>>> 'namey.py'.split('.py')[0]
'namey'

Splitting with Regular Expression in Python [duplicate]

This question already has an answer here:
Does '[ab]+' equal '(a|b)+' in python re module?
(1 answer)
Closed 5 years ago.
I am relatively new to Python, and I am trying to split a string using re.
I have researched a bit, and I have come across a few examples and I tried them out. They seem to work, but with limitation.
I am using a dictionary with a string key that is associated with an integer value. I'm trying to apply a weight to each word that depends on the integer value associated with the key string. My issue is that the string isn't formatted perfectly and I need to split it on underscores ( _ ) as well as whitespace and other various delimiters. From what I understand, this needs to be done with regular expressions. My bit of code is as follows:
for key, value in sorted_articles.items():
wordList = print(re.split(r'(_|\s|:|)',key))
When I print this out, it splits everything fine, but it also prints out the delimiters rather than ignoring them in the list. For example, the string "Hello_how are you_" gets stored in the list as ['Hello', '_', 'how', ' ', 'are', ' ', 'you','_'].
I'm not sure why the delimiters would be added to the list and I can't figure out how to fix it. Thanks in advance for the help!
You can split using the \W+ character, which will split at all not alpha string items and use |_ to specifically search for underscores:
for key, value in sorted_articles.items():
wordList = print(re.split('\W+|_',key))
For instance:
s = "Hello_how are you_"
print(re.split("\W+|_", s))
Output:
['Hello', 'how', 'are', 'you', '']

Turning a list into a string or a word [duplicate]

This question already has answers here:
How to concatenate (join) items in a list to a single string
(11 answers)
Closed 7 years ago.
I know in python there is a way to turn a word or string into a list using list(), but is there a way of turning it back, I have:
phrase_list = list(phrase)
I have tried to change it back into a string using repr() but it keeps the syntax and only changes the data type.
I am wondering if there is a way to turn a list, e.g. ['H','e','l','l','o'] into: 'Hello'.
Use the str.join() method; call it on a joining string and pass in your list:
''.join(phrase)
Here I used the empty string to join the elements of phrase, effectively concatenating all the characters back together into one string.
Demo:
>>> phrase = ['H','e','l','l','o']
>>> ''.join(phrase)
'Hello'
Using ''.join() is the best approach but you could also you a for loop. (Martijn beat me to it!)
hello = ['H','e','l','l','o']
hello2 = ''
for character in hello:
hello2 += character

how to change a character by its position in a string python [duplicate]

This question already has answers here:
Changing one character in a string
(15 answers)
Closed 8 years ago.
Im trying to make a Hangman game and I need to change certain characters in a string.
Eg: '-----', I want to change the third dash in this string, with a letter. This would need to work with a word of any length, any help would be greatly appreciated
Strings are immutable, make it a list and then replace the character, then turn it back to a string like so:
s = '-----'
s = list(s)
s[2] = 'a'
s = ''.join(s)
String = list(String)
String[0] = "x"
String = str(String)
Will also work. I am not sure which one (the one with .join and the one without) is more efficient
You can do it using slicing ,
>>> a
'this is really string'
>>> a[:2]+'X'+a[3:]
'thXs is really string'
>>>

Categories