Removing two non printable characters from a string in python [duplicate] - python

This question already has answers here:
Removing control characters from a string in python
(9 answers)
Closed 2 years ago.
I am getting a text like below by reading a word file
Exe Command\r\x07
My desired text is
Exe Command
I tried this solution but it gives me
Exe Command\r
How can i remove 2 any backslash characters? I would like a speed friendly solution because I have thousands of inputs like this.

You can use replace() method twice.
In [1]: myStr.replace("\r", "").replace("\x07", "")
Out[1]: 'Exe Command'
If this isn't working, you can try using raw string
In [1]: myStr.replace(r"\r", "").replace(r"\x07", "")
Out[1]: 'Exe Command'
EDIT: As per comment, for removing any of those control characters, use this post's solution.
import unicodedata
def remove_control_characters(s):
return "".join(ch for ch in s if unicodedata.category(ch)[0]!="C")
All credits for this solution goes to Alex Quinn.

Related

Is there a way to search for non-english letters in file names? [duplicate]

This question already has answers here:
What is the best way to remove accents (normalize) in a Python unicode string?
(13 answers)
Closed 2 years ago.
I am trying to write a script that will replace all non-english alphabet letters in file names with their English counterparts, is this possible?
If you mean to "deburr" strings, there's a nice, simple-ish recipe for it (for many accented characters anyway) that uses the Unicode NFKD normalization form, then strips everything non-ascii out of it:
>>> import unicodedata
>>> unicodedata.normalize("NFKD", "törkylempijävongahdus").encode("ascii", "ignore").decode()
'torkylempijavongahdus'
For more complex use cases, maybe https://pypi.org/project/transliterate/ is your thing.

Python: Using Regex to remove multiple occurrences of punctuation? [duplicate]

This question already has answers here:
strip punctuation with regex - python
(4 answers)
Closed 2 years ago.
I'm looking to remove reoccurring punctuation in a row.
E.g turn 'Hello...' into 'Hello.'
I've been reading some of the documentation on the matter, but am struggling to find a definitive method. (I personally find the docs on regex to a be a little overwhelming, and unclear at times).
I thought it may be something along the lines of:
re.sub('[!()-{};:,<>./?##$%^&*_~]+', '', input)
But this doesn't work. Any help? Thanks.
You can use this:
import re
input='Hello...'
re.sub(r'(\W)(?=\1)', '', input)
Output:
'Hello.'

Changing string to ascii in python [duplicate]

This question already has answers here:
Convert a Unicode string to a string in Python (containing extra symbols)
(12 answers)
Closed 3 years ago.
I need to convert word
name = 'Łódź'
to ASCII characters
output: 'Lodz'
I can't import any library like unicodedata.
I need to do it in clear python.
I've tried to encode than decode and nothing worked.
Well, a simple method would be to map and replace. This also does not require any special imports.
name = 'Łódź'
name=name.replace('Ł','L')
name=name.replace('ó','o')
name=name.replace('ź','z')
print(name)

get strings between 2 delimiter in python [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I would like to get, from the following string "/path/to/%directory_1%/%directory_2%.csv"
the following list: [directory_1, directory_2]. I would like to avoid using split by "%" my string. I was hoping to find a regex that could help me. However I cannot find the correct one.
For now, I have the following:
re.findall('%(.*)%', dirty_arg)
which output ["directory_1%/%directory_2"]
Do you have any recommandation about that?
Thank you very much for your help.
Try this:
import re
regex = r"%(.*?)%"
dirty_arg = "/path/to/%directory_1%/%directory_2%.csv"
print(re.findall(regex, dirty_arg))
I've added ? to your regex which makes sure it matches as few times as possible. The output of this code is ['directory_1', 'directory_2']

Are there literal strings in Python [duplicate]

This question already has answers here:
How to write string literals in Python without having to escape them?
(6 answers)
Closed 5 years ago.
In F# there is something called a literal string (not string literal), basically if a string literal is preceded by # then it is interpreted as-is, without any escapes.
For example if you want to write the path of a file in Windows(for an os.walk for example) you would do it like this:
"d:\\projects\\re\\p1\\v1\\pjName\\log\\"
Or you could do this(the F# way):
#"d:\projects\re\p1\v1\pjName\log\"
The second variant looks much more clear and pleasing to the eye. Is there something of the sort in python? The documentation doesn't seem to have anything regarding that.
I am working in Python 3.6.3.
There are: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
You can use r prefix.
https://docs.python.org/2.0/ref/strings.html
TL;DR use little r
myString = r'\n'

Categories