How to put an argument of a function inside a raw string - python

I want to create a function that will delete a character in a string of text.
I'll pass the string of text and the character as arguments of the function.
The function works fine but I don't know how to do this correctly if I want to threat it as a raw string.
For example:
import re
def my_function(text, ch):
Regex=re.compile(r'(ch)') # <-- Wrong, obviously this will just search for the 'ch' characters
print(Regex.sub('',r'text')) # <-- Wrong too, same problem as before.
text= 'Hello there'
ch= 'h'
my_function(text, ch)
Any help would be appreciated.

How about changing:
Regex=re.compile(r'(ch)')
print(Regex.sub('',r'text'))
to:
Regex=re.compile(r'({})'.format(ch))
print(Regex.sub('',r'{}'.format(text)))
However, simpler way to achieve this is using str.replace() as:
text= 'Hello there'
ch= 'h'
text = text.replace(ch, '')
# value of text: 'Hello tere'

def my_function(text, ch):
text.replace(ch, "")
This will replace all occurrences of ch with an empty string. No need to invoke the overhead of regular expressions in this.

Related

How continually add to a string?

I'm trying to add to a string over a few function calls that you could basically say will "update" the string. So for example, if you had:
'This is a string'
You could change it to:
'This is my string'
Or then:
'This is my string here'
etc..
My data for the string is coming from a nested dictionary, and I made a function that will change it to a string. This function is called 'create_string()'. I won't post it because it is working fine (although if necessary, I'll make an edit. But take my word for it that it's working fine).
Here's the function 'updater()' which takes three arguments: The string, the position you want to change and the string you want to insert.
def updater(c_string, val, position):
data = c_string.split(' ')
data[position] = str(val)
string = ' '.join(data)
return string
x = create_string(....)
new_string = updater(x,'hey', 0)
Which up until this point works fine:
'hey This is a string'
But when you add another function call, it doesn't keep track of the old string:
new_string = updater(x,'hey',0)
new_string = updater(x,'hi',2)
> 'This is hi string'
I know that the reason is likely because of the variable assignment, but i tried just simply calling the functions, and I still had no luck.
How can I get this working?
Thanks for the help!
Note: Please don't waste your time on the create_string() function, it's working fine. It's only the updater() function and maybe even just the function calls that I think are the problem.
**Edit:**Here's what the expected output would look like:
new_string = updater(x,'hey',0)
new_string = updater(x,'hi',2)
> 'hey is hi string'
You need to do this, to keep modifying the string:
new_string = updater(x, 'hey', 0)
new_string = updater(new_string, 'hi', 2)
x is the same after the first call, the new modified string is new_string from that point on.
You store the result of updater to new_string, but don't pass that new_string to the next updater call.

Clean long string from spaces and tab in python

supposing to have a long string to create and this string is within a method of a class, what is the best way to write the code?
def printString():
mystring = '''title\n
{{\\usepackage}}\n
text continues {param}
'''.format(param='myParameter')
return mystring
this method is well formatted but the final string has unwanted spaces:
a = printString()
print(a)
title
{\usepackage}
text continues myParameter
while this method gives the corrected results but the code can become messy if the string(s) is long:
def printString():
mystring = '''title\n
{{\\usepackage}}\n
text continues {param}
'''.format(param='myParameter')
return mystring
a = printString()
print(a)
title
{\usepackage}
text continues myParameter
some hints to have a good code quality and the results?
Try enclosing the string you want with brackets, like so:
def printString():
mystring = ('title\n'
'{{\\usepackage}}\n'
'text continues {param}').format(param='myParameter')
return mystring
This would allow you to break the string to several lines while c=having control over the whitespace.
You can use brackets to maintain tidiness of long strings inside functions.
def printString():
mystring = ("title\n"
"{{\\usepackage}}\n"
"text continues {param}"
).format(param='myParameter')
return (mystring)
print(printString())
Results in:
title
{\usepackage}
text continues myParameter
You may also wish to explicitly use the + symbol to represent string concatenation, but that changes this from a compile time operation to a runtime operation. Source
def printString():
mystring = ("title\n" +
"{{\\usepackage}}\n" +
"text continues {param}"
).format(param='myParameter')
return (mystring)
You can use re.sub to cleanup any spaces and tabs at the beginning of each lines
>>> import re
>>> def printString():
... mystring = '''title\n
... {{\\usepackage}}\n
... text continues {param}
... '''.format(param='myParameter')
...
... return re.sub(r'\n[ \t]+', '\n', mystring)
...
This gives the following o/p
>>> a = printString()
>>> print (a)
title
{\usepackage}
text continues myParameter

Split string but replace with another string and get list

I am trying to split a string but it should be replaced to another string and return as a list. Its hard to explain so here is an example:
I have string in variable a:
a = "Hello World!"
I want a list such that:
a.split("Hello").replace("Hey") == ["Hey"," World!"]
It means I want to split a string and write another string to that splited element in the list. SO if a is
a = "Hello World! Hello Everybody"
and I use something like a.split("Hello").replace("Hey") , then the output should be:
a = ["Hey"," World! ","Hey"," Everybody"]
How can I achieve this?
From your examples it sounds a lot like you want to replace all occurrences of Hello with Hey and then split on spaces.
What you are currently doing can't work, because replace needs two arguments and it's a method of strings, not lists. When you split your string, you get a list.
>>> a = "Hello World!"
>>> a = a.replace("Hello", "Hey")
>>> a
'Hey World!'
>>> a.split(" ")
['Hey', 'World!']
x = "HelloWorldHelloYou!"
y = x.replace("Hello", "\nHey\n").lstrip("\n").split("\n")
print(y) # ['Hey', 'World', 'Hey', 'You!']
This is a rather brute-force approach, you can replace \n with any character you're not expecting to find in your string (or even something like XXXXX). The lstrip is to remove \n if your string starts with Hello.
Alternatively, there's regex :)
this functions can do it
def replace_split(s, old, new):
return sum([[blk, new for blk] in s.split(old)], [])[:-1]
It wasnt clear if you wanted to split by space or by uppercase.
import re
#Replace all 'Hello' with 'Hey'
a = 'HelloWorldHelloEverybody'
a = a.replace('Hello', 'Hey')
#This will separate the string by uppercase character
re.findall('[A-Z][^A-Z]*', a) #['Hey', 'World' ,'Hey' ,'Everybody']
You can do this with iteration:
a=a.split(' ')
for word in a:
if word=='Hello':
a[a.index(word)]='Hey'

How to create a Regex based on user input (Python)

I am fairly new to Python, and I am learning about Regexes right now, which has been a bit of a challenge for me. My issue right now is I am working on a problem that is to create a function that is a Regex version of the strip() string method.
My problem is that I can't figure out how to convert a character that user inputs into a regex without listing out every possibility in the program with if statements. For instance:
def regexStrip(string, char):
if char = 'a' or 'b' or 'c' etc...
charRegex = re.compile(r'^[a-z]+')
This isn't my full program just a few lines to demonstrate what I'm talking about. I was wondering if anyone could help me in finding a more efficient way to convert user input into a Regex.
You can use braces inside strings and the format function to build the regular expression.
def regexStrip(string, char=' '):
#Removes the characters at the beginning of the string
striped_left = re.sub('^{}*'.format(char), '', string)
#Removes the characters at the end of the string
striped = re.sub('{}*$'.format(char), '', striped_left)
return striped
The strip method in python allows to use multiples chars, for example you can do 'hello world'.strip('held') and it will return 'o wor'
To perform this, you can do :
def regexStrip(string, chars=' '):
rgx_chars = '|'.join(chars)
#Removes the characters at the beginning of the string
striped_left = re.sub('^[{}]*'.format(rgx_chars), '', string)
#Removes the characters at the end of the string
striped = re.sub('[{}]*$'.format(rgx_chars), '', striped_left)
return striped
If you want to use search matching instead of substitutions, you can do :
def regexStrip(string, chars=' '):
rgx_chars = '|'.join(chars)
striped_search = re.search('[^{0}].*[^{0}]'.format(rgx_chars), string)
if striped_search :
return striped_search.group()
else:
return ''

Remove all line breaks from a long string of text

Basically, I'm asking the user to input a string of text into the console, but the string is very long and includes many line breaks. How would I take the user's string and delete all line breaks to make it a single line of text. My method for acquiring the string is very simple.
string = raw_input("Please enter string: ")
Is there a different way I should be grabbing the string from the user? I'm running Python 2.7.4 on a Mac.
P.S. Clearly I'm a noob, so even if a solution isn't the most efficient, the one that uses the most simple syntax would be appreciated.
How do you enter line breaks with raw_input? But, once you have a string with some characters in it you want to get rid of, just replace them.
>>> mystr = raw_input('please enter string: ')
please enter string: hello world, how do i enter line breaks?
>>> # pressing enter didn't work...
...
>>> mystr
'hello world, how do i enter line breaks?'
>>> mystr.replace(' ', '')
'helloworld,howdoienterlinebreaks?'
>>>
In the example above, I replaced all spaces. The string '\n' represents newlines. And \r represents carriage returns (if you're on windows, you might be getting these and a second replace will handle them for you!).
basically:
# you probably want to use a space ' ' to replace `\n`
mystring = mystring.replace('\n', ' ').replace('\r', '')
Note also, that it is a bad idea to call your variable string, as this shadows the module string. Another name I'd avoid but would love to use sometimes: file. For the same reason.
You can try using string replace:
string = string.replace('\r', '').replace('\n', '')
You can split the string with no separator arg, which will treat consecutive whitespace as a single separator (including newlines and tabs). Then join using a space:
In : " ".join("\n\nsome text \r\n with multiple whitespace".split())
Out: 'some text with multiple whitespace'
https://docs.python.org/2/library/stdtypes.html#str.split
The canonic answer, in Python, would be :
s = ''.join(s.splitlines())
It splits the string into lines (letting Python doing it according to its own best practices). Then you merge it. Two possibilities here:
replace the newline by a whitespace (' '.join())
or without a whitespace (''.join())
updated based on Xbello comment:
string = my_string.rstrip('\r\n')
read more here
Another option is regex:
>>> import re
>>> re.sub("\n|\r", "", "Foo\n\rbar\n\rbaz\n\r")
'Foobarbaz'
If anybody decides to use replace, you should try r'\n' instead '\n'
mystring = mystring.replace(r'\n', ' ').replace(r'\r', '')
A method taking into consideration
additional white characters at the beginning/end of string
additional white characters at the beginning/end of every line
various end-line characters
it takes such a multi-line string which may be messy e.g.
test_str = '\nhej ho \n aaa\r\n a\n '
and produces nice one-line string
>>> ' '.join([line.strip() for line in test_str.strip().splitlines()])
'hej ho aaa a'
UPDATE:
To fix multiple new-line character producing redundant spaces:
' '.join([line.strip() for line in test_str.strip().splitlines() if line.strip()])
This works for the following too
test_str = '\nhej ho \n aaa\r\n\n\n\n\n a\n '
Regular expressions is the fastest way to do this
s='''some kind of
string with a bunch\r of
extra spaces in it'''
re.sub(r'\s(?=\s)','',re.sub(r'\s',' ',s))
result:
'some kind of string with a bunch of extra spaces in it'
The problem with rstrip() is that it does not work in all cases (as I myself have seen few). Instead you can use
text = text.replace("\n"," ")
This will remove all new line '\n' with a space.
You really don't need to remove ALL the signs: lf cr crlf.
# Pythonic:
r'\n', r'\r', r'\r\n'
Some texts must have breaks, but you probably need to join broken lines to keep particular sentences together.
Therefore it is natural that line breaking happens after priod, semicolon, colon, but not after comma.
My code considers above conditions. Works well with texts copied from pdfs.
Enjoy!:
def unbreak_pdf_text(raw_text):
""" the newline careful sign removal tool
Args:
raw_text (str): string containing unwanted newline signs: \\n or \\r or \\r\\n
e.g. imported from OCR or copied from a pdf document.
Returns:
_type_: _description_
"""
pat = re.compile((r"[, \w]\n|[, \w]\r|[, \w]\r\n"))
breaks = re.finditer(pat, raw_text)
processed_text = raw_text
raw_text = None
for i in breaks:
processed_text = processed_text.replace(i.group(), i.group()[0]+" ")
return processed_text

Categories