I have a long list of suburbs that I want to do something to
A LOT of them have RDx (for rural Delivery) where x is a number from 1 to 30
I want to just get rid of the RDx like below
for row in WorkingData['PatientSuburb']:
if 'RD10' in str(row):
WorkingData['PatientSuburb'].replace(regex=True,inplace=True,to_replace=r'RD10',value=r'')
I was thinking If I could run a loop and increment the number somehow that'd be great. this wouldn't work but it's along the lines of what I'd like to do:
for rd in range(1,31,1):
if 'RD',rd in str(row):
WorkingData['PatientSuburb'].replace(regex=True,inplace=True,to_replace=r'RD'rd ,value=r'')
If I do this I get output with a space in between:
for rd in range(1,31,1):
print 'RD',rd
like so:
RD 1
RD 2
RD 3
RD 4
RD 5
RD 6
RD 7
RD 8
RD 9
RD 10
RD 11
RD 12
and also I would need to figure out how this piece would work...
to_replace=r'RD'rd
I have seen someone use a % sign in labelling a plot & then it brings in a value from outside the quotes - but I don't know if that's a part of the label function (I did try it and that didn't work at all)
That would look like this
to_replace=r'RD%' % rd
Any help on this would be great thanks!
If you want to use a for loop and substitute a substring by the index then I would say you are almost there.
to_replace = 'RD%d' % i
'%' marks the start of the specifier. In the example above, "d" follows "%" which means to place here a signed integer decimal. It's the same as "printf" library function in C. If "%" is not followed by any valid conversion character, it won't change anything regardless of what's on the right-hand side.
More details and examples here: https://docs.python.org/3.6/library/stdtypes.html#printf-style-bytes-formatting
Even though your question is about looping over several integers to generate strings, it seems your problem would actually be more suited for a regular expression.
This would allow you to capture multiple cases in one, without looping over possible values.
>>> import re
>>> RD_PATTERN = re.compile(r'RD[1-3]?[0-9]')
>>>
>>> def strip_rd(string):
... return re.sub(RD_PATTERN, '', string)
...
>>>
>>> strip_rd('BlablahRD5')
'Blablah'
>>> strip_rd('BlablahRD5sometext')
'Blablahsometext'
>>> strip_rd('BlablahRD10sometext')
'Blablahsometext'
>>> strip_rd('BlablahRD25sometext')
'Blablahsometext'
The regex I provided is not rock-solid by any means (e.g. it matches RD0 even though you specified [1..30]), but you can create one that fits your specific use case. For instance, it might make sense to check that the pattern is at the end of the string, if that's expected to be the case.
Also, note that re.compile-ing the pattern is not necessary (you can give the pattern string directly), but since you mentioned you have several rows, it'll be more performant.
Related
I'm trying to efficiently add one to the end of a string like this:
tt0000001 --> tt0000002 but I'm not sure how to accomplish this.
A complicated way of doing this is to remove the 2 t's at the beginning, count the number of non-zero digits (let's call that number z), make the string an int, add 1, and then create a string with 2 t's, 6 - z 0's, and then the int, but since I need to use many strings (ex: tt0000001, then tt0000002 then tt0000003, etc) many times, it would be great to have a more efficient way of doing this.
Would anyone know how to do this? A one-liner would be ideal if possible.
Thank you!
What you describe is essentially correct. It's not as difficult as you suggest, though, as creating a 0-padded string from an integer is supported.
As long as you know that the number is 7 digits, you can do something like
>>> x = 'tt0000001'
>>> x = f'tt{int(x.lstrip("t"))+1:07}'
>>> x
'tt0000002'
Even simpler, though, is to keep just an integer variable, and only (re)construct the label as necessary each time you increment the integer.
>>> x = 1
>>> x += 1
>>> f'tt{x:07}'
'tt0000002'
>>> x += 1
>>> f'tt{x:07}'
'tt0000003'
Apologies if this has been answered already - I tried looking it up, but maybe my search terms (same as my title) were bad.
Let's assume I have a string like this, which I don't have control over:
astr = "A 5.02654824574 (14.710000000000008, -19.989999999999995, -0.8) <-> s[10]: A 5.02654824574 (-29.11999999999999, 52.78, -0.8)"
I would like to process this string, so that floats in it are displayed with arbitrary amount of float precision - say 3 decimals. Since this would work on a level of a string, I wouldn't expect the process to account for correct rounding - simply for removal of decimal point string characters.
I'm aware I could do, as per Python: Extract multiple float numbers from string :
import re
p = re.compile(r'\d+\.\d+')
for i in p.findall(astr): print i
... which prints:
5.02654824574
14.710000000000008
19.989999999999995
0.8
5.02654824574
29.11999999999999
52.78
0.8
.... however, I'm getting lost at which regex captures I need to do in a search and replace, so - say, for n_float_precision_decimals = 4, - I'd get this string as output (with the above string as input):
"A 5.0265 (14.7100, -19.9899, -0.8) <-> s[10]: A 5.0265 (-29.1199, 52.78, -0.8)"
So basically, the regex would be taking into account that if there is a smaller number of decimals present already, it would not truncate decimals at all.
Is this possible to do, in a single re.sub operation, without having to write explicit for loops as above, and manually constructing the output string?
Got it - thanks to Reduce float precision using RegExp in swift ( which popped up in SO suggestions only after I finished typing the entire question (during which time, all I got were irrelevant results for this particular question) :) ):
>>> pat=re.compile(r'(\d+\.\d{2})\d+')
>>> pat.sub(r'\1', astr)
'A 5.02 (14.71, -19.98, -0.8) <-> s[10]: A 5.02 (-29.11, 52.78, -0.8)'
... or:
>>> nfloatprec=4
>>> pat=re.compile(r'(\d+\.\d{'+str(nfloatprec)+'})\d+')
>>> pat.sub(r'\1', astr)
'A 5.0265 (14.7100, -19.9899, -0.8) <-> s[10]: A 5.0265 (-29.1199, 52.78, -0.8)'
I am working on a project in Python 3 where I need to create a sequence without adding digits. The numbers should be a string saved in a set, since it's faster than a list and they're all unique values.
I.e., I need something like:
Output
000001
000002
000003
...
000010
000011
...
000100
//and so on
Code
def build_sequence():
seq = set()
// logic here
return seq
I have no idea how to solve this issue. It would be great if someone could put me in the right direction.
There are several ways to do this.
But the first will be to get the lower and upper limit of the desired sequence, from the question lets assume 0 to hundred. A simple for loop should do the trick
for i in range(1,100):
print(i)
This should print 1 to 100 as
1 2 3 .... 100
But we want a sequence, there are many pythonic approach to this,
A simple one is format
format(1, "06")
'000001'
Put this in loop.
In python 3.6, we have f-strings
temp_var = 19
f'{temp_var:06}'
There are lots of other methods as well, google and python docs are a great start
Your question is kind of vague, but from what I can guess you want a function to generate from 000000 to 999999.
def build_sequence():
for i in range(1000000):
yield "%.6d" % i
You can then iterate through each string using
for i in build_sequence():
print(i)
where i is the string
I need a code to replace this..
import _mysql
a = "111"
a = _mysql.escape_string(a)
"a" is always gonna be a number between 1 and 1000+
and thus maybe there is a more secure way to "cleaning up" the "a" string in this example for mysql and etc..
rather than relying on
_mysql.escape_string()
function.
which we have no idea what it even does. or how it works. perhaps would be slower than something that we can invent given that all we are working is a number between 1 and 1000+
RE-PHRASÄ°NG THE QUESTÄ°ON:
How to ask python if the string is a maximum of 4 digit number"
Check if it's a number:
>>> "1234".isdigit()
True
>>> "ABCD".isdigit()
False
Check its length:
>>> 1 <= len("1234") <= 4
True
>>> 1 <= len("12345") <= 4
False
escape_string won't clean your string. From the docs:
"escape_string(s) -- quote any SQL-interpreted characters in string s.
Use connection.escape_string(s), if you use it at all. _mysql.escape_string(s) cannot handle character sets. You are probably better off using connection.escape(o) instead, since
it will escape entire sequences as well as strings."
i have a code where the out put should be like this:
hello 3454
nice 222
bye 45433
well 3424
the alignment and right justification is giving me problems.
i tried this in my string {0:>7} but then only the numbers with the specific amount of digits are alright. the other numbers that have some digits more or less become messed up. it is very obvious to understand why they are messing up, but i am having trouble finding a solution. i would hate to use constant and if statements all over the place only for such a minor issue. any ideas?
You could try:
"{:>10d}".format(n) where n is an int to pad-left numbers and
"{:>10s}".format(s), where s is a string to pad-left strings
Edit: choosing 10 is arbitrary.. I would suggest first determining the max length.
But I'm not sure this is what you want..
Anyways, this link contains some info on string formatting:
String formatting
You can try this:
def align(word, number):
return "{:<10s}{:>10d}".format(word, number)
This will pad-right your string with 10 spaces and pad-left your number with 10 spaces, giving the desired result
Example:
align('Hello', 3454)
align('nice', 222)
align('bye', 45433)
align('well', 3424)