Problem:
I need to create a complex string from different parts (nbsp = u'\xa0', data['text'], delimeter).
I know 3 common solutions:
res = '*{nbsp}{nbsp}{nbsp}{nbsp}{0}*{1}'.format(data['text'], delimeter, nbsp=nbsp) # seems unicode error-prone way
res = '*' + 4 * nbsp + data['text'] + '*' + delimeter
res = ''.join(['*', 4 * nbsp, data['text'], '*', delimeter])
There is another way with old % string formatting way but it looks like it becomes a legacy way.
So which one is most pythonic or may be preferable for this certain case?
Your first approach can be improved by uniformly using keyword arguments.
u'*{nbsps}{text}*{delimiter}'.format(nbsps=4*nbsp,
text=data['text'],
delimiter=delimiter)
The format string makes it clear that it contains three more complex blocks, each of which is defined in the same way in the arguments to unicode.format.
"Pythonic", as I understand it, means "can be deciphered in no time after a year of not seeing the code". I would throw the following hat in the ring:
res = "*%s%s*%s" % (4*nbsp, str(data["text"]), delimiter)
even if you consider it legacy, because it is understandable. Read it and compare it with decompiling the above suggestions.
Third one is not good solution but first two is good enough. But I prefer mixing it and trying this :
str("*"+4*"{0}"+"{1}"+"*"+"{2}").format(nbsp, data['text'], delimeter)
Related
When I use the replace function I can input an additional 3rd argument which describes how many occurences of the particular character I might want to change.
For Example -
input_string = input()
first_char = input_string[0]
modified_string = input_string.replace(first_char, "$", input_string.count(first_char)-1)
print(modified_string)
The above code gives the following output:
Input: heyhhdh
Output: $ey$$dh
It replaced the h starting from the first occurrence but is there a way where I can specify where to start?
For instance in the problem I'm working on I need to leave the first character so is there a way to specify that in python
Edit:
The following line of code commented by Tarique performs my task
modified_string = first_char + input_string[1:].replace(first_char, "$", input_string.count(first_char)-1)
However is there a way to do this using only string functions like modifying the arguments in the replace function?
You could do what you already got, except without the pointless counting:
>>> first_char + input_string[1:].replace(first_char, '$')
'hey$$d$'
A single replace without anything else can't do it, but two can:
>>> input_string.replace(first_char, '$').replace('$', first_char, 1)
'hey$$d$'
That's only two linear-time operations instead of three, and for longer strings it's faster. For input_string = 'hey$$d$' * 10**6 the first way takes me 12.1 ms and the second way takes me 9.4 ms.
A third but silly and slow (30.9 ms) way, simulating backwards-replacing by reversing the string before and after:
>>> input_string[::-1].replace(first_char, '$', input_string.count(first_char) - 1)[::-1]
'hey$$d$'
Tarique's method is going to be the only way involving the replace method. You can specify the maximum number of characters to replace (see the bottom of this python documentation page), but that is the opposite of what you want. This is the same for Python 3, as seen here.
I was trying to solve up a problem that was going on cause my IDE could not retain a sequence of numbers cause of the range function which works as so.
And i made a Previous question about it so this is a follow-up to the question. Here's my list comment on the previous question.
I actually made some adjustments by adding a line; 'My_list = list(range(100)) before applying your code so it actually worked. But it combines the answers without commas, for example 10 does this '0123456789' instead of '0,1,2,3,4,5,.....8,9'. any suggestions?
I decided to post this question not to allow the other question go out of context (as i was advised to).
Any suggestions?
You need to understand how strings works in Python.
Strings are constants (literals) kept in a closed bucket. In official docs you can find that "Strings are immutable sequences of Unicode code points".
But programmers need to change or manipulate text in a programmable way. In your case you want:
"[x1][space][comma][x2][comma]...[xn][space][comma]"
where "xn" is a number, and " ," is constant.
In order to achieve this, in a programmable way, programmers can use "masks" to tell the software where they want to place their changes. One can use string format operators:
"%d , %f" %(my_first_integer, my_float)
[0][1][2][3][4][\0]
# Hey Python, return a string, using the above template,
# but place useful stuff where you find magic keywords.
Which means:
Create a 6 positions sequence;
In [0], place my_integer of type int converted into chr;
In [1], copy " ";
In [2], copy ",".
In [3], copy " ";
In [4], place my_float of type float converted into chr;
In [5], place "\0" so the string is over. (Automatically placed in Python)
There are other ways to do this, i.e., the string object has a handy method called formatto handle this construction:
my_integer = 2
my_string = "{0}*pi = {1}".format(my_integer, my_integer*3.14)
print(my_string)
# 2*pi = 6.28
The programmer will achieve the same final result using one or another startegy.
In Python, as well as in other languages, one can combine strings, concatenate, get sub-strings and so on, using specific methods and/or operators.
In order to keep readability you maybe (I guess) want to place each value in a line. In strings you can use special characters like \n for new lines.
my_list = list(range(100))
# ... useful code here and there ...
with open("output.txt", "w") as o:
o.write("My list:\n")
o.write("\tSize: {0}\n\n".format(len(my_list)))
o.write("\t----start----\n")
for i in range(len(my_list)):
o.write("%d\n" % my_list[i])
o.write("\n\t----end----\n")
# That writes:
# My list:
# Size: 100
#
# ----start----
# 0
# 1
# 2
# 3
...
# 99
#
# ----end----
Remember, this is not a comprehensive guide, but a layman one. I'm skipping a lot of boring words and technical details that you'll better find in Python books and courses.
You just need to insert a comma after printing each number:
my_list = list(range(100))
with open("output.txt", "w") as o:
for i in range(len(my_list)):
o.write("%d," % my_list[i]) # Here, after '%d' you can place a comma, or any text you want
I am trying to use the Abaqus (a commercial FEA code) scripting interface to generate FE models, although my question is relating to Python specifically, but a bit of background on why I am trying to do this.
Abaqus has a built in boolean merge operation that requires the following syntax to be used:
a.InstanceFromBooleanMerge(name='name_string', instances=(
a.instances['string1'], a.instances['string2'],
a.instances['string3'], ), originalInstances=SUPPRESS,
domain=GEOMETRY)
The 'instances' parameter is specified as a tuple where each element is of the format
a.instances['string1']
I am trying to make it so that the number of elements within this tuple, and obviously the names within it are scriptable. Currently I have code which looks like:
my_list = []
for i in range(4):
name = str('a.instances[\'')+str('name_')+str(i)+str('\']')
my_list.append(name)
my_list = tuple(my_list)
print my_list
However, this gives:
("a.instances['name_0']", "a.instances['name_1']", "a.instances['name_2']",
a.instances['name_3']")
I have tried using lstrip and rstrip to remove the " characters but to no avail. Is there a way of generating a tuple of arbitrary length where the elements are not enclosed in inverted commas? The format is specified by the Abaqus interface, so there is no alternative format that can be used.
Many Thanks
You're close, try:
for i in range(4):
val = a.instances["name_"+str(i)]
my_list.append(val)
You can make this even shorter using a generator expression:
my_list = tuple(a.instances["name_"+str(i)] for i in range(4))
Those characters will be printed out simply because you're printing out a tuple - that means strings will be quoted, so you can see the difference between (123,) and ("123",). If you want to have it without quotes, construct the output yourself:
def make_tuple_of(n):
return '(' + ', '.join("a.instances['name_" + str(i) + "']" for i in range(n)) + ')'
Edit: I thought you actually wanted to generate the code itself, not create tuple in the current code. If generating a tuple in current code is what you actually want to do, just use tuple(a.instances['name_' + str(i)] for i in range(n))
Edit2: Actually, you could check the library you're working with. Unless it specifically tests for tuples for some reason, it accept lists just fine, since the interface for both is pretty much the same. If it does, you could just pass it [a.instances['name_' + str(i)] for i in range(n)] as a parameter and be done.
I'm looking for the most efficient way to add an element to a comma-separated string while maintaining alphabetical order for the words:
For example:
string = 'Apples, Bananas, Grapes, Oranges'
addition = 'Cherries'
result = 'Apples, Bananas, Cherries, Grapes, Oranges'
Also, a way to do this but while maintaining IDs:
string = '1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'
addition = '62:Cherries'
result = '1:Apples, 4:Bananas, 62:Cherries, 6:Grapes, 23:Oranges'
Sample code is greatly appreciated. Thank you so much.
For the first case:
alist = string.split(', ')
result = ', '.join(sorted(alist + [addition]))
For the second case:
alist = string.split(', ')
result = ', '.join(sorted(alist + [addition],
key=lambda s: s.split(':', 1)[1]))
If you have many thousands of items in the list, the first case might show measurable performance improvement if you're willing to go to the much-greater complication of bisect.insort; but that doesn't support a key=, so the extra complication in the second case would be staggering and probably not even buy you any performance.
The kind of optimizations mentioned in the last paragraphs are worth considering only if a profile of your whole application shows that this operation is a crucial bottleneck for it (and if it is, you'd gain much more speed by keeping this data structure as a list of words, ', '-joining it only at need presumably for output purposes, rather than splitting up and rejoining thousands and thousands of times for the kind of extremely long lists where such optimizations might possibly be warranted).
Are you sure you should be storing the data as a string?
It probably makes more sense to maintain a set or list (or, in your second case, a dictionary) and generate the string when you need to. If the data don't change very often, cache the string.
With any solution that uses the string as your primary data storage, you'll probably end up generating a temporary list to make it easier to insert the element -- so it makes more sense just to keep the list.
Here's one way to do what you want:
>>> ", ".join(sorted('Apples, Bananas, Grapes, Oranges'.split(", ") +
... ["Cherries"]))
'Apples, Bananas, Cherries, Grapes, Oranges'
and "while maintaining IDs":
>>> ", ".join(sorted('1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'.split(", ") +
... ["62:Cherries"], key=lambda x: x.split(":")[1]))
'1:Apples, 4:Bananas, 62:Cherries, 6:Grapes, 23:Oranges'
I'm intentionally ignoring the part of the question where you asked for the "most efficient" way to do something. Proving that an algorithm is the most efficient possible approach to a particular problem is an unsolved problem of computer science. It may not be possible to do at all, and there are certainly no current techniques for it.
If you are concerned about efficiency, however, you should store intermediary data structures, and not do these kinds of operations on strings; any string-based operation is going to waste a bunch of time copying memory around; you should only convert to and from strings once all of your processing is done.
I guess a simple solution would be:
result = string + ',' + addition
I'm trying to make a glob-like expansion of a set of DNA strings that have multiple possible bases.
The base of my DNA strings contains the letters A, C, G, and T. However, I can have special characters like M which could be an A or a C.
For example, say I have the string:
ATMM
I would like to take this string as input and output the four possible matching strings:
ATAA
ATAC
ATCA
ATCC
Rather than brute force a solution, I feel like there must be some elegant Python/Perl/Regular Expression trick to do this.
Thank you for any advice.
Edit, thanks cortex for the product operator. This is my solution:
Still a Python newbie, so I bet there's a better way to handle each dictionary key than another for loop. Any suggestions would be great.
import sys
from itertools import product
baseDict = dict(M=['A','C'],R=['A','G'],W=['A','T'],S=['C','G'],
Y=['C','T'],K=['G','T'],V=['A','C','G'],
H=['A','C','T'],D=['A','G','T'],B=['C','G','T'])
def glob(str):
strings = [str]
## this loop visits very possible base in the dictionary
## probably a cleaner way to do it
for base in baseDict:
oldstrings = strings
strings = []
for string in oldstrings:
strings += map("".join,product(*[baseDict[base] if x == base
else [x] for x in string]))
return strings
for line in sys.stdin.readlines():
line = line.rstrip('\n')
permutations = glob(line)
for x in permutations:
print x
Agree with other posters that it seems like a strange thing to want to do. Of course, if you really want to, there is (as always) an elegant way to do it in Python (2.6+):
from itertools import product
map("".join, product(*[['A', 'C'] if x == "M" else [x] for x in "GMTTMCA"]))
Full solution with input handling:
import sys
from itertools import product
base_globs = {"M":['A','C'], "R":['A','G'], "W":['A','T'],
"S":['C','G'], "Y":['C','T'], "K":['G','T'],
"V":['A','C','G'], "H":['A','C','T'],
"D":['A','G','T'], "B":['C','G','T'],
}
def base_glob(glob_sequence):
production_sequence = [base_globs.get(base, [base]) for base in glob_sequence]
return map("".join, product(*production_sequence))
for line in sys.stdin.readlines():
productions = base_glob(line.strip())
print "\n".join(productions)
You probably could do something like this in python using the yield operator
def glob(str):
if str=='':
yield ''
return
if str[0]!='M':
for tail in glob(str[1:]):
yield str[0] + tail
else:
for c in ['A','G','C','T']:
for tail in glob(str[1:]):
yield c + tail
return
EDIT: As correctly pointed out I was making a few mistakes. Here is a version which I tried out and works.
This isn't really an "expansion" problem and it's almost certainly not doable with any sensible regular expression.
I believe what you're looking for is "how to generate permutations".
You could for example do this recursively. Pseudo-code:
printSequences(sequence s)
switch "first special character in sequence"
case ...
case M:
s1 = s, but first M replaced with A
printSequences(s1)
s2 = s, but first M replaced with C
printSequences(s2)
case none:
print s;
Regexps match strings, they're not intended to be turned into every string they might match.
Also, you're looking at a lot of strings being output from this - for instance:
MMMMMMMMMMMMMMMM (16 M's)
produces 65,536 16 character strings - and I'm guessing that DNA sequences are usually longer than that.
Arguably any solution to this is pretty much 'brute force' from a computer science perspective, because your algorithm is O(2^n) on the original string length. There's actually quite a lot of work to be done.
Why do you want to produce all the combinations? What are you going to do with them? (If you're thinking to produce every string possibility and then look for it in a large DNA sequence, then there are much better ways of doing that.)