Python: Substituting variables with functions SymPy - python

I'm writing a code where I need to substitute variables of a function with multiple functions.
For example, I have B=x1**2+x2**2+x3**2 where I need to substitute x1=cos(x1+x2), x2=sin(x2+x3) and x3=x1 so as to get this value: cos(x1+x2)**2+sin(x2+x3)**2+x1**2
However, when I do this iteratively like this:
for j in range(nvar):
B=expand(B.subs(x[j],f[j]))
where nvar=3 and x is defined as a list of symbols and f as a list of symbolic functions, at each iteration, x[j] from the previous substitution is replaced and gives a wrong answer: x1**2 + sin(x1 + cos(x1 + sin(x1 + x2)))**2 + cos(x1 + sin(x1 + cos(x1 + sin(x1 + x2))))**2
How can I perform this substitution simultaneously?

You can use the simultaneous keyword for subs which was made for cases like this:
>>> (x1**2+x2**2+x3**2).subs(dict(x1=cos(x1+x2), x2=sin(x2+x3), x3=x1), simultaneous=True)
x1**2 + sin(x2 + x3)**2 + cos(x1 + x2)**2
Or, if x and f contain all instances of replacements you are interested in,
>>> reps = dict(zip(x, f))
>>> B = expand(B.subs(reps, simultaneous=True)

Related

Is there a way to replace the first and last three characters in a list of sequences using Python?

I am attempting to use Python to replace certain characters in a list of sequences that will be sent out for synthesis. The characters in question are the first and last three of each sequence. I am also attempting to add a * between each character.
The tricky part is that the first and last character need to be different from the other two.
For example: the DNA sequence TGTACGTTGCTCCGAC would need to be changed to /52MOErT/*/i2MOErG/*/i2MOErT/*A*C*G*T*T*G*C*T*C*C*/i2MOErG/*/i2MOErA/*/32MOErC/
The first character needs to be /52MOEr_/ and the last needs to be /32MOEr_/, where the _ is the character at that index. For the example above it would be T for the first and C for the last. The other two, the GT and GA would need to be /i2MOEr_/ modifications.
So far I have converted the sequences into a list using the .split() function. The end result was ['AAGTCTGGTTAACCAT', 'AATACTAGGTAACTAC', 'TGTACGTTGCTCCGTC', 'TGTAGTTAGCTCCGTC']. I have been playing around for a bit but I feel I need some guidance.
Is this not as easy to do as I thought it would be?
You can just use the divide and conquer algorithm. Here's my solution to achieve your goal.
dna = "TGTACGTTGCTCCGAC"
dnaFirst3Chars = '/52MOEr' + dna[0] + '/*/i2MOEr' + dna[1] + '/*/i2MOEr' + dna[2] + '/*'
dnaMiddle = '*'.join(dna[3:-3])
dnaLast3Chars = '*/i2MOEr' + dna[-3] + '/*i2MOEr' + dna[-2] + '/*/32MOEr' + dna[-1] + '/'
dnaTransformed = dnaFirst3Chars + dnaMiddle + dnaLast3Chars
print(dnaTransformed)
Output:
/52MOErT/*/i2MOErG/*/i2MOErT/*A*C*G*T*T*G*C*T*C*C*/i2MOErG/*i2MOErA/*/32MOErC/
UPDATE:
For simplicity, you can transform the above code in a function like this:
def dna_transformation(dna):
""" Takes a DNA string and returns the transformed DNA """
dnaFirst3Chars = '/52MOEr' + dna[0] + '/*/i2MOEr' + dna[1] + '/*/i2MOEr' + dna[2] + '/*'
dnaMiddle = '*'.join(dna[3:-3])
dnaLast3Chars = '*/i2MOEr' + dna[-3] + '/*i2MOEr' + dna[-2] + '/*/32MOEr' + dna[-1] + '/'
return dnaFirst3Chars + dnaMiddle + dnaLast3Chars
print(dna_transformation("TGTACGTTGCTCCGAC")) # call the function
Output: /52MOErT/*/i2MOErG/*/i2MOErT/*A*C*G*T*T*G*C*T*C*C*/i2MOErG/*i2MOErA/*/32MOErC/
Assuming there's a typo in your expected result and it should actually be
/52MOErT/*/i2MOErG/*/i2MOErT/*A*C*G*T*T*G*C*T*C*C*/i2MOErG/*/i2MOErA/*/32MOErC/ the code below will work:
# python3
def encode_sequence(seq):
seq_front = seq[:3]
seq_back = seq[-3:]
seq_middle = seq[3:-3]
front_ix = ["/52MOEr{}/", "/i2MOEr{}/", "/i2MOEr{}/"]
back_ix = ["/i2MOEr{}/", "/i2MOEr{}/", "/32MOEr{}/"]
encoded = []
for base, index in zip(seq_front, front_ix):
encoded.append(index.format(base))
encoded.extend(seq_middle)
for base, index in zip(seq_back, back_ix):
encoded.append(index.format(base))
return "*".join(encoded)
Read through the code and make sure you understand it. Essentially we're just slicing the original string and inserting the bases into the format you need. Each element of the final output is added to a list and joined by the * character at the end.
If you need to dynamically specify the number and name of the bases you extract from the front and back of the sequence you can use this version. Note that the {} braces tell the string.format function where to insert the base.
def encode_sequence_2(seq, front_ix, back_ix):
seq_front = seq[:len(front_ix)]
seq_back = seq[-len(back_ix):]
seq_middle = seq[len(front_ix):-len(back_ix)]
encoded = []
for base, index in zip(seq_front, front_ix):
encoded.append(index.format(base))
encoded.extend(seq_middle)
for base, index in zip(seq_back, back_ix):
encoded.append(index.format(base))
return "*".join(encoded)
And here's the output:
> seq = "TGTACGTTGCTCCGAC"
> encode_sequence(seq)
/52MOErT/*/i2MOErG/*/i2MOErT/*A*C*G*T*T*G*C*T*C*C*/i2MOErG/*/i2MOErA/*/32MOErC/
If you have a list of sequences to encode you can iterate over the list and encode each:
encoded_list = []
for seq in dna_list:
encoded_list.append(encode_sequence(seq))
Or with a list comprehension:
encoded_list = [encode_sequence(seq) for seq in dna_list)]

SymPy - Treating numbers as symbols

How can I treat numbers as symbols in SymPy?
For example, if I am performing a factorization with symbols I get:
from sympy import factor
factor('a*c*d + a*c*e + a*c*f + b*c*d + b*c*e + b*c*f')
c*(a + b)*(d + e + f)
I would like the same behaviour when I am using numbers in the expression.
Instead of
from sympy import factor
factor('2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f')
4013*c*(d + e + f)
I would like to get
from sympy import factor
factor('2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f')
c*(2006 + 2007)*(d + e + f)
Replace each constant with a unique symbol.
Factor the resulting expression.
Replace the unique symbols with the constants.
For your given case, something like this:
simple = factor('const2006*c*d + const2006*c*e + const2006*c*f + const2007*c*d + const2007*c*e + const2007*c*f')
simple.replace("const", '')
print(simple)
This should give you the desired output. You can identify numeric tokens in the expression with a straightforward regex or trivial parser -- either of which is covered in many other locations.
Symbol trickery to the rescue: replace your numbers with Symbols having a name given by the number. In your case you don't have to watch for negative versions so the following is straightforward:
>>> s = '2006*c*d + 2006*c*e + 2006*c*f + 2007*c*d + 2007*c*e + 2007*c*f'
>>> eq = S(s, evaluate=False); eq
2006*c*d + 2007*c*d + 2006*c*e + 2007*c*e + 2006*c*f + 2007*c*f
>>> reps = dict([(i,Symbol(str(i))) for i in _.atoms(Integer)]); reps
{2006: 2006, 2007: 2007}
>>> factor(eq.subs(reps))
c*(2006 + 2007)*(d + e + f)
Note: the evaluate=False is used to keep the like-terms from combining to give 4013*c*d + 4013*c*e + 4013*c*f.

optimization for faster calculation on python defaultdict

I have such a script;
for b in range(len(xy_alignments.keys())):
print str(b) + " : " + str(len(xy_alignments.keys()))
x = xy_alignments.keys()[b][0]
y = xy_alignments.keys()[b][1]
yx_prob = yx_alignments[(y,x)] / x_phrases[x]
xy_prob = xy_alignments[(x,y)] / y_phrases[y]
line_str = x + "\t" + y + "\t" + str(yx_prob) + "\t" + str(xy_prob) + "\n"
of.write(line_str.encode("utf-8"))
of.close()
xy_alignments, yx_alignments, x_phrases, and y_phrases are
python defaultdict variables which involve millions of keys.
When I run the loop above, it runs damn slowly.
Do python lovers have a suggestion to make it fast?
Thanks,
Here's a more idiomatic version, that should also be faster.
for (x, y), xy_alignment in xy_alignments.iteritems():
yx_prob = yx_alignments[(y, x)] / x_phrases[x]
xy_prob = xy_alignment / y_phrases[y]
of.write(b'%s\t%s\t%s\t%s\n' % (x, y, yx_prob, xy_prob))
This
saves the key() calls which create new lists every time,
saves one dict lookup by using iteritems(),
saves string allocations by using string formatting, and
saves the encode() call because all output is in the ascii range anyway.

Leading/prefix 0s in out of for loop

I am writing a four loop in my program that writes data to a file. I'm wanting for the output to be formatted as follows
frame001 + K.1
frame002 + K.2
...
frame099 + K.99
frame100 + K.100
So far I am doing
for f in range(1, 100):
file.write('frame' + str(f) + ' + K.' + str(f) + '\n')
I have no problem having the K part come out correctly as K.1-K.100, but I don't know how to have prefix zeros/have it output also frame00F to frameFFF with the appropriate amount of preceding zeros.
Using str.format:
>>> 'frame{0:03d} + K.{0}\n'.format(1)
'frame001 + K.1\n'
>>> 'frame{0:03d} + K.{0}\n'.format(100)
'frame100 + K.100\n'
BTW, range(1, 100) will not yield 100. If you want 100 to be included, that should be range(1, 101).
If you are using old version of Python (Python 2.5-), use % operator (String formatting operator) instead (need to specify multiple argument unlike str.format)
>>> 'frame%03d + K.%d\n' % (1, 1)
'frame001 + K.1\n'
>>> 'frame%03d + K.%d\n' % (100, 100)
'frame100 + K.100\n'
If you don't want to repeat arguments, you can pass mapping instead with slightly different format specifier:
>>> 'frame%(i)03d + K.%(i)d\n' % {'i': 1}
'frame001 + K.1\n'

Swapping lines in a text

How to change this:
fv (x,y,z) begin print x;;; print y ;;; return x + y + z end;
x = fv(2,34,5)
g (x) begin y = x + 45 ;;; return y end;
z = g(23)
r = 53
h (x,y,z,r) begin print x;;; print y ;;; print z;;;print r;;;return x + y + z end;
To this:
def fv (x,y,z) :
print x
print y
return x + y + z
x = fv(2,34,5)
def g (x) :
y = x + 45
return y
z = g(23)
r = 53
def h (x,y,z,r) :
print x
print y
print z
print r
return x + y + z
I'm not asking for a full code or to do my homework, I only need advices and/or samples or a direction how to do this.
Since you're only looking for a starting hint, and this is probably homework...
Do a replace() on the various line-enders (e.g. "begin", ";;;", "end;") converting them to "\n", with possibly a ':' in one of them.
Split the resulting text into lines with .split("\n")
Walk the lines to adjust the line prefixes ("def ", indentation)
Put the lines back together using "\n".join(...)
Write the output text
this could get you started
for line in code:
line = line.replace( "begin", " :\n" + " " * 4 ).replace( ";;;", "\n" + " " * 4 ).replace( "end;", "\n" + " " * 4 )
Look at the sed command line tool, for instance. It's a bit hard to know what tools you're expected/allowed to use ...
Well, for starters you open() the file, use its readlines() method to get it into a list of strings.
From there you could iterate through that list and use a combination of split(";;;") methods or something more complex from the re module on strings.
This might be overkill, but take a look at the Ply parser project. You will have to learn about regular expressions and Backus Naur formatting.
Ply parser

Categories