I've been struggling to figure out a way to get my sequence printed out with a 6-mer in the sequence on separate lines. As so (note the spacing of each line):
atgctagtcatc
tgctag
gctagt
ctagtc
tagtca
etc
So far, I've been able to get my sequence in string as shown:
from Bio import SeqIO
record = SeqIO.read(open("testSeq.fasta"), "fasta")
sequence = str(record.seq)
However, the only way I could seem to figure out to do the printing of the 6-mers is by:
print sequence
print sequence[0:5]
print "", sequence[1:6]
print "", "", sequence[2:7]
print "", "", "", sequence [3:8]
etc
I feel like there should be an easier way to do this. I've tried this, but it doesn't seem to work:
x = 0
y = 6
for sequence in sequence[x:y]
print sequence
x = x + 1
y = y + 1
Any opinions on how I should be attempting to accomplish this task would be greatly appreciated. I've only been using python for a couple days now and I'm sorry if my question seems simple.
Thank you!!
This should work:
width = 6
for i in range(len(sequence) - width):
print " " * i + sequence[i:i+width]
You could try the following (as far as I see you're using python2)
seq = "atgctagtcatc"
spaces = " "
for i in range(0, len(seq)):
print spaces*i+seq[i:i+6]
Output:
atgcta
tgctag
gctagt
ctagtc
tagtca
agtcat
gtcatc
tcatc
catc
atc
tc
c
Related
I am new on python and currently reading Hands-on cryptography with python, while I was reading the caesar5.py script on the book a question came across my mind and I would appreciate anyone who can help me out with it.
the code says:
alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
str_in = raw_input("Enter ciphertext: ")
for shift in range(26):
n = len(str_in)
str_out = ""
for i in range(n):
c = str_in[i]
loc = alpha.find(c)
newloc = (loc + shift)%26
str_out += alpha[newloc]
print shift, str_out
and it prints the results in 26 rows, and I was wondering how can I print the results in one list?
instead of printing it like
0 KHOOR
1 LIPPS
.
.
.
25 JGNNQ
It just prints out [KHOOR, LIPPS,...,JGNNQ], something like this.
So essentially what your looking to do is create an array in python.
There are tutorials on this: https://www.w3schools.com/python/python_arrays.asp.
I'll try to write the code out quickly:
array = []
array.append(str_out)
Then at the end of the for loop:
print(array)
I have a question regarding formatting. I am trying to extract relevant data and insert this data into a fortran file. Thankfully, I am using python to accomplish this task. It just so happens that the fortran file is sensitive to the number of spaces between text. So, this brings me to my question. My array array data looks like:
[[ -1.80251269 12.14048223 15.47522331]
[ -2.63865822 13.1656285 15.97462801]
[ -1.76966256 11.35311123 16.13958474]
[ -0.76320052 12.45171386 15.34209158]
[ -2.12634889 11.84315415 14.48020468]]
[[-14.80251269 1.14048223 1.47522331]
[ -2.63865822 13.1656285 15.97462801]
[ -1.76966256 11.35311123 16.13958474]
[ -0.76320052 12.45171386 15.34209158]
[ -2.12634889 11.84315415 14.48020468]]
[[ -0.80251269 0.14048223 0.47522331]
[ -2.63865822 13.1656285 15.97462801]
[ -1.76966256 11.35311123 16.13958474]
[ -0.76320052 12.45171386 15.34209158]
[ -2.12634889 11.84315415 14.48020468]]
These elements are floats, not strings. For example, I wanted the the first row (and every row thereafter) of the data to look like:
-1.80251269 12.14048223 15.47522331
How would I accomplish this? To be specific, there are 5 white spaces that seperate the left margin from the 1st number, -1.80251269, and 5 white spaces that seperate each of the three numbers. Notice also that I need the array brackets gone, but I suspect I can do this with a trim function. Sorry for my lack of knowledge guys; I do not even know how to begin this problem as my knowledge in Python syntax is limited. Any help or tips would be appreciated. Thanks!
EDIT: this is the code I am using to generate the array:
fo = np.genfromtxt("multlines.inp")
data=scipy.delete(fo, 0, 1)
txt = np.hsplit(data,3)
all_data = np.vsplit(data, 4)
i=0
num_molecules = int(raw_input("Enter the number of molecules: "))
print "List of unaltered coordinates:"
while i < (num_molecules):
print all_data[i]
If you are using NumPy, you can use np.savetxt:
np.savetxt('a.txt', a.reshape(15,3), '%16.8f')
To get
-1.80251269 12.14048223 15.47522331
-2.63865822 13.16562850 15.97462801
-1.76966256 11.35311123 16.13958474
...
(You need to reshape your array into 2-dimensions to do what I think you want).
If you have your data formatted as a list, then I suspect that #kamik423's answer will help you. If it if formatted as a string, you may wish to try something like the following.
def properly_format(line):
nums = line.strip(' []\t').split()
spaces = ' '
return spaces + nums[0] + spaces + nums[1] + spaces + nums[2]
lines = my_array_string.splitlines() #if your data is a multiline string
for line in lines:
formatted_line = properly_format(line)
# do something with formatted_line
Edit: forgot to split the string.
If you don't care about the length of each block you can just do
for i in whateverYouArrayIsCalled:
print str(i[0]) + " " + str(i[1]) + " " + str(i[2])
if you however want to have all the elements to be inline try
for i in whateverYouArrayIsCalled:
print (str(i[0]) + " ")[:20] + (str(i[1]) + " ")[:20] + str(i[2])
where the 20 is the length of each block
(for 2.7)
I will assume that the data array is saved in a data.txt file and you want to save the result into fortran.txt, then:
fortran_file = open('fortran.txt','w') # Open fortran.txt for writing
with open('data.txt',r) as data_file: #Open data.txt for reading
while True:
line = data_file.readline()
if not line: break # EOF
result = line.strip('[]').split()
result = " " + " ".join(result)
fortran_file.write(result)
fortran_file.close()
try this:
import numpy
numpy.set_printoptions(sign=' ')
So I have a very simple task. The Project Euler problem Names Scores gives us a file with a set of strings(which are names). Now you have to sort these names in the alphabetical order and then compute what is known as a name score for each of these names and sum them all up. The name score calculation is pretty simple. All you have to do is take a name and then sum up the values of the alphabets in the name and then multiply this sum with the position that the name has on the list. Obviously this seems a pretty simple question.
Being a python beginner, I wanted to try this out on python and being a beginner this was the code I wrote out. I did use list comprehensions as well along with a sum, but that gives me the same answer. Here is my code:
def name_score(s):
# print sum((ord(c)-96) for c in s)
s1 = 0;
for c in s:
s1 = s1 + (ord(c) - 96)
print s1
return s1
# print ord(c) - 96
myList = []
f = open('p022_names.txt')
for line in f:
myList.append(line.lower())
count = 0;
totalSum = 0;
for line in sorted(myList):
count = count + 1;
totalSum += (name_score(line) * count)
print totalSum
Now the file p022_names.txt contains only one line "colin". So the function name_score("colin") should return 53. Now try whatever I always end up getting the value -33. I am using PyDev on Eclipse. Now here is a curious anomaly. If I just used the list variable and populated it with the value myList = ["colin"] in the code, I get the correct answer. Honestly I don't know what is happening. Can anybody throw some light into what is happening here. There is a similar loop also in the program to calculate totalSum, but that doesn't seem to have an issue.
[EDIT] After the issue was pointed out, I am posting an updated revision of the code which works.
def name_score(s):
return sum((ord(c)-96) for c in s)
with open('p022_names.txt') as f:
myList = f.read().splitlines()
print sum((name_score(line.lower()) * (ind+1)) for ind,line in enumerate(sorted(myList)))
96 - 53 - 33 = 10
That happens because you have a newline character ("\n") in your file, thus your line is not "colin" but "colin\n".
To get rid of the newline character, multiple approaches could work. Here is an example:
Replace your line:
for line in f:
with:
for line in f.read().splitlines():
Could it be because you didn't close the file? As in f.close()?
I'm trying to create a simple encryption/decryption code in Python like this (maybe you can see what I'm going for):
def encrypt():
import random
input1 = input('Write Text: ')
input1 = input1.lower()
key = random.randint(10,73)
output = []
for character in input1:
number = ord(character) - 96
number = number + key
output.append(number)
output.insert(0,key)
print (''.join(map(str, output)))
def decrypt():
text = input ('What to decrypt?')
key = int(text[0:2])
text = text[2:]
n=2
text = text
text = [text[i:i+n] for i in range(0, len(text), n)]
text = map(int,text)
text = [x - key for x in text]
text = ''.join(map(str,text))
text = int(text)
print (text)
for character in str(text):
output = []
character = int((character+96))
number = str(chr(character))
output.append(number)
print (''.join(map(str, output)))
When I run the decryptor with the output from the encryption output, I get "TypeError: Can't convert 'int' object to str implicitly."
As you can see, I've added some redundancies to help try to fix things but nothing's working. I ran it with different code (can't remember what), but all that one kept outputting was something like "generatorobject at ."
I'm really lost and I could use some pointers guys, please and thank you.
EDIT: The problem arises on line 27.
EDIT 2: Replaced "character = int((character+96))" with "character = int(character)+96", now the problem is that it only prints (and as I can only assume) only appends the last letter of the decrypted message.
EDIT 2 SOLVED: output = [] was in the for loop, thus resetting it every time. Problem solved, thank you everyone!
Full traceback would help, but it looks like character = int(character)+96 is what you want on line 27.
Paul McGuire, the author of pyparsing, was kind enough to help a lot with a problem I'm trying to solve. We're on 1st down with a yard to goal, but I can't even punt it across the goal line. Confucius said if he gave a student 1/4 of the solution, and he did not return with the other 3/4s, then he would not teach that student again. So it is after almost a week of frustation and with great anxiety that I ask this...
How do I open an input file for pyparsing and print the output to another file?
Here is what I've got so far, but it's really all his work
from pyparsing import *
datafile = open( 'test.txt' )
# Backaus Nuer Form
num = Word(nums)
accessionDate = Combine(num + "/" + num + "/" + num)("accDate")
accessionNumber = Combine("S" + num + "-" + num)("accNum")
patMedicalRecordNum = Combine(num + "/" + num + "-" + num + "-" + num)("patientNum")
gleason = Group("GLEASON" + Optional("SCORE:") + num("left") + "+" + num("right") + "=" + num("total"))
patientData = Group(accessionDate + accessionNumber + patMedicalRecordNum)
partMatch = patientData("patientData") | gleason("gleason")
lastPatientData = None
# PARSE ACTIONS
def patientRecord( datafile ):
for match in partMatch.searchString(datafile):
if match.patientData:
lastPatientData = match
elif match.gleason:
if lastPatientData is None:
print "bad!"
continue
print "{0.accDate}: {0.accNum} {0.patientNum} Gleason({1.left}+{1.right}={1.total})".format(
lastPatientData.patientData, match.gleason
)
patientData.setParseAction(lastPatientData)
# MAIN PROGRAM
if __name__=="__main__":
patientRecord()
It looks like you need to call datafile.read() in order to read the contents of the file. Right now you are trying to call searchString on the file object itself, not the text in the file. You should really look at the Python tutorial (particularly this section) to get up to speed on how to read files, etc.
It seems like you need some help putting it together. The advice of #BrenBarn is spot-on, work with problem of simple complexity before you put it all together. I can help by giving you a minimal example of what you are trying to do, with a much simpler grammar. You can use this as a template to learn how to read/write a file in python. Consider the input text file data.txt:
cat 3
dog 5
foo 7
Let's parse this file and output the results. To have some fun, let's mulpitply the second column by 2:
from pyparsing import *
# Read the input data
filename = "data.txt"
FIN = open(filename)
TEXT = FIN.read()
# Define a simple grammar for the text, multiply the first col by 2
digits = Word(nums)
digits.setParseAction(lambda x:int(x[0]) * 2)
blocks = Group(Word(alphas) + digits)
grammar = OneOrMore(blocks)
# Parse the results
result = grammar.parseString( TEXT )
# This gives a list of lists
# [['cat', 6], ['dog', 10], ['foo', 14]]
# Open up a new file for the output
filename2 = "data2.txt"
FOUT = open(filename2,'w')
# Walk through the results and write to the file
for item in result:
print item
FOUT.write("%s %i\n" % (item[0],item[1]))
FOUT.close()
This gives in data2.txt:
cat 6
dog 10
foo 14
Break each piece down until you understand it. From here, you can slowly adapt this minimal example to your more complex problem above. It's OK to read the file in (as long as it is relatively small) since Paul himself notes:
parseFile is really just a simple shortcut around parseString, pretty
much the equivalent of expr.parseString(open(filename).read()).