How to extract and trim the fasta sequence using biopython

How to extract and trim the fasta sequence using biopython - python

Hellow everybody, I am new to python struggling to do a small task using biopython.I have two file- one containing list of ids and associated number.eg
id.txt
tr_F6LMO6_F6LMO6_9LE 25
tr_F6ISE0_F6ISE0_9LE 17
tr_F6HSF4_F6HSF4_9LE 27
tr_F6PLK9_F6PLK9_9LE 19
tr_F6HOT8_F6HOT8_9LE 29
Second file containg a large fasta sequences.eg below
fasta_db.fasta
>tr|F6LMO6|F6LMO6_9LEHG Transporter
MLAPETRRKRLFSLIFLCTILTTRDLLSVGIFQPSHNARYGGMGGTNLAIGGSPMDIGTN
PANLGLSSKKELEFGVSLPYIRSVYTDKLQDPDPNLAYTNSQNYNVLAPLPYIAIRIPIT
EKLTYGGGVYVPGGGNGNVSELNRATPNGQTFQNWSGLNISGPIGDSRRIKESYSSTFYV
>tr|F6ISE0|F6ISE0_9LEHG peptidase domain protein OMat str.
MPILKVAFVSFVLLVFSLPSFAEEKTDFDGVRKAVVQIKVYSQAINPYSPWTTDGVRASS
GTGFLIGKKRILTNAHVVSNAKFIQVQRYNQTEWYRVKILFIAHDCDLAILEAEDGQFYK
>tr|F6HSF4|F6HSF4_9LEHG hypothetical protein,
MNLRSYIREIQVGLLCILVFLMSLYLLYFESKSRGASVKEILGNVSFRYKTAQRKFPDRM
LWEDLEQGMSVFDKDSVRTDEASEAVVHLNSGTQIELDPQSMVVLQLKENREILHLGEGS
>tr|F6PLK9|F6PLK9_9LEHG Uncharacterized protein mano str.
MRKITGSYSKISLLTLLFLIGFTVLQSETNSFSLSSFTLRDLRLQKSESGNNFIELSPRD
RKQGGELFFDFEEDEASNLQDKTGGYRVLSSSYLVDSAQAHTGKRSARFAGKRSGIKISG
I wanted to match the id from the first file with second file and print those matched seq in a new file after removing the length(from 1 to 25, in eq) .
Eg output[ 25(associated value with id,first file), aa removed from start, when id matched].
fasta_pruned.fasta
>tr|F6LMO6|F6LMO6_9LEHG Transporter
LLSVGIFQPSHNARYGGMGGTNLAIGGSPMDIGTNPANLGLSSKKELEFGVSL
PYIRSVYTDKLQDPDPNLAYTNSQNYNVLAPLPYIAIRIPITEKLTYGGGVYV
PGGGNGNVSELNRATPNGQTFQNWSGLNISGPIGDSRRIKESYSSTFYV
Biopython cookbook was way above my head being new to python programming.Thanks for any help you can give.
I tried and messed up. Here is it.
from Bio import SeqIO
from Bio import Seq
f1 = open('fasta_pruned.fasta','w')
lengthdict = dict()
with open("seqid_len.txt") as seqlengths:
for line in seqlengths:
split_IDlength = line.strip().split(' ')
lengthdict[split_IDlength[0]] = split_IDlength[1]
with open("species.fasta","rU") as spe:
for record in SeqIO.parse(spe,"fasta"):
if record[0] == '>' :
split_header = line.split('|')
accession_ID = split_header[1]
if accession_ID in lengthdict:
f1.write(str(seq_record.id) + "\n")
f1.write(str(seq_record_seq[split_IDlength[1]-1:]))
f1.close()

Your code has almost everything except for a couple of small things which prevent it from giving the desired output:
Your file id.txt has two spaces between the id and the location. You take the 2nd element which would be empty in this case.
When the file is read it is interpreted as a string but you want the position to be an integer
lengthdict[split_IDlength[0]] = int(split_IDlength[-1])
Your ids are very similar but not identical, the only identical part is the 6 character identifier which could be used to map the two files (double check that before you assume it works). Having identical keys makes mapping much easier.
f1 = open('fasta_pruned.fasta', 'w')
fasta = dict()
with open("species.fasta","rU") as spe:
for record in SeqIO.parse(spe, "fasta"):
fasta[record.id.split('|')[1]] = record
lengthdict = dict()
with open("seqid_len.txt") as seqlengths:
for line in seqlengths:
split_IDlength = line.strip().split(' ')
lengthdict[split_IDlength[0].split('_')[1]] = int(split_IDlength[1])
for k, v in lengthdict.items():
if fasta.get(k) is None:
continue
print('>' + k)
print(fasta[k].seq[v:])
f1.write('>{}\n'.format(k))
f1.write(str(fasta[k].seq[v:]) + '\n')
f1.close()
Output:
>F6LMO6
LLSVGIFQPSHNARYGGMGGTNLAIGGSPMDIGTNPANLGLSSKKELEFGVSLPYIRSVYTDKLQDPDPNLAYTNSQNYNVLAPLPYIAIRIPITEKLTYGGGVYVPGGGNGNVSELNRATPNGQTFQNWSGLNISGPIGDSRRIKESYSSTFYV
>F6ISE0
LPSFAEEKTDFDGVRKAVVQIKVYSQAINPYSPWTTDGVRASSGTGFLIGKKRILTNAHVVSNAKFIQVQRYNQTEWYRVKILFIAHDCDLAILEAEDGQFYK
>F6HSF4
YFESKSRGASVKEILGNVSFRYKTAQRKFPDRMLWEDLEQGMSVFDKDSVRTDEASEAVVHLNSGTQIELDPQSMVVLQLKENREILHLGEGS
>F6PLK9
IGFTVLQSETNSFSLSSFTLRDLRLQKSESGNNFIELSPRDRKQGGELFFDFEEDEASNLQDKTGGYRVLSSSYLVDSAQAHTGKRSARFAGKRSGIKISG
>F6HOT8

Related

Remove specific items (0 values and values multiplied by *0) from a large text file and write it to a new text file using Python

I am a basic Python user and I have searched in multiple platforms how to delete from a large text file specific values but I haven't found anything similar to what I want to do . I have a large file (out.txt) and I want to remove all the 0 values and all values multiplied by 0 (75*0) in the large data file. After removing all those values I want to write it in a new text file (out2.txt). Suggestions please. Thanks!
I have tried this code;
content = open('out.txt', 'r').readlines()
content_set = set(content)
cleandata = open('clean.txt', 'w')
for line in content_set:
cleandata.remove(0)
I keep getting this error:
cleandata.remove(0)
AttributeError: '_io.TextIOWrapper' object has no attribute 'remove'
DATA FILE out.txt
75*0 78.8502 45.9301 13358*0 10.7678 0 23.9901 43.8503 77*0 1.3757 36.9888 15.0398 76*0 8.19519 0 4.11938 21.4933 23.832 76*0 34.7566
15.5595 21.0239 0 47.1607 76*0 14.9065 52.916 51.7825 13358*0 62.4689 22.8217 15.68 77*0 12.8943 0 32.1276 14.1273 76*0 39.6095
70.8503 72.8765 45.7607 76*0 12.5657 72.7567 58.0161 30.9 76*0 19.5879 648.696 111.501 13358*0 17.36 18.0555 85.0358 77*0 4.62265
55.7498 61.2049 76*0 762.354 8.34207 23.2367 16.0517 76*0 405.637 20.1265 8.17844 16.4698 76*0 107.228 35.1968 38.4117 13358*0

Try this:
with open('out.txt') as f:
s=f.read()
s=' '.join([i for i in s.split(' ') if i!='0' and '*0' not in i])
with open('out2.txt', 'w') as f:
f.write(s)
Output:
78.8502 45.9301 10.7678 23.9901 43.8503 1.3757 36.9888 15.0398 8.19519 4.11938 21.4933 23.832 34.7566
15.5595 21.0239 47.1607 14.9065 52.916 51.7825 62.4689 22.8217 15.68 12.8943 32.1276 14.1273 39.6095
70.8503 72.8765 45.7607 12.5657 72.7567 58.0161 30.9 19.5879 648.696 111.501 17.36 18.0555 85.0358 4.62265
55.7498 61.2049 762.354 8.34207 23.2367 16.0517 405.637 20.1265 8.17844 16.4698 107.228 35.1968 38.4117

content = open("out.txt").read()
segments = content.split()
for segment in range(len(segments)):
if segments[segment]=="0" or segments[segment].endswith("*0"):
del segments[segment]
clean = open("clean.txt", "w")
clean.write(" ".join(segments))
clean.close()
What this does is take all of the content of out.txt and split() it on all whitespace (no argument means all whitespace). Then it loops over each segment and on each segment, checking if the segment is 0 or contains *0, and if it has either, deletes the segment from segments. At the end, it creates clean.txt, writes all of the segments with spaces separating them, and then closes clean.txt.
The only problem with this that I noticed is that when it writes to clean.txt, they are separated by spaces instead of their original whitespace. One way to fix this is to store the whitespace after each number and when it contains 0 or includes *0, destroy the segment and it's associated whitespace.
Try it and tell me in the comments if it works!

This is should work:
content = open('out.txt', 'r').readlines()
cleandata = []
for line in content:
line = {i:None for i in line.replace("\n", "").split()}
for value in line.copy():
if value == "0" or value.endswith("*0"):
line.pop(value)
cleandata.append(" ".join(line) + "\n")
open('clean.txt', 'w').writelines(cleandata)

Python - Match patterns, print pattern and n lines after it

I have a file like this (with +10000 sequences, +98000 lines):
>DILT_0000000001-mRNA-1
MKVVKICSKLRKFIESRKDAVLPEQEEVLADLWAFEGISEFQMERFAKAAQCFQHQYELA
IKANLTEHASRSLENLGRARARLYDYQGALDAWTKRLDYEIKGIDKAWLHHEIGRAYLEL
NQYEEAIDHAATARDVADREADMEWDLNATVLIAQAHFYAGNLEEAKVYFEAAQNAAFRK
GFFKAESVLAEAIAEVDSEIRREEAKQERVYTKHSVLFNEFSQRAVWSEEYSEELHLFPF
AVVMLRCVLARQCTVHLQFRSCYNL
>DILT_0000000101-mRNA-1
MSCRRLSMNPGEALIKESSAPSRENLLKPYFDEDRCKFRHLTAEQFSDIWSHFDLDGVNE
LRFILRVPASQQAGTGLRFFGYISTEVYVHKTVKVSYIGFRKKNNSRALRRWNVNKKCSN
AVQMCGTSQLLAIVGPHTQPLTNKLCHTDYLPLSANFA
>DILT_0001999301-mRNA-1
LEHGIQPDGQMPSDKTIGGGDDSFQTFFSETGAGKHVPRAVMVDLEPTVIGEYLCVLLTS
FILFRLISTNLGPNSQLASRTLLFAADKTTLFRLLGLLPWSLLKIAVQ
>DILT_0001999401-mRNA-1
MAENGEDANMPEEGKEGNTQDQGEHQQDVQSDEPNEADSGYSSAASSDVNSQTIPITVIL
PNREAVNLSFDPNISVSELQERLNGPGITRLNENLFFTYSGKQLDPNKTLLDYKVQKSST
LYVHETPTALPKSAPNAKEEGVVPSNCLIHSGSRMDENRCLKEYQLTQNSVIFVHRPTAN
TAVQNREEKTSSLEVTVTIRETGNQLHLPINPHXXXXTVEMHVAPGVTVGDLNRKIAIKQ
all the lines with the '>' are IDs. The following lines are the sequences regarding the ID.
I also have a file with the IDs of the sequences I want, like:
DILT_0000000001-mRNA-1
DILT_0000000101-mRNA-1
DILT_0000000201-mRNA-1
DILT_0000000301-mRNA-1
DILT_0000000401-mRNA-1
DILT_0000000501-mRNA-1
DILT_0000000601-mRNA-1
DILT_0000000701-mRNA-1
DILT_0000000801-mRNA-1
DILT_0000000901-mRNA-1
I want to write a script to match the ids and copy the sequences of this IDs, but I'm just getting the IDs, without the sequences.
seqs = open('WBPS10.protein.fa').readlines()
ids = open('ids.txt').readlines()
for line in ids:
for record in seqs:
if line == record[1:]:
print record
I don't know what to write to get the 'n' lines after the ID, because sometimes it's 2 lines, other sequences have more as you can see in my example.
The thing is, I'm trying to do it without using Biopython, which would be a lot easier. I just want to learn other ways.

seqs_by_ids = {}
with open('WBPS10.protein.fa', 'r') as read_file:
for line in read_file.readlines():
if line.startswith('>'):
current_key = line[1:].strip()
seqs_by_ids[current_key] = ''
else:
seqs_by_ids[current_key] += line.strip()
ids = set([line.strip() for line in open('ids.txt').readlines()])
for id in ids:
if id in seqs_by_ids:
print(id)
print('\t{}'.format(seqs_by_ids[id]))
output:
DILT_0000000001-mRNA-1
MKVVKICSKLRKFIESRKDAVLPEQEEVLADLWAFEGISEFQMERFAKAAQCFQHQYELAIKANLTEHASRSLENLGRARARLYDYQGALDAWTKRLDYEIKGIDKAWLHHEIGRAYLELNQYEEAIDHAATARDVADREADMEWDLNATVLIAQAHFYAGNLEEAKVYFEAAQNAAFRKGFFKAESVLAEAIAEVDSEIRREEAKQERVYTKHSVLFNEFSQRAVWSEEYSEELHLFPFAVVMLRCVLARQCTVHLQFRSCYNL
DILT_0000000101-mRNA-1
MSCRRLSMNPGEALIKESSAPSRENLLKPYFDEDRCKFRHLTAEQFSDIWSHFDLDGVNELRFILRVPASQQAGTGLRFFGYISTEVYVHKTVKVSYIGFRKKNNSRALRRWNVNKKCSNAVQMCGTSQLLAIVGPHTQPLTNKLCHTDYLPLSANFA

This should work for you. if line == record[1:]: statement will not work if there is some special char in string e.g \r\n . You are interested in finding the matching IDs only. Code below will work for you.
Code sample
seqs = open('WBPS10.protein.fa').readlines()
ids = open('ids.txt').readlines()
for line in ids:
for record in seqs:
if line in record :
print record
output :
>DILT_0000000001-mRNA-1
>DILT_0000000101-mRNA-1

python newbie - where is my if/else wrong?

Complete beginner so I'm sorry if this is obvious!
I have a file which is name | +/- or IG_name | 0 in a long list like so -
S1 +
IG_1 0
S2 -
IG_S3 0
S3 +
S4 -
dnaA +
IG_dnaA 0
Everything which starts with IG_ has a corresponding name. I want to add the + or - to the IG_name. e.g. IG_S3 is + like S3 is.
The information is gene names and strand information, IG = intergenic region. Basically I want to know which strand the intergenic region is on.
What I think I want:
open file
for every line, if the line starts with IG_*
find the line with *
print("IG_" and the line it found)
else
print line
What I have:
with open(sys.argv[2]) as geneInfo:
with open(sys.argv[1]) as origin:
for line in origin:
if line.startswith("IG_"):
name = line.split("_")[1]
nname = name[:-3]
for newline in geneInfo:
if re.match(nname, newline):
print("IG_"+newline)
else:
print(line)
where origin is the mixed list and geneInfo has only the names not IG_names.
With this code I end up with a list containing only the else statements.
S1 +
S2 -
S3 +
S4 -
dnaA +
My problem is that I don't know what is wrong to search so I can (attempt) to fix it!

Below is some step-by-step annotated code that hopefully does what you want (though instead of using print I have aggregated the results into a list so you can actually make use of it). I'm not quite sure what happened with your existing code (especially how you're processing two files?)
s_dict = {}
ig_list = []
with open('genes.txt', 'r') as infile: # Simulating reading the file you pass in sys.argv
for line in infile:
if line.startswith('IG_'):
ig_list.append(line.split()[0]) # Collect all our IG values for later
else:
s_name, value = line.split() # Separate out the S value and its operator
s_dict[s_name] = value.strip() # Add to dictionary to map S to operator
# Now you can go back through your list of IG values and append the appropriate operator
pulled_together = []
for item in ig_list:
s_value = item.split('_')[1]
# The following will look for the operator mapped to the S value. If it is
# not found, it will instead give you 'not found'
corresponding_operator = s_dict.get(s_value, 'Not found')
pulled_together.append([item, corresponding_operator])
print ('List structure')
print (pulled_together)
print ('\n')
print('Printout of each item in list')
for item in pulled_together:
print(item[0] + '\t' + item[1])

nname = name[:-3]
Python's slicing through list is very powerful, but can be tricky to understand correctly.
When you write [:-3], you take everything except the last three items. The thing is, if you have less than three element in your list, it does not return you an error, but an empty list.
I think this is where things does not work, as there are not much elements per line, it returns you an empty list. If you could tell what do you exactly want it to return there, with an example or something, it would help a lot, as i don't really know what you're trying to get with your slicing.

Does this do what you want?
from __future__ import print_function
import sys
# Read and store all the gene info lines, keyed by name
gene_info = dict()
with open(sys.argv[2]) as gene_info_file:
for line in gene_info_file:
tokens = line.split()
name = tokens[0].strip()
gene_info[name] = line
# Read the other file and lookup the names
with open(sys.argv[1]) as origin_file:
for line in origin_file:
if line.startswith("IG_"):
name = line.split("_")[1]
nname = name[:-3].strip()
if nname in gene_info:
lookup_line = gene_info[nname]
print("IG_" + lookup_line)
else:
pass # what do you want to do in this case?
else:
print(line)

How can I organize case-insensitive text and the material following it?

I'm very new to Python so it'd be very appreciated if this could be explained as in-depth as possible.
If I have some text like this on a text file:
matthew : 60 kg
MaTtHew : 5 feet
mAttheW : 20 years old
maTThEw : student
MaTTHEW : dog owner
How can I make a piece of code that can write something like...
Matthew : 60 kg , 5 feet , 20 years old , student , dog owner
...by only gathering information from the text file?

def test_data():
# This is obviously the source data as a multi-line string constant.
source = \
"""
matthew : 60 kg
MaTtHew : 5 feet
mAttheW : 20 years old
maTThEw : student
MaTTHEW : dog owner
bob : 70 kg
BoB : 6 ft
"""
# Split on newline. This will return a list of lines like ["matthew : 60 kg", "MaTtHew : 5 feet", etc]
return source.split("\n")
def append_pair(d, p):
k, v = p
if k in d:
d[k] = d[k] + [v]
else:
d[k] = [v]
return d
if __name__ == "__main__":
# Do a list comprehension. For every line in the test data, split by ":", strip off leading/trailing whitespace,
# and convert to lowercase. This will yield lists of lists.
# This is mostly a list of key/value size-2-lists
pairs = [[x.strip().lower() for x in line.split(":", 2)] for line in test_data()]
# Filter the lists in the main list that do not have a size of 2. This will yield a list of key/value pairs like:
# [["matthew", "60 kg"], ["matthew", "5 feet"], etc]
cleaned_pairs = [p for p in pairs if len(p) == 2]
# This will iterate the list of key/value pairs and send each to append_pair, which will either append to
# an existing key, or create a new key.
d = reduce(append_pair, cleaned_pairs, {})
# Now, just print out the resulting dictionary.
for k, v in d.items():
print("{}: {}".format(k, ", ".join(v)))

import sys
# There's a number of assumptions I have to make based on your description.
# I'll try to point those out.
# Should be self-explanatory. something like: "C:\Users\yourname\yourfile"
path_to_file = "put_your_path_here"
# open a file for reading. The 'r' indicates read-only
infile = open(path_to_file, 'r')
# reads in the file line by line and strips the "invisible" endline character
readLines = [line.strip() for line in infile]
# make sure we close the file
infile.close()
# An Associative array. Does not use normal numerical indexing.
# instead, in our case, we'll use a string(the name) to index into.
# At a given name index(AKA key) we'll save the attributes about that person.
names = dict()
# iterate through each line we read in from the file
# each line in this loop will be stored in the variable
# item for that iteration.
for item in readLines:
#assuming that your file has a strict format:
# name : attribute
index = item.find(':')
# if there was a ':' found then continue
if index is not -1:
# grab only the name of the person and convert the string to all lowercase
name = item[0:index].lower()
# see if our associative array already has that peson
if names.has_key(name):
# if that person has already been indexed add the new attribute
# this assumes there are no dupilcates so I don't check for them.
names[name].append(item[index+1:len(item)])
else:
# if that person was not in the array then add them.
# we're adding a list at that index to store their attributes.
names[name] = list()
# append the attribute to the list.
# the len() function tells us how long the string 'item' is
# offsetting the index by 1 so we don't capture the ':'
names[name].append(item[index+1:len(item)])
else:
# there was no ':' found in the line so skip it
pass
# iterate through keys (names) we found.
for name in names:
# write it to stdout. I am using this because the "print" built-in to python
# always ends with a new line. This way I can print the name and then
# iterate through the attributes associated with them
sys.stdout.write(name + " : ")
# iterate through attributes
for attribute in names[name]:
sys.stdout.write(attribute + ", ")
# end each person with a new line.
sys.stdout.write('\r\n')

Compare configuration data text with a default data text

I am in the process of understanding how to compare data from two text files and print the data that does not match into a new document or output.
The Program Goal:
Allow the user to compare the data in a file that contains many lines of data with a default file that has the correct values of the data.
Compare multiple lines of different data with the same parameters against a default list of the data with the same parameters
Example:
Lets say I have the following text document that has these parameters and data:
Lets call it Config.txt:
<231931844151>
Bird = 3
Cat = 4
Dog = 5
Bat = 10
Tiger = 11
Fish = 16
<92103884812>
Bird = 4
Cat = 40
Dog = 10
Bat = Null
Tiger = 19
Fish = 24
etc. etc.
Let's call this my Configuration data, now I need to make sure that the values these parameters inside my Config Data file are correct.
So I have a default data file that has the correct values for these parameters/variables. Lets call it Default.txt
<Correct Parameters>
Bird = 3
Cat = 40
Dog = 10
Bat = 10
Tiger = 19
Fish = 234
This text file is the default configuration or the correct configuration for the data.
Now I want to compare these two files and print out the data that is incorrect.
So, in theory, if I were to compare these two text document I should get an output of the following: Lets call this Output.txt
<231931844151>
Cat = 4
Dog = 5
Tiger = 11
Fish = 16
<92103884812>
Bird = 4
Bat = Null
Fish = 24
etc. etc.
Since these are the parameters that are incorrect or do not match. So in this case we see that for <231931844151> the parameters Cat, Dog, Tiger, and Fish did not match the default text file so those get printed. In the case of <92103884812> Bird, Bat, and Fish do not match the default parameters so those get printed.
So that's the gist of it for now.
Code:
Currently this is my approach I am trying to do however I'm not sure how I can compare a data file that has different sets of lines with the same parameters to a default data file.
configFile = open("Config.txt", "rb")
defaultFile = open("Default.txt", "rb")
with open(configFile) as f:
dataConfig = f.read().splitlines()
with open(defaultFile) as d:
dataDefault = d.read().splitlines()
def make_dict(data):
return dict((line.split(None, 1)[0], line) for line in data)
defdict = make_dict(dataDefault)
outdict = make_dict(dataConfig)
#Create a sorted list containing all the keys
allkeys = sorted(set(defdict) | set(outdict))
#print allkeys
difflines = []
for key in allkeys:
indef = key in defdict
inout = key in outdict
if indef and not inout:
difflines.append(defdict[key])
elif inout and not indef:
difflines.append(outdict[key])
else:
#key must be in both dicts
defval = defdict[key]
outval = outdict[key]
if outval != defval:
difflines.append(outval)
for line in difflines:
print line
Summary:
I want to compare two text documents that have data/parameters in them, One text document will have a series of data with the same parameters while the other will have just one series of data with the same parameters. I need to compare those parameters and print out the ones that do not match the default. How can I go about doing this in Python?
EDIT:
Okay so thanks to #Maria 's code I think I am almost there. Now I just need to figure out how to compare the dictionary to the list and print out the differences. Here's an example of what I am trying to do:
for i in range (len(setNames)):
print setNames[i]
for k in setData[i]:
if k in dataDefault:
print dataDefault
obvious the print line is just there to see if it worked or not but I'm not sure if this is the proper way about going through this.

Sample code for parsing the file into separate dictionaries. This works by finding the group separators (blank lines). setNames[i] is the name of the set of parameters in the dictionary at setData[i]. Alternatively you can create an object which has a string name member and a dictionary data member and keep a list of those. Doing the comparisons and outputting it how you want is up to you, this just regurgitates the input file to the command line in a slightly different format.
# The function you wrote
def make_dict(data):
return dict((line.split(None, 1)[0], line) for line in data)
# open the file and read the lines into a list of strings
with open("Config.txt" , "rb") as f:
dataConfig = f.read().splitlines()
# get rid of trailing '', as they cause problems and are unecessary
while (len(dataConfig) > 0) and (dataConfig[len(dataConfig) - 1] == ''):
dataConfig.pop()
# find the indexes of all the ''. They amount to one index past the end of each set of parameters
setEnds = []
index = 0
while '' in dataConfig[index:]:
setEnds.append(dataConfig[index:].index('') + index)
index = setEnds[len(setEnds) - 1] + 1
# separate out your input into separate dictionaries, and keep track of the name of each dictionary
setNames = []
setData = []
i = 0;
j = 0;
while j < len(setEnds):
setNames.append(dataConfig[i])
setData.append(make_dict(dataConfig[i+1:setEnds[j]]))
i = setEnds[j] + 1
j += 1
# handle the last index to the end of the list. Alternativel you could add len(dataConfig) to the end of setEnds and you wouldn't need this
if len(setEnds) > 0:
setNames.append(dataConfig[i])
setData.append(make_dict(dataConfig[i+1:]))
# regurgitate the input to prove it worked the way you wanted.
for i in range(len(setNames)):
print setNames[i]
for k in setData[i]:
print "\t" + k + ": " + setData[i][k];
print ""

Why not just use those dicts and loop through them to compare?
for keys in outdict:
if defdict.get(keys):
print outdict.get(keys)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract and trim the fasta sequence using biopython - python

Related

Remove specific items (0 values and values multiplied by *0) from a large text file and write it to a new text file using Python

Python - Match patterns, print pattern and n lines after it

python newbie - where is my if/else wrong?

How can I organize case-insensitive text and the material following it?

Compare configuration data text with a default data text

Categories

Resources