Class constant dictionary in Python - python

I'm building a module that has a class variable dictionary:
class CodonUsageTable:
CODON_DICT={'TTT': 0, 'TTC': 0, 'TTA': 0, 'TTG': 0, 'CTT': 0,
'CTC': 0, 'CTA': 0, 'CTG': 0, 'ATT': 0, 'ATC': 0,
'ATA': 0, 'ATG': 0, 'GTT': 0, 'GTC': 0, 'GTA': 0,
'GTG': 0, 'TAT': 0, 'TAC': 0, 'TAA': 0, 'TAG': 0,
'CAT': 0, 'CAC': 0, 'CAA': 0, 'CAG': 0, 'AAT': 0,
'AAC': 0, 'AAA': 0, 'AAG': 0, 'GAT': 0, 'GAC': 0,
'GAA': 0, 'GAG': 0, 'TCT': 0, 'TCC': 0, 'TCA': 0,
'TCG': 0, 'CCT': 0, 'CCC': 0, 'CCA': 0, 'CCG': 0,
'ACT': 0, 'ACC': 0, 'ACA': 0, 'ACG': 0, 'GCT': 0,
'GCC': 0, 'GCA': 0, 'GCG': 0, 'TGT': 0, 'TGC': 0,
'TGA': 0, 'TGG': 0, 'CGT': 0, 'CGC': 0, 'CGA': 0,
'CGG': 0, 'AGT': 0, 'AGC': 0, 'AGA': 0, 'AGG': 0,
#Other code
def __init__(self,seqobj):
'''Creates codon table for a given Bio.seq object.i
The only argument is Bio.Seq object with DNA
Currently assumes seq to be DNA, RNA support to be added later'''
dnaseq=str(seqobj)
self.usage_table=CodonUsageTable.CODON_DICT.deepcopy()#instance of table
The last line must make a copy of class dictionary to store instance data in it, but it throws
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "./codon_usage.py", line 47, in __init__
self.usage_table=CodonUsageTable.CODON_DICT.deepcopy()#instance of codon usage table
NameError: global name 'CODON_DICT' is not defined
So does self.CODON_DICT, CODON_DICT or codon_usage.CodonUsageTable.CODON_DICT, when called from __init__. Dictionary is defined:
>>>import codon_usage
>>> codon_usage.CodonUsageTable.CODON_DICT
{'GCT': 0, 'GGA': 0, 'TTA': 0, 'GAT': 0, 'TTC': 0, 'TTG': 0, 'AGT': 0, 'GCG': 0, 'AGG': 0, 'GCC': 0, 'CGA': 0, 'GCA': 0, 'GGC': 0, 'GAG': 0, 'GAA': 0, 'TTT': 0, 'GAC': 0, 'TAT': 0, 'CGC': 0, 'TGT': 0, 'TCA': 0, 'GGG': 0, 'TCC': 0, 'ACG': 0, 'TCG': 0, 'TAG': 0, 'TAC': 0, 'TAA': 0, 'ACA': 0, 'TGG': 0, 'TCT': 0, 'TGA': 0, 'TGC': 0, 'CTG': 0, 'CTC': 0, 'CTA': 0, 'ATG': 0, 'ATA': 0, 'ATC': 0, 'AGA': 0, 'CTT': 0, 'ATT': 0, 'GGT': 0, 'AGC': 0, 'ACT': 0, 'CGT': 0, 'GTT': 0, 'CCT': 0, 'AAG': 0, 'CGG': 0, 'AAC': 0, 'CAT': 0, 'AAA': 0, 'CCC': 0, 'GTC': 0, 'CCA': 0, 'GTA': 0, 'CCG': 0, 'GTG': 0, 'ACC': 0, 'CAA': 0, 'CAC': 0, 'AAT': 0, 'CAG': 0} 'GGT': 0, 'GGC': 0, 'GGA': 0, 'GGG': 0}

The symptoms imply that the story went like this:
you wrote the file and saved it;
you ran the Python shell;
you found that CODON_DICT can't be accessed just like that and fixed that;
you tried that call again within the same Python shell and got that error.
That happens because Python is still using the old version of the module, which is loaded during the import. Although it shows the line from the new file, since all it has in the memory is the bytecode with metadata and has to refer to the disk when error happens. If you want to pick your latest changes without restarting the shell, run:
>>> reload(codon_usage)
and try again.
(A sidenote: dict has no method deepcopy, that function comes from the module copy. dict.copy is enough here, though).

Related

How to transform this 2D list to a dictionary inside a dictionary?

I have been trying to transform the 2D list full_info_results but doesn't seem to find a good way to do it.
full_info_results = [
['*', 'G02', 'G05', 'G07', 'G08', 'G10'],
['P001', '1', '0', '-1', '503', '1'],
['P067', '1', '1', '0', '-1', '503'],
['P218', '0', '1', '1', '-1', '-1'],
['P101', '0', '0', '1', '1', '503'],
['P456', '1', '1', '-1', '1', '-1']
]
GXX = game_id
PXXX = player_id
1 = win
0, -1, 503 = these 3 number doesn't mean anything.
For each player_id I want to record whether they win each particular game_id or not. For example, Player id P001 win G02 and G10. If they win, I want to record it in dictionary as 1, if not win = 0.
Expected results:
expected_result = {'P001':{'G02':1, 'G05':0, 'G07':0, 'G08':0, 'G10':1},
'P067':{'G02':1, 'G05':1, 'G07':0, 'G08':0, 'G10':0},
'P218':{'G02':0, 'G05':1, 'G07':1, 'G08':0, 'G10':0},
'P101':{'G02':0, 'G05':0, 'G07':1, 'G08':1, 'G10':0},
'P456':{'G02':1, 'G05':1, 'G07':0, 'G08':1, 'G10':0}}
I wonder what is the proper way to transform this, because I hardly think of any right now.. Thanks in advance.
You could use the combination of dict and zip to build your sub dictionaries:
import pprint
expected_result = {}
# remove first item from full_info_results, to build the sub dict's keys
keys = full_info_results.pop(0)[1:]
for result in full_info_results:
# first item is the key, so remove it from the list
key = result.pop(0)
# convert '1' to 1 everything else to 0
expected_result[key] = dict(
zip(keys, [1 if r == '1' else 0 for r in result])
)
pprint.pprint(expected_result)
Out:
{'P001': {'G02': 1, 'G05': 0, 'G07': 0, 'G08': 0, 'G10': 1},
'P067': {'G02': 1, 'G05': 1, 'G07': 0, 'G08': 0, 'G10': 0},
'P101': {'G02': 0, 'G05': 0, 'G07': 1, 'G08': 1, 'G10': 0},
'P218': {'G02': 0, 'G05': 1, 'G07': 1, 'G08': 0, 'G10': 0},
'P456': {'G02': 1, 'G05': 1, 'G07': 0, 'G08': 1, 'G10': 0}}
keys= full_info_results[0][1:]
d = { l[0]: {k:(1 if v == '1' else 0) for k,v in zip(keys, l[1:])} for l in full_info_results[1:]}
In [18]: d
Out[18]:
{'P001': {'G02': 1, 'G05': 0, 'G07': 0, 'G08': 0, 'G10': 1},
'P067': {'G02': 1, 'G05': 1, 'G07': 0, 'G08': 0, 'G10': 0},
'P218': {'G02': 0, 'G05': 1, 'G07': 1, 'G08': 0, 'G10': 0},
'P101': {'G02': 0, 'G05': 0, 'G07': 1, 'G08': 1, 'G10': 0},
'P456': {'G02': 1, 'G05': 1, 'G07': 0, 'G08': 1, 'G10': 0}}
Try this :
full_info_results = [
['*', 'G02', 'G05', 'G07', 'G08', 'G10'],
['P001', '1', '0', '-1', '503', '1'],
['P067', '1', '1', '0', '-1', '503'],
['P218', '0', '1', '1', '-1', '-1'],
['P101', '0', '0', '1', '1', '503'],
['P456', '1', '1', '-1', '1', '-1']
]
keys = [key for key in full_info_results[0] if key.startswith("G")]
expected_result = {}
for pi in full_info_results[1:]:
expected_result[pi[0]] = dict(zip(keys, [1 if win == '1' else 0 for win in pi[1:]]))
print(expected_result)
Output:
{'P001': {'G02': 1, 'G05': 0, 'G07': 0, 'G08': 0, 'G10': 1},
'P067': {'G02': 1, 'G05': 1, 'G07': 0, 'G08': 0, 'G10': 0},
'P218': {'G02': 0, 'G05': 1, 'G07': 1, 'G08': 0, 'G10': 0},
'P101': {'G02': 0, 'G05': 0, 'G07': 1, 'G08': 1, 'G10': 0},
'P456': {'G02': 1, 'G05': 1, 'G07': 0, 'G08': 1, 'G10': 0}}
The following is rather concise using starred assignment excessively:
(_, *keys), *data = full_info_results
{head: {k: int(v=="1") for k, v in zip(keys, tail)} for head, *tail in data}
# {'P001': {'G02': 1, 'G05': 0, 'G07': 0, 'G08': 0, 'G10': 1},
# 'P067': {'G02': 1, 'G05': 1, 'G07': 0, 'G08': 0, 'G10': 0},
# 'P101': {'G02': 0, 'G05': 0, 'G07': 1, 'G08': 1, 'G10': 0},
# 'P218': {'G02': 0, 'G05': 1, 'G07': 1, 'G08': 0, 'G10': 0},
# 'P456': {'G02': 1, 'G05': 1, 'G07': 0, 'G08': 1, 'G10': 0}}

Printing possible substring in set of 3 and matching with dictionary keys in python

I want to print possible substring in set of 3 and assign dictionary values if pattern matches with dictionary.keys() and store them into new dictionary
input:
dict1={'000': 0, '001': 0, '010': 0, '011': 0, '100': 1, '101': 0, '110': 1, '111': 0}
str1=['010110100']
output:
sub_string= [010,101,011...]
new dict= {'010':0, '101':0, '011':0, '110':1......}
try this:
[str1[0][i:i+3] for i in range(len(str1[0])-2) if str1[0][i:i+3] in dict1]
{str1[0][i:i+3] : dict1.get(str1[0][i:i+3]) for i in range(len(str1[0])-2) if str1[0][i:i+3] in dict1}
# ['010', '101', '011', '110', '101', '010', '100']
# {'010': 0, '101': 0, '011': 0, '110': 1, '100': 1}
You could do like this:
dict1={'000': 0, '001': 0, '010': 0, '011': 0, '100': 1, '101': 0, '110': 1, '111': 0}
str1= '010110100'
sub_string = []
d = {}
for i in range(len(str1)-2):
temp = str1[i:i+3]
sub_string.append(temp)
d[temp] = dict1.get(temp)
print(sub_string)
print(d)
['010', '101', '011', '110', '101', '010', '100']
{'010': 0, '101': 0, '011': 0, '110': 1, '100': 1}

Stripping single quotes from Python list

I have a python list that might look like this:
['22', '0', '0', '0', '1, 0, 0, 0, 0']
I would like for it to look like this:
[22, 0, 0, 0, 1, 0, 0, 0, 0]
Since the final element isn't an integer, I can't use map as suggested here. And using ast, as suggested here, doesn't get me all the way:
[22, 0, 0, 0, (1, 0, 0, 0, 0)]
Any advice would be greatly appreciated.
Just split on commas then flatten/map int:
>>> [int(x) for item in data for x in item.split(',')]
[22, 0, 0, 0, 1, 0, 0, 0, 0]
l = ['22', '0', '0', '0', '1, 0, 0, 0, 0']
>>> k = ','.join(l)
>>> k
'22,0,0,0,1, 0, 0, 0, 0'
>>> k.split(',')
['22', '0', '0', '0', '1', ' 0', ' 0', ' 0', ' 0']
You can try:
x=['22', '0', '0', '0', '1, 0, 0, 0, 0']
y=eval(str(x).replace("'", ""))

Writing a Function Using pandas and Returning Data in Required Format

I have this total_of_each_error data frame:
month name errors count
0 January ABCD Big 1
1 January ABCD Monitoring 3
2 January WORLD Small 1
3 January Channel Big 2
4 January Channel Small 1
5 January Channel Monitoring 1
6 January AVR Monitoring 1
7 March WORLD Monitoring 2
8 April Migration Big 1
9 April Migration Monitoring 2
10 May P&G Small 1
11 May P&G Monitoring 1
12 May ABCD Monitoring 1
13 May WORLD Improvement 1
14 June P&G Monitoring 1
15 June ABCD Small 1
16 June ABCD Monitoring 1
I have written this function:
import pandas as pd
from itertools import product
def get_chartdata(df):
months = df['month'].unique().tolist()
no_of_errors = df['errors'].unique().tolist()
name = df['name'].unique().tolist()
cross_df = pd.DataFrame(list(product(months, name, no_of_errors)), columns=['month','name','errors'])
merged_df = pd.merge(total_of_each_error,cross_df,how='outer', left_on=['month','name','errors'],
right_on=['month','name','errors']).drop_duplicates().fillna(0)
pivot_df = merged_df.pivot_table(columns='month', index=['name','errors'], values='count', fill_value=0).reset_index()
data = {}
for index, row in pivot_df.iterrows():
if (row['name']) not in data.keys():
data[row['name']] = []
data[row['name']].append({'name':row.values[1:2].tolist() , 'data': row.values[2:].tolist()})
x_axis = {}
for i in pivot_df['name'].unique():
df1 = pivot_df[pivot_df['name'] == i]
x_axisData = pivot_df.columns[2:].unique()
x_axis[i] = {'categories': x_axisData.tolist()}
return data, x_axis
print(get_chartdata(total_of_each_error)) prints the following:
({'P&G': [{'name': ['Big'], 'data': [0, 0, 0, 0, 0]}, {'name': ['Small'], 'data': [0, 0, 0, 0, 1]}, {'name': ['Monitoring'], 'data': [0, 0, 1, 0, 1]}, {'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}],
'ABCD': [{'name': ['Big'], 'data': [0, 1, 0, 0, 0]}, {'name': ['Small'], 'data': [0, 0, 1, 0, 0]}, {'name': ['Monitoring'], 'data': [0, 3, 1, 0, 1]}, {'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}],
'WORLD': [{'name': ['Big'], 'data': [0, 0, 0, 0, 0]}, {'name': ['Small'], 'data': [0, 1, 0, 0, 0]}, {'name': ['Monitoring'], 'data': [0, 0, 0, 2, 0]}, {'name': ['Improvement'], 'data': [0, 0, 0, 0, 1]}],
'Migration': [{'name': ['Big'], 'data': [1, 0, 0, 0, 0]}, {'name': ['Small'], 'data': [0, 0, 0, 0, 0]}, {'name': ['Monitoring'], 'data': [2, 0, 0, 0, 0]}, {'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}],
'Channel': [{'name': ['Big'], 'data': [0, 2, 0, 0, 0]}, {'name': ['Small'], 'data': [0, 1, 0, 0, 0]}, {'name': ['Monitoring'], 'data': [0, 1, 0, 0, 0]}, {'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}],
'AVR': [{'name': ['Big'], 'data': [0, 0, 0, 0, 0]}, {'name': ['Small'], 'data': [0, 0, 0, 0, 0]}, {'name': ['Monitoring'], 'data': [0, 1, 0, 0, 0]}, {'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}]},
{'P&G': {'categories': ['April', 'January', 'June', 'March', 'May']}, 'ABCD': {'categories': ['April', 'January', 'June', 'March', 'May']},
'WORLD': {'categories': ['April', 'January', 'June', 'March', 'May']},
'Migration': {'categories': ['April', 'January', 'June', 'March', 'May']},
'Channel': {'categories': ['April', 'January', 'June', 'March', 'May']},
'AVR': {'categories': ['April', 'January', 'June', 'March', 'May']}})
I am getting categories as ['April', 'January', 'June', 'March', 'May'], but I want it to be ['January', 'March', 'April', 'May', 'June'], and the data should also match with this order.
I am creating charts using Highcharts and passing the above data to Django template. An example chart is at https://jsfiddle.net/nbtejvau/9/
Edit: Added Highcharts tag.
Expected output:
data = {'WORLD': {'categories': ['January', 'March', 'April', 'May', 'June'],
'series': [{
'name': 'Big',
'data': [0, 0, 0, 0, 0] # Number of Bigs in those months
}, {
'name': 'Small',
'data': [1, 0, 0, 0, 0] # Number of Smalls in those months
}, {
'name': 'Monitoring',
'data': [0, 2, 0, 0, 0] # Number of Monitorings in those months
}, {
'name': 'Improvement',
'data': [0, 0, 0, 1, 0] # Number of Improvements in those months
}]
},
'P&G': {'categories': ['January', 'March', 'April', 'May', 'June'],
'series': [{
'name': 'Big',
'data': [0, 0, 0, 0, 0]
}, {
'name': 'Small',
'data': [0, 0, 0, 1, 0]
}, {
'name': 'Monitoring',
'data': [0, 2, 0, 0, 0]
}, {
'name': 'Improvement',
'data': [0, 0, 0, 1, 0]
}]
}
}
Similar output expected for the remaining in total_of_each_error['name']
Added swapping logic into your final dataset,
data = {'P&G': [{'name': ['Big'], 'data': [0, 0, 0, 0, 0]},
{'name': ['Small'], 'data': [0, 0, 0, 0, 1]},
{'name': ['Monitoring'], 'data': [0, 0, 1, 0, 1]},
{'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}],
'ABCD': [{'name': ['Big'], 'data': [0, 1, 0, 0, 0]},
{'name': ['Small'], 'data': [0, 0, 1, 0, 0]},
{'name': ['Monitoring'], 'data': [0, 3, 1, 0, 1]},
{'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}],
'WORLD': [{'name': ['Big'], 'data': [0, 0, 0, 0, 0]},
{'name': ['Small'], 'data': [0, 1, 0, 0, 0]},
{'name': ['Monitoring'], 'data': [0, 0, 0, 2, 0]},
{'name': ['Improvement'], 'data': [0, 0, 0, 0, 1]}],
'Migration': [{'name': ['Big'], 'data': [1, 0, 0, 0, 0]},
{'name': ['Small'], 'data': [0, 0, 0, 0, 0]},
{'name': ['Monitoring'], 'data': [2, 0, 0, 0, 0]},
{'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}],
'Channel': [{'name': ['Big'], 'data': [0, 2, 0, 0, 0]},
{'name': ['Small'], 'data': [0, 1, 0, 0, 0]},
{'name': ['Monitoring'], 'data': [0, 1, 0, 0, 0]},
{'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}],
'AVR': [{'name': ['Big'], 'data': [0, 0, 0, 0, 0]},
{'name': ['Small'], 'data': [0, 0, 0, 0, 0]},
{'name': ['Monitoring'], 'data': [0, 1, 0, 0, 0]},
{'name': ['Improvement'], 'data': [0, 0, 0, 0, 0]}]}
x_axis = {'P&G': {'categories': ['April', 'January', 'June', 'March', 'May']},
'ABCD': {'categories': ['April', 'January', 'June', 'March', 'May']},
'WORLD': {'categories': ['April', 'January', 'June', 'March', 'May']},
'Migration': {'categories': ['April', 'January', 'June', 'March', 'May']},
'Channel': {'categories': ['April', 'January', 'June', 'March', 'May']},
'AVR': {'categories': ['April', 'January', 'June', 'March', 'May']}}
target_month_order = ['January', 'March', 'April', 'May', 'June']
final_data = dict()
for name in x_axis.keys():
# Modifying data1
final_data[name] = dict()
final_data[name]['categories'] = target_month_order
# Modifying data2
final_data[name]['series'] = list()
print('Swapping - ', name)
actual_month_order = x_axis[name]['categories']
swap_index = [actual_month_order.index(month) for month in target_month_order]
_tmp = data[name]
for _val in _tmp:
_new_list = []
for swap_idx in swap_index:
_new_list.append(_val['data'][swap_idx])
# print(list(zip(actual_month_order, _val['data'])))
# print(list(zip(target_month_order, _new_list)))
final_data[name]['series'].append({'name': _val['name'], 'data': _new_list})
print('--')
# print(swap_index)
# print(final_data[name]['series'])
print(' *-*' * 20)
import pprint
pprint.pprint(final_data)
Output
*-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-* *-*
{'ABCD': {'categories': ['January', 'March', 'April', 'May', 'June'],
'series': [{'data': [1, 0, 0, 0, 0], 'name': ['Big']},
{'data': [0, 0, 0, 0, 1], 'name': ['Small']},
{'data': [3, 0, 0, 1, 1], 'name': ['Monitoring']},
{'data': [0, 0, 0, 0, 0], 'name': ['Improvement']}]},
'AVR': {'categories': ['January', 'March', 'April', 'May', 'June'],
'series': [{'data': [0, 0, 0, 0, 0], 'name': ['Big']},
{'data': [0, 0, 0, 0, 0], 'name': ['Small']},
{'data': [1, 0, 0, 0, 0], 'name': ['Monitoring']},
{'data': [0, 0, 0, 0, 0], 'name': ['Improvement']}]},
'Channel': {'categories': ['January', 'March', 'April', 'May', 'June'],
'series': [{'data': [2, 0, 0, 0, 0], 'name': ['Big']},
{'data': [1, 0, 0, 0, 0], 'name': ['Small']},
{'data': [1, 0, 0, 0, 0], 'name': ['Monitoring']},
{'data': [0, 0, 0, 0, 0], 'name': ['Improvement']}]},
'Migration': {'categories': ['January', 'March', 'April', 'May', 'June'],
'series': [{'data': [0, 0, 1, 0, 0], 'name': ['Big']},
{'data': [0, 0, 0, 0, 0], 'name': ['Small']},
{'data': [0, 0, 2, 0, 0], 'name': ['Monitoring']},
{'data': [0, 0, 0, 0, 0], 'name': ['Improvement']}]},
'P&G': {'categories': ['January', 'March', 'April', 'May', 'June'],
'series': [{'data': [0, 0, 0, 0, 0], 'name': ['Big']},
{'data': [0, 0, 0, 1, 0], 'name': ['Small']},
{'data': [0, 0, 0, 1, 1], 'name': ['Monitoring']},
{'data': [0, 0, 0, 0, 0], 'name': ['Improvement']}]},
'WORLD': {'categories': ['January', 'March', 'April', 'May', 'June'],
'series': [{'data': [0, 0, 0, 0, 0], 'name': ['Big']},
{'data': [1, 0, 0, 0, 0], 'name': ['Small']},
{'data': [0, 2, 0, 0, 0], 'name': ['Monitoring']},
{'data': [0, 0, 0, 1, 0], 'name': ['Improvement']}]}}

Python writing a .csv file with rows and columns transpose [duplicate]

This question already has answers here:
How to transpose a dataset in a csv file?
(7 answers)
Closed 2 years ago.
What I have is a long list of codes that involves reading different files and in the end putting everything into different .csv
This is all my codes
import csv
import os.path
#open files + readlines
with open("C:/Users/Ivan Wong/Desktop/Placement/Lists of targets/Mouse/UCSC to Ensembl.csv", "r") as f:
reader = csv.reader(f, delimiter = ',')
#find files with the name in 1st row
for row in reader:
graph_filename = os.path.join("C:/Python27/Scripts/My scripts/Selenoprotein/NMD targets",row[0]+"_nt_counts.txt.png")
if os.path.exists(graph_filename):
y = row[0]+'_nt_counts.txt'
r = open('C:/Users/Ivan Wong/Desktop/Placement/fp_mesc_nochx/'+y, 'r')
k = r.readlines()
r.close
del k[:1]
k = map(lambda s: s.strip(), k)
interger = map(int, k)
import itertools
#adding the numbers for every 3 rows
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.izip_longest(*args, fillvalue=fillvalue)
result = map(sum, grouper(3, interger, 0))
e = row[1]
cDNA = open('C:/Users/Ivan Wong/Desktop/Placement/Downloaded seq/Mouse/cDNA.txt', 'r')
seq = cDNA.readlines()
# get all lines that have a gene name
lineNum = 0;
lineGenes = []
for line in seq:
lineNum = lineNum +1
if '>' in line:
lineGenes.append(str(lineNum))
if '>'+e in line:
lineBegin = lineNum
cDNA.close
# which gene is this
index1 = lineGenes.index(str(lineBegin))
lineEnd = lineGenes[index1+1]
# linebegin and lineEnd now give you, where to look for your sequence, all that
# you have to do is to read the lines between lineBegin and lineEnd in the file
# and make it into a single string.
lineEnd = lineGenes[index1+1]
Lastline = int(lineEnd) -1
# in your code you have already made a list with all the lines (q), first delete
# \n and other symbols, then combine all lines into a big string of nucleotides (like this)
qq = seq[lineBegin:Lastline]
qq = map(lambda s: s.strip(), qq)
string = ''
for i in range(len(qq)):
string = string + qq[i]
# now you want to get a list of triplets, again you can use the for loop:
# first get the length of the string
lenString = len(string);
# this is your list codons
listCodon = []
for i in range(0,lenString/3):
listCodon.append(string[0+i*3:3+i*3])
with open(e+'.csv','wb') as outfile:
outfile.writelines(str(result)+'\n'+str(listCodon))
My problem here is the file produced looks like this:
0 0 0
'GCA' 'CTT' 'GGT'
I want to make it like this:
0 GCA
0 CTT
0 GGT
What can I do in my code to achieve this?
print result:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 2, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 3, 3, 0, 3, 1, 2, 1, 2, 1, 0, 1, 0, 1, 2, 1, 0, 5, 0, 0, 0, 0, 6, 0, 1, 0, 0, 2, 0, 1, 0, 0, 1, 1, 0, 1, 6, 34, 35, 32, 1, 1, 0, 4, 1, 0, 1, 0, 0, 0, 0, 1, 6, 0, 0, 0, 0, 1, 3, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
print listCodon:
['gtt', 'gaa', 'aca', 'gag', 'aca', 'tgt', 'tct', 'gga', 'gat', 'gag', 'ctg', 'tgg', 'gca', 'gaa', 'gga', 'cag', 'gcc', 'taa', 'gca', 'cag', 'gca', 'gca', 'gag', 'ctt', 'tga', 'tct', 'ctt', 'ggt', 'gat', 'cgg', 'tgg', 'ggg', 'atc', 'cgg', 'tgg', 'cct', 'agc', 'ttg', 'tgc', 'caa', 'gga', 'agc', 'tgc', 'tca', 'gct', 'ggg', 'aaa', 'gaa', 'ggt', 'ggc', 'tgt', 'ggc', 'tga', 'cta', 'tgt', 'gga', 'acc', 'ttc', 'tcc', 'ccg', 'agg', 'cac', 'caa', 'gtg', 'ggg', 'cct', 'tgg', 'tgg', 'cac', 'ctg', 'tgt', 'caa', 'cgt', 'ggg', 'ttg', 'cat', 'acc', 'caa', 'gaa', 'gct', 'gat', 'gca', 'tca', 'ggc', 'tgc', 'act', 'gct', 'ggg', 'ggg', 'cat', 'gat', 'cag', 'aga', 'tgc', 'tca', 'cca', 'cta', 'tgg', 'ctg', 'gga', 'ggt', 'ggc', 'cca', 'gcc', 'tgt', 'cca', 'aca', 'caa', 'ctg', 'gtg', 'aga', 'gag', 'aag', 'ccc', 'ttg', 'ccc', 'tct', 'gca', 'ggt', 'ccc', 'att', 'gaa', 'agg', 'aga', 'ggt', 'ttg', 'ctc', 'tct', 'gcc', 'act', 'cat', 'ctg', 'taa', 'ccg', 'tga', 'gct', 'ttt', 'cca', 'ccc', 'ggc', 'ctc', 'ctc', 'ttt', 'gat', 'ccc', 'aga', 'ata', 'atg', 'act', 'ctg', 'aga', 'ctt', 'ctt', 'atg', 'tat', 'gaa', 'taa', 'atg', 'cct', 'ggg', 'cca', 'aaa', 'acc']
picture on the left is what Marek's code helped me to achieve, I want to make an improvement so it arrange like the picture on the right
You can use zip() to zip together two iterators. So if you have
result = [0, 0, 0, 0, 0]
listCodons = ['gtt', 'gaa', 'aca', 'gag', 'aca']
then you can do
>>> list(zip(result, listCodons))
[(0, 'gtt'), (0, 'gaa'), (0, 'aca'), (0, 'gag'), (0, 'aca')]
or, for your example:
with open(e+'.csv','w') as outfile:
out = csv.writer(outfile)
out.writerows(zip(result, listCodons))
try this:
proper_result = '\n'.join([ '%s %s' % (nr, codon) for nr, codon in zip(result, listCodon) ] )
Edit (codons split into separate columns):
proper_result = '\n'.join(' '.join([str(nr),] + list(codon)) for nr, codon in zip(nrs, cdns))
Edit (comma separated values):
proper_result = '\n'.join('%s, %s' % (nr, codon) for nr, codon in zip(result, listCodon))

Categories