I'm trying to convert an AMPL model to Pyomo (something I have no experience with using). I'm finding the syntax hard to adapt to, especially the constraint and objective sections. I've already linked my computer together with python, anaconda, Pyomo, and GLPK, and just need to get the actual code down. I'm a beginner coder so forgive me if my code is poorly written. Still trying to get the hang of this!
Here is the data from the AMPL code:
set PROD := 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30;
set PROD1:= 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30;
ProdCost 414 3 46 519 876 146 827 996 922 308 568 176 58 13 20 974 121 751 130 844 280 123 275 843 717 694 72 413 65 631
HoldingCost 606 308 757 851 148 867 336 44 364 960 69 428 778 485 285 938 980 932 199 175 625 513 536 965 366 950 632 88 698 744
Demand 105 70 135 67 102 25 147 69 23 84 32 41 81 133 180 22 174 80 24 156 28 125 23 137 180 151 39 138 196 69
And here is the model:
set PROD; # set of production amounts
set PROD1; # set of holding amounts
param ProdCost {PROD} >= 0; # parameter set of production costs
param Demand {PROD} >= 0; # parameter set of demand at each time
param HoldingCost {PROD} >= 0; # parameter set of holding costs
var Inventory {PROD1} >= 0; # variable that sets inventory amount at each time
var Make {p in PROD} >= 0; # variable of amount produced at each time
minimize Total_Cost: sum {p in PROD} ((ProdCost[p] * Make[p]) + (Inventory[p] * HoldingCost[p]));
# Objective: minimize total cost from all production and holding cost
subject to InventoryConstraint {p in PROD}: Inventory[p] = Inventory[p-1] + Make[p] - Demand[p];
# excess production transfers to inventory
subject to MeetDemandConstraint {p in PROD}: Make[p] >= Demand[p] - Inventory[p-1];
# Constraint: holding and production must exceed demand
subject to InitialInventoryConstraint: Inventory[0] = 0;
# Constraint: Inventory must start at 0
Here's what I have so far. Not sure if it's right or not:
from pyomo.environ import *
demand=[105,70,135,67,102,25,147,69,23,84,32,41,81,133,180,22,174,80,24,156,28,125,23,137,180,151,39,138,196,69]
holdingcost=[606,308,757,851,148,867,336,44,364,960,69,428,778,485,285,938,980,932,199,175,625,513,536,965,366,950,632,88,698,744]
productioncost=[414,3,46,519,876,146,827,996,922,308,568,176,58,13,20,974,121,751,130,844,280,123,275,843,717,694,72,413,65,631]
model=ConcreteModel()
model.I=RangeSet(1,30)
model.J=RangeSet(0,30)
model.x=Var(model.I, within=NonNegativeIntegers)
model.y=Var(model.J, within=NonNegativeIntegers)
model.obj = Objective(expr = sum(model.x[i]*productioncost[i]+model.y[i]*holdingcost[i] for i in model.I))
def InventoryConstraint(model, i):
return model.y[i-1] + model.x[i] - demand[i] <= model.y[i]
InvCont = Constraint(model, rule=InventoryConstraint)
def MeetDemandConstraint(model, i):
return model.x[i] >= demand[i] - model.y[i-1]
DemCont = Constraint(model, rule=MeetDemandConstraint)
def Initial(model):
return model.y[0] == 0
model.Init = Constraint(rule=Initial)
opt = SolverFactory('glpk')
results = opt.solve(model,load_solutions=True)
model.solutions.store_to(results)
results.write()
Thanks!
The only issues I see are in some of your constraint declarations. You need to attach the constraints to the model and the first argument passed in should be the indexing set (which I'm assuming should be model.I).
def InventoryConstraint(model, i):
return model.y[i-1] + model.x[i] - demand[i] <= model.y[i]
model.InvCont = Constraint(model.I, rule=InventoryConstraint)
def MeetDemandConstraint(model, i):
return model.x[i] >= demand[i] - model.y[i-1]
model.DemCont = Constraint(model.I, rule=MeetDemandConstraint)
The syntax that you're using to solve the model is a little out-dated but should work. Another option would be:
opt = SolverFactory('glpk')
opt.solve(model,tee=True) # The 'tee' option prints the solver output to the screen
model.display() # This will print a summary of the model solution
Another command that is useful for debugging is model.pprint(). This will display the entire model including the expressions for Constraints and Objectives.
Related
I am trying to find out what the internal load factor is for the Python sets. For dictionary which uses a hash table with a load factor of 0.66 (2/3) is. The number of buckets start at 8 and when the 6th key is inserted the number of buckets increases to 16
The table below shows the shift in buckets.
bucket
shift
8
5
16
10
32
21
64
42
128
85
This can be seen with de following Python code where the size of a dictionary and sets is shows with the getsizeof method:
import sys
d = {}
s = set()
for x in range(25):
d[x] = 1
s.add(x)
print(len(d), sys.getsizeof(d), sys.getsizeof(s))
# of elements
memory used for dict
memory used for sets
1
232
216
2
232
216
3
232
216
4
232
216
5
232
728
6
360
728
7
360
728
8
360
728
9
360
728
10
360
728
11
640
728
12
640
728
13
640
728
14
640
728
15
640
728
16
640
728
17
640
728
18
640
728
19
640
2264
20
640
2264
21
640
2264
22
1176
2264
23
1176
2264
24
1176
2264
25
1176
2264
The above table shows that the shift in the buckets correct is for the dictionary, but not for the sets. The memory in sets is different.
I am trying to find out what the load factor is for a set. Is that also 2/3? Or am I doing something wrong with the code?
Currently, it's about 3/5. See the source:
if ((size_t)so->fill*5 < mask*3)
return 0;
return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4);
fill is the number of occupied table cells (including "deleted entry" markers), and mask is 1 less than the total table capacity.
I wrote the code below.
the very first line of the resulting output in the Pycharm console is different for the outputs that are not long with those that are long.
I expected that it starts from "1" and scroll more and more for showing more output but it clears the very first lines by itself.
:
code with the output that is not long (note the number "11"):
for x in range(1, 11):
print(str(x), str(x * x), end=' ')
print(str(x * x * x))
the first twelve lines of the console.
the next lines are eliminated due to insignificancy (note it starts from the number "1"):
code with the output that is long (note the number "111111"):
for x in range(1, 111111):
print(str(x), str(x * x), end=' ')
print(str(x * x * x))
the very first line of its console (note it starts from the number "346" instead of "1"):
NOTE:
I know I can see the rest of the results by pressing a key. my question is about the first line of the result, not the last line
That is because the console is only a certain number of lines long (looks like yours is 12) and you are just seeing the last 12 lines. If you scroll up you will see the rest... or you can print in a different way.
for example this:
for x in range(1, 111):
print(str(x), str(x * x), end=' ')
print(str(x * x * x))
# pause every 10 lines
if x % 10 == 0:
input('press a key to continue')
the output:
1 1 1
2 4 8
3 9 27
4 16 64
5 25 125
6 36 216
7 49 343
8 64 512
9 81 729
10 100 1000
press a key to continue
11 121 1331
12 144 1728
13 169 2197
14 196 2744
15 225 3375
16 256 4096
17 289 4913
18 324 5832
19 361 6859
20 400 8000
press a key to continue
21 441 9261
22 484 10648
23 529 12167
24 576 13824
25 625 15625
26 676 17576
27 729 19683
28 784 21952
29 841 24389
30 900 27000
press a key to continue
31 961 29791
32 1024 32768
33 1089 35937
34 1156 39304
35 1225 42875
36 1296 46656
37 1369 50653
38 1444 54872
39 1521 59319
40 1600 64000
press a key to continue
I trying to find vertex similarities using random walk approach, in this work a transition matrix is used. Each time when I tried to run the code implemented using python I get this error. I also read similar question but no specific answer. Can you help me on how to solve this problem, Your help is really appreciated.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-259-2639b08a8eb7> in <module>()
45
46
---> 47 tuple_steps_prob,b=similarities(training_graph,test_edge_list)
48 print(tuple_steps_prob)
49 # pre_list_=Precision(tuple_steps_prob, test_edge_list,test_num,b)
<ipython-input-237-e0348fd15773> in similarities(graph, test_edge_list)
16 prob_vec[0][k] = 1
17 #print(prob_vec)
---> 18 extracted,prob,y=RandomWalk(graph,nodes,adj,prob_vec)
19
20 j=0
<ipython-input-236-6b0298295e01> in RandomWalk(G, nodes, adj, prob_vec)
31 beta_=0.1
32
---> 33 TM = Transition_Matrix(adj,beta_)
34
35 extracted1=[]
~\Desktop\RW\RW\Transition_Probability_Matrix.py in Transition_Matrix(adj, beta_)
18
19 Iden=np.identity(len(TM))
---> 20
21
22 Transition=beta_/(1+beta_) * Iden + 1/(1+beta_) * TM
~\Anaconda3\lib\site-packages\scipy\sparse\linalg\matfuncs.py in inv(A)
72 """
73 I = speye(A.shape[0], A.shape[1], dtype=A.dtype, format=A.format)
---> 74 Ainv = spsolve(A, I)
75 return Ainv
76
~\Anaconda3\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py in spsolve(A, b, permc_spec, use_umfpack)
196 else:
197 # b is sparse
--> 198 Afactsolve = factorized(A)
199
200 if not isspmatrix_csc(b):
~\Anaconda3\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py in factorized(A)
438 return solve
439 else:
--> 440 return splu(A).solve
441
442
~\Anaconda3\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py in splu(A, permc_spec, diag_pivot_thresh, relax, panel_size, options)
307 _options.update(options)
308 return _superlu.gstrf(N, A.nnz, A.data, A.indices, A.indptr,
--> 309 ilu=False, options=_options)
310
311
RuntimeError: Factor is exactly singular
I am trying to test correlation power analysis attack with two different files of ciphertexts: The first file Ciphertexts.mat was already converted from numpy to matlab by using this line of codes:
import scipy.io
import numpy as np
tab_Obs = np.load('C:\\Users\\My_Test_Traces\\Ciphertexts.npy')
scipy.io.savemat('C:\\Users\\My_Test_Traces\\Ciphertexts.mat', {
"tab_Obs":tab_Obs}
)
The result is:
load 'C:/Users/cpa/data/Ciphertexts.mat';
S =
{
ciph_dec =
163 20 11 228 7 53 249 241 134 90 166 177 179 43 86 103
35 22 125 217 16 82 174 101 197 242 118 33 214 232 86 162
77 116 29 212 76 7 155 18 255 101 126 86 235 155 46 11
...........
}
The second file is parsed_cipher_0cm.mat:
load 'C:/Users/cpa/data/parsed_cipher_0cm.mat';
S =
{
ciph_dec =
67 70 185 254 55 71 60 118 165 27 247 120 31 106 154 24
24 51 124 37 190 187 208 55 32 224 134 214 49 173 224 209
192 86 229 54 24 216 91 9 136 132 131 82 44 170 234 33
.......
}
At first, I think that I have the same two files with the same type, after that, when I try to execute the second file gives me the best solution but the execution via the second file gives me this result:
error: binary operator `*' not implemented for `int32 matrix' by `matrix' operations
error: evaluating binary operator `*' near line 57, column 10
My error is the type of the first file, the error is in the calculation of the h1.
I try by this code in matlab:
load 'C:/Users/cpa/data/Ciphertexts.mat';
%load 'C:/Users/cpa/data/parsed_cipher_0cm.mat';
% truncate measurements
n_measures = 999
tab_Obs = tab_Obs(1:n_measures,:);
ciph_dec = ciph_dec(1:n_measures,:);
K=0:255;
disp (length(K));
Y_i = ciph_dec(:,sbox_n)
F = ones(1,length(K))
sbox_n = 2
disp (size(Y_i))
disp (size(F))
h1=(Y_i*ones(1,length(K)))
disp (size(h1))
I did this test to know the type of each file:
I find that:
Ciphertexts.mat---------> type: int32
parsed_cipher_0cm.mat---> type: float64
I need to have the file Ciphertexts.mat' s type= float64, how to resolve the problem please?
MATLAB does not generally allow for matrix multiplication involving integer data types unless one of the operands is a scalar. We can see this with a simple example:
ones(2, 'int8')*ones(2, 'int8')
Which throws an error:
Error using *
MTIMES is not fully supported for integer classes. At least one input must be scalar.
To compute elementwise TIMES, use TIMES (.*) instead.
This likely for integer overflow safety, though there may be other reasons I'm not familiar with. Though the error messages are not exact, the issue is presumably related.
The immediate MATLAB fix is to cast ciph_dec as a double, which should resolve the multiplication issue:
load 'C:/Users/cpa/data/Ciphertexts.mat';
% load 'C:/Users/cpa/data/parsed_cipher_0cm.mat';
% truncate measurements
n_measures = 999;
tab_Obs = tab_Obs(1:n_measures, :);
ciph_dec = double(ciph_dec(1:n_measures, :)); % Force floating point
K = 0:255;
disp(length(K));
Y_i = ciph_dec(:, sbox_n);
F = ones(1, length(K));
sbox_n = 2;
disp(size(Y_i))
disp(size(F))
h1 = (Y_i*ones(1, length(K)));
disp (size(h1))
Try simply converting to double - ciph_dec = double(ciph_dec). Matlab prefers those and as long as you are within int32, double has all these numbers, so you won't be losing precision.
I have a table with 12 columns and want to select the items in the first column (qseqid) based on the second column (sseqid). Meaning that the second column (sseqid) is repeating with different values in the 11th and 12th columns, which areevalueandbitscore, respectively.
The ones that I would like to get are having the lowestevalueand the highestbitscore(whenevalues are the same, the rest of the columns can be ignored and the data is down below).
So, I have made a short code which uses the second columns as a key for the dictionary. I can get five different items from the second column with lists of qseqid+evalueandqseqid+bitscore.
Here is the code:
#!usr/bin/python
filename = "data.txt"
readfile = open(filename,"r")
d = dict()
for i in readfile.readlines():
i = i.strip()
i = i.split("\t")
d.setdefault(i[1], []).append([i[0],i[10]])
d.setdefault(i[1], []).append([i[0],i[11]])
for x in d:
print(x,d[x])
readfile.close()
But, I am struggling to get the qseqid with the lowest evalue and the highest bitscore for each sseqid.
Is there any good logic to solve the problem?
Thedata.txtfile (including the header row and with»representing tab characters)
qseqid»sseqid»pident»length»mismatch»gapopen»qstart»qend»sstart»send»evalue»bitscore
ACLA_022040»TBB»32.71»431»258»8»39»468»24»423»2.00E-76»240
ACLA_024600»TBB»80»435»87»0»1»435»1»435»0»729
ACLA_031860»TBB»39.74»453»251»3»1»447»1»437»1.00E-121»357
ACLA_046030»TBB»75.81»434»105»0»1»434»1»434»0»704
ACLA_072490»TBB»41.7»446»245»3»4»447»3»435»2.00E-120»353
ACLA_010400»EF1A»27.31»249»127»8»69»286»9»234»3.00E-13»61.6
ACLA_015630»EF1A»22»491»255»17»186»602»3»439»8.00E-19»78.2
ACLA_016510»EF1A»26.23»122»61»4»21»127»9»116»2.00E-08»46.2
ACLA_023300»EF1A»29.31»447»249»12»48»437»3»439»2.00E-45»155
ACLA_028450»EF1A»85.55»443»63»1»1»443»1»442»0»801
ACLA_074730»CALM»23.13»147»101»4»6»143»2»145»7.00E-08»41.2
ACLA_096170»CALM»29.33»150»96»4»34»179»2»145»1.00E-13»55.1
ACLA_016630»CALM»23.9»159»106»5»58»216»4»147»5.00E-12»51.2
ACLA_031930»RPB2»36.87»1226»633»24»121»1237»26»1219»0»734
ACLA_065630»RPB2»65.79»1257»386»14»1»1252»4»1221»0»1691
ACLA_082370»RPB2»27.69»1228»667»37»31»1132»35»1167»7.00E-110»365
ACLA_061960»ACT»28.57»147»95»5»146»284»69»213»3.00E-12»57.4
ACLA_068200»ACT»28.73»463»231»13»16»471»4»374»1.00E-53»176
ACLA_069960»ACT»24.11»141»97»4»581»718»242»375»9.00E-09»46.2
ACLA_095800»ACT»91.73»375»31»0»1»375»1»375»0»732
And here's a little more readable version of the table's contents:
0 1 2 3 4 5 6 7 8 9 10 11
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
ACLA_022040 TBB 32.71 431 258 8 39 468 24 423 2.00E-76 240
ACLA_024600 TBB 80 435 87 0 1 435 1 435 0 729
ACLA_031860 TBB 39.74 453 251 3 1 447 1 437 1.00E-121 357
ACLA_046030 TBB 75.81 434 105 0 1 434 1 434 0 704
ACLA_072490 TBB 41.7 446 245 3 4 447 3 435 2.00E-120 353
ACLA_010400 EF1A 27.31 249 127 8 69 286 9 234 3.00E-13 61.6
ACLA_015630 EF1A 22 491 255 17 186 602 3 439 8.00E-19 78.2
ACLA_016510 EF1A 26.23 122 61 4 21 127 9 116 2.00E-08 46.2
ACLA_023300 EF1A 29.31 447 249 12 48 437 3 439 2.00E-45 155
ACLA_028450 EF1A 85.55 443 63 1 1 443 1 442 0 801
ACLA_074730 CALM 23.13 147 101 4 6 143 2 145 7.00E-08 41.2
ACLA_096170 CALM 29.33 150 96 4 34 179 2 145 1.00E-13 55.1
ACLA_016630 CALM 23.9 159 106 5 58 216 4 147 5.00E-12 51.2
ACLA_031930 RPB2 36.87 1226 633 24 121 1237 26 1219 0 734
ACLA_065630 RPB2 65.79 1257 386 14 1 1252 4 1221 0 1691
ACLA_082370 RPB2 27.69 1228 667 37 31 1132 35 1167 7.00E-110 365
ACLA_061960 ACT 28.57 147 95 5 146 284 69 213 3.00E-12 57.4
ACLA_068200 ACT 28.73 463 231 13 16 471 4 374 1.00E-53 176
ACLA_069960 ACT 24.11 141 97 4 581 718 242 375 9.00E-09 46.2
ACLA_095800 ACT 91.73 375 31 0 1 375 1 375 0 732
Since you're a Python newbie I'm glad that there are several examples of how to this manually, but for comparison I'll show how it can be done using the pandas library which makes working with tabular data much simpler.
Since you didn't provide example output, I'm assuming that by "with the lowest evalue and the highest bitscore for each sseqid" you mean "the highest bitscore among the lowest evalues" for a given sseqid; if you want those separately, that's trivial too.
import pandas as pd
df = pd.read_csv("acla1.dat", sep="\t")
df = df.sort(["evalue", "bitscore"],ascending=[True, False])
df_new = df.groupby("sseqid", as_index=False).first()
which produces
>>> df_new
sseqid qseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
0 ACT ACLA_095800 91.73 375 31 0 1 375 1 375 0.000000e+00 732.0
1 CALM ACLA_096170 29.33 150 96 4 34 179 2 145 1.000000e-13 55.1
2 EF1A ACLA_028450 85.55 443 63 1 1 443 1 442 0.000000e+00 801.0
3 RPB2 ACLA_065630 65.79 1257 386 14 1 1252 4 1221 0.000000e+00 1691.0
4 TBB ACLA_024600 80.00 435 87 0 1 435 1 435 0.000000e+00 729.0
Basically, first we read the data file into an object called a DataFrame, which is kind of like an Excel worksheet. Then we sort by evalue ascending (so that lower evalues come first) and by bitscore descending (so that higher bitscores come first). Then we can use groupby to collect the data in groups of equal sseqid, and take the first one in each group, which because of the sorting will be the one we want.
#!usr/bin/python
import csv
DATA = "data.txt"
class Sequence:
def __init__(self, row):
self.qseqid = row[0]
self.sseqid = row[1]
self.pident = float(row[2])
self.length = int(row[3])
self.mismatch = int(row[4])
self.gapopen = int(row[5])
self.qstart = int(row[6])
self.qend = int(row[7])
self.sstart = int(row[8])
self.send = int(row[9])
self.evalue = float(row[10])
self.bitscore = float(row[11])
def __str__(self):
return (
"{qseqid}\t"
"{sseqid}\t"
"{pident}\t"
"{length}\t"
"{mismatch}\t"
"{gapopen}\t"
"{qstart}\t"
"{qend}\t"
"{sstart}\t"
"{send}\t"
"{evalue}\t"
"{bitscore}"
).format(**self.__dict__)
def entries(fname, header_rows=1, dtype=list, **kwargs):
with open(fname) as inf:
incsv = csv.reader(inf, **kwargs)
# skip header rows
for i in range(header_rows):
next(incsv)
for row in incsv:
yield dtype(row)
def main():
bestseq = {}
for seq in entries(DATA, dtype=Sequence, delimiter="\t"):
# see if a sequence with the same sseqid already exists
prev = bestseq.get(seq.sseqid, None)
if (
prev is None
or seq.evalue < prev.evalue
or (seq.evalue == prev.evalue and seq.bitscore > prev.bitscore)
):
bestseq[seq.sseqid] = seq
# display selected sequences
keys = sorted(bestseq)
for key in keys:
print(bestseq[key])
if __name__ == "__main__":
main()
which results in
ACLA_095800 ACT 91.73 375 31 0 1 375 1 375 0.0 732.0
ACLA_096170 CALM 29.33 150 96 4 34 179 2 145 1e-13 55.1
ACLA_028450 EF1A 85.55 443 63 1 1 443 1 442 0.0 801.0
ACLA_065630 RPB2 65.79 1257 386 14 1 1252 4 1221 0.0 1691.0
ACLA_024600 TBB 80.0 435 87 0 1 435 1 435 0.0 729.0
While not nearly as elegant and concise as using thepandaslibrary, it's quite possible to do what you want without resorting to third-party modules. The following uses thecollections.defaultdictclass to facilitate creation of dictionaries of variable-length lists of records. The use of theAttrDictclass is optional, but it makes accessing the fields of each dictionary-based records easier and is less awkward-looking than the usualdict['fieldname']syntax otherwise required.
import csv
from collections import defaultdict, namedtuple
from itertools import imap
from operator import itemgetter
data_file_name = 'data.txt'
DELIMITER = '\t'
ssqeid_dict = defaultdict(list)
# from http://stackoverflow.com/a/1144405/355230
def multikeysort(items, columns):
comparers = [((itemgetter(col[1:].strip()), -1) if col.startswith('-') else
(itemgetter(col.strip()), 1)) for col in columns]
def comparer(left, right):
for fn, mult in comparers:
result = cmp(fn(left), fn(right))
if result:
return mult * result
else:
return 0
return sorted(items, cmp=comparer)
# from http://stackoverflow.com/a/15109345/355230
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
with open(data_file_name, 'rb') as data_file:
reader = csv.DictReader(data_file, delimiter=DELIMITER)
format_spec = '\t'.join([('{%s}' % field) for field in reader.fieldnames])
for rec in (AttrDict(r) for r in reader):
# Convert the two sort fields to numeric values for proper ordering.
rec.evalue, rec.bitscore = map(float, (rec.evalue, rec.bitscore))
ssqeid_dict[rec.sseqid].append(rec)
for ssqeid in sorted(ssqeid_dict):
# Sort each group of recs with same ssqeid. The first record after sorting
# will be the one sought that has the lowest evalue and highest bitscore.
selected = multikeysort(ssqeid_dict[ssqeid], ['evalue', '-bitscore'])[0]
print format_spec.format(**selected)
Output (»represents tabs):
ACLA_095800» ACT» 91.73» 375» 31» 0» 1» 375» 1» 375» 0.0» 732.0
ACLA_096170» CALM» 29.33» 150» 96» 4» 34» 179» 2» 145» 1e-13» 55.1
ACLA_028450» EF1A» 85.55» 443» 63» 1» 1» 443» 1» 442» 0.0» 801.0
ACLA_065630» RPB2» 65.79» 1257» 386» 14» 1» 1252» 4» 1221» 0.0» 1691.0
ACLA_024600» TBB» 80» 435» 87» 0» 1» 435» 1» 435» 0.0» 729.0
filename = 'data.txt'
readfile = open(filename,'r')
d = dict()
sseqid=[]
lines=[]
for i in readfile.readlines():
sseqid.append(i.rsplit()[1])
lines.append(i.rsplit())
sorted_sseqid = sorted(set(sseqid))
sdqDict={}
key =None
for sorted_ssqd in sorted_sseqid:
key=sorted_ssqd
evalue=[]
bitscore=[]
qseid=[]
for line in lines:
if key in line:
evalue.append(line[10])
bitscore.append(line[11])
qseid.append(line[0])
sdqDict[key]=[qseid,evalue,bitscore]
print sdqDict
print 'TBB LOWEST EVALUE' + '---->' + min(sdqDict['TBB'][1])
##I think you can do the list manipulation below to find out the qseqid
readfile.close()