This is my code
for i,val in enumerate(DS3Y_pred_trans):
if val < 1.5:
DS3Y_pred_trans[i] = 1
else:
DS3Y_pred_trans[i] = 2
There are values less than 1.5 in the list, but the out is all 2s.
What am I missing?
This is the whole code.
from numpy import genfromtxt
DS3X_train = np.genfromtxt('train.csv', dtype=float, delimiter=',')
print DS3X_train
DS3Y_train = np.genfromtxt('train_labels.csv', dtype=int, delimiter=',' )
print DS3Y_train
DS3X_test = np.genfromtxt('test.csv', dtype=float, delimiter=',')
print DS3X_test
DS3Y_test = np.genfromtxt('test_labels.csv', dtype=int, delimiter=',' )
print DS3Y_test
DS3X_train_trans = zip(*DS3X_train)
cov_train = np.cov(DS3X_train_trans)
U, s, V = np.linalg.svd(cov_train, full_matrices=True)
u = U[:,:-1]
u_trans = zip(*u)
DS3X_train_reduced = np.dot(u_trans,DS3X_train_trans)
b = np.ones((3,2000))
b[1:,:] = DS3X_train_reduced
print "\n"
DS3X_train_reduced = b
DS3X_train_reduced_trans = zip(*DS3X_train_reduced)
temp = np.dot(DS3X_train_reduced,DS3X_train_reduced_trans)
try:
inv_temp = np.linalg.inv(temp)
except np.linalg.LinAlgError:
pass
else:
psue_inv = np.dot(inv_temp,DS3X_train_reduced)
print psue_inv.shape
weight = np.dot(psue_inv,DS3Y_train)
weight_trans = zip(weight)
print weight_trans
DS3X_test_trans = zip(*DS3X_test)
DS3X_test_reduced = np.dot(u_trans,DS3X_test_trans)
b = np.ones((3,400))
b[1:,:] = DS3X_test_reduced
print "\n"
print b
DS3X_test_reduced = b
print DS3X_test_reduced.shape
DS3X_test_reduced_trans = zip(*DS3X_test_reduced)
DS3Y_pred = np.dot(DS3X_test_reduced_trans,weight_trans)
print DS3Y_pred
print DS3Y_pred.shape
DS3Y_pred_trans = zip(DS3Y_pred)
print repr(DS3Y_pred_trans[0])
for i,val in enumerate(DS3Y_pred_trans):
if val < 1.5:
DS3Y_pred_trans[i] = 1
else:
DS3Y_pred_trans[i] = 2
print DS3Y_pred
now regression using indicator variable and graph plottings
Your values are not numbers. In Python 2 numbers sort before other objects, so when comparing val with 1.5, the comparison is always false.
You probably have strings:
>>> '1.0' < 1.5
False
>>> 1.0 < 1.5
True
If so, convert your values to floats first:
for i, val in enumerate(DS3Y_pred_trans):
if float(val) < 1.5:
DS3Y_pred_trans[i] = 1
else:
DS3Y_pred_trans[i] = 2
It could be that you are storing other objects in the list still; you'll need to take a close look at what is actually in the list and adjust your code accordingly, or fix how the list is created in the first place.
Since you are replacing all values anyway, you could use a list comprehension:
DS3Y_pred_trans = [1 if float(val) < 1.5 else 2 for val in DS3Y_pred_trans]
Related
I have an array like this:
A = [0,0,3,6,6,7,8,11,11,22]
And I want to remove the elements that appear an even number of times inside the array, so you get:
res = [3,7,8,22]
Is this possible using numpy?
Actually you don't need numpy for this type of array manipulation. I tried using pure python.
def removeevencountelements(listarg) :
minelement = min(listarg)
maxelement = max(listarg)
uniqueelementsset = set(listarg)
outputlist = [ ]
for i in range(0 , len(uniqueelementsset)) :
if ((listarg.count((list(uniqueelementsset)[i]))) % 2 == 1) :
for i2 in range((listarg.count((list(uniqueelementsset)[i])))) :
outputlist.append((list(uniqueelementsset)[i]))
return outputlist
A = [1,1,1,2,2,3,5,7,8,9,10,10,10,10,12,12,12,15,1]
print(removeevencountelements(A))
Another way to implemet it, using a dictionary (only looping at most twice on all elements):
def f1(arr):
result = []
counter_dict = {}
for num in arr:
if num in counter_dict:
counter_dict[num] += 1
else:
counter_dict[num] = 1
for key in counter_dict:
if counter_dict[key] % 2 == 1:
result.append(key)
return result
edit: If you need to keep all original appearances of the array then this works:
def f2(arr):
result = []
counter_dict = {}
for num in arr:
if num in counter_dict:
counter_dict[num] += 1
else:
counter_dict[num] = 1
for key in counter_dict:
if counter_dict[key] % 2 == 1:
result.extend([key]*counter_dict[key])
return result
input: A = [0,0,3,6,6,7,8,11,11,22,22,22]
output f1: [3,7,8,22]
output f2: [3,7,8,22,22,22]
Here is a very simple way to accomplish this:
resultArray = [a for a in arr if arr.count(a) % 2 != 0]
where 'arr' is your original array.
import os; import time;
not_command_error = "error: not an command"
empty_error = "error: file empty"
def read(file):
e = open(file, "r")
b = e.readlines()
e.close()
return b
code_file = "put your code here"
e = read(code_file)
if e == []:
print("\033[0;37;41m",empty_error)
time.sleep(90000)
count = len(e)
print(count)
g = 0
l = 0
while True:
l+= 1
t = open(code_file, "r")
y = t.readlines(l)
t.close()
k = len(y)
print(y[k])
u = y[k]
g+= 1
if count == g:
break
this is my code, and I get and index out of range error, any help?
i tried changing the format and it still didn't work.
i get index out of range error, do i not use a variable?
This part of your code will throw an index out of range error:
k = len(y)
print(y[k])
Indices for lists in Python go from 0 to len(x) - 1, so to access the last element, k should equal len(y) - 1.
Even better (thanks, #MarkK!), you can use negative indices to access the end of the array:
print(y[-1])
I'm working on this python problem:
Given a sequence of the DNA bases {A, C, G, T}, stored as a string, returns a conditional probability table in a data structure such that one base (b1) can be looked up, and then a second (b2), to get the probability p(b2 | b1) of the second base occurring immediately after the first. (Assumes the length of seq is >= 3, and that the probability of any b1 and b2 which have never been seen together is 0. Ignores the probability that b1 will be followed by the end of the string.)
You may use the collections module, but no other libraries.
However I'm running into a roadblock:
word = 'ATCGATTGAGCTCTAGCG'
def dna_prob2(seq):
tbl = dict()
levels = set(word)
freq = dict.fromkeys(levels, 0)
for i in seq:
freq[i] += 1
for i in levels:
tbl[i] = {x:0 for x in levels}
lastlevel = ''
for i in tbl:
if lastlevel != '':
tbl[lastlevel][i] += 1
lastlevel = i
for i in tbl:
print(i,tbl[i][i] / freq[i])
return tbl
tbl['T']['T'] / freq[i]
Basically, the end result is supposed to be the final line tbl you see above. However, when I try to do that in print(i,tbl[i][i] /freq[i), and run dna_prob2(word), I get 0.0s for everything.
Wondering if anyone here can help out.
Thanks!
I am not sure what it is your code is doing, but this works:
def makeprobs(word):
singles = {}
probs = {}
thedict={}
ll = len(word)
for i in range(ll-1):
x1 = word[i]
x2 = word[i+1]
singles[x1] = singles.get(x1, 0)+1.0
thedict[(x1, x2)] = thedict.get((x1, x2), 0)+1.0
for i in thedict:
probs[i] = thedict[i]/singles[i[0]]
return probs
I finally got back to my professor. This is what it was trying to accomplish:
word = 'ATCGATTGAGCTCTAGCG'
def dna_prob2(seq):
tbl = dict()
levels = set(seq)
freq = dict.fromkeys(levels, 0)
for i in seq:
freq[i] += 1
for i in levels:
tbl[i] = {x:0 for x in levels}
lastlevel = ''
for i in seq:
if lastlevel != '':
tbl[lastlevel][i] += 1
lastlevel = i
return tbl, freq
condfreq, freq = dna_prob2(word)
print(condfreq['T']['T']/freq['T'])
print(condfreq['G']['A']/freq['A'])
print(condfreq['C']['G']/freq['G'])
Hope this helps.
what does this error mean?
this is my code:
import csv
from statistics import mean
averages = list()
sorted_averages = list()
dic = dict()
with open('first.csv') as fopen:
reader = csv.reader(fopen)
for line in reader:
name = line[0]
line = line[1:]
counter = 0
for i in line:
i = float(i)
line[counter] = i
counter += 1
average = mean(line)
averages.append(average)
dic[name] = average
for i in range(0, len(averages)):
maxi = 0
maxi1 = 0
for number in averages:
if number > maxi:
maxi = number
elif number == maxi:
maxi = number
maxi1 = number
else:
maxi = maxi
sorted_averages.append(maxi)
averages.remove(maxi)
del(averages)
insorted_averages = sorted_averages.reverse()
for z in insorted_averages[:3]:
print(z)
i have sorted my list from max to min. Now i want to print 3 worth averages but i got that error. i have done with with changing 3 to -4 but it didnt work too
.reverse() reverses your list in-place and returns None:
sorted_averages = list(range(3))
insorted_averages = sorted_averages.reverse()
print(insorted_averages)
insorted_averages is now None. sorted_averages is reversed though...
read (4) in the python doc under mutable sequence types.
I wrote the following piece of code:
def all_di(fl):
dmm = {}
for k in range(2):
for i in fl:
for m in range (len(i)-1):
temp = i[m:m+k+1]
if temp in dmm:
dmm[temp] += 1.0
else:
dmm[temp] = 1.0
## return dmm
p = raw_input("Enter a 2 AA long seq:")
sum = 0
for x,y in dmm.iteritems():
if x == p:
n1 = y
for l,m in dmm.iteritems():
if l[0] == p[0]:
sum = sum + m
print float(n1)/float(sum)
all_di(inh)
if inh = {'VE':16,'GF':19,'VF':23,'GG' :2}
The code works as follows:
Enter a 2 AA long seq: VE
result will be = 16/(16+23) = 0.41
How it works: the function searches the dictionary dmm for the key similar to the one entered in input (example taken here 'VE'). It stores its value and then searches for all the key-value pairs that have the 1st letter in common and adds all its values and returns a fraction.
VE = 16
**V**E + **V**F = 39
= 16/39 = 0.41
What I want: keeping the function intact, I want to have a secondary dictionary that iterates for every key-value pair in the dictionary and stores the fractional values of it in a different dictionary such that:
new_dict = {'VE' : 0.41, 'GF':0.90,'VF':0.51, 'GG': 0.09}
I don't want to remove the print statement as it is the output for my program. I however need the new_dict for further work.
def all_di(fl,p=0):
dmm = {}
interactive = p == 0
if interactive:
p = raw_input("Enter a 2 AA long seq:")
if p in fl:
numer = fl[p]
denom = 0.0
for t in fl:
if t[0] == p[0]:
denom = denom + fl[t]
if interactive:
print numer / denom
return numer / denom
inh = {'VE':16,'GF':19,'VF':23,'GG' :2}
all_di(inh)
new_dict = {x:all_di(inh, x) for x in inh}
print new_dict