Iterate through a file and an array simultaneously python for loop - python

I am trying to iterate through a file and add a new column into it instead of one that is present in the file using two concurrent for loops. But i dont know how to iterate the array part.
I have an array aa=[1,2,3,4,5]
My file is:
I a 0
II b 0
III c 0
IV d 0
V f 0
I want it like:
I a 1
II b 2
III c 3
IV d 4
V f 5
I tried python code:
cmg=[1,2,3,4,5]
fh=open("plink5.map",'r')
fhnew=open("plink5.out",'w+')
for line,i in zip(fh,(0,len(cmg)-1,1)):
line=line.strip('\n')
aa=line.split('\t')
aanew=str(aa[0])+"\t"+str(aa[1])+"\t"+str(cmg[i])
print(aanew)
fhnew.write(aanew)
fh.close()
fhnew.close()
I get error in the array iteration part

What you were trying to do is:
for line,i in zip(fh,range((0,len(cmg) ,1))):
^^^^^ ^^
But what would be easier:
for line,x in zip(fh, cmg):

Related

Python nested loop - table index as variable

I am not Python programmer, but I need to use some method from SciPy library. I just want to repeat inner loop a couple of times, but with changed index of table. Here is my code for now:
from scipy.stats import pearsonr
fileName = open('ILPDataset.txt', 'r')
attributeValue, classValue = [], []
for index in range(0, 10, 1):
for line in fileName.readlines():
data = line.split(',')
attributeValue.append(float(data[index]))
classValue.append(float(data[10]))
print(index)
print(pearsonr(attributeValue, classValue))
And I am getting the following output:
0
(-0.13735062681256097, 0.0008840631556260505)
1
(-0.13735062681256097, 0.0008840631556260505)
2
(-0.13735062681256097, 0.0008840631556260505)
3
(-0.13735062681256097, 0.0008840631556260505)
4
(-0.13735062681256097, 0.0008840631556260505)
5
(-0.13735062681256097, 0.0008840631556260505)
6
(-0.13735062681256097, 0.0008840631556260505)
7
(-0.13735062681256097, 0.0008840631556260505)
8
(-0.13735062681256097, 0.0008840631556260505)
9
(-0.13735062681256097, 0.0008840631556260505)
As you can see index is changing, but the result of that function is always like the index would be 0.
When I am running script couple of times but with changing index value like this:
attributeValue.append(float(data[0]))
attributeValue.append(float(data[1]))
...
attributeValue.append(float(data[9]))
everything is ok, and I am getting correct results, but I can't do it in one loop statement. What am I doing wrong?
EDIT:
Test file:
62,1,6.8,3,542,116,66,6.4,3.1,0.9,1
40,1,1.9,1,231,16,55,4.3,1.6,0.6,1
63,1,0.9,0.2,194,52,45,6,3.9,1.85,2
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,6.2,3,240,1680,850,7.2,4,1.2,1
20,1,1.1,0.5,128,20,30,3.9,1.9,0.95,2
84,0,0.7,0.2,188,13,21,6,3.2,1.1,2
57,1,4,1.9,190,45,111,5.2,1.5,0.4,1
52,1,0.9,0.2,156,35,44,4.9,2.9,1.4,1
57,1,1,0.3,187,19,23,5.2,2.9,1.2,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
30,1,1.3,0.4,482,102,80,6.9,3.3,0.9,1
17,0,0.7,0.2,145,18,36,7.2,3.9,1.18,2
46,0,14.2,7.8,374,38,77,4.3,2,0.8,1
Expected results of pearsonr for 9 script runs:
data[0] (0.06050513030608389, 0.8238536636813034)
data[1] (-0.49265895172303803, 0.052525691067199995)
data[2] (-0.5073312383613632, 0.0448647312201305)
data[3] (-0.4852842899321005, 0.056723468068371544)
data[4] (-0.2919584357031029, 0.27254138535817224)
data[5] (-0.41640591455640696, 0.10863082761524119)
data[6] (-0.46954072465442487, 0.0665061785375443)
data[7] (0.08874739193909209, 0.7437895010751641)
data[8] (0.3104260624799073, 0.24193152445774302)
data[9] (0.2943030868699842, 0.26853066217221616)
Turn each line of the file into a list of floats
data = []
with open'ILPDataset.txt') as fileName:
for line in fileName:
line = line.strip()
line = line.split(',')
line = [float(item) for item in line[:11]]
data.append(line)
Transpose the data so that each list in data has the column values from the original file. data --> [[column 0 items], [column 1 items],[column 2 items],...]
data = zip(*data) # for Python 2.7x
#data = list(zip(*data)) # for python 3.x
Correlate:
for n in [0,1,2,3,4,5,6,7,8,9]:
corr = pearsonr(data[n], data[10])
print('data[{}], {}'.format(n, corr))
#wwii 's answer is very good
Only one suggestion. list(zip(*data)) seems a bit overkill to me. zip is really for lists with variable types and potentially variable lengths to be composed into tuples. Only then be transformed back into lists in this case with list()).
So why not just use the simple transpose operation which is what this is?
import numpy;
//...
data = numpy.transpose(data);
which does the same job, probably faster (not measure) and more deterministically.

ValueError: math domain error While Using Logarithms

I am currently working on a code to find a value C which I will then compare against other parameters. However, whenever I try to run my code I receive this error: ValueError: math domain error. I am unsure why I am receiving this error, though I think it's how I setup my equation. Maybe there is a better way to write it. This is my code:
import os
import math
path = "C:\Users\Documents\Research_Papers"
uH2 =5
uHe = 3
eH2 = 2
eHe = 6
R = ((uH2*eH2)/(uHe*eHe))
kH2=[]
kHe=[]
print(os.getcwd()) # see where you are
os.chdir(path) # use a raw string so the backslashes are ok
print(os.getcwd()) # convince yourself that you're in the right place
print(os.listdir(path)) # make sure the file is in here
myfile=open("hcl#hfs.dat.txt","r")
lines=myfile.readlines()
for x in lines:
kH2.append(x.split(' ')[1])
kHe.append(x.split(' ')[0])
myfile.close()
print kH2
print kHe
g = len(kH2)
f = len(kHe)
print g
print f
for n in range(0,7):
C = (((math.log(float(kH2[n]),10)))-(math.log(float(kHe[n]),10)))/math.log(R,10)
print C
It then returns this line saying that there is a domain error.
C = (((math.log(float(kH2[n]),10)))-(math.log(float(kHe[n]),10)))/math.log(R,10)
ValueError: math domain error
Also, for the text file, I am just using a random list of 6 numbers for now as I am trying to get my code working before I put the real list of numbers in. The numbers I am using are:
5 10 4 2
6 20 1 2
7 30 4 2
8 40 3 2
9 23 1 2
4 13 6 2
Try to check if the value inside the log is positive as non-positive value to a log function is a domain error.
Hope this helps.

converting a file of numbers in to a list

to put it simply i am trying to read a file that will eventually have nothing but numbers in them either seperated by spaces, commas, or new lines. I have read through alot of these post and fixed somethings. I learned they are imported as strings first. however i am running into an issue where its importing the numbers as list. so now i have a list of list. this would be fine except i cant have it checked by ints or have numbers added to it. the idea is to have each user asigned a number and then saved. im not worrying about saving right now im just worried about importing the numbers and being able to use them as individual numbers.
my code thus far:
fo1 = open('mach_uID_3.txt', 'a+')
t1 = fo1.read()
t2 = []
print t1
for x in t1.split():
print x
z = [int(n) for n in x.split()]
t2.append(z)
print t2
print t2[3]
fo1.close()
and the file its reading is.
0 1 2 25
34
23
my results are pretty ugly but here you go.
0 1 2 25
34
23
0
1
2
25
34
23
[[0], [1], [2], [25], [34], [23]]
[25]
Process finished with exit code 0
Use extend instead of append:
t2.extend(int(n) for n in x.split())
To have all the numbers in a single, flattened list, do this:
fo1 = open('mach_uID_3.txt', 'a+')
number_list = list(map(int, fo1.read().split())
fo1.close()
But it's better to open the file like this:
with open('mach_uID_3.txt', 'a+') as fo1:
number_list = list(map(int, fo1.read().split())
so you don't have to explicitly close it.

how to get result from many list in same line in python

I have three list:
alist=[1,2,3,4,5]
blist=['a','b','c','d','e']
clist=['#','#','$','&','*']
I want my output in this format:
1 2 3 4 5
a b c d e
# # $ & *
I am able to print in correct format but when i am having list with many elements it's actually printing like this:
1 2 3 4 5 6 ..........................................................................
................................................................................
a b c d e ............................................................................
......................................................................................
# # $ & * .............................................................................
.......................................................................................
but I want my output like this:
12345....................................................................
abcde...................................................................
##$&*...................................................................
............................................................... {this line is from alist}
................................................................ {this line is from blist}
................................................................ {this line is from clist}
Try the following:
term_width = 80
all_lists = (alist, blist, clist)
length = max(map(len, all_lists))
for offset in xrange(0, length, term_width):
print '\n'.join(''.join(map(str, l[offset:offset+term_width])) for l in all_lists)
This assumes terminal width is 80 characters, which is the default. You might want to detect it's actual width with curses library or something based on it.
Either way, to adapt to any output width you only need to change term_width value and the code will use it.
It also assumes all elements are 1-character long. If it's not the case, please clarify.
If you need to detect terminal width, you may find some solutions here: How to get Linux console window width in Python

How should I use Numpy's vstack method?

Firstly, here is the relevant part of the code:
stokes_list = np.zeros(shape=(numrows,1024)) # 'numrows' defined earlier
for i in range(numrows):
epoch_name = y['filename'][i] # 'y' is an array from earlier
os.system('pdv -t {0} > temp.txt '.format(epoch_name)) # 'pdv' is a command from another piece of software - here I copy the output into a temporary file
stokes_line = np.genfromtxt('temp.txt', usecols=3, dtype=[('stokesI','float')], skip_header=1)
stokes_list = np.vstack((stokes_line,stokes_line))
So, basically, every time the code loops around, stokes_line pulls one of the columns (4th one) from the file temp.txt, and I want it to add a line to stokes_list each time.
For example, if the first stokes_line is
1.1 2.2 3.3
and the second is
4.4 5.5 6.6
then stokes_list will be
1.1 2.2 3.3
4.4 5.5 6.6
and will keep growing...
It's not working at the moment, because I think that the line:
stokes_list = np.vstack((stokes_line,stokes_line))
is not correct. It's only stacking 2 lists - which makes sense as I only have 2 arguments. I basically would like to know how I keep stacking again and again.
Any help would be very gratefully received!
If it is needed, here is an example of the format of the temp.txt file:
File: t091110_065921.SFTC Src: J1903+0925 Nsub: 1 Nch: 1 Npol: 4 Nbin: 1024 RMS: 0.00118753
0 0 0 0.00148099 -0.00143755 0.000931365 -0.00296775
0 0 1 0.000647476 -0.000896698 0.000171287 0.00218597
0 0 2 0.000704697 -0.00052846 -0.000603842 -0.000868739
0 0 3 0.000773361 -0.00234724 -0.0004112 0.00358033
0 0 4 0.00101559 -0.000691062 0.000196023 -0.000163109
0 0 5 -0.000220367 -0.000944024 0.000181002 -0.00268215
0 0 6 0.000311783 0.00191545 -0.00143816 -0.00213856
vstacking again and again is not good, because it copies the whole arrays.
Create a normal Python list, .append to it and then pass it whole to np.vstack to create a new array once.
stokes_list = []
for i in xrange(numrows):
...
stokes_line = ...
stokes_list.append(stokes_line)
big_stokes = np.vstack(stokes_list)
You already know the final size of the stokes_list array since you know numrows. So it seems you don't need to grow an array (which is very inefficient). You can simply assign the correct row at each iteration.
Simply replace your last line by :
stokes_list[i] = stokes_line
By the way, about your non-working line I think you meant :
stokes_list = np.vstack((stokes_list, stokes_line))
where you're replacing stokes_list by its new value.

Categories