I am not Python programmer, but I need to use some method from SciPy library. I just want to repeat inner loop a couple of times, but with changed index of table. Here is my code for now:
from scipy.stats import pearsonr
fileName = open('ILPDataset.txt', 'r')
attributeValue, classValue = [], []
for index in range(0, 10, 1):
for line in fileName.readlines():
data = line.split(',')
attributeValue.append(float(data[index]))
classValue.append(float(data[10]))
print(index)
print(pearsonr(attributeValue, classValue))
And I am getting the following output:
0
(-0.13735062681256097, 0.0008840631556260505)
1
(-0.13735062681256097, 0.0008840631556260505)
2
(-0.13735062681256097, 0.0008840631556260505)
3
(-0.13735062681256097, 0.0008840631556260505)
4
(-0.13735062681256097, 0.0008840631556260505)
5
(-0.13735062681256097, 0.0008840631556260505)
6
(-0.13735062681256097, 0.0008840631556260505)
7
(-0.13735062681256097, 0.0008840631556260505)
8
(-0.13735062681256097, 0.0008840631556260505)
9
(-0.13735062681256097, 0.0008840631556260505)
As you can see index is changing, but the result of that function is always like the index would be 0.
When I am running script couple of times but with changing index value like this:
attributeValue.append(float(data[0]))
attributeValue.append(float(data[1]))
...
attributeValue.append(float(data[9]))
everything is ok, and I am getting correct results, but I can't do it in one loop statement. What am I doing wrong?
EDIT:
Test file:
62,1,6.8,3,542,116,66,6.4,3.1,0.9,1
40,1,1.9,1,231,16,55,4.3,1.6,0.6,1
63,1,0.9,0.2,194,52,45,6,3.9,1.85,2
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,6.2,3,240,1680,850,7.2,4,1.2,1
20,1,1.1,0.5,128,20,30,3.9,1.9,0.95,2
84,0,0.7,0.2,188,13,21,6,3.2,1.1,2
57,1,4,1.9,190,45,111,5.2,1.5,0.4,1
52,1,0.9,0.2,156,35,44,4.9,2.9,1.4,1
57,1,1,0.3,187,19,23,5.2,2.9,1.2,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
30,1,1.3,0.4,482,102,80,6.9,3.3,0.9,1
17,0,0.7,0.2,145,18,36,7.2,3.9,1.18,2
46,0,14.2,7.8,374,38,77,4.3,2,0.8,1
Expected results of pearsonr for 9 script runs:
data[0] (0.06050513030608389, 0.8238536636813034)
data[1] (-0.49265895172303803, 0.052525691067199995)
data[2] (-0.5073312383613632, 0.0448647312201305)
data[3] (-0.4852842899321005, 0.056723468068371544)
data[4] (-0.2919584357031029, 0.27254138535817224)
data[5] (-0.41640591455640696, 0.10863082761524119)
data[6] (-0.46954072465442487, 0.0665061785375443)
data[7] (0.08874739193909209, 0.7437895010751641)
data[8] (0.3104260624799073, 0.24193152445774302)
data[9] (0.2943030868699842, 0.26853066217221616)
Turn each line of the file into a list of floats
data = []
with open'ILPDataset.txt') as fileName:
for line in fileName:
line = line.strip()
line = line.split(',')
line = [float(item) for item in line[:11]]
data.append(line)
Transpose the data so that each list in data has the column values from the original file. data --> [[column 0 items], [column 1 items],[column 2 items],...]
data = zip(*data) # for Python 2.7x
#data = list(zip(*data)) # for python 3.x
Correlate:
for n in [0,1,2,3,4,5,6,7,8,9]:
corr = pearsonr(data[n], data[10])
print('data[{}], {}'.format(n, corr))
#wwii 's answer is very good
Only one suggestion. list(zip(*data)) seems a bit overkill to me. zip is really for lists with variable types and potentially variable lengths to be composed into tuples. Only then be transformed back into lists in this case with list()).
So why not just use the simple transpose operation which is what this is?
import numpy;
//...
data = numpy.transpose(data);
which does the same job, probably faster (not measure) and more deterministically.
to put it simply i am trying to read a file that will eventually have nothing but numbers in them either seperated by spaces, commas, or new lines. I have read through alot of these post and fixed somethings. I learned they are imported as strings first. however i am running into an issue where its importing the numbers as list. so now i have a list of list. this would be fine except i cant have it checked by ints or have numbers added to it. the idea is to have each user asigned a number and then saved. im not worrying about saving right now im just worried about importing the numbers and being able to use them as individual numbers.
my code thus far:
fo1 = open('mach_uID_3.txt', 'a+')
t1 = fo1.read()
t2 = []
print t1
for x in t1.split():
print x
z = [int(n) for n in x.split()]
t2.append(z)
print t2
print t2[3]
fo1.close()
and the file its reading is.
0 1 2 25
34
23
my results are pretty ugly but here you go.
0 1 2 25
34
23
0
1
2
25
34
23
[[0], [1], [2], [25], [34], [23]]
[25]
Process finished with exit code 0
Use extend instead of append:
t2.extend(int(n) for n in x.split())
To have all the numbers in a single, flattened list, do this:
fo1 = open('mach_uID_3.txt', 'a+')
number_list = list(map(int, fo1.read().split())
fo1.close()
But it's better to open the file like this:
with open('mach_uID_3.txt', 'a+') as fo1:
number_list = list(map(int, fo1.read().split())
so you don't have to explicitly close it.
I have three list:
alist=[1,2,3,4,5]
blist=['a','b','c','d','e']
clist=['#','#','$','&','*']
I want my output in this format:
1 2 3 4 5
a b c d e
# # $ & *
I am able to print in correct format but when i am having list with many elements it's actually printing like this:
1 2 3 4 5 6 ..........................................................................
................................................................................
a b c d e ............................................................................
......................................................................................
# # $ & * .............................................................................
.......................................................................................
but I want my output like this:
12345....................................................................
abcde...................................................................
##$&*...................................................................
............................................................... {this line is from alist}
................................................................ {this line is from blist}
................................................................ {this line is from clist}
Try the following:
term_width = 80
all_lists = (alist, blist, clist)
length = max(map(len, all_lists))
for offset in xrange(0, length, term_width):
print '\n'.join(''.join(map(str, l[offset:offset+term_width])) for l in all_lists)
This assumes terminal width is 80 characters, which is the default. You might want to detect it's actual width with curses library or something based on it.
Either way, to adapt to any output width you only need to change term_width value and the code will use it.
It also assumes all elements are 1-character long. If it's not the case, please clarify.
If you need to detect terminal width, you may find some solutions here: How to get Linux console window width in Python
Firstly, here is the relevant part of the code:
stokes_list = np.zeros(shape=(numrows,1024)) # 'numrows' defined earlier
for i in range(numrows):
epoch_name = y['filename'][i] # 'y' is an array from earlier
os.system('pdv -t {0} > temp.txt '.format(epoch_name)) # 'pdv' is a command from another piece of software - here I copy the output into a temporary file
stokes_line = np.genfromtxt('temp.txt', usecols=3, dtype=[('stokesI','float')], skip_header=1)
stokes_list = np.vstack((stokes_line,stokes_line))
So, basically, every time the code loops around, stokes_line pulls one of the columns (4th one) from the file temp.txt, and I want it to add a line to stokes_list each time.
For example, if the first stokes_line is
1.1 2.2 3.3
and the second is
4.4 5.5 6.6
then stokes_list will be
1.1 2.2 3.3
4.4 5.5 6.6
and will keep growing...
It's not working at the moment, because I think that the line:
stokes_list = np.vstack((stokes_line,stokes_line))
is not correct. It's only stacking 2 lists - which makes sense as I only have 2 arguments. I basically would like to know how I keep stacking again and again.
Any help would be very gratefully received!
If it is needed, here is an example of the format of the temp.txt file:
File: t091110_065921.SFTC Src: J1903+0925 Nsub: 1 Nch: 1 Npol: 4 Nbin: 1024 RMS: 0.00118753
0 0 0 0.00148099 -0.00143755 0.000931365 -0.00296775
0 0 1 0.000647476 -0.000896698 0.000171287 0.00218597
0 0 2 0.000704697 -0.00052846 -0.000603842 -0.000868739
0 0 3 0.000773361 -0.00234724 -0.0004112 0.00358033
0 0 4 0.00101559 -0.000691062 0.000196023 -0.000163109
0 0 5 -0.000220367 -0.000944024 0.000181002 -0.00268215
0 0 6 0.000311783 0.00191545 -0.00143816 -0.00213856
vstacking again and again is not good, because it copies the whole arrays.
Create a normal Python list, .append to it and then pass it whole to np.vstack to create a new array once.
stokes_list = []
for i in xrange(numrows):
...
stokes_line = ...
stokes_list.append(stokes_line)
big_stokes = np.vstack(stokes_list)
You already know the final size of the stokes_list array since you know numrows. So it seems you don't need to grow an array (which is very inefficient). You can simply assign the correct row at each iteration.
Simply replace your last line by :
stokes_list[i] = stokes_line
By the way, about your non-working line I think you meant :
stokes_list = np.vstack((stokes_list, stokes_line))
where you're replacing stokes_list by its new value.