Python nested loop - table index as variable - python
I am not Python programmer, but I need to use some method from SciPy library. I just want to repeat inner loop a couple of times, but with changed index of table. Here is my code for now:
from scipy.stats import pearsonr
fileName = open('ILPDataset.txt', 'r')
attributeValue, classValue = [], []
for index in range(0, 10, 1):
for line in fileName.readlines():
data = line.split(',')
attributeValue.append(float(data[index]))
classValue.append(float(data[10]))
print(index)
print(pearsonr(attributeValue, classValue))
And I am getting the following output:
0
(-0.13735062681256097, 0.0008840631556260505)
1
(-0.13735062681256097, 0.0008840631556260505)
2
(-0.13735062681256097, 0.0008840631556260505)
3
(-0.13735062681256097, 0.0008840631556260505)
4
(-0.13735062681256097, 0.0008840631556260505)
5
(-0.13735062681256097, 0.0008840631556260505)
6
(-0.13735062681256097, 0.0008840631556260505)
7
(-0.13735062681256097, 0.0008840631556260505)
8
(-0.13735062681256097, 0.0008840631556260505)
9
(-0.13735062681256097, 0.0008840631556260505)
As you can see index is changing, but the result of that function is always like the index would be 0.
When I am running script couple of times but with changing index value like this:
attributeValue.append(float(data[0]))
attributeValue.append(float(data[1]))
...
attributeValue.append(float(data[9]))
everything is ok, and I am getting correct results, but I can't do it in one loop statement. What am I doing wrong?
EDIT:
Test file:
62,1,6.8,3,542,116,66,6.4,3.1,0.9,1
40,1,1.9,1,231,16,55,4.3,1.6,0.6,1
63,1,0.9,0.2,194,52,45,6,3.9,1.85,2
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,6.2,3,240,1680,850,7.2,4,1.2,1
20,1,1.1,0.5,128,20,30,3.9,1.9,0.95,2
84,0,0.7,0.2,188,13,21,6,3.2,1.1,2
57,1,4,1.9,190,45,111,5.2,1.5,0.4,1
52,1,0.9,0.2,156,35,44,4.9,2.9,1.4,1
57,1,1,0.3,187,19,23,5.2,2.9,1.2,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
30,1,1.3,0.4,482,102,80,6.9,3.3,0.9,1
17,0,0.7,0.2,145,18,36,7.2,3.9,1.18,2
46,0,14.2,7.8,374,38,77,4.3,2,0.8,1
Expected results of pearsonr for 9 script runs:
data[0] (0.06050513030608389, 0.8238536636813034)
data[1] (-0.49265895172303803, 0.052525691067199995)
data[2] (-0.5073312383613632, 0.0448647312201305)
data[3] (-0.4852842899321005, 0.056723468068371544)
data[4] (-0.2919584357031029, 0.27254138535817224)
data[5] (-0.41640591455640696, 0.10863082761524119)
data[6] (-0.46954072465442487, 0.0665061785375443)
data[7] (0.08874739193909209, 0.7437895010751641)
data[8] (0.3104260624799073, 0.24193152445774302)
data[9] (0.2943030868699842, 0.26853066217221616)
Turn each line of the file into a list of floats
data = []
with open'ILPDataset.txt') as fileName:
for line in fileName:
line = line.strip()
line = line.split(',')
line = [float(item) for item in line[:11]]
data.append(line)
Transpose the data so that each list in data has the column values from the original file. data --> [[column 0 items], [column 1 items],[column 2 items],...]
data = zip(*data) # for Python 2.7x
#data = list(zip(*data)) # for python 3.x
Correlate:
for n in [0,1,2,3,4,5,6,7,8,9]:
corr = pearsonr(data[n], data[10])
print('data[{}], {}'.format(n, corr))
#wwii 's answer is very good
Only one suggestion. list(zip(*data)) seems a bit overkill to me. zip is really for lists with variable types and potentially variable lengths to be composed into tuples. Only then be transformed back into lists in this case with list()).
So why not just use the simple transpose operation which is what this is?
import numpy;
//...
data = numpy.transpose(data);
which does the same job, probably faster (not measure) and more deterministically.
Related
Python input() does not read whole input data
I'm trying to read the data from stdin, actually I'm using Ctrl+C, Ctrl+V to pass the values into cmd, but it stops the process at some point. It's always the same point. The input file is .in type, formating is that the first row is one number and next 3 rows contains the set of numbers separated with space. I'm using Python 3.9.9. Also this problem occurs with longer files (number of elements in sets > 10000), with short input everything is fine. It seems like the memory just run out. I had following aproach: def readData(): # Read input for line in range(5): x = list(map(int, input().rsplit())) if(line == 0): nodes_num = x[0] if(line == 1): masses_list = x if(line == 2): init_seq_list = x if(line == 3): fin_seq_list = x return nodes_num, masses_list, init_seq_list, fin_seq_list and the data which works: 6 2400 2000 1200 2400 1600 4000 1 4 5 3 6 2 5 3 2 4 6 1 and the long input file: https://pastebin.com/atAcygkk it stops at the sequence: ... 2421 1139 322], so it's like a part of 4th row.
To read input from "standard input", you just need to use the stdin stream. Since your data is all on lines you can just read until the EOL delimiter, not having to track lines yourself with some index number. This code will work when run as python3.9 sowholeinput.py < atAcygkk.txt, or cat atAcygkk.txt| python3.9 sowholeinput.py. def read_data(): stream = sys.stdin num = int(stream.readline()) masses = [int(t) for t in stream.readline().split()] init_seq = [int(t) for t in stream.readline().split()] fin_seq = [int(t) for t in stream.readline().split()] return num, masses, init_seq, fin_seq Interestingly, it does not work, as you describe, when pasting the text using the terminal cut-and-paste. This implies a limitation with that method, not Python itself.
Python split on the partition list
I am currently on a project that gat all partition on a disk with Ubuntu 20. def get_partitions(): """ This function returns a list of partition objects. """ partitions = [] for line in open('/proc/partitions'): if line.startswith('major'): continue fields = line.split() partitions.append(partition( int(fields[0]), int(fields[1]), int(fields[3]), fields[5] )) return partitions But I have this error : Traceback (most recent call last): File "/home/mathieu-s/Documents/opt/repo/dosm/disk/disk_scanner.py", line 69, in <module> print(get_partitions()) File "/home/mathieu-s/Documents/opt/repo/dosm/disk/disk_scanner.py", line 62, in get_partitions int(fields[0]), IndexError: list index out of range Someone can help me ?
Errors explications Main error When you get partitions data from /proc/partitions in Ubuntu 20.04, you have approximately this output: major minor #blocks name 7 0 5956 loop0 7 1 4 loop1 7 2 9240 loop2 7 3 9244 loop3 7 4 151112 loop4 7 5 135924 loop5 7 6 283688 loop6 7 7 63580 loop7 259 0 500107608 nvme0n1 259 1 834560 nvme0n1p1 259 2 8388608 nvme0n1p2 259 3 490883072 nvme0n1p3 7 8 101824 loop8 You can look that the second line is empty, but in your code you didn't check this case. Second error The line partition data: major minor #blocks name 7 0 5956 loop0 When you get the fields for your row, you convert the 3rd field to int, but the 3rd field is the name of the partition. The conversion of name to int will not work. And when you get the 5th field from the line, it will show an error because this field does not exist on Ubuntu 20.04's file partition. Possible solution fix the first error: empty line To solve the main error, you can simply modify your if condition with this: if line.startswith('major') or line.startswith('\n'): fix the second problem: field number To solve the second problem, you can modify your code in the append with this: partitions.append(partition( int(fields[0]), int(fields[1]), int(fields[2]), fields[3] )) All code of a possible solution: def get_partitions(): """ This function returns a list of partition in the disk. """ partitions = [] for line in open('/proc/partitions'): if line.startswith('major') or line.startswith('\n'): continue fields = line.split() partitions.append(partition( int(fields[0]), int(fields[1]), int(fields[2]), fields[3] )) return partitions
converting a file of numbers in to a list
to put it simply i am trying to read a file that will eventually have nothing but numbers in them either seperated by spaces, commas, or new lines. I have read through alot of these post and fixed somethings. I learned they are imported as strings first. however i am running into an issue where its importing the numbers as list. so now i have a list of list. this would be fine except i cant have it checked by ints or have numbers added to it. the idea is to have each user asigned a number and then saved. im not worrying about saving right now im just worried about importing the numbers and being able to use them as individual numbers. my code thus far: fo1 = open('mach_uID_3.txt', 'a+') t1 = fo1.read() t2 = [] print t1 for x in t1.split(): print x z = [int(n) for n in x.split()] t2.append(z) print t2 print t2[3] fo1.close() and the file its reading is. 0 1 2 25 34 23 my results are pretty ugly but here you go. 0 1 2 25 34 23 0 1 2 25 34 23 [[0], [1], [2], [25], [34], [23]] [25] Process finished with exit code 0
Use extend instead of append: t2.extend(int(n) for n in x.split())
To have all the numbers in a single, flattened list, do this: fo1 = open('mach_uID_3.txt', 'a+') number_list = list(map(int, fo1.read().split()) fo1.close() But it's better to open the file like this: with open('mach_uID_3.txt', 'a+') as fo1: number_list = list(map(int, fo1.read().split()) so you don't have to explicitly close it.
Iterate through a file and an array simultaneously python for loop
I am trying to iterate through a file and add a new column into it instead of one that is present in the file using two concurrent for loops. But i dont know how to iterate the array part. I have an array aa=[1,2,3,4,5] My file is: I a 0 II b 0 III c 0 IV d 0 V f 0 I want it like: I a 1 II b 2 III c 3 IV d 4 V f 5 I tried python code: cmg=[1,2,3,4,5] fh=open("plink5.map",'r') fhnew=open("plink5.out",'w+') for line,i in zip(fh,(0,len(cmg)-1,1)): line=line.strip('\n') aa=line.split('\t') aanew=str(aa[0])+"\t"+str(aa[1])+"\t"+str(cmg[i]) print(aanew) fhnew.write(aanew) fh.close() fhnew.close() I get error in the array iteration part
What you were trying to do is: for line,i in zip(fh,range((0,len(cmg) ,1))): ^^^^^ ^^ But what would be easier: for line,x in zip(fh, cmg):
How should I use Numpy's vstack method?
Firstly, here is the relevant part of the code: stokes_list = np.zeros(shape=(numrows,1024)) # 'numrows' defined earlier for i in range(numrows): epoch_name = y['filename'][i] # 'y' is an array from earlier os.system('pdv -t {0} > temp.txt '.format(epoch_name)) # 'pdv' is a command from another piece of software - here I copy the output into a temporary file stokes_line = np.genfromtxt('temp.txt', usecols=3, dtype=[('stokesI','float')], skip_header=1) stokes_list = np.vstack((stokes_line,stokes_line)) So, basically, every time the code loops around, stokes_line pulls one of the columns (4th one) from the file temp.txt, and I want it to add a line to stokes_list each time. For example, if the first stokes_line is 1.1 2.2 3.3 and the second is 4.4 5.5 6.6 then stokes_list will be 1.1 2.2 3.3 4.4 5.5 6.6 and will keep growing... It's not working at the moment, because I think that the line: stokes_list = np.vstack((stokes_line,stokes_line)) is not correct. It's only stacking 2 lists - which makes sense as I only have 2 arguments. I basically would like to know how I keep stacking again and again. Any help would be very gratefully received! If it is needed, here is an example of the format of the temp.txt file: File: t091110_065921.SFTC Src: J1903+0925 Nsub: 1 Nch: 1 Npol: 4 Nbin: 1024 RMS: 0.00118753 0 0 0 0.00148099 -0.00143755 0.000931365 -0.00296775 0 0 1 0.000647476 -0.000896698 0.000171287 0.00218597 0 0 2 0.000704697 -0.00052846 -0.000603842 -0.000868739 0 0 3 0.000773361 -0.00234724 -0.0004112 0.00358033 0 0 4 0.00101559 -0.000691062 0.000196023 -0.000163109 0 0 5 -0.000220367 -0.000944024 0.000181002 -0.00268215 0 0 6 0.000311783 0.00191545 -0.00143816 -0.00213856
vstacking again and again is not good, because it copies the whole arrays. Create a normal Python list, .append to it and then pass it whole to np.vstack to create a new array once. stokes_list = [] for i in xrange(numrows): ... stokes_line = ... stokes_list.append(stokes_line) big_stokes = np.vstack(stokes_list)
You already know the final size of the stokes_list array since you know numrows. So it seems you don't need to grow an array (which is very inefficient). You can simply assign the correct row at each iteration. Simply replace your last line by : stokes_list[i] = stokes_line By the way, about your non-working line I think you meant : stokes_list = np.vstack((stokes_list, stokes_line)) where you're replacing stokes_list by its new value.