How should I use Numpy's vstack method?

How should I use Numpy's vstack method? - python

Firstly, here is the relevant part of the code:
stokes_list = np.zeros(shape=(numrows,1024)) # 'numrows' defined earlier
for i in range(numrows):
epoch_name = y['filename'][i] # 'y' is an array from earlier
os.system('pdv -t {0} > temp.txt '.format(epoch_name)) # 'pdv' is a command from another piece of software - here I copy the output into a temporary file
stokes_line = np.genfromtxt('temp.txt', usecols=3, dtype=[('stokesI','float')], skip_header=1)
stokes_list = np.vstack((stokes_line,stokes_line))
So, basically, every time the code loops around, stokes_line pulls one of the columns (4th one) from the file temp.txt, and I want it to add a line to stokes_list each time.
For example, if the first stokes_line is
1.1 2.2 3.3
and the second is
4.4 5.5 6.6
then stokes_list will be
1.1 2.2 3.3
4.4 5.5 6.6
and will keep growing...
It's not working at the moment, because I think that the line:
stokes_list = np.vstack((stokes_line,stokes_line))
is not correct. It's only stacking 2 lists - which makes sense as I only have 2 arguments. I basically would like to know how I keep stacking again and again.
Any help would be very gratefully received!
If it is needed, here is an example of the format of the temp.txt file:
File: t091110_065921.SFTC Src: J1903+0925 Nsub: 1 Nch: 1 Npol: 4 Nbin: 1024 RMS: 0.00118753
0 0 0 0.00148099 -0.00143755 0.000931365 -0.00296775
0 0 1 0.000647476 -0.000896698 0.000171287 0.00218597
0 0 2 0.000704697 -0.00052846 -0.000603842 -0.000868739
0 0 3 0.000773361 -0.00234724 -0.0004112 0.00358033
0 0 4 0.00101559 -0.000691062 0.000196023 -0.000163109
0 0 5 -0.000220367 -0.000944024 0.000181002 -0.00268215
0 0 6 0.000311783 0.00191545 -0.00143816 -0.00213856

vstacking again and again is not good, because it copies the whole arrays.
Create a normal Python list, .append to it and then pass it whole to np.vstack to create a new array once.
stokes_list = []
for i in xrange(numrows):
...
stokes_line = ...
stokes_list.append(stokes_line)
big_stokes = np.vstack(stokes_list)

You already know the final size of the stokes_list array since you know numrows. So it seems you don't need to grow an array (which is very inefficient). You can simply assign the correct row at each iteration.
Simply replace your last line by :
stokes_list[i] = stokes_line
By the way, about your non-working line I think you meant :
stokes_list = np.vstack((stokes_list, stokes_line))
where you're replacing stokes_list by its new value.

Related

Python nested loop - table index as variable

I am not Python programmer, but I need to use some method from SciPy library. I just want to repeat inner loop a couple of times, but with changed index of table. Here is my code for now:
from scipy.stats import pearsonr
fileName = open('ILPDataset.txt', 'r')
attributeValue, classValue = [], []
for index in range(0, 10, 1):
for line in fileName.readlines():
data = line.split(',')
attributeValue.append(float(data[index]))
classValue.append(float(data[10]))
print(index)
print(pearsonr(attributeValue, classValue))
And I am getting the following output:
0
(-0.13735062681256097, 0.0008840631556260505)
1
(-0.13735062681256097, 0.0008840631556260505)
2
(-0.13735062681256097, 0.0008840631556260505)
3
(-0.13735062681256097, 0.0008840631556260505)
4
(-0.13735062681256097, 0.0008840631556260505)
5
(-0.13735062681256097, 0.0008840631556260505)
6
(-0.13735062681256097, 0.0008840631556260505)
7
(-0.13735062681256097, 0.0008840631556260505)
8
(-0.13735062681256097, 0.0008840631556260505)
9
(-0.13735062681256097, 0.0008840631556260505)
As you can see index is changing, but the result of that function is always like the index would be 0.
When I am running script couple of times but with changing index value like this:
attributeValue.append(float(data[0]))
attributeValue.append(float(data[1]))
...
attributeValue.append(float(data[9]))
everything is ok, and I am getting correct results, but I can't do it in one loop statement. What am I doing wrong?
EDIT:
Test file:
62,1,6.8,3,542,116,66,6.4,3.1,0.9,1
40,1,1.9,1,231,16,55,4.3,1.6,0.6,1
63,1,0.9,0.2,194,52,45,6,3.9,1.85,2
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,6.2,3,240,1680,850,7.2,4,1.2,1
20,1,1.1,0.5,128,20,30,3.9,1.9,0.95,2
84,0,0.7,0.2,188,13,21,6,3.2,1.1,2
57,1,4,1.9,190,45,111,5.2,1.5,0.4,1
52,1,0.9,0.2,156,35,44,4.9,2.9,1.4,1
57,1,1,0.3,187,19,23,5.2,2.9,1.2,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
30,1,1.3,0.4,482,102,80,6.9,3.3,0.9,1
17,0,0.7,0.2,145,18,36,7.2,3.9,1.18,2
46,0,14.2,7.8,374,38,77,4.3,2,0.8,1
Expected results of pearsonr for 9 script runs:
data[0] (0.06050513030608389, 0.8238536636813034)
data[1] (-0.49265895172303803, 0.052525691067199995)
data[2] (-0.5073312383613632, 0.0448647312201305)
data[3] (-0.4852842899321005, 0.056723468068371544)
data[4] (-0.2919584357031029, 0.27254138535817224)
data[5] (-0.41640591455640696, 0.10863082761524119)
data[6] (-0.46954072465442487, 0.0665061785375443)
data[7] (0.08874739193909209, 0.7437895010751641)
data[8] (0.3104260624799073, 0.24193152445774302)
data[9] (0.2943030868699842, 0.26853066217221616)

Turn each line of the file into a list of floats
data = []
with open'ILPDataset.txt') as fileName:
for line in fileName:
line = line.strip()
line = line.split(',')
line = [float(item) for item in line[:11]]
data.append(line)
Transpose the data so that each list in data has the column values from the original file. data --> [[column 0 items], [column 1 items],[column 2 items],...]
data = zip(*data) # for Python 2.7x
#data = list(zip(*data)) # for python 3.x
Correlate:
for n in [0,1,2,3,4,5,6,7,8,9]:
corr = pearsonr(data[n], data[10])
print('data[{}], {}'.format(n, corr))

#wwii 's answer is very good
Only one suggestion. list(zip(*data)) seems a bit overkill to me. zip is really for lists with variable types and potentially variable lengths to be composed into tuples. Only then be transformed back into lists in this case with list()).
So why not just use the simple transpose operation which is what this is?
import numpy;
//...
data = numpy.transpose(data);
which does the same job, probably faster (not measure) and more deterministically.

ITK/SimpleITK DICOM Series Loaded in wrong order / slice spacing incorrect

The problem occurs with a number of the datasets, but we particularly noticed it with Soft-tissue-Sarcoma in the dicoms in
STS_004/1.3.6.1.4.1.14519.5.2.1.5168.1900.124239320067253523699285035604/1.3.6.1.4.1.14519.5.2.1.5168.1900.952127023780097934747932279670
The spacing is read as 30 instead of 2.9 and the 3D image has brain slices between two lung slices

Basically if you read the dicoms using SimpleITK.ReadImage or VTK the tool loads the files in the same order your list is in (usually alphabetical order). The mapping between the slices and the files are not in alphabetical order and are instead in a random order. This causes the Slice Spacing (a tag that is missing in these data) to be computed incorrectly since it is the difference in position between file 0 and 1. It also causes brain slices to turn up between two lung slices and other strange artifacts.
The solution is to presort the files using the GetGDCMSeriesFileNames function.
# noinspection PyPep8Naming
import SimpleITK as sitk
def safe_sitk_read(img_list, *args, **kwargs):
dir_name = os.path.dirname(img_list[0])
s_img_list = sitk.ImageSeriesReader().GetGDCMSeriesFileNames(dir_name)
return sitk.ReadImage(s_img_list, *args, **kwargs)

So here is what I tried on my side:
$ gdcm2vtk --lower-left --ipp-sort STS_004/1.3.6.1.4.1.14519.5.2.1.5168.1900.124239320067253523699285035604/1.3.6.1.4.1.14519.5.2.1.5168.1900.952127023780097934747932279670 /tmp/kmader.mha
And then I check the output file with:
$ head -13 /tmp/kmader.mha
ObjectType = Image
NDims = 3
BinaryData = True
BinaryDataByteOrderMSB = False
CompressedData = False
TransformMatrix = 1 0 0 0 1 0 0 0 1
Offset = -250 -250 -5
CenterOfRotation = 0 0 0
ElementSpacing = 0.976562 0.976562 3.3
DimSize = 512 512 311
AnatomicalOrientation = ???
ElementType = MET_SHORT
ElementDataFile = LOCAL
Indeed you are right, GDCM computes the Z-Spacing as being 3.3 while it should really be 3.27 in this case. Please report a bug upstream.
Fixed in current git repository:
https://github.com/malaterre/GDCM/commit/36b7fbce6d2bd146cfa7541a175c8f6519da7dba

how to get result from many list in same line in python

I have three list:
alist=[1,2,3,4,5]
blist=['a','b','c','d','e']
clist=['#','#','$','&','*']
I want my output in this format:
1 2 3 4 5
a b c d e
# # $ & *
I am able to print in correct format but when i am having list with many elements it's actually printing like this:
1 2 3 4 5 6 ..........................................................................
................................................................................
a b c d e ............................................................................
......................................................................................
# # $ & * .............................................................................
.......................................................................................
but I want my output like this:
12345....................................................................
abcde...................................................................
##$&*...................................................................
............................................................... {this line is from alist}
................................................................ {this line is from blist}
................................................................ {this line is from clist}

Try the following:
term_width = 80
all_lists = (alist, blist, clist)
length = max(map(len, all_lists))
for offset in xrange(0, length, term_width):
print '\n'.join(''.join(map(str, l[offset:offset+term_width])) for l in all_lists)
This assumes terminal width is 80 characters, which is the default. You might want to detect it's actual width with curses library or something based on it.
Either way, to adapt to any output width you only need to change term_width value and the code will use it.
It also assumes all elements are 1-character long. If it's not the case, please clarify.
If you need to detect terminal width, you may find some solutions here: How to get Linux console window width in Python

reading an array with missing data and spaces in the first column

I have a .txt file I want to read using pyhon. The file is an array. It contains data on comets. I copied 3 rows out of the 3000 rows.
P/2011 U1 PANSTARRS 1.54 0.5 14.21 145.294 352.628 6098.07
P/2011 VJ5 Lemmon 4.12 0.5 2.45 139.978 315.127 5904.20 *
149P/Mueller 4 3.67 0.1 5.32 85.280 27.963 6064.72
I am reading the array using the the following code:
import numpy as np
list_comet = np.genfromtxt('jfc_master.txt', dtype=None)
I am facing 2 different problems:
First, in row 1 the name of the comet is: P/2011 U1 PANSTARRS. If I type:
list_comet[0][1] the result will be P/2011. How should I tell python how to read the name of each comet? Note that the longest name is 31 characters. So what is the command to tell python that column 1 is 31 characters long?
Second, in row 2 that value of the last column is *. When I read the file I am receiving an error which says that:
Line #2941 (got 41 columns instead of 40)
(note that the above data is not the complete data, the total number of columns I have in my original data is 38). I guess I am receiving this error due to the * found in certain rows. How can I fix this problem?

You didn't mention what data structure you're looking for, i.e. what operations you intend to perform on the parsed data. In the simplest case, you could massage the file into a list of 8-tuples - the last element being either '*' or an empty string. That is as simple as
import string
def tokenize(s):
if s[-1] == '*':
return string.rsplit(s, None, 7)
else:
return string.rsplit(s, None, 6) + ['']
tokens = (tokenize(line.rstrip()) for line in open('so21712204.txt'))
To be fair, this doesn't make tokens a list of 8-tuples but rather a generator (which is more space efficient) of lists, each of which having 8 elements.

Iterate through a file and an array simultaneously python for loop

I am trying to iterate through a file and add a new column into it instead of one that is present in the file using two concurrent for loops. But i dont know how to iterate the array part.
I have an array aa=[1,2,3,4,5]
My file is:
I a 0
II b 0
III c 0
IV d 0
V f 0
I want it like:
I a 1
II b 2
III c 3
IV d 4
V f 5
I tried python code:
cmg=[1,2,3,4,5]
fh=open("plink5.map",'r')
fhnew=open("plink5.out",'w+')
for line,i in zip(fh,(0,len(cmg)-1,1)):
line=line.strip('\n')
aa=line.split('\t')
aanew=str(aa[0])+"\t"+str(aa[1])+"\t"+str(cmg[i])
print(aanew)
fhnew.write(aanew)
fh.close()
fhnew.close()
I get error in the array iteration part

What you were trying to do is:
for line,i in zip(fh,range((0,len(cmg) ,1))):
^^^^^ ^^
But what would be easier:
for line,x in zip(fh, cmg):

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How should I use Numpy's vstack method? - python

Related

Python nested loop - table index as variable

ITK/SimpleITK DICOM Series Loaded in wrong order / slice spacing incorrect

how to get result from many list in same line in python

reading an array with missing data and spaces in the first column

Iterate through a file and an array simultaneously python for loop

Categories

Resources