Saving a row-wise txt with numpy with a pre-defined delimiter - python

Let's say I have two numpy arrays x and y and I want to save them in a txt with a tab as a delimiter (\t) and their appropriate type (x is a float and y is a integer) and a certain format. For example:
import numpy as np
x=np.random.uniform(0.1,10.,4)
y=np.random.randint(10,size=4)
If I simply use np.savetxt('name.txt',(x,y)), this is what I get:
6.111548206503401026e+00 4.208270401300298058e-01 5.914485954766230957e-01 6.652272388676337966e-01
6.027109785846696433e+00 1.024051075089774443e+01 3.358386699980072621e+01 7.652668778594046151e-01
But what I want is a row-wise txt file, so I followed this solution:
numpy array to a file, np.savetxt
and bu using
np.savetxt('name.txt',np.vstack((x,y)).T,delimiter='\t') I get:
2.640596763338360020e+00 4.000000000000000000e+00
8.693117057064670306e+00 4.000000000000000000e+00
3.891035166453641558e+00 6.000000000000000000e+00
9.044178202861068883e+00 2.000000000000000000e+00
Until here it is ok, but as I mentioned, I want the output to have the appropriate data type and some formatting, so I tried np.savetxt('name.txt',np.vstack((x,y)).T,fmt=('%7.2f,%5i'),delimiter='\
...: t'), and what I get is:
2.64, 4
8.69, 4
3.89, 6
9.04, 2
which does have the appropriate format and data type, but which adds a , after the columns. Does anyone knows how to get rid of this , printed after the column?

The comma is in your fmt string. Replace it with fmt='%7.2f %5i', like so:
np.savetxt('name.txt',np.vstack((x,y)).T,fmt='%7.2f %5i')
Note the tab delimiter (delimiter='\t') is not necessary as np.vstack((x,y)).T fills only one column. If you want a tab between the values, change the format string to fmt='%7.2f \t%5i' or alternatively:
np.savetxt('name.txt',np.vstack((x,y)).T,fmt=('%7.2f', '%5i'), delimiter='\t')

Related

Trying to create Numpy matrix/array but my output adds a \t and \n in python after splitting the comma separated array of numbers

So I am trying to make a Numpy Array based on this set of numbers I have that is without a header and split based on space (my objective is to remove the arrays with ALL zeroes). this is the code I have:
with open('/Users/name/Desktop/PDB/test_d3psm_misc/d3ps.profile','r') as f:
for line in f:
r = line.split(" ")
print(r)
my output:
['0.1\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.36\t0.54\t0.0\t0.0\t0.0\t\n']
['0.0\t0.06\t0.0\t0.0\t0.0\t0.0\t0.03\t0.0\t0.0\t0.0\t0.0\t0.0\t0.91\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.0\t0.0\t0.0\t0.0\t0.0\t0.02\t0.02\t0.0\t0.0\t0.51\t0.16\t0.06\t0.07\t0.0\t0.0\t0.0\t0.03\t0.0\t0.03\t0.1\t\n']
['0.02\t0.0\t0.05\t0.74\t0.0\t0.0\t0.12\t0.0\t0.0\t0.0\t0.0\t0.01\t0.0\t0.0\t0.03\t0.03\t0.0\t0.0\t0.0\t0.0\t\n']
['0.18\t0.1\t0.05\t0.13\t0.01\t0.0\t0.02\t0.04\t0.05\t0.0\t0.02\t0.13\t0.0\t0.09\t0.1\t0.06\t0.01\t0.0\t0.0\t0.0\t\n']
['0.04\t0.01\t0.07\t0.27\t0.04\t0.0\t0.12\t0.0\t0.0\t0.0\t0.26\t0.08\t0.0\t0.0\t0.01\t0.01\t0.03\t0.0\t0.01\t0.06\t\n']
['0.0\t0.04\t0.01\t0.02\t0.0\t0.01\t0.02\t0.46\t0.0\t0.0\t0.0\t0.05\t0.0\t0.0\t0.21\t0.13\t0.02\t0.0\t0.03\t0.0\t\n']
['0.02\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.11\t0.02\t0.04\t0.03\t0.0\t0.15\t0.0\t0.0\t0.0\t0.0\t0.64\t0.0\t\n']
['0.0\t0.79\t0.0\t0.0\t0.0\t0.03\t0.0\t0.0\t0.01\t0.0\t0.02\t0.12\t0.0\t0.0\t0.0\t0.0\t0.02\t0.0\t0.0\t0.0\t\n']
['0.05\t0.02\t0.01\t0.0\t0.02\t0.07\t0.01\t0.0\t0.0\t0.05\t0.04\t0.09\t0.01\t0.0\t0.46\t0.02\t0.09\t0.0\t0.0\t0.05\t\n']
['0.03\t0.11\t0.31\t0.0\t0.24\t0.0\t0.0\t0.21\t0.02\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.03\t0.02\t0.0\t0.0\t0.04\t\n']
['0.08\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.09\t0.0\t0.0\t0.0\t0.0\t0.05\t0.02\t0.0\t0.0\t0.01\t0.74\t\n']
['0.2\t0.0\t0.02\t0.0\t0.01\t0.0\t0.0\t0.59\t0.02\t0.02\t0.0\t0.0\t0.0\t0.01\t0.0\t0.06\t0.03\t0.0\t0.0\t0.01\t\n']
['0.17\t0.0\t0.02\t0.0\t0.04\t0.0\t0.0\t0.05\t0.0\t0.45\t0.04\t0.0\t0.06\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.16\t\n']
['0.07\t0.0\t0.0\t0.0\t0.09\t0.0\t0.0\t0.01\t0.0\t0.33\t0.06\t0.0\t0.08\t0.04\t0.0\t0.0\t0.0\t0.0\t0.0\t0.31\t\n']
['0.07\t0.0\t0.0\t0.0\t0.01\t0.0\t0.0\t0.0\t0.0\t0.2\t0.33\t0.0\t0.01\t0.03\t0.02\t0.0\t0.01\t0.0\t0.02\t0.29\t\n']
['0.07\t0.06\t0.0\t0.01\t0.1\t0.03\t0.02\t0.01\t0.0\t0.14\t0.18\t0.0\t0.07\t0.11\t0.0\t0.01\t0.04\t0.0\t0.02\t0.12\t\n']
['0.0\t0.09\t0.48\t0.13\t0.01\t0.01\t0.04\t0.0\t0.01\t0.02\t0.02\t0.02\t0.0\t0.0\t0.01\t0.07\t0.05\t0.0\t0.0\t0.04\t\n']
['0.07\t0.22\t0.03\t0.08\t0.0\t0.06\t0.12\t0.05\t0.05\t0.01\t0.01\t0.07\t0.02\t0.0\t0.04\t0.1\t0.03\t0.0\t0.02\t0.01\t\n']
['0.03\t0.1\t0.09\t0.16\t0.01\t0.12\t0.16\t0.04\t0.02\t0.0\t0.0\t0.12\t0.0\t0.01\t0.04\t0.03\t0.03\t0.0\t0.01\t0.03\t\n']
['0.01\t0.05\t0.13\t0.09\t0.01\t0.04\t0.02\t0.44\t0.0\t0.0\t0.0\t0.14\t0.01\t0.0\t0.02\t0.02\t0.0\t0.0\t0.0\t0.0\t\n']
['0.0\t0.1\t0.02\t0.03\t0.01\t0.23\t0.23\t0.01\t0.03\t0.01\t0.06\t0.19\t0.0\t0.0\t0.01\t0.01\t0.02\t0.0\t0.0\t0.02\t\n']
['0.01\t0.0\t0.0\t0.0\t0.01\t0.0\t0.05\t0.0\t0.0\t0.23\t0.15\t0.0\t0.01\t0.01\t0.0\t0.0\t0.0\t0.0\t0.0\t0.53\t\n']
['0.01\t0.0\t0.0\t0.0\t0.0\t0.0\t0.01\t0.0\t0.0\t0.05\t0.6\t0.01\t0.03\t0.21\t0.0\t0.0\t0.01\t0.05\t0.01\t0.01\t\n']
['0.03\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.13\t0.42\t0.0\t0.06\t0.0\t0.0\t0.0\t0.02\t0.15\t0.02\t0.17\t\n']
['0.28\t0.0\t0.0\t0.0\t0.01\t0.04\t0.01\t0.26\t0.0\t0.03\t0.06\t0.01\t0.01\t0.0\t0.01\t0.02\t0.11\t0.0\t0.0\t0.16\t\n']
['0.02\t0.4\t0.02\t0.0\t0.02\t0.19\t0.01\t0.02\t0.03\t0.0\t0.0\t0.21\t0.0\t0.0\t0.0\t0.07\t0.0\t0.0\t0.0\t0.0\t\n']
['0.02\t0.69\t0.02\t0.0\t0.0\t0.01\t0.05\t0.05\t0.01\t0.0\t0.01\t0.05\t0.0\t0.01\t0.0\t0.05\t0.0\t0.01\t0.01\t0.02\t\n']
['0.07\t0.02\t0.01\t0.01\t0.01\t0.05\t0.01\t0.0\t0.05\t0.11\t0.16\t0.08\t0.03\t0.09\t0.03\t0.08\t0.04\t0.01\t0.08\t0.05\t\n']
['0.07\t0.16\t0.05\t0.11\t0.02\t0.0\t0.02\t0.24\t0.06\t0.01\t0.01\t0.1\t0.01\t0.0\t0.05\t0.04\t0.0\t0.0\t0.01\t0.04\t\n']
['0.04\t0.05\t0.09\t0.08\t0.0\t0.15\t0.07\t0.04\t0.02\t0.02\t0.01\t0.12\t0.01\t0.0\t0.07\t0.1\t0.09\t0.0\t0.02\t0.01\t\n']
['0.01\t0.01\t0.11\t0.13\t0.0\t0.02\t0.04\t0.27\t0.19\t0.0\t0.01\t0.08\t0.0\t0.0\t0.04\t0.04\t0.01\t0.0\t0.04\t0.0\t\n']
['0.12\t0.06\t0.01\t0.01\t0.03\t0.04\t0.0\t0.03\t0.02\t0.04\t0.11\t0.02\t0.05\t0.03\t0.0\t0.19\t0.11\t0.01\t0.05\t0.08\t\n']
['0.0\t0.03\t0.0\t0.0\t0.0\t0.0\t0.0\t0.03\t0.01\t0.01\t0.03\t0.0\t0.01\t0.03\t0.01\t0.0\t0.0\t0.78\t0.06\t0.01\t\n']
['0.05\t0.0\t0.02\t0.03\t0.0\t0.33\t0.08\t0.07\t0.03\t0.03\t0.01\t0.06\t0.0\t0.03\t0.0\t0.11\t0.06\t0.0\t0.02\t0.06\t\n']
['0.03\t0.0\t0.0\t0.0\t0.02\t0.01\t0.0\t0.03\t0.0\t0.05\t0.2\t0.0\t0.13\t0.31\t0.05\t0.02\t0.02\t0.0\t0.02\t0.11\t\n']
['0.01\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.01\t0.0\t0.01\t0.06\t0.0\t0.0\t0.01\t0.82\t0.04\t0.0\t0.0\t0.0\t0.03\t\n']
['0.13\t0.02\t0.01\t0.0\t0.0\t0.28\t0.0\t0.34\t0.0\t0.0\t0.0\t0.15\t0.0\t0.0\t0.0\t0.02\t0.05\t0.0\t0.0\t0.0\t\n']
['0.0\t0.0\t0.03\t0.0\t0.0\t0.0\t0.0\t0.97\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.05\t0.08\t0.0\t0.0\t0.0\t0.0\t0.0\t0.39\t0.14\t0.0\t0.03\t0.12\t0.0\t0.12\t0.0\t0.02\t0.01\t0.0\t0.01\t0.02\t\n']
['0.03\t0.02\t0.0\t0.0\t0.05\t0.0\t0.0\t0.0\t0.0\t0.29\t0.1\t0.01\t0.08\t0.0\t0.04\t0.01\t0.01\t0.02\t0.0\t0.35\t\n']
['0.02\t0.01\t0.07\t0.23\t0.03\t0.02\t0.41\t0.03\t0.01\t0.01\t0.02\t0.09\t0.03\t0.0\t0.0\t0.02\t0.0\t0.0\t0.0\t0.01\t\n']
['0.07\t0.02\t0.01\t0.05\t0.0\t0.02\t0.12\t0.01\t0.01\t0.08\t0.03\t0.12\t0.0\t0.03\t0.27\t0.08\t0.01\t0.0\t0.02\t0.03\t\n']
['0.01\t0.01\t0.05\t0.17\t0.01\t0.0\t0.03\t0.63\t0.02\t0.0\t0.0\t0.01\t0.01\t0.0\t0.0\t0.05\t0.01\t0.0\t0.0\t0.01\t\n']
['0.01\t0.0\t0.0\t0.02\t0.0\t0.0\t0.9\t0.0\t0.01\t0.0\t0.05\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.01\t0.02\t0.04\t0.11\t0.01\t0.01\t0.11\t0.0\t0.0\t0.0\t0.0\t0.02\t0.0\t0.0\t0.01\t0.31\t0.37\t0.0\t0.0\t0.0\t\n']
['0.1\t0.0\t0.01\t0.04\t0.0\t0.01\t0.01\t0.0\t0.0\t0.11\t0.15\t0.0\t0.01\t0.07\t0.37\t0.01\t0.02\t0.01\t0.03\t0.04\t\n']
['0.06\t0.05\t0.0\t0.01\t0.0\t0.04\t0.44\t0.02\t0.03\t0.04\t0.11\t0.03\t0.02\t0.01\t0.02\t0.02\t0.02\t0.04\t0.01\t0.03\t\n']
['0.11\t0.0\t0.01\t0.16\t0.0\t0.25\t0.3\t0.01\t0.01\t0.01\t0.01\t0.02\t0.0\t0.0\t0.01\t0.01\t0.05\t0.0\t0.01\t0.04\t\n']
['0.63\t0.0\t0.05\t0.0\t0.13\t0.0\t0.0\t0.07\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.02\t0.1\t0.0\t0.0\t0.0\t\n']
['0.44\t0.0\t0.0\t0.0\t0.01\t0.0\t0.0\t0.03\t0.0\t0.04\t0.11\t0.0\t0.12\t0.04\t0.0\t0.01\t0.0\t0.0\t0.0\t0.19\t\n']
['0.13\t0.06\t0.01\t0.0\t0.01\t0.03\t0.05\t0.0\t0.02\t0.08\t0.2\t0.05\t0.03\t0.05\t0.0\t0.01\t0.0\t0.01\t0.12\t0.14\t\n']
['0.0\t0.95\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.05\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.0\t0.05\t0.0\t0.0\t0.0\t0.0\t0.91\t0.0\t0.0\t0.0\t0.03\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.06\t0.0\t0.0\t0.0\t0.01\t0.0\t0.0\t0.01\t0.0\t0.1\t0.4\t0.0\t0.03\t0.02\t0.0\t0.02\t0.09\t0.0\t0.01\t0.25\t\n']
['0.02\t0.1\t0.02\t0.01\t0.01\t0.12\t0.08\t0.01\t0.04\t0.01\t0.07\t0.17\t0.03\t0.08\t0.03\t0.02\t0.0\t0.08\t0.1\t0.01\t\n']
['0.03\t0.0\t0.0\t0.0\t0.0\t0.0\t0.91\t0.0\t0.03\t0.0\t0.0\t0.0\t0.0\t0.03\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t1.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.09\t0.0\t0.0\t0.0\t0.01\t0.0\t0.0\t0.01\t0.0\t0.11\t0.12\t0.0\t0.0\t0.0\t0.0\t0.09\t0.36\t0.0\t0.0\t0.22\t\n']
['0.03\t0.02\t0.03\t0.01\t0.0\t0.01\t0.0\t0.86\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.03\t0.01\t0.0\t0.0\t0.0\t\n']
['0.02\t0.0\t0.0\t0.0\t0.02\t0.0\t0.01\t0.0\t0.01\t0.25\t0.32\t0.0\t0.02\t0.04\t0.0\t0.02\t0.03\t0.0\t0.07\t0.18\t\n']
['0.03\t0.08\t0.06\t0.17\t0.01\t0.04\t0.07\t0.02\t0.03\t0.02\t0.04\t0.13\t0.01\t0.01\t0.04\t0.09\t0.09\t0.0\t0.01\t0.03\t\n']
['0.15\t0.03\t0.01\t0.01\t0.03\t0.03\t0.02\t0.07\t0.0\t0.07\t0.06\t0.08\t0.0\t0.01\t0.17\t0.03\t0.07\t0.0\t0.01\t0.16\t\n']
['0.04\t0.07\t0.04\t0.08\t0.01\t0.06\t0.27\t0.05\t0.03\t0.01\t0.03\t0.17\t0.0\t0.0\t0.02\t0.05\t0.04\t0.01\t0.0\t0.01\t\n']
['0.01\t0.03\t0.11\t0.16\t0.0\t0.05\t0.04\t0.01\t0.1\t0.02\t0.07\t0.08\t0.01\t0.05\t0.07\t0.12\t0.02\t0.0\t0.02\t0.04\t\n']
['0.04\t0.02\t0.0\t0.02\t0.01\t0.01\t0.02\t0.04\t0.0\t0.14\t0.13\t0.03\t0.01\t0.02\t0.02\t0.06\t0.03\t0.01\t0.02\t0.38\t\n']
['0.04\t0.21\t0.02\t0.02\t0.01\t0.08\t0.26\t0.01\t0.02\t0.02\t0.04\t0.07\t0.0\t0.0\t0.0\t0.07\t0.07\t0.0\t0.0\t0.04\t\n']
['0.01\t0.01\t0.03\t0.0\t0.0\t0.0\t0.0\t0.01\t0.0\t0.27\t0.22\t0.07\t0.01\t0.07\t0.01\t0.03\t0.0\t0.01\t0.1\t0.15\t\n']
['0.02\t0.01\t0.01\t0.01\t0.0\t0.0\t0.0\t0.01\t0.01\t0.2\t0.46\t0.0\t0.03\t0.07\t0.0\t0.02\t0.01\t0.01\t0.01\t0.13\t\n']
['0.28\t0.02\t0.02\t0.01\t0.0\t0.01\t0.04\t0.31\t0.01\t0.0\t0.02\t0.03\t0.0\t0.0\t0.02\t0.09\t0.06\t0.02\t0.02\t0.01\t\n']
['0.05\t0.1\t0.01\t0.02\t0.08\t0.06\t0.17\t0.04\t0.03\t0.03\t0.01\t0.08\t0.0\t0.04\t0.01\t0.14\t0.03\t0.0\t0.0\t0.11\t\n']
['0.05\t0.04\t0.01\t0.01\t0.02\t0.06\t0.02\t0.02\t0.01\t0.11\t0.06\t0.01\t0.02\t0.06\t0.02\t0.13\t0.23\t0.0\t0.04\t0.08\t\n']
['0.03\t0.22\t0.03\t0.03\t0.0\t0.08\t0.06\t0.01\t0.02\t0.01\t0.08\t0.14\t0.0\t0.02\t0.18\t0.03\t0.03\t0.0\t0.01\t0.02\t\n']
['0.03\t0.04\t0.06\t0.17\t0.04\t0.03\t0.1\t0.12\t0.02\t0.01\t0.03\t0.05\t0.0\t0.01\t0.14\t0.07\t0.02\t0.0\t0.05\t0.01\t\n']
['0.02\t0.02\t0.01\t0.03\t0.0\t0.01\t0.05\t0.01\t0.06\t0.01\t0.04\t0.03\t0.01\t0.04\t0.02\t0.04\t0.03\t0.52\t0.02\t0.02\t\n']
['0.02\t0.03\t0.01\t0.01\t0.01\t0.02\t0.02\t0.02\t0.02\t0.1\t0.34\t0.01\t0.02\t0.14\t0.01\t0.02\t0.01\t0.02\t0.05\t0.08\t\n']
['0.05\t0.24\t0.05\t0.02\t0.01\t0.02\t0.07\t0.03\t0.02\t0.03\t0.04\t0.06\t0.0\t0.02\t0.02\t0.13\t0.13\t0.0\t0.03\t0.03\t\n']
['0.01\t0.01\t0.01\t0.01\t0.0\t0.01\t0.03\t0.01\t0.01\t0.02\t0.01\t0.02\t0.0\t0.1\t0.04\t0.03\t0.01\t0.05\t0.61\t0.01\t\n']
['0.09\t0.15\t0.02\t0.26\t0.0\t0.02\t0.11\t0.02\t0.04\t0.02\t0.02\t0.1\t0.0\t0.0\t0.06\t0.04\t0.01\t0.0\t0.0\t0.03\t\n']
['0.02\t0.03\t0.0\t0.05\t0.0\t0.0\t0.01\t0.01\t0.0\t0.11\t0.31\t0.06\t0.01\t0.18\t0.0\t0.03\t0.01\t0.0\t0.07\t0.08\t\n']
['0.02\t0.04\t0.0\t0.06\t0.0\t0.0\t0.04\t0.01\t0.0\t0.0\t0.04\t0.01\t0.01\t0.01\t0.71\t0.01\t0.01\t0.0\t0.0\t0.02\t\n']
['0.03\t0.07\t0.04\t0.07\t0.01\t0.08\t0.1\t0.04\t0.06\t0.03\t0.01\t0.17\t0.0\t0.01\t0.1\t0.08\t0.07\t0.0\t0.0\t0.02\t\n']
['0.01\t0.36\t0.02\t0.06\t0.0\t0.08\t0.06\t0.01\t0.08\t0.03\t0.01\t0.08\t0.0\t0.03\t0.01\t0.11\t0.04\t0.0\t0.01\t0.01\t\n']
['0.07\t0.02\t0.0\t0.02\t0.0\t0.04\t0.03\t0.02\t0.06\t0.04\t0.34\t0.02\t0.07\t0.04\t0.01\t0.04\t0.03\t0.03\t0.06\t0.07\t\n']
['0.04\t0.05\t0.01\t0.0\t0.0\t0.0\t0.03\t0.01\t0.01\t0.25\t0.04\t0.01\t0.1\t0.01\t0.02\t0.03\t0.06\t0.1\t0.0\t0.2\t\n']
['0.05\t0.24\t0.04\t0.08\t0.01\t0.01\t0.05\t0.09\t0.07\t0.02\t0.04\t0.18\t0.01\t0.01\t0.01\t0.02\t0.02\t0.02\t0.0\t0.03\t\n']
['0.04\t0.07\t0.03\t0.01\t0.01\t0.02\t0.03\t0.3\t0.01\t0.01\t0.01\t0.08\t0.01\t0.01\t0.21\t0.05\t0.07\t0.0\t0.0\t0.02\t\n']
['0.04\t0.13\t0.06\t0.01\t0.0\t0.01\t0.03\t0.04\t0.03\t0.04\t0.12\t0.09\t0.03\t0.01\t0.01\t0.03\t0.08\t0.0\t0.06\t0.17\t\n']
['0.08\t0.02\t0.02\t0.01\t0.16\t0.01\t0.01\t0.08\t0.02\t0.03\t0.03\t0.03\t0.02\t0.09\t0.05\t0.03\t0.03\t0.01\t0.24\t0.04\t\n']
['0.01\t0.19\t0.01\t0.11\t0.01\t0.02\t0.02\t0.03\t0.01\t0.24\t0.04\t0.1\t0.0\t0.07\t0.02\t0.02\t0.01\t0.0\t0.03\t0.06\t\n']
['0.02\t0.01\t0.02\t0.01\t0.0\t0.01\t0.03\t0.66\t0.01\t0.01\t0.02\t0.02\t0.0\t0.02\t0.02\t0.02\t0.03\t0.0\t0.04\t0.04\t\n']
['0.02\t0.06\t0.02\t0.02\t0.0\t0.42\t0.04\t0.02\t0.06\t0.01\t0.05\t0.12\t0.01\t0.02\t0.01\t0.02\t0.04\t0.0\t0.02\t0.04\t\n']
['0.05\t0.08\t0.06\t0.03\t0.0\t0.03\t0.04\t0.02\t0.03\t0.02\t0.02\t0.36\t0.03\t0.01\t0.01\t0.07\t0.1\t0.03\t0.01\t0.01\t\n']
['0.02\t0.05\t0.01\t0.09\t0.0\t0.44\t0.11\t0.02\t0.03\t0.04\t0.02\t0.03\t0.01\t0.01\t0.01\t0.02\t0.01\t0.0\t0.03\t0.05\t\n']
['0.02\t0.09\t0.01\t0.01\t0.0\t0.02\t0.06\t0.02\t0.08\t0.13\t0.07\t0.27\t0.03\t0.01\t0.01\t0.03\t0.03\t0.0\t0.02\t0.1\t\n']
['0.01\t0.01\t0.01\t0.01\t0.0\t0.01\t0.01\t0.01\t0.01\t0.08\t0.06\t0.01\t0.0\t0.08\t0.01\t0.01\t0.01\t0.48\t0.11\t0.04\t\n']
['0.01\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.02\t0.02\t0.0\t0.0\t0.6\t0.0\t0.0\t0.0\t0.04\t0.24\t0.02\t\n']
['0.13\t0.0\t0.0\t0.0\t0.01\t0.0\t0.0\t0.03\t0.02\t0.06\t0.49\t0.03\t0.09\t0.03\t0.01\t0.02\t0.02\t0.0\t0.0\t0.06\t\n']
['0.19\t0.0\t0.0\t0.0\t0.08\t0.0\t0.0\t0.01\t0.0\t0.13\t0.28\t0.0\t0.06\t0.1\t0.0\t0.02\t0.0\t0.0\t0.03\t0.08\t\n']
['0.04\t0.35\t0.01\t0.05\t0.0\t0.12\t0.08\t0.02\t0.02\t0.01\t0.04\t0.12\t0.03\t0.01\t0.01\t0.02\t0.01\t0.0\t0.05\t0.04\t\n']
['0.04\t0.01\t0.0\t0.01\t0.0\t0.0\t0.01\t0.02\t0.0\t0.04\t0.42\t0.01\t0.02\t0.21\t0.01\t0.04\t0.01\t0.0\t0.08\t0.07\t\n']
['0.05\t0.05\t0.02\t0.03\t0.02\t0.02\t0.05\t0.01\t0.01\t0.11\t0.12\t0.12\t0.02\t0.0\t0.03\t0.04\t0.17\t0.0\t0.01\t0.12\t\n']
['0.06\t0.0\t0.02\t0.03\t0.03\t0.0\t0.01\t0.41\t0.01\t0.02\t0.02\t0.02\t0.0\t0.05\t0.1\t0.11\t0.04\t0.0\t0.01\t0.04\t\n']
['0.05\t0.08\t0.07\t0.15\t0.01\t0.07\t0.08\t0.11\t0.04\t0.01\t0.04\t0.08\t0.0\t0.0\t0.04\t0.05\t0.06\t0.0\t0.03\t0.03\t\n']
['0.01\t0.0\t0.05\t0.34\t0.04\t0.03\t0.3\t0.01\t0.0\t0.01\t0.04\t0.0\t0.0\t0.0\t0.0\t0.12\t0.03\t0.0\t0.01\t0.0\t\n']
['0.09\t0.02\t0.03\t0.05\t0.03\t0.02\t0.09\t0.05\t0.0\t0.03\t0.07\t0.05\t0.0\t0.08\t0.02\t0.21\t0.05\t0.01\t0.02\t0.09\t\n']
['0.07\t0.04\t0.04\t0.24\t0.01\t0.08\t0.25\t0.01\t0.03\t0.0\t0.01\t0.04\t0.01\t0.0\t0.0\t0.09\t0.04\t0.0\t0.01\t0.04\t\n']
['0.06\t0.0\t0.0\t0.0\t0.0\t0.04\t0.01\t0.02\t0.01\t0.52\t0.06\t0.0\t0.01\t0.03\t0.07\t0.0\t0.03\t0.0\t0.0\t0.12\t\n']
['0.03\t0.15\t0.33\t0.11\t0.04\t0.1\t0.02\t0.02\t0.0\t0.03\t0.01\t0.06\t0.0\t0.0\t0.0\t0.04\t0.01\t0.01\t0.0\t0.03\t\n']
['0.01\t0.05\t0.02\t0.02\t0.01\t0.0\t0.0\t0.0\t0.0\t0.14\t0.34\t0.02\t0.06\t0.07\t0.08\t0.02\t0.02\t0.0\t0.0\t0.12\t\n']
['0.02\t0.05\t0.14\t0.18\t0.0\t0.16\t0.11\t0.03\t0.03\t0.02\t0.07\t0.07\t0.01\t0.01\t0.02\t0.02\t0.03\t0.0\t0.0\t0.02\t\n']
['0.16\t0.03\t0.06\t0.08\t0.03\t0.04\t0.11\t0.05\t0.04\t0.02\t0.08\t0.01\t0.02\t0.01\t0.03\t0.04\t0.09\t0.0\t0.02\t0.07\t\n']
['0.07\t0.05\t0.02\t0.03\t0.09\t0.02\t0.02\t0.12\t0.04\t0.01\t0.04\t0.04\t0.02\t0.01\t0.06\t0.14\t0.2\t0.0\t0.0\t0.02\t\n']
['0.06\t0.02\t0.09\t0.12\t0.0\t0.07\t0.12\t0.06\t0.1\t0.07\t0.06\t0.06\t0.02\t0.02\t0.02\t0.04\t0.03\t0.0\t0.01\t0.03\t\n']
['0.03\t0.06\t0.06\t0.04\t0.0\t0.06\t0.05\t0.03\t0.15\t0.02\t0.02\t0.19\t0.0\t0.01\t0.08\t0.07\t0.07\t0.0\t0.02\t0.03\t\n']
['0.16\t0.02\t0.04\t0.03\t0.03\t0.02\t0.11\t0.02\t0.0\t0.02\t0.02\t0.02\t0.0\t0.01\t0.38\t0.04\t0.02\t0.0\t0.03\t0.02\t\n']
['0.0\t0.0\t0.0\t0.02\t0.0\t0.02\t0.89\t0.0\t0.0\t0.0\t0.0\t0.01\t0.0\t0.01\t0.0\t0.0\t0.02\t0.0\t0.0\t0.0\t\n']
['0.01\t0.01\t0.0\t0.01\t0.01\t0.0\t0.01\t0.01\t0.06\t0.15\t0.07\t0.02\t0.0\t0.52\t0.01\t0.01\t0.02\t0.0\t0.0\t0.08\t\n']
['0.06\t0.03\t0.02\t0.37\t0.01\t0.07\t0.11\t0.04\t0.02\t0.03\t0.05\t0.05\t0.0\t0.0\t0.0\t0.11\t0.02\t0.0\t0.0\t0.01\t\n']
['0.22\t0.04\t0.03\t0.22\t0.02\t0.06\t0.11\t0.12\t0.02\t0.0\t0.01\t0.05\t0.0\t0.0\t0.0\t0.07\t0.01\t0.0\t0.01\t0.01\t\n']
['0.17\t0.01\t0.0\t0.0\t0.02\t0.0\t0.0\t0.0\t0.0\t0.06\t0.04\t0.0\t0.0\t0.03\t0.0\t0.0\t0.0\t0.52\t0.11\t0.0\t\n']
['0.08\t0.4\t0.01\t0.03\t0.02\t0.09\t0.07\t0.03\t0.0\t0.0\t0.01\t0.16\t0.02\t0.0\t0.0\t0.03\t0.01\t0.0\t0.0\t0.03\t\n']
['0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.04\t0.0\t0.0\t0.0\t0.95\t0.0\t0.0\t\n']
['0.01\t0.01\t0.03\t0.01\t0.01\t0.03\t0.03\t0.01\t0.03\t0.06\t0.1\t0.01\t0.09\t0.11\t0.0\t0.02\t0.01\t0.0\t0.04\t0.41\t\n']
['0.01\t0.04\t0.05\t0.22\t0.0\t0.02\t0.06\t0.05\t0.03\t0.0\t0.0\t0.05\t0.0\t0.0\t0.15\t0.27\t0.07\t0.0\t0.0\t0.0\t\n']
['0.06\t0.01\t0.0\t0.0\t0.01\t0.0\t0.0\t0.0\t0.0\t0.05\t0.19\t0.01\t0.06\t0.08\t0.15\t0.01\t0.01\t0.05\t0.24\t0.07\t\n']
['0.06\t0.01\t0.02\t0.22\t0.0\t0.05\t0.25\t0.02\t0.02\t0.0\t0.0\t0.05\t0.01\t0.0\t0.01\t0.02\t0.01\t0.24\t0.0\t0.0\t\n']
['0.04\t0.04\t0.08\t0.15\t0.0\t0.08\t0.28\t0.0\t0.01\t0.0\t0.02\t0.03\t0.01\t0.0\t0.0\t0.03\t0.02\t0.0\t0.15\t0.05\t\n']
['0.12\t0.0\t0.0\t0.0\t0.0\t0.0\t0.01\t0.0\t0.0\t0.04\t0.46\t0.0\t0.01\t0.01\t0.25\t0.01\t0.01\t0.0\t0.03\t0.05\t\n']
['0.1\t0.0\t0.0\t0.0\t0.03\t0.0\t0.01\t0.0\t0.0\t0.08\t0.22\t0.0\t0.04\t0.02\t0.24\t0.02\t0.01\t0.0\t0.02\t0.21\t\n']
['0.09\t0.16\t0.06\t0.16\t0.0\t0.07\t0.16\t0.06\t0.01\t0.0\t0.02\t0.08\t0.0\t0.0\t0.02\t0.09\t0.01\t0.0\t0.0\t0.0\t\n']
['0.12\t0.06\t0.03\t0.01\t0.0\t0.3\t0.03\t0.01\t0.03\t0.05\t0.19\t0.05\t0.01\t0.0\t0.0\t0.06\t0.0\t0.0\t0.01\t0.03\t\n']
['0.07\t0.0\t0.01\t0.0\t0.02\t0.0\t0.0\t0.0\t0.0\t0.22\t0.16\t0.0\t0.0\t0.0\t0.15\t0.0\t0.0\t0.0\t0.0\t0.34\t\n']
['0.1\t0.03\t0.03\t0.01\t0.0\t0.0\t0.01\t0.01\t0.0\t0.16\t0.05\t0.01\t0.03\t0.02\t0.01\t0.01\t0.04\t0.03\t0.0\t0.47\t\n']
['0.05\t0.0\t0.04\t0.06\t0.0\t0.06\t0.07\t0.02\t0.02\t0.0\t0.01\t0.02\t0.03\t0.05\t0.29\t0.18\t0.05\t0.0\t0.05\t0.0\t\n']
['0.01\t0.0\t0.0\t0.0\t0.0\t0.02\t0.0\t0.02\t0.02\t0.0\t0.01\t0.02\t0.0\t0.81\t0.0\t0.0\t0.0\t0.0\t0.06\t0.0\t\n']
['0.0\t0.0\t0.0\t0.0\t0.0\t0.02\t0.05\t0.0\t0.0\t0.0\t0.0\t0.86\t0.0\t0.0\t0.05\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.0\t0.58\t0.1\t0.03\t0.0\t0.03\t0.11\t0.0\t0.0\t0.0\t0.01\t0.13\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t\n']
['0.03\t0.05\t0.05\t0.29\t0.0\t0.1\t0.15\t0.04\t0.05\t0.0\t0.01\t0.06\t0.0\t0.0\t0.12\t0.04\t0.0\t0.0\t0.0\t0.0\t\n']
['0.03\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.01\t0.0\t0.09\t0.15\t0.0\t0.03\t0.0\t0.0\t0.01\t0.02\t0.03\t0.0\t0.61\t\n']
['0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.11\t0.11\t0.0\t0.0\t0.0\t0.0\t0.75\t0.0\t\n']
['0.03\t0.4\t0.0\t0.0\t0.0\t0.05\t0.23\t0.0\t0.01\t0.01\t0.04\t0.17\t0.02\t0.0\t0.0\t0.0\t0.02\t0.01\t0.0\t0.02\t\n']
['0.11\t0.24\t0.0\t0.07\t0.01\t0.15\t0.11\t0.01\t0.01\t0.0\t0.02\t0.17\t0.03\t0.04\t0.0\t0.0\t0.02\t0.0\t0.0\t0.0\t\n']
['0.33\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.14\t0.04\t0.0\t0.01\t0.0\t0.0\t0.0\t0.03\t0.0\t0.0\t0.45\t\n']
['0.02\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.15\t0.48\t0.01\t0.14\t0.01\t0.0\t0.0\t0.0\t0.0\t0.0\t0.19\t\n']
['0.08\t0.1\t0.08\t0.03\t0.01\t0.15\t0.12\t0.0\t0.01\t0.0\t0.0\t0.28\t0.01\t0.0\t0.01\t0.05\t0.07\t0.0\t0.0\t0.01\t\n']
['0.13\t0.04\t0.0\t0.02\t0.0\t0.02\t0.6\t0.0\t0.06\t0.0\t0.0\t0.01\t0.01\t0.0\t0.0\t0.0\t0.02\t0.0\t0.09\t0.0\t\n']
['0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.07\t0.0\t0.0\t0.28\t0.0\t0.0\t0.64\t0.0\t0.0\t0.0\t0.0\t0.0\t0.01\t\n']
['0.3\t0.1\t0.01\t0.0\t0.01\t0.03\t0.06\t0.03\t0.1\t0.05\t0.0\t0.04\t0.0\t0.01\t0.0\t0.17\t0.0\t0.0\t0.0\t0.09\t\n']
['0.07\t0.15\t0.06\t0.03\t0.0\t0.11\t0.07\t0.05\t0.04\t0.02\t0.0\t0.01\t0.01\t0.0\t0.17\t0.19\t0.02\t0.0\t0.0\t0.0\t\n']
['0.01\t0.02\t0.0\t0.0\t0.0\t0.02\t0.0\t0.0\t0.05\t0.07\t0.31\t0.0\t0.0\t0.27\t0.0\t0.03\t0.03\t0.01\t0.08\t0.08\t\n']
['0.19\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.01\t0.0\t0.15\t0.48\t0.05\t0.0\t0.03\t0.0\t0.01\t0.02\t0.0\t0.0\t0.07\t\n']
['0.01\t0.11\t0.01\t0.01\t0.19\t0.16\t0.0\t0.01\t0.0\t0.05\t0.1\t0.03\t0.1\t0.15\t0.0\t0.01\t0.0\t0.0\t0.0\t0.05\t\n']
['0.23\t0.04\t0.05\t0.09\t0.0\t0.02\t0.0\t0.04\t0.02\t0.0\t0.0\t0.0\t0.0\t0.0\t0.24\t0.29\t0.0\t0.0\t0.0\t0.0\t\n']
['0.13\t0.0\t0.0\t0.0\t0.0\t0.06\t0.05\t0.02\t0.02\t0.04\t0.24\t0.0\t0.05\t0.1\t0.04\t0.0\t0.11\t0.0\t0.0\t0.15\t\n']
['0.44\t0.02\t0.0\t0.0\t0.0\t0.18\t0.09\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.19\t0.02\t0.02\t0.0\t0.0\t0.03\t\n']
['0.51\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.3\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.19\t0.0\t0.0\t0.0\t0.0\t\n']
I dont understand why this is happening. If there is an easier why to parse a bunch of files with numbers in it finding those that are ALL zeroes that would be better :)
import pandas as pd
df = pd.read_csv("/Users/name/Desktop/PDB/test_d3psm_misc/d3ps.profile", sep="\t", header=None).astype(float)
df = df.loc[df.sum(axis=1)>0]
Then, if you want to convert in numpy array:
df = df.to_numpy()

How to save an excel file containing a data set into a csv file while preserving the data type of the values as float?

I am importing a csv file into python with the pandas package.
SP = pd.read_csv('S&P500 (5year).csv')
When I go to use the pct_change() operand, it is unable to process the values as they have been saved with the type 'str'.
I have tried using the .astype(float) method and it returns an error could not convert string to float: '1805.51'
The 'Adj Close**' are type str and I need them as type float
Date Open High Low Close* Adj Close** Volume
0 11/1/2013 1,758.70 1,813.55 1,746.20 1,805.81 1,805.81 63,628,190,00
1 12/1/2013 1,806.55 1,849.44 1,767.99 1,848.36 1,848.36 64,958,820,000.00
2 1/1/2014 1,845.86 1,850.84 1,770.45 1,782.59 1,782.59 75,871,910,000.00
3 2/1/2014 1,782.68 1,867.92 1,737.92 1,859.45 1,859.45 69,725,590,000.00
4 3/1/2014 1,857.68 1,883.97 1,834.44 1,872.34 1,872.34 71,885,030,000.00
Try adding dtype and thousands into read_csv function. Replace the column_name in the example with the column you need to convert to float. As csv is split by commas, you need to add the thousands parameter when reading csv.
Example:
SP = pd.read_csv('S&P500 (5year).csv', thousands=',', dtype={'column_name': float})

numpy savetxt: how to save an integer and a float numpy array into the save row of the file

I have a set of integers and a set of numpy arrays, which I would like to use np.savetxt to store the corresponding integer and the array into the same row, and rows are separated by \n.
In the txt file, each row should look like the following:
12345678 0.282101 -0.343122 -0.19537 2.001613 1.034215 0.774909 0.369273 0.219483 1.526713 -1.637871
The float numbers should separated by space
I try to use the following code to solve this
np.savetxt("a.txt", np.column_stack([ids, a]), newline="\n", delimiter=' ',fmt='%d %.06f')
But somehow I cannot figure out the correct formating for integer and floats.
Any suggestions?
Please specify what a "set of integers" and "set of numpy arrays" are: from your example it looks as though ids is a list or 1d numpy array, and a is a 2d numpy array, but this is not clear from your question.
If you're trying to combine a list of integers with a 2d array, you should probably avoid np.savetxt and convert to a string first:
import numpy as np
ids = [1, 2, 3, 4, 5]
a = np.random.rand(5, 5)
with open("filename.txt", "w") as f:
for each_id, row in zip(ids, a):
line = "%d " %each_id + " ".join(format(x, "0.8f") for x in row) + "\n"
f.write(line)
Gives the output in filename.txt:
1 0.38325380 0.80964789 0.83787527 0.83794886 0.93933360
2 0.44639702 0.91376799 0.34716179 0.60456704 0.27420285
3 0.59384528 0.12295988 0.28452126 0.23849965 0.08395266
4 0.05507753 0.26166780 0.83171085 0.17840250 0.66409724
5 0.11363045 0.40060894 0.90749637 0.17903019 0.15035594

Error when trying to save hdf5 row where one column is a string and the other is an array of floats

I have two column, one is a string, and the other is a numpy array of floats
a = 'this is string'
b = np.array([-2.355, 1.957, 1.266, -6.913])
I would like to store them in a row as separate columns in a hdf5 file. For that I am using pandas
hdf_key = 'hdf_key'
store5 = pd.HDFStore('file.h5')
z = pd.DataFrame(
{
'string': [a],
'array': [b]
})
store5.append(hdf_key, z, index=False)
store5.close()
However, I get this error
TypeError: Cannot serialize the column [array] because
its data contents are [mixed] object dtype
Is there a way to store this to h5? If so, how? If not, what's the best way to store this sort of data?
I can't help you with pandas, but can show you how do this with pytables.
Basically you create a table referencing either a numpy recarray or a dtype that defines the mixed datatypes.
Below is a super simple example to show how to create a table with 1 string and 4 floats. Then it adds rows of data to the table.
It shows 2 different methods to add data:
1. A list of tuples (1 tuple for each row) - see append_list
2. A numpy recarray (with dtype matching the table definition) -
see simple_recarr in the for loop
To get the rest of the arguments for create_table(), read the Pytables documentation. It's very helpful, and should answer additional questions. Link below:
Pytables Users's Guide
import tables as tb
import numpy as np
with tb.open_file('SO_55943319.h5', 'w') as h5f:
my_dtype = np.dtype([('A','S16'),('b',float),('c',float),('d',float),('e',float)])
dset = h5f.create_table(h5f.root, 'table_data', description=my_dtype)
# Append one row using a list:
append_list = [('test string', -2.355, 1.957, 1.266, -6.913)]
dset.append(append_list)
simple_recarr = np.recarray((1,),dtype=my_dtype)
for i in range(5):
simple_recarr['A']='string_' + str(i)
simple_recarr['b']=2.0*i
simple_recarr['c']=3.0*i
simple_recarr['d']=4.0*i
simple_recarr['e']=5.0*i
dset.append(simple_recarr)
print ('done')

PYTHON - Error while using numpy genfromtxt to import csv data with multiple data types

I'm working on a kaggle competition to predict restaurant revenue based on multiple predictors. I'm a beginner user of Python, I would normally use Rapidminer for data analysis. I am using Python 3.4 on the Spyder 2.3 dev environment.
I am using the below code to import the training csv file.
from sklearn import linear_model
from numpy import genfromtxt, savetxt
def main():
#create the training & test sets, skipping the header row with [1:]
dataset = genfromtxt(open('data/train.csv','rb'), delimiter=",", dtype= None)[1:]
train = [x[1:41] for x in dataset]
test = genfromtxt(open('data/test.csv','rb'), delimiter=",")[1:]
This is the error I get:
dataset = genfromtxt(open('data/train.csv','rb'), delimiter=",", dtype= None)[1:]
IndexError: too many indices for array
Then I checked for various imported data types using print (dataset.dtype)
I noticed that the datatypes had been individually assigned for every value in the csv file. Moreover, the code wouldn't work with [1:] in the end. It gave me the same error of too many indices. And if I removed [1:] and defined the input with the skip_header=1 option, I got the below error:
output = np.array(data, dtype=ddtype)
TypeError: Empty data-type
It seems to me like the entire data set is being read as a single row with over 5000 columns.
The data set consists of 43 columns and 138 rows.
I'm stuck at this point, I would appreciate any help with how I can proceed.
I'm posting the raw csv data below (a sample):
Id,Open Date,City,City Group,Type,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,P11,P12,P13,P14,P15,P16,P17,P18,P19,P20,P21,P22,P23,P24,P25,P26,P27,P28,P29,P30,P31,P32,P33,P34,P35,P36,P37,revenue
0,7/17/99,Ä°stanbul,Big Cities,IL,4,5,4,4,2,2,5,4,5,5,3,5,5,1,2,2,2,4,5,4,1,3,3,1,1,1,4,2,3,5,3,4,5,5,4,3,4,5653753
1,2/14/08,Ankara,Big Cities,FC,4,5,4,4,1,2,5,5,5,5,1,5,5,0,0,0,0,0,3,2,1,3,2,0,0,0,0,3,3,0,0,0,0,0,0,0,0,6923131
2,3/9/13,DiyarbakÄr,Other,IL,2,4,2,5,2,3,5,5,5,5,2,5,5,0,0,0,0,0,1,1,1,1,1,0,0,0,0,1,3,0,0,0,0,0,0,0,0,2055379
3,2/2/12,Tokat,Other,IL,6,4.5,6,6,4,4,10,8,10,10,8,10,7.5,6,4,9,3,12,20,12,6,1,10,2,2,2.5,2.5,2.5,7.5,25,12,10,6,18,12,12,6,2675511
4,5/9/09,Gaziantep,Other,IL,3,4,3,4,2,2,5,5,5,5,2,5,5,2,1,2,1,4,2,2,1,2,1,2,3,3,5,1,3,5,1,3,2,3,4,3,3,4316715
5,2/12/10,Ankara,Big Cities,FC,6,6,4.5,7.5,8,10,10,8,8,8,10,8,6,0,0,0,0,0,5,6,3,1,5,0,0,0,0,7.5,5,0,0,0,0,0,0,0,0,5017319
6,10/11/10,Ä°stanbul,Big Cities,IL,2,3,4,4,1,5,5,5,5,5,2,5,5,3,4,4,3,4,2,4,1,2,1,5,4,4,5,1,3,4,5,2,2,3,5,4,4,5166635
7,6/21/11,Ä°stanbul,Big Cities,IL,4,5,4,5,2,3,5,4,4,4,4,3,4,0,0,0,0,0,3,5,2,4,2,0,0,0,0,3,2,0,0,0,0,0,0,0,0,4491607
8,8/28/10,Afyonkarahisar,Other,IL,1,1,4,4,1,2,1,5,5,5,1,5,5,1,1,2,1,4,1,1,1,1,1,4,4,4,2,2,3,4,5,5,3,4,5,4,5,4952497
9,11/16/11,Edirne,Other,IL,6,4.5,6,7.5,6,4,10,10,10,10,2,10,7.5,0,0,0,0,0,25,3,3,1,10,0,0,0,0,5,2.5,0,0,0,0,0,0,0,0,5444227
I think the characters (e.g. Ä°) are causing the problem in genfromtxt. I found the following reads in the data you have here,
dtypes = "i8,S12,S12,S12,S12" + ",i8"*38
test = genfromtxt(open('data/test.csv','rb'), delimiter="," , names = True, dtype=dtypes)
You can then access the elements by name,
In [16]: test['P8']
Out[16]: array([ 4, 5, 5, 8, 5, 8, 5, 4, 5, 10])
The values for the city column,
test['City']
returns,
array(['\xc3\x84\xc2\xb0stanbul', 'Ankara', 'Diyarbak\xc3\x84r', 'Tokat',
'Gaziantep', 'Ankara', '\xc3\x84\xc2\xb0stanbul',
'\xc3\x84\xc2\xb0stanbul', 'Afyonkarahis', 'Edirne'],
dtype='|S12')
In principle, you could try to convert these to unicode in your python script with something like,
In [17]: unicode(test['City'][0], 'utf8')
Out[17]: u'\xc4\xb0stanbul
Where \xc4\xb0 is UTF-8 hexadecimal encoding for İ. To avoid this, you could also try to clean up the csv input files.
[Solved].
I just chucked numpy's genfromtext and opted to use read_csv from pandas since it gives the option to import text in 'utf-8' encoding.

Categories