Python: error in returning loop results - python

My output for the following code is only save the last row of result instead of putting every single row of values into the csv file. I have limited knowledge in python. I think my looping part is incorrect. Can anyone help me?
Code
import numpy as np
from numpy import genfromtxt
with open('binary.csv') as actg:
actg=actg.readlines()
with open('single.csv') as single:
single=single.readlines()
with open('division.csv') as division:
division=division.readlines()
for line in actg:
for line2 in single:
for line1 in division:
myarray = np.fromstring(line, dtype=float, sep=',')
myarray = myarray.reshape((-1, 3, 4))
a=np.asmatrix(myarray)
a=np.array(a)
single1 = np.fromstring(line2, dtype=float, sep=',')
single1 = single1.reshape((-1, 4))
s=np.asmatrix(single1)
s=np.array(s)
division1 = np.fromstring(line1, dtype=float, sep=',')
m=np.asmatrix(division1)
m=np.array(m)
res2 = (s[np.newaxis,:,:] / m[:,np.newaxis,:] * a).sum(axis=-1)
np.savetxt("output.csv", res2, delimiter=",")
binary.csv
0,1,0,0,1,0,0,0,0,0,0,1
0,0,1,0,1,0,0,0,1,0,0,0
single.csv:
0.28,0.22,0.23,0.27,0.12,0.29,0.34,0.21,0.44,0.56,0.51,0.65
division.csv
0.4,0.5,0.7,0.1
0.2,0.8,0.9,0.3
Expected output
0.44,0.3,6.5
0.26,0.6,2.2
Actual output
0.26,0.6,2.2

It is not you python understanding. It is your loop logic understanding.
If you look closely at your loops, you (and not Python) always keep the last elements.
In the first and second loops you overwrite each time the variable that you need and you do not keep the overall information for each iteration.
In the last loop, you declare the row variable as new list in each iteration.
Also, the i variable is not set up anywhere except as the index variable in the for loop!

Because of the fact that you only write out the results after the loops have finished, you are only seeing the last result.
Also, I would expect 4 results from that data, not 2.
You need to write the output on each iteration or store each iteration and then write out the stored values.
Try printing res2 each time you calculate it and you will see something like this:
('Final Results', array([[ 0.44, 0.3 , 6.5 ]]))
('Final Results', array([[ 0.275 , 0.6 , 2.16666667]]))
('Final Results', array([[ 0.32857143, 0.3 , 1.1 ]]))
('Final Results', array([[ 0.25555556, 0.6 , 2.2 ]]))

Related

How to combine h5 data numpy arrays based on date in filename?

I have hundreds of .h5 files with dates in their filename (e.g ...20221017...). For each file, I have extracted some parameters into a numpy array of the format
[[param_1a, param_2a...param_5a],
...
[param_1x, param_2x,...param_5x]]
which represents data of interest. I want to group the data by month, so instead of having (e.g) 30 arrays for one month, I have 1 array which represents the average of the 30 arrays. How can I do this?
This is the code I have so far, filename represents a txt file of file names.
def combine_months(filename):
fin = open(filename, 'r')
next_name = fin.readline()
while (next_name != ""):
year = next_name[6:10]
month = next_name[11:13]
date = month+'\\'+year
#not sure where to go from here
fin.close()
An example of what I hope to achieve is that say array_1, array_2, array_3 are numpy arrays representing data from different h5 files with the same month in the date of their filename.
array_1 = [[ 1 4 10]
[ 2 5 11]
[3 6 12]]
array_2 = [[ 1 2 5]
[ 2 2 3]
[ 3 6 12]]
array_3 = [[ 2 4 10]
[ 3 2 3]
[ 4 6 12]]
I want the result to look like:
2022_04_data = [[1,3,7.5]
[2, 2, 6.5]
[3,4,7.5]
[4,6,12]]
Note that the first number of each row represents an ID, so I need to group those data together based on the first number as well.
Ok, here is the beginning of an answer. (I suspect you may have more questions as you work thru the details.)
There are several ways to get the filenames. You could put them in a file, but it's easier (and better IMHO) to use the glob.iglob() function. There are 2 examples below that show how to: 1) open each file, 2) read the data from the data dataset into an array, and 3) append the array to a list. The first example has the file names in a list. The second uses the glob.iglob() function to get the filenames. (You could also use glob.glob() to create a list of names.)
Method 1: read filenames from list
import h5py
arr_list = []
for h5file in ['20221001.h5', '20221002.h5', '20221003.h5']:
with h5py.File(h5file,'r') as h5f:
arr = h5f['data'][()]
#print(arr)
arr_list.append(arr)
Method 2: use glob.iglob() to get files using wildcard names
import h5py
from glob import iglob
arr_list = []
for h5file in iglob('202210*.h5'):
with h5py.File(h5file,'r') as h5f:
print(h5f.keys()) # to get the dataset names from the keys
arr = h5f['data'][()]
#print(arr)
arr_list.append(arr)
After you have read the datasets into arrays, you iterate over the list, do your calculations and create a new array from the results. Code below shows how to get the shape and dtype.
for arr in arr_list:
# do something with the data based on column 0 value
print(arr.shape, arr.dtype)
Code below shows a way to sum rows with matching column 0 values. Without more details it's hard to show exactly how to do this. It reads all column 0 values into a sorted list, then uses to size count and sum arrays, then as a index to the proper row.
# first create a list from column 0 values, then sort
row_value_list = []
for arr in arr_list:
col_vals = arr[:,0]
for val in col_vals:
if val not in row_value_list:
row_value_list.append(val)
# Sort list of column IDs
row_value_list.sort()
# get length index list to create cnt and sum arrays
a0 = len(row_value_list)
# get shape and dtype from 1st array, assume constant for all
a1 = arr_list[0].shape[1]
dt = arr_list[0].dtype
arr_cnt = np.zeros(shape=(a0,a1),dtype=dt)
arr_cnt[:,0] = row_value_list
arr_sum = np.zeros(shape=(a0,a1),dtype=dt)
arr_sum[:,0] = row_value_list
for arr in arr_list:
for row in arr:
idx = row_value_list.index(row[0])
arr_cnt[idx,1:] += 1
arr_sum[idx,1:] += row[1:]
print('Count Array\n',arr_cnt)
print('Sum Array\n',arr_sum)
arr_ave = arr_sum/arr_cnt
arr_ave[:,0] = row_value_list
print('Average Array\n',arr_ave)
Here is an alternate way to create row_value_list from a set. It's simpler because sets don't retain duplicate values, so you don't have to check for existing values when adding them to row_value_set.
# first create a set from column 0 values, then create a sorted list
row_value_set = set()
for arr in arr_list:
col_vals = set(arr[:,0])
row_value_set = row_value_set.union(col_vals)
row_value_list = sorted(row_value_set)
This is a new, updated answer that addresses the comment/request about calculating the median. (It also calculates the mean, and can be easily extended to calculate other statistics from the masked array.)
As noted in my comment on Nov 4 2022, "starting from my first answer quickly got complicated and hard to follow". This process is similar but different from the first answer. It uses glob to get a list of filenames (instead of iglob). Instead of loading the H5 datasets into a list of arrays, it loads all of the data into a single array (data is "stacked" on the 0-axis.). I don't think this increases the memory footprint. However, memory could be a problem if you load a lot of very large datasets for analysis.
Summary of the procedure:
Use glob.glob() to load filenames to a list based on a wildcard
Allocate an array to hold all the data (arr_all) based on the # of
files and size of 1 dataset.
Loop thru all H5 files, loading data to arr_all
Create a sorted list of unique group IDs (column 0 values)
Allocate arrays to hold mean/median (arr_mean and arr_median) based on the # of unique row IDs and # of columns in arr_all.
Loop over values in ID list, then:
a. Create masked array (mask) where column 0 value = loop value
b. Broadcast mask to match arr_all shape then apply to create ma_arr_all
c. Loop over columns of ma_arr_all, compress to get unmasked values, then calculate mean and median and save.
Code below:
import h5py
from glob import glob
import numpy as np
# use glob.glob() to get list of files using wildcard names
file_list = glob('202210*.h5')
with h5py.File(file_list[0],'r') as h5f:
a0, a1 = h5f['data'].shape
# allocate array to hold values from all datasets
arr_all = np.zeros(shape=(len(file_list)*a0,a1), dtype=h5f['data'].dtype)
start, stop = 0, a0
for i, h5file in enumerate(file_list):
with h5py.File(h5file,'r') as h5f:
arr_all[start:stop,:] = h5f['data'][()]
start += a0
stop += a0
# Create a set from column 0 values, and use to create a sorted list
row_value_list = sorted(set(arr_all[:,0]))
arr_mean = np.zeros(shape=(len(row_value_list),arr_all.shape[1]))
arr_median = np.zeros(shape=(len(row_value_list),arr_all.shape[1]))
col_0 = arr_all[:,0:1]
for i, row_val in enumerate(row_value_list):
row_mask = np.where(col_0==row_val, False, True ) # True mask value ignores data.
all_mask= np.broadcast_to(row_mask, arr_all.shape)
ma_arr_all = np.ma.masked_array(arr_all, mask=all_mask)
for j in range(ma_arr_all.shape[1]):
masked_col = ma_arr_all[:,j:j+1].compressed()
arr_mean[i:i+1,j:j+1] = np.mean(masked_col)
arr_median[i:i+1,j:j+1] = np.median(masked_col)
print('Mean values:\n',arr_mean)
print('Median values:\n',arr_median)
Added Nov 22, 2022:
Method above uses np.broadcast_to() introduced in NumPy 1.10. Here is an alternate method for prior versions. (Replaces the entire for i, row_val loop.) It should be more memory efficient. I haven't profiled to verify, but arrays all_mask and ma_arr_all are not created.
for i, row_val in enumerate(row_value_list):
row_mask = np.where(col_0==row_val, False, True ) # True mask value ignores data.
for j in range(arr_all.shape[1]):
masked_col = np.ma.masked_array(arr_all[:,j:j+1], mask=row_mask).compressed()
arr_mean[i:i+1,j:j+1] = np.mean(masked_col)
arr_median[i:i+1,j:j+1] = np.median(masked_col)
I ran with values provided by OP. Output is provided below and is the same for both methods:
Mean values:
[[ 1. 3. 7.5 ]
[ 2. 3.66666667 8. ]
[ 3. 4.66666667 9. ]
[ 4. 6. 12. ]]
Median values:
[[ 1. 3. 7.5]
[ 2. 4. 10. ]
[ 3. 6. 12. ]
[ 4. 6. 12. ]]

Python for loop, iterating over values from numpy arrange method

I need to write code that tests a numpy array of cutoff values for a classification problem. The values to test are stored in the cutoff_list variable. I then want to place the list of resulting confusion matrices in a dictionary. However, the code below gives me only the first dictionary entry (confusion matrix for the first test value):
cutoff_list = [np.arange(0,1,0.01)] # list of test values
dictionary = {}
for i, v in enumerate(cutoff_list):
actual = (df.observed)
predicted = np.where(df.indicator > i, 1, 0)
df_confusion = confusion_matrix(actual, predicted) / len(df.indicator)
dictionary[i] = df_confusion
print(dictionary)
Libraries that I am using:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
from sklearn.metrics import confusion_matrix
Is this a problem with the loop or the dictionary update step? I'm new to Python, have more experience with R and still struggling here. Any help appreciated.
Your for loop runs only once, because of the way you set it up.
You are enclosing the return of the np.arrange(0.1.0.01) in a list which breaks the way you want your for loop to run. You are getting just one value because the for loop runs just once as the outer list has just one item.
>>> cutoff_list = [np.arange(0,1,0.01)]
>>> cutoff_list
[array([0. , 0.01, ... 0.98, 0.99])]
>>> type(cutoff_list)
<class 'list'>
You want to get the actual numpy array:
>>> cutoff_list = np.arange(0,1,0.01)
>>> cutoff_list
array([0. , 0.01, ... 0.98, 0.99])
>>> type(cutoff_list)
<class 'numpy.ndarray'>
Change the line cutoff_list = [np.arange(0,1,0.01)] to cutoff_list = np.arange(0,1,0.01) and see whether that resolves your problem.
I would also imagine that you want to use v instead of i in this line:
predicted = np.where(df.indicator > i, 1, 0)
as the i will hold just the enumerating value that you use as key for your dict, whereas the v will hold the values from cutoff_list.

Why does pd.as_matrix() change the values and the number of decimal places from the original data frame?

I have a dataframe that consists of two decimal values and an Id:
When I apply the as matrix function on the x and y values it yields an array that looks like this:
coords = df.as_matrix(columns=['x', 'y'])
coords
yields:
array([[ 0.0703843 , 0.170845 ],
[ 0.07022078, 0.17150128],
[ 0.07208886, 0.17159163],
...,
[ 0.07162819, 0.17044404],
[ 0.06951432, 0.17096308],
[ 0.07104143, 0.17040137]])
This immediately seemed strange since the length of the decimal place were inconsistent but I just assumed pandas was doing some shortening for display purposes
But then when I tried to retrieve the IDs - I could only get one or zero matchs when they should all match:
ids = []
for coord in coords:
try:
_id = df.loc[df['x'] == coord[0]]['id'][1]
ids.append(_id)
except:
pass
len(ids)
1
What I am trying to understand is why the pd.as_matrix function extracts a value from the data frame that cannot be referenced again, and if so how do retrieve the ids from the data frame.
Any help here would be appreciated.
Thanks
Edit
Bellow is an subset of the data frame in CSV:
,id,x,y
0,07379a26-2447-4fce-83ac-4784abf07389,0.07038429591623253,0.17084500318384327
1,f5cc3adb-0588-4705-b1a3-fe1b669b776f,0.07022078416348305,0.17150127781674332
2,b5a57ffe-8565-4443-9685-11675ce25dc4,0.07208886125821728,0.17159163002146055
3,940efcaa-6d9d-4b10-a0fe-d8ec8c1d9c7e,0.07057468050347501,0.1700482708522834
4,616d7794-565a-4d2d-98cb-334beb5b91ef,0.07057895306948389,0.170054305037284
5,e2d1819d-1f58-407d-9950-be0a0c00374b,0.07161607658023798,0.17013089473907284
6,6a739687-f9ad-47bd-8a4b-c47bc4b2aec6,0.070163429153604,0.16889764101717875
7,dd2df646-9a66-4baa-8815-d24f1858eda7,0.07035099968831582,0.16995622800529742
8,6a224d76-efea-4313-803d-c25b619dae0a,0.07066777462044714,0.17021849979554743
9,321147fa-ee51-4bab-9634-199c92a42d2f,0.06984869509314469,0.17098101436534555
10,e52d6289-01ba-4e7d-8054-bb9a349c0505,0.07068704829137691,0.17029718331066224
11,517f256b-6171-4d93-9b4b-0f81aac828fb,0.0713283119291569,0.16983952831019206
12,e339c742-9784-49fc-a435-790db0364229,0.07131341496221469,0.1698513011377732
13,6f20ad5a-22fb-43a2-8885-838e5161df14,0.06942397329210678,0.1716572235671854
14,f6e1008f-2b22-4d88-8c84-c0dc4f2d822e,0.06942427697939664,0.17165098925109726
15,8a2d35e5-10a2-4188-b98f-54200d2db8da,0.07048162129308791,0.16896051533992895
16,adab8fd8-4348-412d-85d2-01491886967b,0.07076495746208027,0.16966622176968035
17,df79523b-848b-45a9-8dab-fe53c2a5b62d,0.06988926585338372,0.17028143287771583
18,db05d97c-3b16-4da8-9659-820fc7e3f858,0.0713167479593096,0.1685149810693375
19,d43963d1-b803-473c-85dc-2ed2e9f77f4e,0.07045583812582461,0.1706502407290604
20,9d99c9a6-2de3-4e6a-9bd7-9d7ece358a2f,0.07044174575566758,0.17066067488910522
21,3eec44be-b9e2-45a2-b919-05028f5a0ba9,0.07079585677115756,0.16920818686920963
22,9f836847-2b67-4b33-930a-1f84452628ba,0.07078522829778934,0.16919781903167638
23,fbaa8958-a5d5-4dfb-91f7-8c11afe226a8,0.07128542860765898,0.16834798505762455
24,a84b59c4-4145-472d-a26a-4c930648c16c,0.07196635776157265,0.17047633495883885
25,29cf8ad3-7068-4207-b0a2-4f0cff337c9f,0.0719701195278871,0.17051442269732875
26,d0f512c8-5c4f-427a-99e1-ebb4c5b363e5,0.0718787509597688,0.17054903897593635
27,74b1db2d-002b-4f89-8d02-ac084e9a3cd5,0.07089130417373782,0.16981103290127117
28,89210a0c-8144-491d-9e98-19e7f4c3085e,0.07076060461092577,0.1707011426749184
29,aebb377e-7c26-4bb5-8563-c3055a027844,0.07103977816965212,0.17113978347674103
30,00b527a0-d40a-44b4-90f9-750fd447d2d7,0.07097785505134419,0.16963542019904118
31,8c186559-f50d-40ca-a821-11596e1e5261,0.06992637446216321,0.17110063865050085
32,0e64cf14-6ccd-4ad0-9715-ab410f6baf6a,0.0718311255786932,0.1705675237580442
33,f5479823-1efe-47b8-9977-73dc41d1d69e,0.07016981880399553,0.1703708437681898
34,385cfa13-2476-4e3d-b755-3063a7f802b9,0.07016550435008462,0.17037054473511137
35,a40bf573-b701-46f0-9a06-5857cf3ab199,0.0701443567773146,0.17035314147536326
36,0c5a9751-2c1b-4003-834d-9584d2f907a2,0.07016050805421256,0.17038992836178396
37,65b09067-9cf0-492d-8a70-13d4f92f8a10,0.07137336818557355,0.1684713798357405
The issue is with the df.loc function on geo-dataframes.
Once I exported it to a csv, then re read the dataframe in using normal pandas it seemed to work just fine.
Just letting who finds this know.

Pandas: Replacing column values in dataframe columns

my goal for this question is to insert a comma between every character in every column value, which have been hashed and padded to a length of 19 digits.
The code below works partially, but the array values get messed up by trying to apply the f_comma function to the column value...thanks for any help!
I've taken some of the answers from other questions and have created the following code:
using this function -
def f_comma(p_string, n=1):
p_string = str(p_string)
return ','.join(p_string[i:i+n] for i in range(0, len(p_string), n))
And opening a tsv file
data = pd.read_csv('a1.tsv', sep = '\t', dtype=object)
I have modified another answer to do the following -
h = 1
try:
while data.columns[h]:
a = data.columns[h]
data[a] = f_comma((abs(data[a].apply(hash))).astype(str).str.zfill(19))
h += 1
except IndexError:
pass
which returns this array
array([[ '0, , , , ,4,1,7,5,7,0,1,4,5,4,6,1,6,5,3,1,4,6,1,\n,N,a,m,e,:, ,d,a,t,e,,, ,d,t,y,p,e,:, ,o,b,j,e,c,t',
'0, , , , ,6,2,9,1,6,7,0,8,4,2,8,2,9,1,0,9,5,9,4,\n,N,a,m,e,:, ,n,a,m,e,,, ,d,t,y,p,e,:, ,o,b,j,e,c,t']], dtype=object)
without the f_comma function the array looks like -
array([['3556968867719847281', '3691880917405293133']], dtype=object)
The goal is an array like this -
array([['3,5,5,6,9,6,8,8,6,7,7,1,9,8,4,7,2,8,1', '3,6,9,1,8,8,0,9,1,7,4,0,5,2,9,3,1,3,3']], dtype=object)
You should be able to use pandas string functions.
e.g. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.join.html
df["my_column"].str.join(',')

How to use numpy.genfromtxt when first column is string and the remaining columns are numbers?

Basically, I have a bunch of data where the first column is a string (label) and the remaining columns are numeric values. I run the following:
data = numpy.genfromtxt('data.txt', delimiter = ',')
This reads most of the data well, but the label column just gets 'nan'. How can I deal with this?
By default, np.genfromtxt uses dtype=float: that's why you string columns are converted to NaNs because, after all, they're Not A Number...
You can ask np.genfromtxt to try to guess the actual type of your columns by using dtype=None:
>>> from StringIO import StringIO
>>> test = "a,1,2\nb,3,4"
>>> a = np.genfromtxt(StringIO(test), delimiter=",", dtype=None)
>>> print a
array([('a',1,2),('b',3,4)], dtype=[('f0', '|S1'),('f1', '<i8'),('f2', '<i8')])
You can access the columns by using their name, like a['f0']...
Using dtype=None is a good trick if you don't know what your columns should be. If you already know what type they should have, you can give an explicit dtype. For example, in our test, we know that the first column is a string, the second an int, and we want the third to be a float. We would then use
>>> np.genfromtxt(StringIO(test), delimiter=",", dtype=("|S10", int, float))
array([('a', 1, 2.0), ('b', 3, 4.0)],
dtype=[('f0', '|S10'), ('f1', '<i8'), ('f2', '<f8')])
Using an explicit dtype is much more efficient than using dtype=None and is the recommended way.
In both cases (dtype=None or explicit, non-homogeneous dtype), you end up with a structured array.
[Note: With dtype=None, the input is parsed a second time and the type of each column is updated to match the larger type possible: first we try a bool, then an int, then a float, then a complex, then we keep a string if all else fails. The implementation is rather clunky, actually. There had been some attempts to make the type guessing more efficient (using regexp), but nothing that stuck so far]
If your data file is structured like this
col1, col2, col3
1, 2, 3
10, 20, 30
100, 200, 300
then numpy.genfromtxt can interpret the first line as column headers using the names=True option. With this you can access the data very conveniently by providing the column header:
data = np.genfromtxt('data.txt', delimiter=',', names=True)
print data['col1'] # array([ 1., 10., 100.])
print data['col2'] # array([ 2., 20., 200.])
print data['col3'] # array([ 3., 30., 300.])
Since in your case the data is formed like this
row1, 1, 10, 100
row2, 2, 20, 200
row3, 3, 30, 300
you can achieve something similar using the following code snippet:
labels = np.genfromtxt('data.txt', delimiter=',', usecols=0, dtype=str)
raw_data = np.genfromtxt('data.txt', delimiter=',')[:,1:]
data = {label: row for label, row in zip(labels, raw_data)}
The first line reads the first column (the labels) into an array of strings.
The second line reads all data from the file but discards the first column.
The third line uses dictionary comprehension to create a dictionary that can be used very much like the structured array which numpy.genfromtxt creates using the names=True option:
print data['row1'] # array([ 1., 10., 100.])
print data['row2'] # array([ 2., 20., 200.])
print data['row3'] # array([ 3., 30., 300.])
data=np.genfromtxt(csv_file, delimiter=',', dtype='unicode')
It works fine for me.
For a dataset of this format:
CONFIG000 1080.65 1080.87 1068.76 1083.52 1084.96 1080.31 1081.75 1079.98
CONFIG001 414.6 421.76 418.93 415.53 415.23 416.12 420.54 415.42
CONFIG010 1091.43 1079.2 1086.61 1086.58 1091.14 1080.58 1076.64 1083.67
CONFIG011 391.31 392.96 391.24 392.21 391.94 392.18 391.96 391.66
CONFIG100 1067.08 1062.1 1061.02 1068.24 1066.74 1052.38 1062.31 1064.28
CONFIG101 371.63 378.36 370.36 371.74 370.67 376.24 378.15 371.56
CONFIG110 1060.88 1072.13 1076.01 1069.52 1069.04 1068.72 1064.79 1066.66
CONFIG111 350.08 350.69 352.1 350.19 352.28 353.46 351.83 350.94
This code works for my application:
def ShowData(data, names):
i = 0
while i < data.shape[0]:
print(names[i] + ": ")
j = 0
while j < data.shape[1]:
print(data[i][j])
j += 1
print("")
i += 1
def Main():
print("The sample data is: ")
fname = 'ANOVA.csv'
csv = numpy.genfromtxt(fname, dtype=str, delimiter=",")
num_rows = csv.shape[0]
num_cols = csv.shape[1]
names = csv[:,0]
data = numpy.genfromtxt(fname, usecols = range(1,num_cols), delimiter=",")
print(names)
print(str(num_rows) + "x" + str(num_cols))
print(data)
ShowData(data, names)
Python-2 output:
The sample data is:
['CONFIG000' 'CONFIG001' 'CONFIG010' 'CONFIG011' 'CONFIG100' 'CONFIG101'
'CONFIG110' 'CONFIG111']
8x9
[[ 1080.65 1080.87 1068.76 1083.52 1084.96 1080.31 1081.75 1079.98]
[ 414.6 421.76 418.93 415.53 415.23 416.12 420.54 415.42]
[ 1091.43 1079.2 1086.61 1086.58 1091.14 1080.58 1076.64 1083.67]
[ 391.31 392.96 391.24 392.21 391.94 392.18 391.96 391.66]
[ 1067.08 1062.1 1061.02 1068.24 1066.74 1052.38 1062.31 1064.28]
[ 371.63 378.36 370.36 371.74 370.67 376.24 378.15 371.56]
[ 1060.88 1072.13 1076.01 1069.52 1069.04 1068.72 1064.79 1066.66]
[ 350.08 350.69 352.1 350.19 352.28 353.46 351.83 350.94]]
CONFIG000:
1080.65
1080.87
1068.76
1083.52
1084.96
1080.31
1081.75
1079.98
CONFIG001:
414.6
421.76
418.93
415.53
415.23
416.12
420.54
415.42
CONFIG010:
1091.43
1079.2
1086.61
1086.58
1091.14
1080.58
1076.64
1083.67
CONFIG011:
391.31
392.96
391.24
392.21
391.94
392.18
391.96
391.66
CONFIG100:
1067.08
1062.1
1061.02
1068.24
1066.74
1052.38
1062.31
1064.28
CONFIG101:
371.63
378.36
370.36
371.74
370.67
376.24
378.15
371.56
CONFIG110:
1060.88
1072.13
1076.01
1069.52
1069.04
1068.72
1064.79
1066.66
CONFIG111:
350.08
350.69
352.1
350.19
352.28
353.46
351.83
350.94
You can use numpy.recfromcsv(filename): the types of each column will be automatically determined (as if you use np.genfromtxt() with dtype=None), and by default delimiter=",". It's basically a shortcut for np.genfromtxt(filename, delimiter=",", dtype=None) that Pierre GM pointed at in his answer.
Here is a working example start to finish:
If I want to import numbers from a file without the first line:
I like trains #this is the first line, a string
1 \t 2 \t 3 #\t is to signify that the delimeter (separation) is tab and not komma
4 \t 5 \t 6
Then running the following code:
import numpy as np #contains genfromtxt
import matplotlib.pyplot as plt #enables plots
from pathlib import Path # easier using path instead of writing it again and again when you have many files in the same folder
path = r'some_path' #location of your file in your computer like r'C:my comp\folder\folder2' r is there to make the win 10 path readable in python, it means "just text"
fileNames = [r'\I like trains.txt',
r'\die potato.txt']
data=np.genfromtxt(path + fileNames[0], delimiter='\t', skip_header=1)
Produces this result:
data = [1 2 3
4 5 6]
where each number has its own cell and can be reached separately

Categories