Code in Python that converts .h5 files to .csv - python

I have a .h5 file that I need to convert to .csv and this is what I have done.
#coding: utf-8
import numpy as np
import sys
import h5py
file = h5py.File('C:/Users/Sakib/Desktop/VNP46A1.A2020086.h01v01.001.2020087082319.h5','r')
a = list(file.keys())
np.savetxt(sys.stdout, file[a[0:]], '%g', ',')
But this generates an error saying 'list' object has no attribute 'encode'
[P.S Also I have not worked with the module sys before. Where will my new csv file be written and with which name?]

First, you have a small error in the arrangement of the []
. There is no need to create a list.
Also, sys.stdout depends on your process "standard output". For an interactive process it will go to the screen. You should create a file and write to it if you want to capture the output. Also, your formatting string (%g) needs to match the data in the HDF5 dataset.
Try this:
h5f= h5py.File('C:/Users/.....h5','r')
for a in h5f.keys() :
outf = open('./save_'+a+'.txt','w')
np.savetxt(outf, file[a][:], '%g', ',')
outf.close

Related

How do I save each iteration as my file format without overwriting the previous iteration?

I am new to coding. I basically have a bunch of files in "nifti" format, I wanted to simply load them, apply a thresholding function to them and then save them. I was able to write the few lines of code to do it to one file (it worked), but I have many so I created another python file and tried to make a for loop. I think it does everything fine but the last step for saving my files just keeps overwriting so in the end I only get one output file.
import numpy as np
import nibabel as nb
import glob
import os
path= 'subjects'
all_files=glob.glob(path + '/*.nii')
for filename in all_files:
image=nb.load(filename)
data=image.get_fdata()
data [data<0.1]=0
new_image=nb.Nifti1Image(data, affine=image.affine, header=image.header)
nb.save(new_image,filename+1)

How to create a template for "could not convert string to float"?

Is there a way to test a CSV file for errors? For example, I have a CSV file downloaded from Kaggle. When I try to run it in Anaconda, it throws an error.
a) How do you test files before you run them for string to float errors?
b) Is there a way to set up a template to do this for all files moving forward?
Here is the text from notepad. I have converted all text to numbers and still throws an error.
My code:
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
# load the dataset
dataset = loadtxt('data.csv', delimiter=',')
data.csv file
15,1,14,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
34,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
52,5,16,4,1,37,37,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
46,3,21,4,0,0,0,1,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
42,3,23,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
51,3,17,6,1,34,3,0,0,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1
26,1,26,3,0,0,0,1,2,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
45,1,20,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0
44,3,15,0,1,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
44,3,26,4,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
27,1,17,3,0,0,0,1,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
45,4,14,6,0,0,0,1,10,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
44,2,25,2,0,0,0,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
43,2,18,5,0,0,0,0,0,1,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
40,3,18,2,0,0,0,1,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Seems like certain CSV files from Kaggle & others have encoding issues.
Instead of opening the file with the default encoding (which is 'utf-8'), use 'utf-8-sig'.
dataset = loadtxt('data.csv', delimiter=',', encoding='utf-8-sig')
Once I create some code to scan for this PRIOR to running in a deep learning algo, I will post it as follow on.

fail to load arff file in python

I am quite sure that my arff files are correct, for that I have downloaded different files on the web and successfully opened them in Weka.
But I want to use my data in python, then I typed:
import arff
data = arff.load('file_path','rb')
It always returns an error message: Invalid layout of the ARFF file, at line 1.
Why this happened and how should I do to make it right?
If you change your code like in below, it'll work.
import arff
data = arff.load(open('file_path'))
Using scipy we can load arff data in python
from scipy.io import arff
import pandas as pd
data = arff.loadarff('dataset.arff')
df = pd.DataFrame(data[0])
df.head()

How can I fix the CSV import issue?

I am trying to import this data from a CSV file
location scale
 0.90109  0.63551
0.59587  0.65525
0.80460  0.64227
0.65178  0.65198
0.76307  0.64503
0.52575  0.65915
0.41322 0.66496
0.30059 0.67022
0.21620  0.67382
0.17404 0.67552
-0.05027 0.68363
-0.0782 0.68454
Using this code.
test=[]
import csv
f=open("data.csv")
for row in csv.reader(f):
test.append(row)
But when I open the test file, I am getting some \xao encodings. Can you tell me how to fix this?
All I want to do is perform some operations on the data after importing into a variable.
Your input file appears to contain some non-breaking space characters (0xA0). Remove those from the file and try again.

How to write in ARFF file using LIAC-ARFF package in Python?

I want to load an ARFF file in python, then change some values of it and then save changes to file. I'm using LIAC-ARFF package (https://pypi.python.org/pypi/liac-arff). I loaded ARFF file with following lines of code:
import arff
data = arff.load(open(FILE_NAME, 'rb'))
After manipulating some values inside data, i want to write data to another ARFF file. Any solution?
Use the following code:
import arff
data = arff.load(open(FILE_NAME, 'rb'))
f = open(outputfilename, 'wb')
arff.dump(data, f)
f.close()
In the LICA-ARFF description you see dump method which serializes to a the file, but it's wrong. It just write object as text file. Serialize means save whole the object, so the output file is binary not a text file.
We can load arff data into python using scipy.
from scipy.io import arff
import pandas as pd
data = arff.loadarff('dataset.arff')
df = pd.DataFrame(data[0])
df.head()

Categories