Code in Python that converts .h5 files to .csv - python
I have a .h5 file that I need to convert to .csv and this is what I have done.
#coding: utf-8
import numpy as np
import sys
import h5py
file = h5py.File('C:/Users/Sakib/Desktop/VNP46A1.A2020086.h01v01.001.2020087082319.h5','r')
a = list(file.keys())
np.savetxt(sys.stdout, file[a[0:]], '%g', ',')
But this generates an error saying 'list' object has no attribute 'encode'
[P.S Also I have not worked with the module sys before. Where will my new csv file be written and with which name?]
First, you have a small error in the arrangement of the []
. There is no need to create a list.
Also, sys.stdout depends on your process "standard output". For an interactive process it will go to the screen. You should create a file and write to it if you want to capture the output. Also, your formatting string (%g) needs to match the data in the HDF5 dataset.
Try this:
h5f= h5py.File('C:/Users/.....h5','r')
for a in h5f.keys() :
outf = open('./save_'+a+'.txt','w')
np.savetxt(outf, file[a][:], '%g', ',')
outf.close
Related
How do I save each iteration as my file format without overwriting the previous iteration?
I am new to coding. I basically have a bunch of files in "nifti" format, I wanted to simply load them, apply a thresholding function to them and then save them. I was able to write the few lines of code to do it to one file (it worked), but I have many so I created another python file and tried to make a for loop. I think it does everything fine but the last step for saving my files just keeps overwriting so in the end I only get one output file. import numpy as np import nibabel as nb import glob import os path= 'subjects' all_files=glob.glob(path + '/*.nii') for filename in all_files: image=nb.load(filename) data=image.get_fdata() data [data<0.1]=0 new_image=nb.Nifti1Image(data, affine=image.affine, header=image.header) nb.save(new_image,filename+1)
How to create a template for "could not convert string to float"?
Is there a way to test a CSV file for errors? For example, I have a CSV file downloaded from Kaggle. When I try to run it in Anaconda, it throws an error. a) How do you test files before you run them for string to float errors? b) Is there a way to set up a template to do this for all files moving forward? Here is the text from notepad. I have converted all text to numbers and still throws an error. My code: from numpy import loadtxt from keras.models import Sequential from keras.layers import Dense # load the dataset dataset = loadtxt('data.csv', delimiter=',') data.csv file 15,1,14,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 34,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 52,5,16,4,1,37,37,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0 46,3,21,4,0,0,0,1,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 42,3,23,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 51,3,17,6,1,34,3,0,0,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1 26,1,26,3,0,0,0,1,2,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 45,1,20,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0 44,3,15,0,1,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 44,3,26,4,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 27,1,17,3,0,0,0,1,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 45,4,14,6,0,0,0,1,10,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 44,2,25,2,0,0,0,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 43,2,18,5,0,0,0,0,0,1,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 40,3,18,2,0,0,0,1,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Seems like certain CSV files from Kaggle & others have encoding issues. Instead of opening the file with the default encoding (which is 'utf-8'), use 'utf-8-sig'. dataset = loadtxt('data.csv', delimiter=',', encoding='utf-8-sig') Once I create some code to scan for this PRIOR to running in a deep learning algo, I will post it as follow on.
fail to load arff file in python
I am quite sure that my arff files are correct, for that I have downloaded different files on the web and successfully opened them in Weka. But I want to use my data in python, then I typed: import arff data = arff.load('file_path','rb') It always returns an error message: Invalid layout of the ARFF file, at line 1. Why this happened and how should I do to make it right?
If you change your code like in below, it'll work. import arff data = arff.load(open('file_path'))
Using scipy we can load arff data in python from scipy.io import arff import pandas as pd data = arff.loadarff('dataset.arff') df = pd.DataFrame(data[0]) df.head()
How can I fix the CSV import issue?
I am trying to import this data from a CSV file location scale 0.90109 0.63551 0.59587 0.65525 0.80460 0.64227 0.65178 0.65198 0.76307 0.64503 0.52575 0.65915 0.41322 0.66496 0.30059 0.67022 0.21620 0.67382 0.17404 0.67552 -0.05027 0.68363 -0.0782 0.68454 Using this code. test=[] import csv f=open("data.csv") for row in csv.reader(f): test.append(row) But when I open the test file, I am getting some \xao encodings. Can you tell me how to fix this? All I want to do is perform some operations on the data after importing into a variable.
Your input file appears to contain some non-breaking space characters (0xA0). Remove those from the file and try again.
How to write in ARFF file using LIAC-ARFF package in Python?
I want to load an ARFF file in python, then change some values of it and then save changes to file. I'm using LIAC-ARFF package (https://pypi.python.org/pypi/liac-arff). I loaded ARFF file with following lines of code: import arff data = arff.load(open(FILE_NAME, 'rb')) After manipulating some values inside data, i want to write data to another ARFF file. Any solution?
Use the following code: import arff data = arff.load(open(FILE_NAME, 'rb')) f = open(outputfilename, 'wb') arff.dump(data, f) f.close() In the LICA-ARFF description you see dump method which serializes to a the file, but it's wrong. It just write object as text file. Serialize means save whole the object, so the output file is binary not a text file.
We can load arff data into python using scipy. from scipy.io import arff import pandas as pd data = arff.loadarff('dataset.arff') df = pd.DataFrame(data[0]) df.head()