I am quite sure that my arff files are correct, for that I have downloaded different files on the web and successfully opened them in Weka.
But I want to use my data in python, then I typed:
import arff
data = arff.load('file_path','rb')
It always returns an error message: Invalid layout of the ARFF file, at line 1.
Why this happened and how should I do to make it right?
If you change your code like in below, it'll work.
import arff
data = arff.load(open('file_path'))
Using scipy we can load arff data in python
from scipy.io import arff
import pandas as pd
data = arff.loadarff('dataset.arff')
df = pd.DataFrame(data[0])
df.head()
Related
I created a csv file and used pandas.read_csv in order to access it. I used
euler_tens = pd.read_csv(filename)
and you can see that this table displays when I simply run the variable name. How do I simply save this image as a png file? This table is exactly what I needed, but now I can't seem to get that image, and I don't want to just screenshot it.
You can try with dataframe-image module
pip install dataframe-image
Example:
import pandas as pd
import dataframe_image as dfi
euler_tens = pd.read_csv(filename)
dfi.export(euler_tens, 'dataframe.png')
When loading a dataset into Jupyter, I know it requires lines of code to load it in:
from tensorflow.contrib.learn.python.learn.datasets import base
# Data files
IRIS_TRAINING = "iris_training.csv"
IRIS_TEST = "iris_test.csv"
# Load datasets.
training_set = base.load_csv_with_header(filename=IRIS_TRAINING,
features_dtype=np.float32,
target_dtype=np.int)
test_set = base.load_csv_with_header(filename=IRIS_TEST,
features_dtype=np.float32,
target_dtype=np.int)
So why is ther error NotFoundError: iris_training.csv
still thrown? I feel as though there is more to loading data sets on to jupyter, and would be grateful on any help on this topic
I'm following a course through AI adventures, and dont know how to add in the .csv file; the video mentions nothing about how to add it on.
Here is the link: https://www.youtube.com/watch?v=G7oolm0jU8I&list=PLIivdWyY5sqJxnwJhe3etaK7utrBiPBQ2&index=3
The issue is that you either need to use file's absolute path i.e C:\path_to_csv\iris_training.csv for windows and for UNIX/Linux /path_to_csv/iris_training.csv or you will need to place the file in your notebook workspace i.e directory that is being listed in your Jupyter UI which can be found at http://localhost:8888/tree Web UI. If you are having trouble finding the directory then just execute below python code and place the file in the printed location
import os
cwd = os.getcwd()
print(cwd)
Solution A
if you are working with python you can use python lib pandas to import your file .csv using:
import pandas as pd
IRIS_TRAINING = pd.read_csv("../iris_training.csv")
IRIS_TEST = pd.read_csv("../iris_test.csv")
Solution B
import numpy as np
mydata = np.genfromtxt(filename, delimiter=",")
Read More About
python-pandas
Read More About
python-Numpy
I have a .dat file and I want to generate the content in python, thus I use the following code:
import numpy as np
bananayte=np.fromfile("U04_banana-ytest.dat",dtype=float)
print(bananayte)
However, my initial data should be like "1.0000000e+00", while the output is like "1.39804066e-76". What happened? and what should I do to get the correct value? Thanks!
I want to load an ARFF file in python, then change some values of it and then save changes to file. I'm using LIAC-ARFF package (https://pypi.python.org/pypi/liac-arff). I loaded ARFF file with following lines of code:
import arff
data = arff.load(open(FILE_NAME, 'rb'))
After manipulating some values inside data, i want to write data to another ARFF file. Any solution?
Use the following code:
import arff
data = arff.load(open(FILE_NAME, 'rb'))
f = open(outputfilename, 'wb')
arff.dump(data, f)
f.close()
In the LICA-ARFF description you see dump method which serializes to a the file, but it's wrong. It just write object as text file. Serialize means save whole the object, so the output file is binary not a text file.
We can load arff data into python using scipy.
from scipy.io import arff
import pandas as pd
data = arff.loadarff('dataset.arff')
df = pd.DataFrame(data[0])
df.head()
I met a DF file which is encoded in binary format. But when I open it using Vim, still I can see characters like "pandas.core.frame", "numpy.core.multiarray". So I guess it is related with Python. However I know little about the Python language. Though I have tried using pandas and numpy modules, I failed to read the file. Could you guys give any suggestion on this issue? Thank you in advance. Here is the Dropbox link to the DF file: https://www.dropbox.com/s/b22lez3xysvzj7q/flux.df
Looks like DataFrame stored with pickle, use read_pickle() to read it:
import pandas as pd
df = pd.read_pickle('flux.df')