How to load a data set into Jupyter Notebook

How to load a data set into Jupyter Notebook - python

When loading a dataset into Jupyter, I know it requires lines of code to load it in:
from tensorflow.contrib.learn.python.learn.datasets import base
# Data files
IRIS_TRAINING = "iris_training.csv"
IRIS_TEST = "iris_test.csv"
# Load datasets.
training_set = base.load_csv_with_header(filename=IRIS_TRAINING,
features_dtype=np.float32,
target_dtype=np.int)
test_set = base.load_csv_with_header(filename=IRIS_TEST,
features_dtype=np.float32,
target_dtype=np.int)
So why is ther error NotFoundError: iris_training.csv
still thrown? I feel as though there is more to loading data sets on to jupyter, and would be grateful on any help on this topic
I'm following a course through AI adventures, and dont know how to add in the .csv file; the video mentions nothing about how to add it on.
Here is the link: https://www.youtube.com/watch?v=G7oolm0jU8I&list=PLIivdWyY5sqJxnwJhe3etaK7utrBiPBQ2&index=3

The issue is that you either need to use file's absolute path i.e C:\path_to_csv\iris_training.csv for windows and for UNIX/Linux /path_to_csv/iris_training.csv or you will need to place the file in your notebook workspace i.e directory that is being listed in your Jupyter UI which can be found at http://localhost:8888/tree Web UI. If you are having trouble finding the directory then just execute below python code and place the file in the printed location
import os
cwd = os.getcwd()
print(cwd)

Solution A
if you are working with python you can use python lib pandas to import your file .csv using:
import pandas as pd
IRIS_TRAINING = pd.read_csv("../iris_training.csv")
IRIS_TEST = pd.read_csv("../iris_test.csv")
Solution B
import numpy as np
mydata = np.genfromtxt(filename, delimiter=",")
Read More About
python-pandas
Read More About
python-Numpy

Related

how to open and read .nc files?

I had a problem of opening .nc files and converting them to .csv files but still, I can not read them (meaning the first part). I saw this link also this link but I could not find out how to open them. I have written a piece of code and I faced an error which I will post below. To elaborate on the error, it is able to find the files but is not able to open them.
#from netCDF4 import Dataset # use scipy instead
from scipy.io import netcdf #### <--- This is the library to import.
import os
# Open file in a netCDF reader
directory = './'
#wrf_file_name = directory+'filename'
wrf_file_name = [f for f in sorted(os.listdir('.')) if f.endswith('.nc')]
nc = netcdf.netcdf_file(wrf_file_name,'r')
#Look at the variables available
nc.variables
#Look at the dimensions
nc.dimensions
And the error is:
Error: LAKE00000002-GloboLakes-L3S-LSWT-v4.0-fv01.0.nc is not a valid NetCDF 3 file

Calling a function in a different python file using Google Colab?

I'm working on a project for which I need to call functions from several python files to use in one main program. All of the programs in question are notebooks in the same directory in Google Colab. I am having trouble being able to call the functions I need and I haven't been able to find a solution that works. I've tried simply from InterpolateData import LoadandInterp where InterpolateData is the file name where the function LoadandInterp is stored. This is what I currently have:
from google.colab import files
import sys
sys.path.append( "/content/drive/My Drive/Colab Notebooks")
import InterpolateData
import numpy as np
import pandas as pd
from scipy.interpolate import griddata
#get, normalize and interpolate data
#SpectralHighData
temperatureList=np.arange(25.0,46.0,1.0)
interpList=np.arange(25.0,45.0,0.1)
pathBefore="/content/drive/My Drive/Colab Notebooks/Original Data/High Temperatures/Spectral_high/CdTe Spectra Interpolated "
pathAfter="C.csv"
interpolated=InterpolateData.LoadandInterp(temperatureList, interpList, pathBefore, pathAfter)
Everything that I've tried returns an error along the lines of ModuleNotFoundError: No module named 'InterpolateData'
Does anyone know a way I can get this to work? Surely, there is a way?
Edit: Before the previous code, I have code to mount my google drive and change the directory to where the files are stored. It looks like this:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
!ls "/content/drive/My Drive/Colab Notebooks"
%cd "/content/drive/My Drive/Colab Notebooks"

For anyone who stumbles across this in the future: I was able to find a solution, eventually.
In order to access another program, the program file must be a .py file and it must be in a folder that also contains a file called _init_.py. The _init_.py can be completely empty.
Once you have the files set up, change your directory to the folder with your 'module' program(s) using %cd 'filepath'. You can then import your module using import filename. The functions in your other program are now accessible through filename.function.

Is this the correct way to open a .csv file in Python?

I've so many different ways and it is still saying file not found
I'm running the code in a Jupyter Notebook.
I'd rather run the file from wherever it is. Here is the infomation for its location
Have I generated the correct code (below is the code).
import numpy
numpy.loadtxt(fname='C:\Desktop\swc-python\data\inflammation-01.csv', delimiter=',')
Also tried this but it did not work:
import numpy
numpy.fname = ('C:\Desktop\swc-python\data\small-01.csv')
openfname = open(fname,'r')
Also, an you save a Jupyter notebook in the same directory as the infomation.

these are some examples with pandas, and os, maybe they could help
they use slash, not backward slash (this option, or the option below)
# import pandas as pd
pd.read_csv("C:/Users/<Insert your user>/Desktop/code/Exercise Files/us_baby_names.csv")
or
(option below), change the current directory,
# import os and pandas library
import os
import pandas as pd
# show current working directory, change it, show it again
os.getcwd()
os.chdir('C:/Users/<Insert your user>/Desktop/code/Exercise Files/')
os.getcwd()
pd.read_csv("us_baby_names.csv")

How to read a datafile from another folder without giving full path in python?

In my google drive I have a folder called Colab Notebooks/data/, How can I append this path to system, so that I dont have to give the full name of data file ?
My attempt:
from google.colab import drive
drive.mount('/content/drive')
dat_dir = 'drive/My Drive/Colab Notebooks/data/'
# read data
import pandas as pd
pd.read_csv(dat_dir + 'titanic_kaggle_train.csv') # this works
I want this
pd.read_csv('titanic_kaggle_train.csv') # Here, I dont have full path
I tried
import sys
sys.path.append(dat_dir) # did not work
# another attempt (did not work)
!export PYTHONPATH=$HOME/drive/My\ Drive/Colab\ Notebooks/data/:$PYTHONPATH
Question
How can this command work?
pd.read_csv('titanic_kaggle_train.csv')

Is this, what you are searching (change the working directory using os.chdir:
import os
os.chdir(dat_dir)
pd.read_csv('titanic_kaggle_train.csv')

You can also use a one-line magic.
%cd drive/My\ Drive/Colab\ Notebooks/data/

Python Jupyter Notebook - Unable to open CSV file through a path

I am a little new to Python, and I have been using the Jupyter Notebook through Anaconda. I am trying to import a csv file to make a DataFrame, but I am unable to import the file.
Here is an attempt using the local method:
df = pd.read_csv('Workbook1')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-11-a2deb4e316ab> in <module>()
----> 1 df = pd.read_csv('Workbook1')
After that I tried using the path (I put user for my username)
df = pd.read_csv('Users/user/Desktop/Workbook1.csv')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-13-3f2bedd6c4de> in <module>()
----> 1 df = pd.read_csv('Users/user/Desktop/Workbook1.csv')
I am using a Mac, which I am also new to, and I am not 100% sure if I am correctly importing the right path. Can anyone offer some insight or solutions that would allow me to open this csv file.

Instead of providing path, you can set a path using the code below:
import os
import pandas as pd
os.chdir("D:/dataset")
data = pd.read_csv("workbook1.csv")
This will surely work.

Are you sure that the file exists in the location you are specifying to the pandas read_csv method? You can check using the os python built in module:
import os
os.path.isfile('/Users/user/Desktop/Workbook1.csv')
Another way of checking if the file of interest is in the current working directory within a Jupyter notebook is by running ls -l within a cell:
ls -l

I think the problem is probably in the location of the file:
df1 = pd.read_csv('C:/Users/owner/Desktop/contacts.csv')
Having done that, now you can play around with the big file if you have, and create useful data with:
df1.head()

The OS module in python provides functions for interacting with the operating system. OS, comes under Python’s standard utility modules.
import os
import pandas as pd
os.chdir("c:\Pandas")
df=pd.read_csv("names.csv")
df
This might help. :)

The file name is case sensitive, so check your case.

I had the same problem on a Mac too and for some reason it only happened to me there. And I tried to use many tricks but nothing works. I recommend you go directly to the file, right click and then press “alt” key after that the option to “copy route” will appear, and just paste it into your jupyter. For some reason that worked to me.

I believe the issue is that you're not using fully qualified paths. Try this:
Move the data into a suitable project directory. You can do this using the %%bash Magic commands.
%%bash
mkdir -p /project/data/
cp data.csv /project/data/data.csv
You can read the file
f = open("/project/data/data.csv","r")
print(f.read())
f.close()
But it might be most useful to load it into a library.
import pandas as pd
data = pd.read_csv("/project/data/data.csv")
I’ve created a runnable Jupyter notebook with more details here: Jupyter Basics: Reading Files.

Try double quotes, instead of single quotes. it worked for me.

you can open csv files in Jupyter notebook by following these easy steps-
Step 1 - create a directory or folder (you can also use old created folder)
Step 2 - Change your Jupyter working directory to that created directory -
import os
os.chdir('D:/datascience/csvfiles')
Step 3 - Now your directory is changed in Jupyter Notebook. Store your file(s) in that directory.
Step 4 - Open your file -
import pandas as pd
df = pd.read_csv("workbook1.csv")
Now your file is read and stored in a Data Frame variable df, you can display this file content by following
df.head() - display first five rows of this file
df - display all rows of this file
Happy Data Science!

There was a similar problem for me while reading a CSV file in Jupyter notebook from the computer.
I solved it by substituting the "" symbol with "/" in the path like this.
This is what I had:
"C:\Users\RAJ\Desktop\HRPrediction\HRprediction.csv"
This is what I changed it for:
"C:/Users/RAJ/Desktop/HRPrediction/HRprediction.csv".

This is what worked for me. I am using Mac OS.
Save your CSV on a separate folder on your desktop.
When opening a Jupyter notebook press on the same folder that your dataset is currently saved in. Press new notebook in the upper right hand corner.
After opening a new notebook. Code as per usual and read your data using import pandas as pd and pd.read_csv calling to your dataset.

 No need to use anything extra just use r in front of the location.
df = pd.read_csv(r'C:/Users/owner/Desktop/contacts.csv'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to load a data set into Jupyter Notebook - python

Related

how to open and read .nc files?

Calling a function in a different python file using Google Colab?

Is this the correct way to open a .csv file in Python?

How to read a datafile from another folder without giving full path in python?

Python Jupyter Notebook - Unable to open CSV file through a path

Categories

Resources