OBJECTIVE
Using Jupyter notebooks, import a csv file for data manipulation
APPROACH
Import necessary libraries for statistical analysis (pandas, matplotlib, sklearn, etc.)
Import data set using pandas
Manipulate data
CODE
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use("ggplot")
import pandas as pd
from sklearn.cluster import KMeans
data = pd.read_csv("../data/walmart-stores.csv")
print(data)
ERROR
OSError: File b'../data/walmart-stores.csv' does not exist
FOLDER STRUCTURE
Anconda
env
kmean.ipynb
data
walmart-stores.csv
(other folders [for anaconda env])
(other folders)
QUESTION(S)
The error clearly states that the csv file cannot be found. I imagine it has to do with the project running in an Anaconda environment, but I thought this was the purpose of Anaconda environments in the first place. Am I wrong?
After answering the question, are there any other suggestions on how I should structure my Jupyter Notebooks when using Anaconda?
NOTES: I am new to python, anaconda, and jupyter notebooks so please disregard are naivety/stupidity. Thank you!
Fellow newbie here!
Try removing the "../" from your data location
Change
data = pd.read_csv("../data/walmart-stores.csv")
to
data = pd.read_csv("data/walmart-stores.csv")
Related
Does anyone know the fix for this "pd is not defined"? it seems to be combined with a dying kernel. I'm using Jupyter Notebook.
I had started with these imports and didn't get an error message so I was assuming that pandas was imported successfully
import pandas as pd
import numpy as np
Tried updating my Python version to 3.11
This should not be a python problem. Since import pandas as pd did not fail, you should try print out print(pd) and see where pandas is installed and maybe verify it is is properly installed. It should print out something like lib/python3.x/site-packages/pandas/__init__.py'.
I have recently taken up working on Python 3.6 implemented in RStudio using Miniconda and the library reticulate.
I have installed all the packages that I need using
reticulate::py_install('name of the packages')
when I try to import some .nc data as follow:
import numpy as np
import pandas as pd
import cartopy as cp
from cartopy import crs as ccrs, feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr
import openpyxl
import netCDF4
#%%
adt = xr.open_dataset("G:/Research/ADT.nc")
I get this error:
NameError: name 'netCDF4' is not defined
I have looked at possible solutions, like:
netCDF4 import not being found by Python
but I have installed the package netCDF4 and xarray as the other ones, which do not give any issue (so far)
I had this error in Jupiter Notebook, but then I restarted the Kernel and ran again, and it was fine. I installed netcdf4 using conda.
I'm having a hard time trying to figure out what is behind this error, as I'm new in programming.
When I try to read a netCDF4 file, the following error pops up:
File "/home/thiago/anaconda3/lib/python3.7/os.py", line 678, in __getitem__
raise KeyError(key) from None
KeyError: 'PROJ_LIB'
Trying to use the simple method as follows:
from netCDF4 import Dataset as dt
import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap as bm
data = 'sst.wkmean.1981-1989.nc'
I already uninstalled and reinstalled netCDF4 package but, as I'm unexperienced, something tells me that this error's origin could be associated with lack of updating of packages (don't dominate terminal programing), bad directory installing or something like that. I use the deepin distribution of linux, but looking foward to change to ubuntu as community and tutorials as well spread throughout the internet.
Running the following code
import pandas as pd
import matplotlib.pylab as plt
df1=pd.read_csv("/Users/ee1.xlsx")
Getting an error back as
ValueError: too many values to unpack
I can't figure out what I'm doing wrong. Going out of my mind to trying to resolve this. I'm running Spyder python 2.7 in Anaconda
you have to open excel file using the following code
import pandas as pd
df1=pd.read_excel("/Users/ee1.xlsx")
Excel files and csv files are different.
EDIT: This is not solved in the suggested duplicate; reloading the module
after editing the file doesn't help.
I have a python file "/home/Misc/misc_def.py" collecting some functions that I'm using in several ipython notebooks. The first cell in each notebook is
import csv
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
sns.set_style('white')
from sys import path
path.append('/home/Misc')
import misc_def
However, the strange thing is that sometimes this works (the notebook can find the functions in the file) and sometimes it doesn't. I'm using notebooks in different folders, but I think this shouldn't matter since it's all absolute paths. The errors I get are standard for not finding functions; e.g.
NameError: name 'get_overlap_data' is not defined
Is there something unstable about the way I do it above?