having trouble importing CSV - python

I usually use AWS s3 to import csv files and it is really easy. Trying to import this csv file straight from the file explorer and continuing to get an error. Is it different?
import pd as pandas
month = pd.read_csv("12Month.csv")
print(month)
Error: [Errno 2] No such file or directory: '12Month.csv'

You probably will have to state the full path.
import pandas as pd
month = pd.read_csv(r"C: etc... 12Month.csv"")
or use :
from pathlib import Path
wdir = Path.cwd()
filename = path(wdir,"12Month.csv")
month = pd.read_csv(filename)

Related

List only one file in directory as string

I'm trying to code a program with Python 3.7.2 that takes values out of one XML file and copys it into another XML file.
With my current code this only works with 1 file in a folder but I need it to also work with multiple files in one directory. Right now when executing my code, I get the following error in VS-Code:
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\app_python/in_spsh/ZAA.S0333055.500.CCT.000000153774.O.0001ZAA.S0444055.500.CCT.000000153774.O.0001'
This is due to me having two files in the same directory. In this case 'ZAA.S0444055.500.CCT.000000153774.O.0001' and 'ZAA.S0333055.500.CCT.000000153774.O.0001'
I'm using fnmatch.filter to set the name of a file as a string, but with my current code this won't work with multiple files.
Here is my code:
import shutil
import os
import time
import fnmatch
import copy
from xml.etree import ElementTree as ET
import datetime
import sys
dt = datetime.datetime.now().strftime("%Y/%m/%d %H-%M-%S")
sys.stdout = open('H:\\app_python/log/log.txt', 'w+')
while len(os.listdir('H:\\app_python/in_spsh/')) == 0:
print("{} > No file found in 'in_spsh'. Sleeping...".format(dt))
time.sleep(5)
else:
filename_old = fnmatch.filter(os.listdir('H:\\app_python/in_spsh/'), 'ZAA.*.0001')
filename_string = ''.join(filename_old)
print("{} > File '{}' acquired".format(dt, filename_string))
Does any1 know what I need to change in order to make this code work with multiple files?

Pandas,(Python) -> Export to xlsx with multiple sheets

i`m traind to read some .xlsx files from a directory that is create earlier using curent timestamp and the files are store there, now i want to read those .xlsx files and put them in only one .xlsx files with multiple sheets, but i tried multiple ways and didnt work, i tried:
final file Usage-SvnAnalysis.xlsx
the script i tried:
import pandas as pd
import numpy as np
from timestampdirectory import createdir
import os
dest = createdir()
dfSvnUsers = pd.read_csv(dest, "SvnUsers.xlsx")
dfSvnGroupMembership = pd.read_csv(dest, "SvnGroupMembership.xlsx")
xlwriter = pd.ExcelWriter("Usage-SvnAnalysis.xlsx")
dfSvnUsers.to_excel(xlwriter, sheet_name='SvnUsers', index = False )
dfSvnGroupMembership.to_excel(xlwriter, sheet_name='SvnGroupMembership', index = False )
xlwriter.close()
the folder that is created automaticaly with curent timestamp that contains files.
this is one of file that file that i want to add as sheet in that final xlsx
this is how i create the director with curent time and return dest to export the files in
I change a bit the script, now its how it looks like, still getting error :
File "D:\Py_location_projects\testfi\Usage-SvnAnalysis.py", line 8, in
with open(file, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'SvnGroupMembership.xlsx'
the files exist, but the script cant take the root path to that directory because i create that directory on other script using timestamp and i returned the path using dest
dest=createdir() represent the path where the files is, what i need to do its just acces this dest an read the files from there and export them in only 1 xlsx as sheets of him , in this cas sheet1 and sheet2, because i tried to reat only 2 files from that dir
import pandas as pd
import numpy as np
from timestampdirectory import createdir
import os
dest = createdir()
files = os.listdir(dest)
for file in files:
with open(file, 'r') as f:
dfSvnUsers = open(os.path.join(dest, 'SvnUsers.xlsx'))
dfSvnGroupMembership = open(os.path.join(dest, 'SvnGroupMembership.xlsx'))
xlwriter = pd.ExcelWriter("Usage-SvnAnalysis.xlsx")
dfSvnUsers.to_excel(xlwriter, sheet_name='SvnUsers', index = False )
dfSvnGroupMembership.to_excel(xlwriter, sheet_name='SvnGroupMembership', index = False )
xlwriter.close()
I think you should try read Excel files use pd.read_excel instead of pd.read_csv.
import os
dfSvnUsers = pd.read_excel(os.path.join(dest, "SvnUsers.xlsx"))
dfSvnGroupMembership = pd.read_excel(os.path.join(dest, "SvnGroupMembership.xlsx"))

VS Code: Is there a way to read a csv file without needing specification of the full path?

I am trying to read data from a csv file (in the same folder as my main.py) but it seems that
Visual Studio Code doesn't understand the project folder or something of the sort
FileNotFoundError: [Errno 2] No such file or directory: 'ratings.csv'
Here is my code
import numpy as np
import pandas as pd
# read data with panda, only the columns that are needed
r_cols = ['user_id', 'movie_id', 'rating']
ratings = pd.read_csv('ratings.csv', sep=';', names=r_cols, usecols=[1, 2, 3], encoding="ISO-8859-1", low_memory=False, header=0)
Adding the full path of the file fixes the problem, and it also works if I add
import os with os.chdir in the beginning of the code.
But PyCharm doesn't need the above tweaks in order to run it. So my question remains, is there a VSCode setting that I am missing?
I got the same issue and I solved it by doing this:
import pandas as pd
df = pd.read_csv('Pandas/sample.csv')
print(df)
As people mentioned in the comments, we can set the debug path in VSCode, please add the following settings in "launch.json": (It will automatically go to the directory where the file is located before debugging the code)
"cwd": "${fileDirname}",
import os
def infolder_file( filename ):
afname = os.path.abspath(__file__)
current_folder = os.path.dirname(afname)
uf = os.path.join(current_folder, filename )
return uf
print( infolder_file( 'anyfilename.txt' ) )
You can define a constant for the directory at the top of your module that you then use with any files you need to access.
from pathlib import Path
DIRNAME = Path(__file__).parent
def func():
fn = DIRNAME / 'file.suffix'

FileNotFoundError while importing a csv file using pandas in Jupyter notebook

import pandas as pd
df = pd.read_csv('/home/josepm/Documents/test_ver2.csv')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-3-5cd7fd573fb7> in <module>()
1 import pandas as pd
----> 2 df = pd.read_csv('/home/josepm/Documents/test_ver2.csv')
I try to import a CSV file using pandas and every time it says that it doesn't find the file. It's like Jupyter doesn't see it. I tried to do this:
import os
os.path.isfile('/home/josepm/Documents/test_ver2.csv')
and it doesn't see the file either.
Change
pd.read_csv('\Users\user\Desktop\Workbook1.csv')
to
pd.read_csv(r'C:\Users\user\Desktop\Workbook1.csv')
Please try the following code:
import os
path = os.path.abspath(r'file path')
f = open(path)
print(f)
The working directory is the point from where all the files are accessed in Jupyter Notebook.
Find the current working directory
import os
os.getcwd()
Example o/p : 'C:\Users\xyz'
Now place your CSV files in this path
List the contents of your directory to check if the CSV file is present
os.listdir('C:\Users\xyz')
Now try reading the CSV file
Copy the file directory, then paste it here:
pd.read_csv(r'(here)\(csv file name including .csv)').
for example:
pd.read_csv(r'C:\Users\DCL\ML and related\Bengaluru_House_Data.csv')
Hope this will work.

How to open my files in data_folder with pandas using relative path?

I'm working with pandas and need to read some csv files, the structure is something like this:
folder/folder2/scripts_folder/script.py
folder/folder2/data_folder/data.csv
How can I open the data.csv file from the script in scripts_folder?
I've tried this:
absolute_path = os.path.abspath(os.path.dirname('data.csv'))
pandas.read_csv(absolute_path + '/data.csv')
I get this error:
File folder/folder2/data_folder/data.csv does not exist
Try
import pandas as pd
pd.read_csv("../data_folder/data.csv")
Pandas will start looking from where your current python file is located. Therefore you can move from your current directory to where your data is located with '..'
For example:
pd.read_csv('../../../data_folder/data.csv')
Will go 3 levels up and then into a data_folder (assuming it's there)
Or
pd.read_csv('data_folder/data.csv')
assuming your data_folder is in the same directory as your .py file.
You could use the __file__ attribute:
import os
import pandas as pd
df = pd.read_csv(os.path.join(os.path.dirname(__file__), "../data_folder/data.csv"))
For non-Windows users:
import pandas as pd
import os
os.chdir("../data_folder")
df = pd.read_csv("data.csv")
For Windows users:
import pandas as pd
df = pd.read_csv(r"C:\data_folder\data.csv")
The prefix r in location above saves time when giving the location to the pandas Dataframe.
# script.py
current_file = os.path.abspath(os.path.dirname(__file__)) #older/folder2/scripts_folder
#csv_filename
csv_filename = os.path.join(current_file, '../data_folder/data.csv')
Keeping things tidy with f-strings:
import os
import pandas as pd
data_files = '../data_folder/'
csv_name = 'data.csv'
pd.read_csv(f"{data_files}{csv_name}")
With python or pandas when you use read_csv or pd.read_csv, both of them look into current working directory, by default where the python process have started. So you need to use os module to chdir() and take it from there.
import pandas as pd
import os
print(os.getcwd())
os.chdir("D:/01Coding/Python/data_sets/myowndata")
print(os.getcwd())
df = pd.read_csv('data.csv',nrows=10)
print(df.head())
If you want to keep your tidy, then I would suggest you to assign the path and file separately and then read:
path = 'C:/Users/username/Documents/folder'
file_name = 'file_name.xlsx'
file=pd.read_excel(f"{path}{file_name}")
I was also looking for the relative path version, this works OK. Note when run (Spyder 3.6) you will see (unicode error) 'unicodeescape' codec can't decode bytes at the closing triple quote. Remove the offending comment lines 14 and 15 and adjust the file names and location for your environment and check for indentation.
-- coding: utf-8 --
"""
Created on Fri Jan 24 12:12:40 2020
Source:
Read a .csv into pandas from F: drive on Windows 7
Demonstrates:
Load a csv not in the CWD by specifying relative path - windows version
#author: Doug
From CWD C:\Users\Doug\.spyder-py3\Data Camp\pandas we will load file
C:/Users/Doug/.spyder-py3/Data Camp/Cleaning/g1803.csv
"""
import csv
trainData2 = []
with open(r'../Cleaning/g1803.csv', 'r') as train2Csv:
trainReader2 = csv.reader(train2Csv, delimiter=',', quotechar='"')
for row in trainReader2:
trainData2.append(row)
print(trainData2)
You can always point to your home directory using ~ then you can refer to your data folder.
import pandas as pd
df = pd.read_csv("~/mydata/data.csv")
For your case, it should be like this
import pandas as pd
df = pd.read_csv("~/folder/folder2/data_folder/data.csv")
You can also set your data directory as a prefix
import pandas as pd
DATA_DIR = "~/folder/folder2/data_folder/"
df = pd.read_csv(DATA_DIR+"data.csv")
You can take advantage of f-strings as #nikos-tavoularis said
import pandas as pd
DATA_DIR = "~/folder/folder2/data_folder/"
FILE_NAME = "data.csv"
df = pd.read_csv(f"{DATA_DIR}{FILE_NAME}")
import pandas as pd
df = pd.read_csv('C:/data_folder/data.csv')
This link here answers it. Reading file using relative path in python project
Basically using Path from pathlib you'll do the following in script.py
from pathlib import Path
path = Path(__file__).parent / "../data_folder/data.csv"
pd.read_csv(path)
You can try with this.
df = pd.read_csv("E:\working datasets\sales.csv")
print(df.head())
import os
s_path = os.getcwd()
# s_path = "...folder/folder2/scripts_folder/script.py"
s_path = s_path.split('/')
print(s_path)
# [,..., 'folder', 'folder2', 'scripts_folder', 'script.py']
d_path = s_path[:len(s_path)-2] + ['data_folder', 'data.csv']
print(os.path.join(*d_path))
# ...folder/folder2/data_folder/data.csv```
You can use . to represent now working path.
#Linux
df = pd.read_csv("../data_folder/data.csv")
#Wins
df = pd.read_csv("..\\data_folder\\data.csv")
Try this:
Open a new terminal window.
Drag and drop the file (that you want Pandas to read) in that terminal window.
This will return the full address of your file in a line.
Copy and paste that line into read_csv command as shown here:
import pandas as pd
pd.read_csv("the path returned by terminal")
That's it.
Just replace your "/" with ""

Categories