how to show NewsAggregatorDataset using read_csv - python

i create following code and want to see the head of dataset but it doesent work.
i download the dataset from here and put it in the project folder.
help me
thanks
import pandas as pd
from pathlib import Path
path_csv = Path('NewsAggregatorDataset/newsCorpora.csv').absolute()
data = pd.read_csv(path_csv)
print(data.head())

You can try like this way with os module if it is in the NewsAggregatorDataset directory
import pandas as pd
import os
path_csv = os.path.dirname(__file__)+'/NewsAggregatorDataset/newsCorpora.csv'
data = pd.read_csv(path_csv)
print(data.head())
Also see below lines on how to get current_directory and parent_directory
import os
current_directory = os.path.abspath(os.path.dirname(__file__)) # get current directory
parent_directory = os.path.abspath(current_directory + "/../") # get parent directory

Related

how to print csv with the same pathname but with an extension?

In the code that I present, it reads csv files that are in one folder and prints them in another, I want this print to be with the same name that it has in the path but with an extension. For example, if the file is called: aaa.csv, the print would be aaa_ext.csv
the print i get are file_list0.csv, file_list1.csv, file_list2.csv
This is my code:
import pandas as pd
import numpy as np
import glob
import os
all_files = glob.glob("C:/Users/Gamer/Documents/Colbun/Saturn/*.csv")
file_list = []
for i,f in enumerate(all_files):
df = pd.read_csv(f,header=0,usecols=["t","f"])
df.to_csv(f'C:/Users/Gamer/Documents/Colbun/Saturn2/file_list{i}.csv')
you can modify the line that writes the csv file as follows:
df.to_csv(f'C:/Users/Gamer/Documents/Colbun/Saturn2/{os.path.basename(f).split(".")[0]}_ext.csv')

having trouble importing CSV

I usually use AWS s3 to import csv files and it is really easy. Trying to import this csv file straight from the file explorer and continuing to get an error. Is it different?
import pd as pandas
month = pd.read_csv("12Month.csv")
print(month)
Error: [Errno 2] No such file or directory: '12Month.csv'
You probably will have to state the full path.
import pandas as pd
month = pd.read_csv(r"C: etc... 12Month.csv"")
or use :
from pathlib import Path
wdir = Path.cwd()
filename = path(wdir,"12Month.csv")
month = pd.read_csv(filename)

VS Code: Is there a way to read a csv file without needing specification of the full path?

I am trying to read data from a csv file (in the same folder as my main.py) but it seems that
Visual Studio Code doesn't understand the project folder or something of the sort
FileNotFoundError: [Errno 2] No such file or directory: 'ratings.csv'
Here is my code
import numpy as np
import pandas as pd
# read data with panda, only the columns that are needed
r_cols = ['user_id', 'movie_id', 'rating']
ratings = pd.read_csv('ratings.csv', sep=';', names=r_cols, usecols=[1, 2, 3], encoding="ISO-8859-1", low_memory=False, header=0)
Adding the full path of the file fixes the problem, and it also works if I add
import os with os.chdir in the beginning of the code.
But PyCharm doesn't need the above tweaks in order to run it. So my question remains, is there a VSCode setting that I am missing?
I got the same issue and I solved it by doing this:
import pandas as pd
df = pd.read_csv('Pandas/sample.csv')
print(df)
As people mentioned in the comments, we can set the debug path in VSCode, please add the following settings in "launch.json": (It will automatically go to the directory where the file is located before debugging the code)
"cwd": "${fileDirname}",
import os
def infolder_file( filename ):
afname = os.path.abspath(__file__)
current_folder = os.path.dirname(afname)
uf = os.path.join(current_folder, filename )
return uf
print( infolder_file( 'anyfilename.txt' ) )
You can define a constant for the directory at the top of your module that you then use with any files you need to access.
from pathlib import Path
DIRNAME = Path(__file__).parent
def func():
fn = DIRNAME / 'file.suffix'

Python - Calling for relative path where csv is stored

I have a Dataframe that I would like to save to a csv. I would however like to define the path where the csv is create to be relative. Given below is what I am trying. Could anyone guide me where I am going wrong.
Path where I am trying to store the csv is '/users/user/desktop/sample.csv'
import pandas as pd
import numpy as np
import os
df = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), columns=list('ABCDE'))
absolute_path = os.path.dirname(os.path.dirname(__file__))
file_path = absolute_path + '/desktop/sample.csv'
df.to_csv('file_path')
Get the below error when executing the above code:
NameError: name '__file__' is not defined
Could anyone guide me on this. Thanks

How to open my files in data_folder with pandas using relative path?

I'm working with pandas and need to read some csv files, the structure is something like this:
folder/folder2/scripts_folder/script.py
folder/folder2/data_folder/data.csv
How can I open the data.csv file from the script in scripts_folder?
I've tried this:
absolute_path = os.path.abspath(os.path.dirname('data.csv'))
pandas.read_csv(absolute_path + '/data.csv')
I get this error:
File folder/folder2/data_folder/data.csv does not exist
Try
import pandas as pd
pd.read_csv("../data_folder/data.csv")
Pandas will start looking from where your current python file is located. Therefore you can move from your current directory to where your data is located with '..'
For example:
pd.read_csv('../../../data_folder/data.csv')
Will go 3 levels up and then into a data_folder (assuming it's there)
Or
pd.read_csv('data_folder/data.csv')
assuming your data_folder is in the same directory as your .py file.
You could use the __file__ attribute:
import os
import pandas as pd
df = pd.read_csv(os.path.join(os.path.dirname(__file__), "../data_folder/data.csv"))
For non-Windows users:
import pandas as pd
import os
os.chdir("../data_folder")
df = pd.read_csv("data.csv")
For Windows users:
import pandas as pd
df = pd.read_csv(r"C:\data_folder\data.csv")
The prefix r in location above saves time when giving the location to the pandas Dataframe.
# script.py
current_file = os.path.abspath(os.path.dirname(__file__)) #older/folder2/scripts_folder
#csv_filename
csv_filename = os.path.join(current_file, '../data_folder/data.csv')
Keeping things tidy with f-strings:
import os
import pandas as pd
data_files = '../data_folder/'
csv_name = 'data.csv'
pd.read_csv(f"{data_files}{csv_name}")
With python or pandas when you use read_csv or pd.read_csv, both of them look into current working directory, by default where the python process have started. So you need to use os module to chdir() and take it from there.
import pandas as pd
import os
print(os.getcwd())
os.chdir("D:/01Coding/Python/data_sets/myowndata")
print(os.getcwd())
df = pd.read_csv('data.csv',nrows=10)
print(df.head())
If you want to keep your tidy, then I would suggest you to assign the path and file separately and then read:
path = 'C:/Users/username/Documents/folder'
file_name = 'file_name.xlsx'
file=pd.read_excel(f"{path}{file_name}")
I was also looking for the relative path version, this works OK. Note when run (Spyder 3.6) you will see (unicode error) 'unicodeescape' codec can't decode bytes at the closing triple quote. Remove the offending comment lines 14 and 15 and adjust the file names and location for your environment and check for indentation.
-- coding: utf-8 --
"""
Created on Fri Jan 24 12:12:40 2020
Source:
Read a .csv into pandas from F: drive on Windows 7
Demonstrates:
Load a csv not in the CWD by specifying relative path - windows version
#author: Doug
From CWD C:\Users\Doug\.spyder-py3\Data Camp\pandas we will load file
C:/Users/Doug/.spyder-py3/Data Camp/Cleaning/g1803.csv
"""
import csv
trainData2 = []
with open(r'../Cleaning/g1803.csv', 'r') as train2Csv:
trainReader2 = csv.reader(train2Csv, delimiter=',', quotechar='"')
for row in trainReader2:
trainData2.append(row)
print(trainData2)
You can always point to your home directory using ~ then you can refer to your data folder.
import pandas as pd
df = pd.read_csv("~/mydata/data.csv")
For your case, it should be like this
import pandas as pd
df = pd.read_csv("~/folder/folder2/data_folder/data.csv")
You can also set your data directory as a prefix
import pandas as pd
DATA_DIR = "~/folder/folder2/data_folder/"
df = pd.read_csv(DATA_DIR+"data.csv")
You can take advantage of f-strings as #nikos-tavoularis said
import pandas as pd
DATA_DIR = "~/folder/folder2/data_folder/"
FILE_NAME = "data.csv"
df = pd.read_csv(f"{DATA_DIR}{FILE_NAME}")
import pandas as pd
df = pd.read_csv('C:/data_folder/data.csv')
This link here answers it. Reading file using relative path in python project
Basically using Path from pathlib you'll do the following in script.py
from pathlib import Path
path = Path(__file__).parent / "../data_folder/data.csv"
pd.read_csv(path)
You can try with this.
df = pd.read_csv("E:\working datasets\sales.csv")
print(df.head())
import os
s_path = os.getcwd()
# s_path = "...folder/folder2/scripts_folder/script.py"
s_path = s_path.split('/')
print(s_path)
# [,..., 'folder', 'folder2', 'scripts_folder', 'script.py']
d_path = s_path[:len(s_path)-2] + ['data_folder', 'data.csv']
print(os.path.join(*d_path))
# ...folder/folder2/data_folder/data.csv```
You can use . to represent now working path.
#Linux
df = pd.read_csv("../data_folder/data.csv")
#Wins
df = pd.read_csv("..\\data_folder\\data.csv")
Try this:
Open a new terminal window.
Drag and drop the file (that you want Pandas to read) in that terminal window.
This will return the full address of your file in a line.
Copy and paste that line into read_csv command as shown here:
import pandas as pd
pd.read_csv("the path returned by terminal")
That's it.
Just replace your "/" with ""

Categories