I'm very new to Python and this will be an extremely basic question.
I want a user to input the name of a csv file, which I want to open with pandas to easily access its rows and columns.
This is the code that I wrote:
import pandas as pd
DATAFIN = str(raw_input("Name of your data file"))
dataset = pd.read_csv(DATAFIN)
dataset.head()
However, I seem to be doing some kind of mistake because this is the message I get (sorry for the lenght):
Traceback (most recent call last):
File "c:\Users\File.py", line 34, in <module>
dataset = pd.read_csv(DATAFIN)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
does not exist: ' maindata.csv\r'csv
Do you have any idea about which is the problem?
I am sorry for any mistakes in formatting.
It looks like you use a space charakter in your string
' maindata.csv\r'
Try to type your csv name without the space
So it looks like
Name of your data filemaindata.csv
try to read your csv file using pd.read_csv(r'address of the file.csv')
use argparse to get filename from command line.
run your script by python script.py --filename file.csv
and use print to see result
import pandas as pd
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--filename')
args = parser.parse_args()
dataset = pd.read_csv(args.filename)
print(dataset.head())
Related
I'm getting the below error when trying to convert csx to xlx format using Pandas script.
I tried running the below script:
import os
os.chdir("/opt/alb_test/alb/albt1/Source/alb/al/conversion/scripts")
# Reading the csv file
import pandas as pd
print(pd.__file__)
df_new = pd.read_csv("sourcefile.csv", sep="|", header=None).dropna(axis=1, how="all")
# saving xlsx file
df_new.to_excel("sourcefile.xlsx", index=False)
I am getting the error mentioned below:
Traceback (most recent call last):
File "/opt/alb_test/alb/albt1/Source/alb/al/conversion/scripts/pythn.py", line 13, in <module>
df = pd.read_csv("ff_mdm_reject_report.csv", lineterminator='\n')
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 611, in _read
return parser.read(nrows)
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/opt/infa/MDW-env/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 30488, saw 2
Can anyone guide me how to fix it?
Thanks in advance!
I am using pycharm and when i run a code of opening a csv file using pandas I am getting an error of no existence.
I saved the csv file in my project directory and called it using pandas.
import pandas as pd
df = pd.read_csv("E:\\students")
print(df)
The error when i run the code:
Traceback (most recent call last): File "E:/untitled232/file1.py", line 2, in <module>
df = pd.read_csv("E:\\students") File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 787, in __init__
self._make_engine(self.engine) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options) File "E:\untitled232\venv\lib\site-packages\pandas\io\parsers.py", line 1708, in __init__
self._reader = parsers.TextReader(src, **kwds) File "pandas\_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.__cinit__ File "pandas\_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: File b'E:\\students' does not exist
It seems I had to put .csv after the name.
I am trying to Import a csv file saved in a local Folder. When I use Anaconda Python Notebook I have no Problems, while using Zeppelin I do have issues.
The code I am using, that works fine in Anaconda, is:
#import csv data
frequency=pd.read_csv("C:\\Users\\L18938\\Desktop\\Vehicle_to_grid\\analysis\\Frequency_March_2018.csv", nrows=86401)
However, when running it on Zeppelin, I receive:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4025)
File "pandas/parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:8031)
IOError: File C:\Users\L18938\Desktop\Vehicle_to_grid\analysis\Frequency_March_2018.csv does not exist
Obviously, the file exists and there are no Errors in the path spelling.
I have tryied / or double \, but nothing changes. Also
os.chdir("C:/Users/L18938/Desktop/Vehicle_to_grid/analysis")
or
os.listdir("C:/Users/L18938/Desktop/Vehicle_to_grid/analysis")
Any idea? thank you in advance
Your Traceback let show you that the python interpreter is running in Unix file path mode (/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py)
When you are under Anaconda, you are in pure windows and your traceback will be something like (C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py)
Anaconda will reach file with a Windows type file-path, and Zeppelin will reach file in a UNIX type file-path.
Your issue is definitely relative to how you specify your path in Zeppelin, you can't use Windows path, but you you may try something like that:
frequency=pd.read_csv("file:///C:/Users/L18938/Desktop/Vehicle_to_grid/analysis/Frequency_March_2018.csv", nrows=86401)
I have a csv with strings containing line terminator I can import with panda with this code :
df_desc = pd.read_csv(import_desc, sep="|")
But when I try to import it in a dask dataframe :
import dask.dataframe as ddf
import_info = "data/info.csv"
df_desc = ddf.read_csv(import_desc, sep="|", blocksize=None, dtype='str')
I get this error :
Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1578, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/data_extraction_dask.py", line 10, in <module>
df_desc = ddf.read_table(import_desc, sep="|", blocksize=None, dtype='str')
File "/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.py", line 323, in read
**kwargs)
File "/anaconda2/lib/python2.7/site-packages/dask/dataframe/io/csv.py", line 243, in read_pandas
head = reader(BytesIO(b_sample), **kwargs)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 411, in _read
data = parser.read(nrows)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 982, in read
ret = self._engine.read(nrows)
File "/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1719, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)
File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)
File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)
File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)
File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at line 130
The documentation mention :
It should also be noted that this function may fail if a CSV file
includes quoted strings that contain the line terminator. To get
around this you can specify blocksize=None to not split files into
multiple partitions, at the cost of reduced parallelism.
That's why I used blocksize=None but this function use a sampling strategy that use the first bytes of the file to determine the type of columns and , I think, generate this error.
I can't skip the samping step even by indicating the type with dtypes.
Is there any workaround ?
I wrote the following script that runs perfectly when using pyCharm, but when I go to run it in a terminal it gives me these errors:
File "/Users/Chris/PycharmProjects/firstfile/trial.py", line 6, in <module>
r = pf.read_csv('python.csv')
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3427)
File "pandas/parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6861)
IOError: File python.csv does not exist
Could someone point in the the right direction? I am guessing that it has to do with the csv file not being in the right path or directory. Right now I have the csv file saved in the same folder as my .py project. I also checked and made sure I have the right packages installed, so I do not think it is that.
import csv
import pandas as pf
r = pf.read_csv('python.csv')
r.head()
print r.describe()
tradeDates = r['Trade Date'].unique()
r.name = 'Trade Date'
for trades in tradeDates:
outfilename = trades
printName = outfilename + ".csv"
print printName
r[r['Trade Date'] == trades].to_csv(printName, index=False)
When you run python /Users/Chris/PycharmProjects/firstfile/trial.py python looks for csv file in your current directory, not in /Users/Chris/PycharmProjects/firstfile.
You either need to change your directory before running the code, or you need to use the full path in trial.py like this:
import csv
import pandas as pf
r = pf.read_csv('/Users/Chris/PycharmProjects/firstfile/python.csv')
r.head()