How to set many columns on Pandas Python? - python

I want to insert more than a hundred columns into a CSV file. But seems pandas library has limited columns.
Here is the error message:
Traceback (most recent call last):
File "metric.py", line 91, in <module>
finalFile(sys.argv[1])
File "metric.py", line 80, in finalFile
data = pd.read_csv(f, header=None, dtype=str)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 948, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 1180, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py", line 2010, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
And below is my function:
def finalFile(fname):
output = pd.DataFrame()
for file_name in os.listdir('test/sciprt-temp/'):
if file_name.startswith(fname):
with open(os.path.join('test/sciprt-temp/', file_name)) as f:
data = pd.read_csv(f, header=None, dtype=str)
output[file_name.rsplit('.', 4)[2]] = data[1]
output.insert(0, 'timestamp', dt.datetime.now().timestamp())
output.insert(0, 'hostname', fname.rsplit('-', 3)[0])
output.set_index(output.columns[0], inplace=True)
output.to_csv(fname.rsplit('.', 2)[2] + ".csv")
finalFile(sys.argv[1])
It seems to work fine when inserting few columns but not working with more columns.
hostname,timestamp,-diskstats_latency-sda-avgrdwait-g,-diskstats_latency-sda-avgwait-g,-diskstats_latency-sda-avgwrwait-g,-diskstats_latency-sda-svctm-g,-diskstats_latency-sda_avgwait-g
test.test.com,1617779170.62498,2.7979746835e-03,6.6681051841e-03,7.1533659185e-03,2.5977601795e-04,6.6681051841e-03

Related

Error merging multiple CSV files - Python

I'm trying to merge several CSV files into one.
Searching several methods, I found this one:
files = glob.glob("D:\\green_lake\\Projects\\covid_19\\tabelas_relacao\\acre\\*.csv")
files_merged = pd.concat([pd.read_csv(df) for df in files], ignore_index=True)
When running this error is returned:
>>> files_merged = pd.concat([pd.read_csv(df) for df in files], ignore_index=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "C:\Users\Leonardo\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\Leonardo\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 678, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\Leonardo\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read
return parser.read(nrows)
File "C:\Users\Leonardo\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1253, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:\Users\Leonardo\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 225, in
read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 243, saw 4
I'm starting to study python and if it's a stupid mistake, I apologize ;)

Python script works but throws error - pandas.errors tokenizing data , Expected 9 fields saw 10

I am new to python. I am trying to read json response from requests and filtering using pandas to save in csv file. This script works and gives me all the data but its throws this error after execution -
I am not able to figure out why its throwing this error ? How can I pass this error ?
Error -
script.py line 42, in <module>
df = pd.read_csv("Data_script4.csv")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 458, in _read
data = parser.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 1196, in read
ret = self._engine.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 2155, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 862, in
pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 905, in
pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 53,
saw 10
This is my script -
if __name__ == '__main__':
parser = argparse.ArgumentParser("gets data")
parser.add_argument("-o" , dest="org", help="org name")
parser.add_argument("-p" , dest="pat", help="pat value")
args = parser.parse_args()
org = args.org
token = args.pat
url = "https://dev.azure.com/{org_name}/_apis/git/repositories?
api-
version=6.0".format(org_name=org)
data = getproject(url,token)
data_file=open("Data_script4.csv", "w",newline='')
val=data['value']
csv_writer = csv.writer(data_file)
for i in val:
if count==0:
header=i.keys()
csv_writer.writerow(header)
count +=1
csv_writer.writerow(i.values())
pro_name=[]
time=[]
df = pd.read_csv("Data_script4.csv")
for i in df["project"]:
res = ast.literal_eval(i)
pro_name.append(res['name'])
time.append(res['lastUpdateTime'])
del df["project"]
df["project name"] = pro_name
df["lastUpdateTime"] = time
df =df[["id","name","url","project
name","lastUpdateTime","defaultBranch","size","remoteUrl","sshUrl","webUrl"]]
df.head()
df.to_csv("Data_Filtered.csv",index=False)
print("\nFile Created Successfully...")
data_file.close()
os.remove('Data_script4.csv')
How can I resolve this issue ?
Your question was answered here
Here's the takeaway:
You need to substitute:
df = pd.read_csv("Data_script4.csv")
with this:
df = pd.read_csv('Data_script4.csv', error_bad_lines=False)

Using read_csv with a home-made object as a 'file'

The read_csv doc says that its first parameter can be 'any object with a read() method (such as a file handle or StringIO)'. My question is about how to construct an object that will work in this capacity.
import pandas as pd
file_name = 'plain.txt'
class FileWrap:
def __init__(self, path):
self.file = open(path)
def read(self):
return self.file.readline().rstrip()
filewrap = FileWrap(file_name)
while True:
line = filewrap.read()
if not line:
break
print (line)
df = pd.read_csv(FileWrap(file_name), header=None)
print (df)
The output from this script is this.
The first three lines are simply intended to show that the FileWrap object's read method seems to function as would be expected. The remaining lines serve to show that there's something I don't understand about constructing an object with a read method that pandas can use to receive its input a line at a time. What does a read have to do to make pandas happy?
1,2,3
4,5,6
7,8,9
Traceback (most recent call last):
File "temp.py", line 20, in <module>
df = pd.read_csv(FileWrap(file_name), header=None)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 645, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 388, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 729, in __init__
self._make_engine(self.engine)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 922, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 1389, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 535, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:6077)
File "pandas\parser.pyx", line 797, in pandas.parser.TextReader._get_header (pandas\parser.c:9878)
File "pandas\parser.pyx", line 909, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11257)
File "pandas\parser.pyx", line 2008, in pandas.parser.raise_parser_error (pandas\parser.c:26804)
TypeError: raise: exception class must be a subclass of BaseException
when pandas call check is_file_like it invalidate does object has read and __iter__ methods, is_file_like, so you can try:
import pandas as pd
file_name = 'plain.txt'
class FileWrap:
def __init__(self, path):
self.file = open(path)
def __iter__(self):
self.file.readline().rstrip()
def read(self, *args, **kwargs):
return self.file.read()
df = pd.read_csv(FileWrap(file_name), header=None)
print (df)

"pandas.io.common.EmptyDataError: No columns to parse from file" after moving to mac

In Windows 8, the script works fine. After I moved script and data.csv to work in my mac, I keep getting error: "pandas.io.common.EmptyDataError: No columns to parse from file."
The script and data are in the same folder as
"/Users/myname/Downloads/test/testimport.py"
"/Users/myname/Downloads/test/test2.csv"
I've tried many file locations to read the csv but nothing works.
file_loc = "../test/test2.csv"
# as well as "../test2.csv", "/test2.csv", "/Users/myname/Downloads/test/test2.csv"
import pandas as pd
df = pd.read_csv(file_loc)
exp_mat = df.as_matrix()
print exp_mat
How can I read the csv here? Is it wrong location problem or is csv filetype in mac not compatible?
Here is OS X El Capitan. Full error is
h143% python testimport.py
Traceback (most recent call last):
File "test_importexcel.py", line 24, in <module>
df = pd.read_csv(file_loc)
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 538, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:6171)
pandas.io.common.EmptyDataError: No columns to parse from file
Data (copying from Number is like)
x time value
445.1207 0.003626 21935450
445.1203 0.011099 36700932
445.1203 0.017235 35722172
445.1203 0.022958 33623668
445.1203 0.028689 33500360
352.3396 37.180567 307886720
352.3396 37.185836 303264100
352.3396 37.191101 292523810

IO Error: csv file does not exist though it exists at given location specified

import pandas as pd
import os
import time
from datetime import datetime
path = "C:\WinPython-32bit-2.7.9.5\python- 2.7.9\Lib\idlelib\MuditPracticals\intraQuarter\intraQuarter"
def Key_Stats(gather="Total Debt/Equity (mrq)"):
statspath = path+'/_KeyStats'
stock_list = [x[0] for x in os.walk(statspath)]
df = pd.DataFrame(columns = ['Date','Unix','Ticker','DE Ratio','Price','SP500'])
sp500_df = pd.DataFrame.from_csv("YAHOO-INDEX_GSPC.csv")
for each_dir in stock_list[1:25]:
each_file = os.listdir(each_dir)
ticker = each_dir.split("\\")[3]
if len(each_file) > 0:
for file in each_file:
date_stamp = datetime.strptime(file, '%Y%m%d%H%M%S.html')
unix_time = time.mktime(date_stamp.timetuple())
full_file_path = each_dir+'/'+file
source = open(full_file_path,'r').read()
try:
value = float(source.split(gather+':</td><td class="yfnc_tabledata1">')[1].split('</td>')[0])
try:
sp500_date = datetime.fromtimestamp(unix_time).strftime('%Y-%m-%d')
row = sp500_df[(sp500_df.index == sp500_date)]
sp500_value = float(row["Adjusted Close"])
except:
sp500_date = datetime.fromtimestamp(unix_time-259200).strftime('%Y-%m-%d')
row = sp500_df[(sp500_df.index == sp500_date)]
sp500_value = float(row["Adjusted Close"])
stock_price = float(source.split('</small><big><b>')[1].split('</b></big>')[0])
#print("stock_price:",stock_price,"ticker:", ticker)
df = df.append({'Date':date_stamp,
'Unix':unix_time,
'Ticker':ticker,
'DE Ratio':value,
'Price':stock_price,
'SP500':sp500_value}, ignore_index = True)
except Exception as e:
print "hello"
save = gather.replace(' ','').replace(')','').replace('(','').replace('/','')+('.csv')
print(save)
df.to_csv(save)
Key_Stats()
Compile Time Error In Spyder
File "<ipython-input-1-dfafbc7450e8>", line 1, in <module>
runfile('C:/WinPython-32bit-2.7.9.5/python- 2.7.9/Lib/idlelib/MuditPracticals/data_organisation1.py', wdir='C:/WinPython-32bit-2.7.9.5/python-2.7.9/Lib/idlelib/MuditPracticals')
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/WinPython-32bit-2.7.9.5/python-2.7.9/Lib/idlelib/MuditPracticals/data_organisation1.py", line 56, in <module>
Key_Stats()
File "C:/WinPython-32bit-2.7.9.5/python-2.7.9/Lib/idlelib/MuditPracticals/data_organisation1.py", line 13, in Key_Stats
sp500_df = pd.DataFrame.from_csv("YAHOO-INDEX_GSPC.csv")
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\core\frame.py", line 1036, in from_csv
infer_datetime_format=infer_datetime_format)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 250, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 566, in __init__
self._make_engine(self.engine)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 705, in _make_engine
``self._engine = CParserWrapper(self.f, **self.options)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 1072, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3160)
File "pandas\parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5905)
IOError: File YAHOO-INDEX_GSPC.csv does not exist
It is giving IO error though file exists at that location
IO ERROR occurs at compile time
and why it is so that in other IDLE pandas module is not found but in Spyder there is no pandas Error
the path to your .csv file is relative. if the file is not in your current working directory python will not find it.
"though file exists at that location"... that is the problem with relative paths: what is that location?
here is a previous answer that should resolve the issue:
Python ConfigParser cannot search .ini file correctly (Ubuntu 14, Python 3.4)

Categories