ValueError when reading a sas file with pandas - python

pandas.read_sas() prints traceback messages that I cannot remove. The problem is it prints messages for EACH row it's reading, so when I try to read the whole file it just freezes printing too much.
I tried from other stackoverflow answers
import warnings
warnings.simplefilter(action='ignore')
And
warnings.filterwarnings('ignore')
And
from IPython.display import HTML
HTML('''<script>
code_show_err=false;
function code_toggle_err() {
if (code_show_err){
$('div.output_stderr').hide();
} else {
$('div.output_stderr').show();
}
code_show_err = !code_show_err
}
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click here.''')
But nothing works.
The message it prints is:
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) pandas\io\sas\sas.pyx in pandas.io.sas._sas.rle_decompress()
ValueError: Unexpected non-zero end_of_first_byte
Exception ignored in:
'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback
(most recent call last): File "pandas\io\sas\sas.pyx", line 29, in
pandas.io.sas._sas.rle_decompress ValueError: Unexpected non-zero
end_of_first_byte

As highlighted in the traceback, the error is caused by a bug in the pandas implementation of RLE decompression, which is used when the SAS dataset is exported using CHAR (RLE) compression.
Note the pandas issue created for this topic: https://github.com/pandas-dev/pandas/issues/31243
The resolution that pandas implemented for this bug in read_sas is contained in the following Pull Request, which is part of the version 1.5 milestone, yet to be released at the time of answering: https://github.com/pandas-dev/pandas/pull/47113
To answer your question, you have two options:
Wait until pandas releases version 1.5, update to that version, and read_sas should then work as expected. You've already been waiting awhile since you asked, so I suspect this will be fine.
Use the python sas7bdat library instead (https://pypi.org/project/sas7bdat/), and then convert to a pandas DataFrame:
from sas7bdat import SAS7BDAT
df = SAS7BDAT("./path/to/file.sas7bdat").to_data_frame()
The sas7bdat approach worked for me, after facing the exact same error as you did.

Related

Domo dataset: OverflowError: Python int too large to convert to C long

I have 64-bit Windows 10 OS and I recently updated my python using pip. I use pydomo to connect to DOMO dataset I created and while importing, it's giving the below error only sometimes. Here is part of the code and error.
import pandas as pd
from pydomo import Domo
domo = Domo(client_id,secret,api_host='api.domo.com')
#import dataset as pandas dataframe
DF = domo.ds_get('aaaaa-12ert34-3456789')
OverflowError: Python int too large to convert to C long
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~\AppData\Roaming\Python\Python39\site-packages\pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
TypeError: invalid string coercion to datetime
I was try to avoid it using the below command but didn't work.
np.iinfo(np.uint64).max
I saw someone use a CSV file to bring data and happened same error and avoid it using the below command.
csv.field_size_limit(maxInt)
Is there anything like that for DOMO to avoid?
Any idea would be appreciated and thanks in advance!
This works for me when I cannot get ds_get to work:
domo.ds_query('aaaaa-12ert34-3456789', 'select * from table')

Python read pickle protocol 4 error: STACK_GLOBAL requires str

In Python 3.7.5, ubuntu 18.04, pickle read gives error,
pickle version 4
Sample code:
import pickle as pkl
file = open("sample.pkl", "rb")
data = pkl.load(file)
Error:
UnpicklingError Traceback (most recent call
last)
in
----> 1 data = pickle.load(file)
UnpicklingError: STACK_GLOBAL requires str
Reading from same file object solves problem.
Reading using pandas also gives same problem
I also has this error turned out I was opening a numpy file with pickle. ;)
Turns out it is known issue. There is issue page in
github
I had this problem and just added pckl to the end of the file name.
My problem was that I was trying to pickle and un-pickle across different python environments - watch out to make sure your pickle versions match!
Perhaps this will be the solution to this error for someone.
I needed to load a numpy array:
torch.load(file)
When I loaded the array, this error appeared. All that is needed is to turn the array into a tensor.
For example:
result = torch.from_numpy(np.load(file))

TypeError when trying to fetch data from CSV file

I am new in Python. I have a .csv file named supermarket.csv. I am trying to fetch data from the file and store it in a DataFrame object. I am using Jupyter as text editor.
Data the file contains:
,Address,City,Country,Employees,ID,Name,State
0,3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114
1,735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119
2,332 Hill St,San Francisco,USA,25,3,Super River,California 94114
3,3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114
4,1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California
5,551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114
The code I am trying to run:
import pandas
df1=pandas.read_csv("supermarkets.csv")
df1
and its throwing an type error:
> ---------------------------------------------------------------------------
TypeError
Traceback (most recent call last)
<ipython-input-123-0000e09242f0> in <module>()
----> 1 df1=pandas.read_csv("supermarkets.csv")
----> 2 df1
TypeError: 'str' object is not callable
> ---------------------------------------------------------------------------
I was following a tutorial. In the tutorial it worked fine for the instructor. But whenever I'm trying to run this code getting the same error.
I have also tried for .json and .xlsx file. Both are working fine. Only for read_csv() method getting this error.
Improper Indentation
Python programs get structured through indentation, i.e. code blocks are defined by their indentation. Okay that's what we expect from any program code, isn't it? Yes, but in the case of Python it's a language requirement not a matter of style. This principle makes it easier to read and understand other people's Python code.
Read more about Python Indentation
Try this:
import pandas
df1 = pandas.read_csv('supermarkets.csv')
print df1

TypeError: read_excel() takes exactly 2 arguments (1 given)

I get this problem when i try to read file:
import numpy as np
import pandas as pd
pos = pd.read_excel('pos.xls', header=None)
and the error is like this:
Traceback (most recent call last):
File "one-hot.py", line 4, in <module>
pos = pd.read_excel('pos.xls', header=None)
TypeError: read_excel() takes exactly 2 arguments (1 given)
but to my surprise,when i run the code in my own pc by pycharm,it will not be an error.i get the problem only when i use my school's ubuntu(not use pycharm).
my own python is python 2.7.12,and python on school's ubuntu is python 2.7.6
My best guess (I can't try it on Python 2.7.6 since I don't have it) is that You use pandas version 0.13 or bellow. According to docs, You must also provide sheetname, which, in later version, has default value of 0.
pandas.io.excel.read_excel(io, sheetname, **kwds)
This sounds like an issue with a different version of the pandas library installed. Looking back at the older documentation pages for pandas library, it seems that pandas did in fact require 2 parameters back in version 0.13.0 (and potentially other old versions, but I did not check any others). For version 0.13.0, the docs define the function as:
pandas.read_excel(io, sheetname, **kwds)
You can read those details here: http://pandas.pydata.org/pandas-docs/version/0.13.0/generated/pandas.read_excel.html?highlight=read_excel#pandas.read_excel
Chances are, it is just an issue with a different library version.
I actually had a similar problem which was solved by adding '.xlsx' to the end of my proposed file name:
practicetoexcel.to_excel('Thisxldoc.xlsx', sheet_name = 'Practice')

Installing the latest version of XLRD

I am already using an xlrd package. The code I am working on always returns an error message:
Traceback (most recent call last):
File "diffoct8.py", line 17, in <module>
row = rs.get(row_number)
AttributeError: 'Sheet' object has no attribute 'get'
What could be the problem?
Is there a newer version of XLRD?. If yes, how can I install it in Ubuntu?
Here you can get latest xlrd package. https://pypi.python.org/pypi/xlrd
From my understanding, you just want to get information from a row in a sheet. I assume there are 10 elements in a row.
Try this:
...
element_num = 10
row = []
for i in xrange(element_num):
row.append(rs.cell(row_number, i).value)
...
The method get() does not exist (It was purely used to show the approach you should take and where the problem was in your previous question). I've update my answer to that question to show you how you should use the row() method, as instructed in the documentation.

Categories