Read SAS file with pandas - python

I'm trying to use the pandas read_sas() function.
First, I create a SAS dataset by running this code in SAS:
libname tmp 'c:\temp';
data tmp.test;
do i=1 to 100;
x=rannor(0);
output;
end;
run;
Now, in IPython, I do this:
import numpy as np
import pandas as pd
%cd C:\temp
pd.read_sas('test.sas7bdat')
Pretty straightforward and seems like it should work. But I just get this error:
TypeError: read() takes at most 1 argument (2 given)
What am I missing here? I'm using pandas version 0.18.0.

According issue report linked below, this bug will be fixed in 18.1.
https://github.com/pydata/pandas/issues/12647

Related

Columns names issues using pandas.read_csv

I am pretty new to python.
I am trying to import the SMSSpam Collection Data using pandas read_csv module.
I
The import went went.
But as the file does not have header I tried to include columns names(variables names : "status" and "message" and ended up with empty file.
Here is my code:
import numpy as np
import pandas as pd
file_loc="C:\Users\User\Documents\JP\SMSCollection.txt"
df=pd.read_csv(file_loc,sep='\t')
The above code works well I got the I got the 5571 rows x 2 columns].
But when I add columns using the following line of code
df.columns=["status","message"]
I ended up with an empty df
Any help on this ?
Thanks
You could try to set the column names at read time:
df=pd.read_csv(file_loc,sep='\t',header=None,names=["status","message"])

Extracting data from a large csv file:causes dtype warnings

I work for a company and I recently switched from using spreadsheet package to python. Since, I am very new to python there are alot of things that I have difficulty grasping.Using python, I am trying to extract data from a large csv file(37791 rows and 316 columns.) Here is a piece of code I wrote:
Solution 1
import numpy as np
import pandas as pd
df=pd.read_csv=('C:\\Users\\Maxwell\\Desktop\\Test.data.csv',skiprows=1)
data=df.loc[:,['Steps','Parameter']]
This command generates an error,i.e, it gives a DtypeWwarning:columns (0,1,2,3........81) have mixed types. Specify dtype option on import or set low memory= False
So, I found a workaround.
Solution 2
import pandas as pd
import numpy as np
df=pd.read_csv(('C:\\Users\\Maxwell\\Desktop\\Test.data.csv',skiprows=1,error_bad_lines=False, index_col=False, dtype='unicode')
data=df.loc[:,['Steps','Parameter']]
Two questions:
i)I was able to get around the error, but now the columns that I want(Steps & Parameter)have been converted to objects(probably due to the dtype='unicode' command). How can I convert Steps column into an integer type and parameter into a float.
ii) Some people say that dtype warning isn't really an error. But, I found out that when I use Solution 1 and read the csv file. The Steps column contains some floats.The original csv file doesn't have any floats in Steps column. It looks as if, some floats have been placed by python itself!! Why does this happen?
(I am not able to upload the original csv file, because my company doesn't allow it!)

Unable to write my dataframe using feather (strided data not supported)

When using the feather package (http://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/) to try and write a simple 20x20 dataframe, I keep getting an error stating that strided data isn't yet supported. I don't believe my data is strided (or out of the ordinary), and I can replicate the sample code given on the website, but can't seem to get it to work with my own. Here is some sample code:
import feather
import numpy as np
import pandas as pd
tempArr = reshape(np.arange(400), (20,20))
df = pd.DataFrame(tempArr)
feather.write_dataframe(df, 'test.feather')
The last line returns the following error:
FeatherError: Invalid: no support for strided data yet
I am running this on Ubuntu 14.04. Am I perhaps misunderstanding something about how pandas dataframes are stored?
Please come to GitHub: https://github.com/wesm/feather/issues/97
Bug reports do not belong on StackOverflow

talib ADX function error

Using python pandas dataframe (d) downloaded from yahoo finance which has the format:
Date,Open,High,Low,Close,Volume,Adj Close
2015-01-13,1.290,1.290,1.200,1.225,421600,1.225
I can successfully use talib functions like this:
talib.abstract.EMA(d, timeperiod=8, price='Close')
talib.abstract.SMA(d, timeperiod=13, price='Close')
talib.abstract.RSI(d, timeperiod=25, price='Close')
From the documentation (http://mrjbq7.github.io/ta-lib/func_groups/momentum_indicators.html) they take the form:
real = XYZ(close, timeperiod=14)
However when trying to use the talib functions with the form:
real = XYZ(high, low, close, timeperiod=14) such as the ADX I cant figure out the correct syntax needed.
I've tried this:
talib.abstract.ADX(d,high='High',low='Low',close='Close',Timeperiod=10)
Exception: input_arrays parameter missing required data keys: high, low, close
and this:
talib.abstract.ADX(high=d.High,low=d.Low,close=d.Close,Timeperiod=10)
TypeError: only length-1 arrays can be converted to Python scalars
Any ideas on the correct format for the parameters needed for this and the other talib python wrappers that have more than one input parameter ?
Any help on the correct format would be greatly appreciated !!!
Thanks in advance
Depends on the shape of your arrays in some cases. If you really need the function as a matter of urgency, just call it from the library:
import talib
import numpy as np
h = np.array(high)
l = np.array(low)
c = np.array(close)
output_atr = np.array(talib.ATR(h,l,c,14))
This works fine.

Pickling a DataFrame

I am trying to pickle a DataFrame with
import pandas as pd
from pandas import DataFrame
data = pd.read_table('Purchases.tsv',index_col='coreuserid')
data.to_pickle('Purchases.pkl')
I have been running on "data" for a while and have had no issues so I know it is not a data corruption issue. I am thinking likely syntax but I have tried a number of variants. I hesitate to give the whole error message but it ends with:
\pickle.pyc in to_pickle(obj, path)
13 """
14 with open(path, 'wb') as f:
15 pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
SystemError: error return without exception set
The Purchases.pkl file is created but if I call
data = pd.read_pickle('Purchases.pkl')
I get EOFError. I am using Canopy 1.4 so pandas 0.13.1 which should be recent enough to have this functionality.
Fast forward a few years, and now it works fine. Thanks pandas ;)
You can try create a class from your DataFrame and pickle it after.
This can help you:
Pass pandas dataframe into class

Categories