error using pandas in python - python

I would like to use the pandas package for python. Some functionalities work, but when I try to pass "include" argument into the describe() function I get an error:
train_df.describe(include=['O'])
Full code looks like thie following:
import numpy as np
import pandas as pd
import random as rnd
import matplotlib.pyplot as plt
# aquire data
train_df = pd.read_csv('train.csv')
test_df = pd.read_csv('test.csv')
train_df.describe(include=['O'])
I get the following error:
>> python survival.py
Traceback (most recent call last):
File "survival.py", line 10, in <module>
train_df.describe(include=['O'])
TypeError: describe() got an unexpected keyword argument 'include'
Using the .describe() on its own seems to work. Any ideas? Thank you.

Related

why when I try to do adfuller test it tells me: 'NoneType' object is not callable?

hope someone can help me,
I'm trying to run the adfuller test, but it return me error:'NoneType' object is not callable, the excel file should be well imported and no needed to drop Nan.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
bond_future_data=pd.read_excel('europe_market_data.xlsx', sheet_name='RXA',header=[0],index_col=[0])
bond_future_data.columns.names=['car']
bond_future_data.index.names=['dates']
bond_future_price=bond_future_data['RX1 Comdty']
adfuller(bond_future_price)
I thought that was an error in the way I imported adfuller, but shouldn't be cause the following code works
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
a=np.linspace(0,10)
print(adfuller(a))
this is the error:
Exception has occurred: TypeError
'NoneType' object is not callable
File "/Users/federicoruggieri/Desktop/phyton/#garch and imp vol.py", line 25, in
adfuller(bond_future_price)
i also add a screenshot where I printed the dataframe.
screen of printed dataframe

TypeError: <class 'cftime._cftime.DatetimeGregorian'> is not convertible to datetime

I have been trying to use the pvlib-python tool for forecasting. The tool comes with some model-specific classes.
# Import pvlib forecast models
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import seaborn as sns; sns.set_color_codes()
from pvlib.forecast import GFS, NAM, NDFD, HRRR, RAP
from pvlib import solarposition
# Specify Location (Phoenix, AZ)
latitude, longitude, tz = 32.2, -110.9, 'US/Arizona'
# Specify time range
start = pd.Timestamp(datetime.date.today(), tz=tz)
end = start + pd.Timedelta(days=7)
irrad_vars = ['ghi','dni','dhi']
from pvlib.forecast import GFS, NAM, NDFD, HRRR, RAP
model = GFS()
# Retrive data.returns panda.DataFrame object
raw_data = model.get_data(latitude, longitude, start, end)
print(raw_data.head())
When I try to get data from the model, the code produces the following output:
TypeError Traceback (most recent call last)
# Retrive data.returns panda.DataFrame object
----> 6 raw_data = model.get_data(latitude, longitude, start, end)
TypeError: <class 'cftime._cftime.DatetimeGregorian'> is not convertible to datetime
So i don't know what is in your getdata function but i would suspect it uses netCDF4 library. and the netCDF4.num2date function which is built on the cftime library (https://github.com/Unidata/cftime). See requirements section of netCDF4 documentation: https://unidata.github.io/netcdf4-python/netCDF4/index.html
It seems they migrated away from python datetime library around version 5 because it can handle more calendars than strictly gregorian. I don't totally understand why, but you can use the kwarg options of only_use_cftime_datetimes=False usually will suffice, but you can also force it with an additional only_use_python_datetimes=True This should return you a python datetime and fix your problem.
It was discussed by pvlib's contributors here: https://github.com/pvlib/pvlib-python/issues/944
One of the suggestions was downgrading cftime and it worked for me.

what is causing "AttributeError: 'numpy.ndarray' object has no attribute 'diff'"

I am new to numpy and I am NOT understanding the documentation as regards diff. the code below throws the error. I am baffled any help would be appreciated.
Traceback (most recent call last):
File "/home/dave/Desktop/mcmtest/testhv calc.py", line 11, in <module>
r = np.log(close_prices).diff()
AttributeError: 'numpy.ndarray' object has no attribute 'diff'
here is the test code.
import numpy as np
from numpy import sqrt,mean,log,diff
import pandas as pd
close_prices = [178.97,175.5,171.07,171.85,172.43,172.99,167.37,164.34,162.71,\
156.41,155.15,159.54,163.03,156.49,160.5,167.78,167.43,166.97,167.96,171.51,171.11]
print (close_prices)
r = np.log(close_prices).diff()
print(r)
Given that numpy.ndarray is the Python type of "numpy arrays", the error is just saying that arrays don't have a diff method. diff is a function defined in the numpy module.
Instead of np.log(close_prices).diff(), do
np.diff(np.log(close_prices))

Call auto.arima from Python via pyRserve

I have been trying to use the pyRserve for time series forecast and intent to use the auto.arima function in R.
I used the following code to solve the problem of functions with a dot in their names such as auto.arima:
import pyRserve
import pandas as pd
import numpy
conn = pyRserve.connect()
df = pd.read_excel('D:/My Path/C9.xlsx', sheet_name='C9')
aList = df['Value'].tolist() # Cast the desired column into a python list
aList = numpy.array(aList)
conn.r.List = aList
auto_arima = getattr(conn.r, 'auto.arima')
conn.r.sapply(conn.ref.List, auto_arima)
but, it returned this error:
Traceback (most recent call last):
File "D:/Forecast/Python/R2Python/R2P_Practice.py", line 21, in <module>
auto_arima = getattr(conn.r, 'auto.arima')
File "C:\Python27\lib\site-packages\pyRserve\rconn.py", line 308, in __getattr__
'defined in Rserve' % realname)
NameError: no such variable or function "auto.arima" defined in Rserve
It seems the auto.arima is not defined in Rserve. Why isn't it there? How can I fix this?

TCP Time-Sequence Graph Ipython Notebook

I'm having network traffic data in this location "C:\Users\ASHWIN\Desktop\Test3_pcap.csv". In that file contain frame.number, frame.time, eth.src, eth.dst, ip.src, ip.dst, ip.proto, tcp.stream, tcp.seq, tcp.ack, tcp.window_size and tcp.len which divided in column.
Besides i already declared some importing files in my Ipython notebook which is in below:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
%matplotlib inline
I wanted to do plotting graph for TCP-time sequence graph by using my csv file but its turns alot of error. The sample code i did and get error was as below;
fields=["tcp.stream", "ip.src", "ip.dst", "tcp.seq", "tcp.ack", "tcp.window_size", "tcp.len"]
ts=read_csv("C:\Users\ASHWIN\Desktop\Test3_pcap.csv", fields, timeseries=True, strict=True)
ts
stream=ts[ts["tcp.stream"] == 10]
print stream.to_string()
stream["type"] = stream.apply(lambda x: "client" if x["ip.src"] == stream.irow(0)["ip.src"] else "server", axis=1)
print stream.to_string()
client_stream=stream[stream.type == "client"]
client_stream["tcp.seq"].plot(style="r-o")
When I run all this 8 code through my Ipython notebook it show alot of error. Can anyone solve my problem using this network traffic csv file. I wanted to create a TCP-time sequence graph for this network traffic data in csv format. I'm hoping alot is anyone can solve my problem in this ipython notebook.Thank you.
My code:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
%matplotlib inline
Location = r'C:\Users\ASHWIN\Desktop\tempo\New folderTest3_pcap.csv'
fields=["tcp.stream", "ip.src", "ip.dst", "tcp.seq", "tcp.ack", "tcp.window_size", "tcp.len"]
ts=read_csv(Location, fields, timeseries=True, strict=True)
ts
And this is the error I get:
TypeError Traceback (most recent call last)
<ipython-input-6-ae8455b41c8b> in <module>()
1 Location = r'C:\Users\ASHWIN\Desktop\tempo\New folderTest3_pcap.csv'
2 fields=["tcp.stream", "ip.src", "ip.dst", "tcp.seq", "tcp.ack", "tcp.window_size", "tcp.len"]
----> 3 ts=read_csv(Location, fields, timeseries=True, strict=True)
4 ts
TypeError: parser_f() got an unexpected keyword argument 'timeseries'
timeseries, as well as strict, are not valid arguments of read_csv()

Categories