Ta-lib and pandas series not working with python - python

I'm new in Coding, so please be patient with me ;)
This is my code that didn't work, I'm trying several hours to solve it, but every time I got another error.
import pandas as pd
import numpy as np
import talib.abstract as ta
dfBTCUSD_1h = pd.read_csv('Bitfinex_BTCUSD_1h.csv', skiprows=1)
dfBTCUSD_1h.sort_values(by='Date') # This now sorts in date order
open = dfBTCUSD_1h.iloc[:,2]
high = dfBTCUSD_1h.iloc[:,3]
low = dfBTCUSD_1h.iloc[:,4]
close =dfBTCUSD_1h.iloc[:,5]
short_ema = ta.EMA(close,timeperiod=10)
TypeError: cannot convert the series to <class 'int'>
How can I edit the pandas series to a working file for ta-lib?
best regards

Instead of supplying code to ta.EMA, do the following:
short_ema = ta.EMA(close.astype(float),timeperiod=10)
Talib functions usually take numpy arrays of dtype float as inputs

Related

How do I fix this type error ('value' must be an instance of str or bytes, not a float) on Python

I want to plot a graph for Covid-19 in India and so far there's no problem when I manually input my data as x and y axis. But since the data is quite long and when I want to read it as .csv file, it gives me this error 'value' must be an instance of str or bytes, not a float. I have also try to wrap int(corona_case), but giving me another new error, cannot convert the series to <class 'int'. Also I would be very appreciate if someone can suggest me tutorials on plotting graph involving datetime using python since this is my first time learning python.
I am using Python 3.
p/s I seem can't find a way to share my csv file so I am gonna leave it in snippet.
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
import numpy as np
plt.style.use('seaborn')
data = pd.read_csv('india.csv')
corona_date = data['Date']
corona_case = data['Case']
plt.plot_date (corona_date, corona_case, linestyle='solid')
plt.gcf().autofmt_xdate()
plt.title('COVID-19 in India')
plt.xlabel('Date')
plt.ylabel('Cumulative Case')
plt.tight_layout()
plt.show()
Date,Case
2020-09-30,6225763
2020-10-01,6312584
2020-10-02,6394068
2020-10-03,6473544
2020-10-04,6549373
2020-10-05,6623815
2020-10-06,6685082
convert all DataFrame columns to the int64 dtype
df = df.astype(int)
convert column "a" to int64 dtype and "b" to complex type
`df = df.astype({"a": int, "b": complex})`
convert Series to float16 type
s = s.astype(np.float16)
convert Series to Python strings
s = s.astype(str)
convert Series to categorical type - see docs for more details
s = s.astype('category')
You can convert the type of column in pandas dataframe like this.
corona_case = data['case'].astype(int) # or str for string
If that is what you are trying to do.
I couldn't give further information but this might work:
corona_date = (data['Date']).astype(str)
corona_case = (data['Case']).astype(str)

Iteration, calculation via pandas

I am new in Python and I would like to ask something.
My code reads a csv file. I want to use one column. I want to use an equation which calculates, depending on the value of the column I want to use, several values. I am using commands for and if.
my code
import pandas as pd
import matplotlib as mpl
import numpy as np
dfArxika = pd.read_csv('AIALL.csv', usecols=[0,1,2,3,4,5,6,7,8,9,10], header=None, index_col=False)
print(dfArxika.columns)
A=dfArxika[9]
for i in A:
if (A(i)>=4.8 and A(i)<66):
IA=(2.2*log10(A(i)/66)+5.5)
elif A(i)>=66:
IA=3.66*log10(A(i)/66)+5.5
else:
IA=2.2*log10(A(i)/66)+5.5
but command window shbows me the error:
TypeError: 'Series' object is not callable
Could you help me?
As #rdas mentioned in the comments, you are using parentheses () instead of brackets [] for indexing the values of your column.
I am not sure whatIA is in your example, but this might work:
for i in range(len(dfArxika)):
if (A.loc[i, 9]>=4.8 and A.loc[i, 9]<66):
IA=(2.2*log10(A.loc[i, 9]/66)+5.5)
elif A.loc[i, 9]>=66:
IA=3.66*log10(A.loc[i, 9]/66)+5.5
else:
IA=2.2*log10(A.loc[i, 9]/66)+5.5

.How to subtract a percentage from a csv file and then output it into another file? I'd preferably like a formula like x*.10=y

Sorry if I haven't explained things very well. I'm a complete novice please feel free to critic
I've searched every where but I havent found anything close to subtracting a percent. when its done on its own(x-.10=y) it works wonderfully. the only problem is Im trying to make 'x' stand for sample_.csv[0] or the numerical value from first column from my understanding.
import csv
import numpy as np
import pandas as pd
readdata = csv.reader(open("sample_.csv"))
x = input(sample_.csv[0])
y = input(x * .10)
print(x + y)
the column looks something like this
"20,a,"
"25,b,"
"35,c,"
"45,d,"
I think you should only need pandas for this task. I'm guessing you want to apply this operation on one column:
import pandas as pd
df = pd.read_csv('sample_.csv') # assuming columns within csv header.
df['new_col'] = df['20,a'] * 1.1 # Faster than adding to a percentage x + 0.1x = 1.1*x
df.to_csv('new_sample.csv', index=False) # Default behavior is to write index, which I personally don't like.
BTW: input is a reserved command in python and asks for input from the user. I'm guessing you don't want this behavior but I could be wrong.
import pandas as pd
df = pd.read_csv("sample_.csv")
df['newcolumn'] = df['column'].apply(lambda x : x * .10)
Please try this.

How to save a dask series to hdf5

Here is what I tried first
df = dd.from_pandas(pd.DataFrame(dict(x=np.random.normal(size=100),
y = np.random.normal(size=100))), chunksize=40)
cat = df.map_partitions( lambda d: np.digitize(d['x']+d['y'], [.3,.9]), meta=pd.Series([], dtype=int, name='x'))
cat.to_hdf('/tmp/cat.h5', '/cat')
This fails with cannot properly create the storer...
I next tried to save cat.values instead:
da.to_hdf5('/tmp/cat.h5', '/cat', cat.values)
This fails with cannot convert float NaN to integer which I am guessing to be due to cat.values not having nan shape and chunksize values.
How do I get both of these to work? Note the actual data would not fit in memory.
This works fine:
import numpy as np
import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame(dict(x=np.random.normal(size=100),
y=np.random.normal(size=100)))
ddf = dd.from_pandas(df, chunksize=40)
cat = ddf.map_partitions(lambda d: pd.Series(np.digitize(d['x'] + d['y'], [.3,.9])),
meta=('x', int))
cat.to_hdf('cat.h5', '/cat')
You were missing the pd.Series wrapper around the call to np.digitize, which meant the output of map_partitions was a numpy array instead of a pandas series (an error). In the future when debugging it may be useful to try computing a bit of data from steps along the way to see where the error is (for example, I found this issue by running .head() on cat).

panda read_csv() converting imaginary to real

After calling a file using pandas by this two lines:
import pandas as pd
import numpy as np
df = pd.read_csv('PN_lateral_n_eff.txt', header=None)
df.columns = ["effective_index"]
here is my output:
effective_index
0 2.568393573877396+1.139080496494329e-006i
1 2.568398351899841+1.129979376397734e-006i
2 2.568401556986464+1.123872317134941e-006i
after that, i can not use the numpy to convert it into a real number. Because, panda dtype was object. I tried this:
np.real(df, dtype = float)
TypeError: real() got an unexpected keyword argument 'dtype'
Any way to do that?
Looks like astype(complex) works with Numpy arrays of strings, but not with Pandas Series of objects:
cmplx = df['effective_index'].str.replace('i','j')\ # Go engineering
.values\ # Go NumPy
.astype('str')\ # Go string
.astype(np.complex) # Go complex
#array([ 2.56839357 +1.13908050e-06j, 2.56839835 +1.12997938e-06j,
# 2.56840156 +1.12387232e-06j])
df['effective_index'] = cmplx # Go Pandas again

Categories