panda read_csv() converting imaginary to real - python

After calling a file using pandas by this two lines:
import pandas as pd
import numpy as np
df = pd.read_csv('PN_lateral_n_eff.txt', header=None)
df.columns = ["effective_index"]
here is my output:
effective_index
0 2.568393573877396+1.139080496494329e-006i
1 2.568398351899841+1.129979376397734e-006i
2 2.568401556986464+1.123872317134941e-006i
after that, i can not use the numpy to convert it into a real number. Because, panda dtype was object. I tried this:
np.real(df, dtype = float)
TypeError: real() got an unexpected keyword argument 'dtype'
Any way to do that?

Looks like astype(complex) works with Numpy arrays of strings, but not with Pandas Series of objects:
cmplx = df['effective_index'].str.replace('i','j')\ # Go engineering
.values\ # Go NumPy
.astype('str')\ # Go string
.astype(np.complex) # Go complex
#array([ 2.56839357 +1.13908050e-06j, 2.56839835 +1.12997938e-06j,
# 2.56840156 +1.12387232e-06j])
df['effective_index'] = cmplx # Go Pandas again

Related

How do I fix this type error ('value' must be an instance of str or bytes, not a float) on Python

I want to plot a graph for Covid-19 in India and so far there's no problem when I manually input my data as x and y axis. But since the data is quite long and when I want to read it as .csv file, it gives me this error 'value' must be an instance of str or bytes, not a float. I have also try to wrap int(corona_case), but giving me another new error, cannot convert the series to <class 'int'. Also I would be very appreciate if someone can suggest me tutorials on plotting graph involving datetime using python since this is my first time learning python.
I am using Python 3.
p/s I seem can't find a way to share my csv file so I am gonna leave it in snippet.
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
import numpy as np
plt.style.use('seaborn')
data = pd.read_csv('india.csv')
corona_date = data['Date']
corona_case = data['Case']
plt.plot_date (corona_date, corona_case, linestyle='solid')
plt.gcf().autofmt_xdate()
plt.title('COVID-19 in India')
plt.xlabel('Date')
plt.ylabel('Cumulative Case')
plt.tight_layout()
plt.show()
Date,Case
2020-09-30,6225763
2020-10-01,6312584
2020-10-02,6394068
2020-10-03,6473544
2020-10-04,6549373
2020-10-05,6623815
2020-10-06,6685082
convert all DataFrame columns to the int64 dtype
df = df.astype(int)
convert column "a" to int64 dtype and "b" to complex type
`df = df.astype({"a": int, "b": complex})`
convert Series to float16 type
s = s.astype(np.float16)
convert Series to Python strings
s = s.astype(str)
convert Series to categorical type - see docs for more details
s = s.astype('category')
You can convert the type of column in pandas dataframe like this.
corona_case = data['case'].astype(int) # or str for string
If that is what you are trying to do.
I couldn't give further information but this might work:
corona_date = (data['Date']).astype(str)
corona_case = (data['Case']).astype(str)

How to apply argrelextrema function in Python 3.7?

I am tryig to apply argrelextrema function with dataframe df. But unable to apply correctly. below is my code
import pandas as pd
from scipy.signal import argrelextrema
np.random.seed(42)
def maxloc(data):
loc_opt_ind = argrelextrema(df.values, np.greater)
loc_max = np.zeros(len(data))
loc_max[loc_opt_ind] = 1
data['loc_max'] = loc_max
return data
values = np.random.rand(23000)
df = pd.DataFrame({'value': values})
np.all(maxloc_faster(df).loc_max)
It gives me error
that loc_max[loc_opt_ind] = 1
IndexError: too many indices for array
A Pandas dataframe is two-dimensional. That is, df.values is two dimensional, even when it has only one column. As a result, loc_opt_ind will contain x and y indices (two tuples; just print loc_opt_ind to see), which can't be used to index loc_max. You probably want to use either df['values'].values (which turns into <Series>.values), or np.squeeze(df.values) as input. Note that argrelextrema still returns a tuple in that case, just a one-element one, so you may need loc_opt_ind[0] (np.where has similar behaviour).

Convert selected elements in data frame from float to integer unsuccessful

I'm trying to convert a list of elements in the dataframe called "GDP" from floating to integers. The cells that I want to convert are specified in GDP.iloc[4,-10]. I have tried the following methods:
for x in GDP.iloc[4,-10:]:
pd.to_numeric(x, downcast='signed')
GDP.iloc[4,-10:]=GDP.iloc[4,-10:].astype(int)
GDP.iloc[4,-10:]=int(GDP.iloc[4,-10:])
However, none of them seem to be working in converting the float to integers. No errors appear for methods 1 and 2 but for option 3, the following error appears:
TypeError: cannot convert the series to
The data can be found here: https://data.worldbank.org/indicator/NY.GDP.MKTP.CD
GDP = pd.read_csv('world_bank.csv',header=None)
Method 1
for x in GDP.iloc[4,-10:]:
pd.to_numeric(x, downcast='signed')
Method 2:
GDP.iloc[4,-10:]=GDP.iloc[4,-10:].astype(int)
Method 3:
GDP.iloc[4,-10:]=int(GDP.iloc[4,-10:])
Can someone help me out? Much appreciated.
enter image description here
You can use astype(np.int64) to convert to int
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv')
# df.head()
df = df.fillna('custom_none_values')
# df.head()
df = df[df['1960'] != 'custom_none_values']
df['1960'] = df['1960'].astype(np.int64)
df.head()

How to save a dask series to hdf5

Here is what I tried first
df = dd.from_pandas(pd.DataFrame(dict(x=np.random.normal(size=100),
y = np.random.normal(size=100))), chunksize=40)
cat = df.map_partitions( lambda d: np.digitize(d['x']+d['y'], [.3,.9]), meta=pd.Series([], dtype=int, name='x'))
cat.to_hdf('/tmp/cat.h5', '/cat')
This fails with cannot properly create the storer...
I next tried to save cat.values instead:
da.to_hdf5('/tmp/cat.h5', '/cat', cat.values)
This fails with cannot convert float NaN to integer which I am guessing to be due to cat.values not having nan shape and chunksize values.
How do I get both of these to work? Note the actual data would not fit in memory.
This works fine:
import numpy as np
import pandas as pd
import dask.dataframe as dd
df = pd.DataFrame(dict(x=np.random.normal(size=100),
y=np.random.normal(size=100)))
ddf = dd.from_pandas(df, chunksize=40)
cat = ddf.map_partitions(lambda d: pd.Series(np.digitize(d['x'] + d['y'], [.3,.9])),
meta=('x', int))
cat.to_hdf('cat.h5', '/cat')
You were missing the pd.Series wrapper around the call to np.digitize, which meant the output of map_partitions was a numpy array instead of a pandas series (an error). In the future when debugging it may be useful to try computing a bit of data from steps along the way to see where the error is (for example, I found this issue by running .head() on cat).

Ta-lib and pandas series not working with python

I'm new in Coding, so please be patient with me ;)
This is my code that didn't work, I'm trying several hours to solve it, but every time I got another error.
import pandas as pd
import numpy as np
import talib.abstract as ta
dfBTCUSD_1h = pd.read_csv('Bitfinex_BTCUSD_1h.csv', skiprows=1)
dfBTCUSD_1h.sort_values(by='Date') # This now sorts in date order
open = dfBTCUSD_1h.iloc[:,2]
high = dfBTCUSD_1h.iloc[:,3]
low = dfBTCUSD_1h.iloc[:,4]
close =dfBTCUSD_1h.iloc[:,5]
short_ema = ta.EMA(close,timeperiod=10)
TypeError: cannot convert the series to <class 'int'>
How can I edit the pandas series to a working file for ta-lib?
best regards
Instead of supplying code to ta.EMA, do the following:
short_ema = ta.EMA(close.astype(float),timeperiod=10)
Talib functions usually take numpy arrays of dtype float as inputs

Categories