Python Float to Int Conversion Error - python

I'm pretty new to python. My goal is to change a float into an int. The float is a series. There are no Nans. I've checked out quite a few posts, including: Pandas: change data type of Series to String.
i've tried a few different types of syntax:
```comp.month.apply(int)```
Here's the error that followed that.
```TypeError Traceback (most recent call last) <ipython-input-190-690a8228abec> in <module>()
----> 1 comp.month.apply(int)
/Users/halliebregman/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in apply(self, func, convert_dtype, args, **kwds)
2058 values = lib.map_infer(values, lib.Timestamp)
----> 2060 mapped = lib.map_infer(values, f, convert=convert_dtype)
2061 if len(mapped) and isinstance(mapped[0], Series):
2062 from pandas.core.frame import DataFrame
pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:58435)()
TypeError: 'file' object is not callable```
and also:
```with open ("ints.csv", "w") as ints:
for i in range(len(comp)):
months = int(comp['month'][i])
days = int(comp['day'][i])
print months, days
ints.write('{} {} \n'.format(months, days))```
Followed by this error:
```TypeError Traceback (most recent call last) <ipython-input-191-0d6fe0a99830> in <module>()
1 with open ("ints.csv", "w") as ints:
2 for i in range(len(comp)):
----> 3 months = int(comp['month'][i])
4 days = int(comp['day'][i])
5 print months, days
TypeError: 'file' object is not callable```
What am I missing here? It seems like this should be simple :/
Thanks!

Related

TypeError while formatting pandas.df.pct_change() output to percentage

I'm trying to calculate the daily returns of stock in percentage format from a CSV file by defining a function.
Here's my code:
def daily_ret(ticker):
return f"{df[ticker].pct_change()*100:.2f}%"
When I call the function, I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-7122588f1289> in <module>()
----> 1 daily_ret('AAPL')
<ipython-input-39-7dd6285eb14d> in daily_ret(ticker)
1 def daily_ret(ticker):
----> 2 return f"{df[ticker].pct_change()*100:.2f}%"
TypeError: unsupported format string passed to Series.__format__
Where am I going wrong?
f-strings can't be used to format iterables like that, even Series:
Use map or apply instead:
def daily_ret(ticker):
return (df[ticker].pct_change() * 100).map("{:.2f}%".format)
def daily_ret(ticker):
return (df[ticker].pct_change() * 100).apply("{:.2f}%".format)
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': np.arange(1, 6)})
print(daily_ret('A'))
0 nan%
1 100.00%
2 50.00%
3 33.33%
4 25.00%
Name: A, dtype: object

Dask Dataframe: Resample partitioned data loaded from multiple parquet files

I am loading multiple parquet files containing timeseries data together. But the loaded dask dataframe has unknown partitions because of which I can't apply various time series operations on it.
df = dd.read_parquet('/path/to/*.parquet', index='Timestamps)
For instance, df_resampled = df.resample('1T').mean().compute() gives following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-8e6f7f4340fd> in <module>
1 df = dd.read_parquet('/path/to/*.parquet', index='Timestamps')
----> 2 df_resampled = df.resample('1T').mean().compute()
~/.conda/envs/suf/lib/python3.7/site-packages/dask/dataframe/core.py in resample(self, rule, closed, label)
2627 from .tseries.resample import Resampler
2628
-> 2629 return Resampler(self, rule, closed=closed, label=label)
2630
2631 #derived_from(pd.DataFrame)
~/.conda/envs/suf/lib/python3.7/site-packages/dask/dataframe/tseries/resample.py in __init__(self, obj, rule, **kwargs)
118 "for more information."
119 )
--> 120 raise ValueError(msg)
121 self.obj = obj
122 self._rule = pd.tseries.frequencies.to_offset(rule)
ValueError: Can only resample dataframes with known divisions
See https://docs.dask.org/en/latest/dataframe-design.html#partitions
for more information.
I went to the link: https://docs.dask.org/en/latest/dataframe-design.html#partitions and it says,
In these cases (when divisions are unknown), any operation that requires a cleanly partitioned DataFrame with known divisions will have to perform a sort. This can generally achieved by calling df.set_index(...).
I then tried following, but no success.
df = dd.read_parquet('/path/to/*.parquet')
df = df.set_index('Timestamps')
This step throws the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-468e9af0c4d6> in <module>
1 df = dd.read_parquet(os.path.join(OUTPUT_DATA_DIR, '20*.gzip'))
----> 2 df.set_index('Timestamps')
3 # df_resampled = df.resample('1T').mean().compute()
~/.conda/envs/suf/lib/python3.7/site-packages/dask/dataframe/core.py in set_index(***failed resolving arguments***)
3915 npartitions=npartitions,
3916 divisions=divisions,
-> 3917 **kwargs,
3918 )
3919
~/.conda/envs/suf/lib/python3.7/site-packages/dask/dataframe/shuffle.py in set_index(df, index, npartitions, shuffle, compute, drop, upsample, divisions, partition_size, **kwargs)
483 if divisions is None:
484 sizes = df.map_partitions(sizeof) if repartition else []
--> 485 divisions = index2._repartition_quantiles(npartitions, upsample=upsample)
486 mins = index2.map_partitions(M.min)
487 maxes = index2.map_partitions(M.max)
~/.conda/envs/suf/lib/python3.7/site-packages/dask/dataframe/core.py in __getattr__(self, key)
3755 return self[key]
3756 else:
-> 3757 raise AttributeError("'DataFrame' object has no attribute %r" % key)
3758
3759 def __dir__(self):
AttributeError: 'DataFrame' object has no attribute '_repartition_quantiles'
Can anybody suggest what is the right way to load multiple timeseries files as a dask dataframe on which timeseries operations of pandas can be applied?

TypeError in read_parquet Dask

I have a parquet file called data.parquet. I'm using the library dask from Python. When I run the line
import dask.dataframe as dd
df = dd.read_parquet('data.parquet',engine='pyarrow')
I get the error
TypeError Traceback (most recent call last)
<ipython-input-22-807fa43763c1> in <module>
----> 1 df = dd.read_parquet('data.parquet',engine='pyarrow')
~/anaconda3/lib/python3.7/site-packages/dask/dataframe/io/parquet.py in read_parquet(path, columns, filters, categories, index, storage_options, engine, infer_divisions)
1395 categories=categories,
1396 index=index,
-> 1397 infer_divisions=infer_divisions,
1398 )
1399
~/anaconda3/lib/python3.7/site-packages/dask/dataframe/io/parquet.py in _read_pyarrow(fs, fs_token, paths, columns, filters, categories, index, infer_divisions)
858 _open = lambda fn: pq.ParquetFile(fs.open(fn, mode="rb"))
859 for piece in dataset.pieces:
--> 860 pf = piece.get_metadata(_open)
861 # non_empty_pieces.append(piece)
862 if pf.num_row_groups > 0:
TypeError: get_metadata() takes 1 positional argument but 2 were given
I just don't understand why this happens, since this is how it is implemented here.
Any help will be appreciated!
I faced the same problem. I resolved by upgrade version dask 2.30.0

TypeError: 'float' object has no attribute '__getitem__' in function

I am trying to pass a dataframe to a function and compute mean and std dev from different columns of the dataframe. When I execute each line of the function step by step (without writing a function as such) it works fine. However, when I try to write a function to compute, I keep getting this error:
TypeError: 'float' object has no attribute '__getitem__'
This is my code:
def computeBias(data):
meandata = np.array(data['mean'])
sddata = np.array(data.sd)
ni = np.array(data.numSamples)
mean = np.average(meandata, weights=ni)
pooled_sd = np.sqrt((np.sum(np.multiply((ni - 1), np.array(sddata)**2)))/(np.sum(ni) - 1))
return mean, pooled_sd
mean,sd = df.apply(computeBias)
This is sample data:
id type mean sd numSamples
------------------------------------------------------------------------
1 33 -0.43 0.40 101
2 23 -0.76 0.1 100
3 33 0.89 0.56 101
4 45 1.4 0.9 100
This is the full error traceback:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-134-f4dc392140dd> in <module>()
----> 1 mean,sd = df.apply(computeBias)
C:\Users\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\series.pyc in apply(self, func, convert_dtype, args, **kwds)
2353 else:
2354 values = self.asobject
-> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype)
2356
2357 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\src\inference.pyx in pandas._libs.lib.map_infer (pandas\_libs\lib.c:66440)()
<ipython-input-133-2af38e3e29f0> in computeBias(data)
1 def computeBias(data):
2
----> 3 meandata = np.array(data['mean'])
4 sddata = np.array(data.sd)
5 ni = np.array(data.numSamples)
TypeError: 'float' object has no attribute '__getitem__'
Does anyone know of any workaround? TIA!
meandata = np.array(data['mean'])
TypeError: 'float' object has no attribute '__getitem__'
__getitem__ is the method that Python tries to call when you use indexing. In the marked line that means data['mean'] is producing the error. Evidently data is a number, a float object. You can't index a number.
data['mean'] looks like you are either trying to get an item from a dictionary, or from a dataframe, using a named index. I won't dig into the rest of your code to determine what you intend.
What you need to do it understand what data really it, and what produces it.
You are using this in a df.apply(....), and apparently think that it just means
computeBias(df) # or
computeBias(df.data)
Rather I suspect the apply is iterating, in some dimension, over the dataframe, and passing values or dataseries to your code. It isn't passing the whole dataframe.

python write to a stata .dta file from notebook

I'm tying to write a pandas data frame to a stata .dta file. Following the advice given in Save .dta files in python, I wrote:
import pandas as pd
df.to_stata(workdir+' generosity.dta')
and I got an error message TypeError: object of type 'float' has no len() and I'm not sure what this means.
most columns in df are objects, but there are three columns that are float64.
I tried following another method(as described in this post Convert .CSV files to .DTA files in Python) via rpy2, but when i tried to install it, I received an error message "Error: tried to guess R's home but no r commnand in the path" so I've given up on it (I have R on my computer but have not used it once)
Thank you very much.
edit: here is the result:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-140-7a8f8bc8d446> in <module>()
1 #write the dataframe as a Stata file
----> 2 df.to_stata(workdir+group+' generosity.dta')
C:\Users\chungk\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in to_stata(self, fname, convert_dates, write_index, encoding, byteorder, time_stamp, data_label)
1262 time_stamp=time_stamp, data_label=data_label,
1263 write_index=write_index)
-> 1264 writer.write_file()
1265
1266 #Appender(fmt.docstring_to_string, indents=1)
C:\Users\chungk\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\stata.pyc in write_file(self)
1245 self._write(_pad_bytes("", 5))
1246 if self._convert_dates is None:
-> 1247 self._write_data_nodates()
1248 else:
1249 self._write_data_dates()
C:\Users\chungk\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\stata.pyc in _write_data_nodates(self)
1327 if var is None or var == np.nan:
1328 var = _pad_bytes('', typ)
-> 1329 if len(var) < typ:
1330 var = _pad_bytes(var, typ)
1331 if compat.PY3:
TypeError: object of type 'float' has no len()

Categories