After using groupby function I want to convert that to a dataframe object but it shows error
My Code
dfgrp1 = df['Service 1'].groupby(['Service Type'])
dfgrp1 = dfgrp1.to_frame()
Output
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [18], in <cell line: 2>()
1 dfgrp1 = df['Service 1'].groupby(['Service Type'])
----> 2 dfgrp1 = dfgrp1.to_frame()
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\groupby\groupby.py:904, in GroupBy.__getattr__(self, attr)
901 if attr in self.obj:
902 return self[attr]
--> 904 raise AttributeError(
905 f"'{type(self).__name__}' object has no attribute '{attr}'"
906 )
AttributeError: 'DataFrameGroupBy' object has no attribute 'to_frame'
P.S. I have multiple sheets in the excel workbook I don't think that would be a problem but just mentioning it in case it does affect.
Apply aggregation to the grouped result first.
for instance, dfgrp1 what does it produces when you print it? an object reference, which you cannot make into frame.
However, the result that you see as result of groupby, employing agregation, will allow you to use to_frame()
Related
I am struggling to separate the data. I tried looking at the groupby function pandas has, but it doesn't seem to work. I don't understand what I am doing wrong.
data = pd.read_csv("path/file")
y=data['JIN-DIF']
y1=data['JEX-DIF']
y2=data['JEL-DIF']
y3=data['D3E']
d={'Induction':y,'Exchange':y1,'Dispersion':y3,'Electrostatic':y2}
df=pd.DataFrame(d)
grouped_df2= df.groupby('Exchange')
grouped_df2.filter(lambda x: x.Exchange > 0)
When I run this code, I get an "TypeError: filter function returned a Series, but expected a scalar bool error". I'm not sure about how to upload the data, so I have just attached a picture of it.
It will work when I change line 9 to
grouped_df2.filter(lambda x: x.Exchange.mean() > 0)
Here is a picture of sample data
The error message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-71-0e26eb8f080b> in <module>
7 df=pd.DataFrame(d)
8 grouped_df2= df.groupby('Exchange')
----> 9 grouped_df2.filter(lambda x: x.Exchange > -0.1)
~/anaconda3/lib/python3.6/site-packages/pandas/core/groupby/generic.py in filter(self, func, dropna, *args, **kwargs)
1584 # non scalars aren't allowed
1585 raise TypeError(
-> 1586 f"filter function returned a {type(res).__name__}, "
1587 "but expected a scalar bool"
1588 )
TypeError: filter function returned a Series, but expected a scalar bool
Hi guys is am getting this error:
AttributeError: 'numpy.float64' object has no attribute 'index'
The traceback looks like this:
AttributeError Traceback (most recent call last)
<ipython-input-50-dfcbcabe20ea> in <module>()
2 for name, df in all_data.items():
3 top_10 = df.mean().dropna().sort_values().iloc[-10]
----> 4 top_10_columns[name] = top_10.index
While running the following code:
top_10_columns = {}
for name, df in all_data.items():
top_10 = df.mean().dropna().sort_values().iloc[-10]
top_10_columns[name] = top_10.index
You are accidentally not getting the "top 10" items when you do .iloc[-10], but just the 10th to last item. So top_10 is a single value of type numpy.float64. Giving iloc a range should fix it. .iloc[0:10] or .iloc[-10:] depending on whether your sort is ascending or descending and you want to get either the first ten items (.iloc[0:10]) or the last ten items (.iloc[-10:]).
You are trying to assign to an array, but Python is interpreting top_10_columns as a float. Above your for loop you must declare it as an array i.e top_10_columns = []
length = df.count()
df = df.withColumn("log", log(col("power"),lit(length)))
The following lines throw such an error. Can you please help me take a log of a column using another value or another column as a base.
TypeError Traceback (most recent call last)
<ipython-input-102-c0894b6127d1> in <module>()
1 #df.show()
2
----> 3 df = df.withColumn("log", log(col("power"),lit(2)))
5 frames
/content/spark-2.4.5-bin-hadoop2.7/python/pyspark/sql/column.py in __iter__(self)
342
343 def __iter__(self):
--> 344 raise TypeError("Column is not iterable")
345
346 # string methods
TypeError: Column is not iterable
If you want to use funtions that are not build-in on spark dataframes you can use user-defined functions, in your case it would look like this:
from pyspark.sql.functions import udf
from math import log
#udf("float")
def log_udf(s):
return log(s,2)
df.withColumn("log", log_udf("power")).show()
Here is my code
Import pandas as pd
finance=pd.read_csv("C:/Users/hp/Desktop/Financial Sample.csv")
finance.Profit.describe()
And the error
AttributeError Traceback (most recent call last) in ----> 1 finance.Profit.describe() ~\Anaconda3\lib\site-packages\pandas\core\generic.py in getattr(self, name) 5177 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5178 return self[name] -> 5179 return object.getattribute(self, name) 5180 5181 def setattr(self, name, value): AttributeError: 'DataFrame' object has no attribute 'Profit'
according to your submitted error
here is correct syntax to describe Profit Column
finance['Profit'].describe()
This syntax will work if the column name has been parsed as how it was saved (case-sensitive):
finance['Profit'].describe()
However, sometimes the name of your column can have added characters before it after reading so the actual call might result in an error. To avoid this, you can also use .iloc()
finance.iloc[:,"(column number here, starts from 0)"].describe()
I ran this statement dr=df.dropna(how='all') to remove missing values and got the error message shown below:
AttributeError Traceback (most recent call last)
<ipython-input-29-07367ab952bc> in <module>
----> 1 dr=df.dropna(how='all')
AttributeError: 'list' object has no attribute 'dropna'
According to pdf https://www.google.com/url?sa=t&source=web&rct=j&url=https://readthedocs.org/projects/tabula-py/downloads/pdf/latest/&ved=2ahUKEwiKr-mQ9qTnAhUKwqYKHcAtAcoQFjADegQIBRAB&usg=AOvVaw32D890VNjAq5wOkTo4icOi&cshid=1580168098808
df = tabula.read_pdf(file, lattice=True, pages='all', area=(1, 1, 1000, 100), relative_area=True)
pages='all' => probably return a list of Dataframe
So you have to check:
for sub_df in df:
dr=sub_df.dropna(how='all')