I am trying datatable in python for first time and was following examples from this link: Grouping with by() to explore more on datatables, but am getting a NameError when attempted below code.
import numpy as np
import pandas as pd
import datatable as dt
df = dt.Frame([[1, 1, 5], [2, 3, 6]], names=['A', 'B'])
df[:, update(filter_col = count()), by('A')]
Error:
--------------------------------------------------------------------------- NameError Traceback (most recent call
last) ~\AppData\Local\Temp/ipykernel_2040/2701559568.py in <module>
----> 1 df[:, update(filter_col = count()), by('A')]
NameError: name 'update' is not defined
This is working fine in the example shown in above link but I am not sure why I am getting this error. Also tried help on this:
help(update())
But got this error:
--------------------------------------------------------------------------- NameError Traceback (most recent call
last) ~\AppData\Local\Temp/ipykernel_2040/1402169417.py in <module>
----> 1 help(update())
NameError: name 'update' is not defined
You're not using the right name to access update(). The very first example has:
from datatable import (dt, f, by, ifelse, update, sort,
count, min, max, mean, sum, rowsum)
Meaning that they can refer to datatable.update as just update.
However your import is like:
import datatable as dt
Meaning that to access datatable.update, you have to use dt.update. Same with datatable.count and datatable.by:
So the solution would look like:
df[:, dt.update(filter_col = dt.count()), dt.by('A')]
The function update is not imported directly, but through datatable (dt). You can access it withdt.update.
Related
I am working on openAI, and stuck I have tried to sort this issue on my own but didn't get any resolution. I want my code to run the sentence generation operation on every row of the Input_Description_OAI column and give me the output in another column (OpenAI_Description). Can someone please help me with the completion of this task. I am new to python.
The dataset looks like:
import os
import openai
import wandb
import pandas as pd
openai.api_key = "MY-API-Key"
data=pd.read_excel("/content/OpenAI description.xlsx")
data
data["OpenAI_Description"] = data.apply(lambda _: ' ', axis=1)
data
gpt_prompt = ("Write product description for: Brand: COILCRAFT ; MPN: DO5010H-103MLD..")
response = openai.Completion.create(engine="text-curie-001", prompt=gpt_prompt,
temperature=0.7, max_tokens=1000, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0)
print(response['choices'][0]['text'])
data['OpenAI_Description'] = data.apply(gpt_prompt,response['choices'][0]['text'], axis=1)
I got the error after execution on first row as:
---------------------------------------------------------------------------
TypeError
Traceback (most recent call last)
<ipython-input-32-c798fbf9bc16> in <module>
15 print(response['choices'][0]['text'])
16 #data.add_data(gpt_prompt,response['choices'][0]['text'])
---> 17 data['OpenAI_Description'] = data.apply(gpt_prompt,response['choices'][0]['text'], axis=1)
18
TypeError: apply() got multiple values for argument 'axis'
Problem: Importing an python file (EDA.py) into a jupyter notebook.The python file uses pandas and has an "Import pandas as pd" in it. But in Jupyter I get the error that pd is not defined.
Python file:
<EDA.py>
def eda_df(df):
import pandas as pd
print('=================Unique Values============================')
unique_series = df.apply(pd.Series.nunique).sort_values()
print(unique_series)
Jupyter Notebook:
import EDA
train = pd.read_csv(r'.\kaggle\housing\house-prices-advanced-regression-techniques\train.csv')
eda_df(train)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-269-86ee9695b171> in <module>
----> 1 eda_df(train)
~\iCloudDrive\Adnan PC\Data Science\Jupyter NB\EDA.py in eda_df(df)
13 print('Features missing more than 40% data: ',len(missing_data_list))
14 print(missing_data_list)
---> 15 print('=================Unique Values============================')
16 unique_series = df.apply(pd.Series.nunique).sort_values()
17 unique_list = unique_series[unique_series<15].index.to_list()
NameError: name 'pd' is not defined
You just need to import pandas as pd:
import pandas as pd
def eda_df(df):
unique_series = df.apply(pd.Series.nunique).sort_values()
return (unique_series)
Initially, I was getting "list object is not callable" error but after "importing list " new error came in the picture as shown below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
imort list
data_cols=['user id','movie id','rating','timestamp']
item_cols=['movie id','movie title','release date','video release date','IMDb URL','unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance ','Sci-Fi','Thriller','War' ,'Western']
user_cols = ['user id','age','gender','occupation','zip code']
#importing the data files onto dataframes
users=pd.read_csv('u.user',sep='|',names=user_cols,encoding='latin-1')
item=pd.read_csv('u.item',sep='|',names=item_cols,encoding='latin-1')
data=pd.read_csv('u.data',sep='\t',names=data_cols,encoding='latin-1')
dataset=pd.merge(pd.merge(item,data),users)
#print(dataset.head())
rating_total=dataset.groupby('movie title').size()
rating_mean=(dataset.groupby('movie title'))['movie title','rating']
rating_mean=rating_mean.mean()
rating_total=pd.DataFrame({'movie title':rating_total.index,'total
ratings':rating_total.values})
rating_mean['movie title']=rating_mean.index
final=pd.merge(rating_mean,rating_total).sort_values(by='total
ratings',ascending=False)
pop=final[:300].sort_values(by='rating',ascending=False)
pop=pop['movie title']
pop1=list(pop.head(10))
Output
TypeError Traceback (most recent call last)
<ipython-input-57-0b36af3a9876> in <module>
30 pop=pop['movie title']
31 #print(pop.head())
---> 32 pop1=list(pop.head(10))
TypeError: 'module' object is not callable
I am writing code to analyze some data and want to create a data frame. How do I set it up successfully to run?
this is for analysis of data and I would like to create a data frame that can categorize data in different grades such as A
Here is the code I wrote:
import analyze_lc_Feb2update
from imp import reload
analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
df = analyze_lc_Feb2update.create_df()
df.shape
df_new = df[df.grade=='A']
df_new.shape
df.columns
df.int_rate.head(5)
df.int_rate.tail(5)
df.int_rate.dtype
df.term.dtype
df_new = df[df.grade =='A']
df_new.shape
output:
TypeError Traceback (most recent call last)
<ipython-input-3-7079435f776f> in <module>()
2 from imp import reload
3 analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
4 df = analyze_lc_Feb2update.create_df()
5 df.shape
6 df_new = df[df.grade=='A']
TypeError: create_df() missing 1 required positional
argument: 'grade'
Based on what was provided I guess your problem is here:
from imp import reload
analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
df = analyze_lc_Feb2update.create_df()
This looks like some custom library you are trying to use, of which the .create_df() method requires a positional argument "grade" which would require you to do something like:
df = analyze_lc_Feb2update.create_df(grade="blah")
I ran this statement dr=df.dropna(how='all') to remove missing values and got the error message shown below:
AttributeError Traceback (most recent call last)
<ipython-input-29-07367ab952bc> in <module>
----> 1 dr=df.dropna(how='all')
AttributeError: 'list' object has no attribute 'dropna'
According to pdf https://www.google.com/url?sa=t&source=web&rct=j&url=https://readthedocs.org/projects/tabula-py/downloads/pdf/latest/&ved=2ahUKEwiKr-mQ9qTnAhUKwqYKHcAtAcoQFjADegQIBRAB&usg=AOvVaw32D890VNjAq5wOkTo4icOi&cshid=1580168098808
df = tabula.read_pdf(file, lattice=True, pages='all', area=(1, 1, 1000, 100), relative_area=True)
pages='all' => probably return a list of Dataframe
So you have to check:
for sub_df in df:
dr=sub_df.dropna(how='all')