"NameError: name 'update' is not defined" Error when using datatable

"NameError: name 'update' is not defined" Error when using datatable - python

I am trying datatable in python for first time and was following examples from this link: Grouping with by() to explore more on datatables, but am getting a NameError when attempted below code.
import numpy as np
import pandas as pd
import datatable as dt
df = dt.Frame([[1, 1, 5], [2, 3, 6]], names=['A', 'B'])
df[:, update(filter_col = count()), by('A')]
Error:
--------------------------------------------------------------------------- NameError Traceback (most recent call
last) ~\AppData\Local\Temp/ipykernel_2040/2701559568.py in <module>
----> 1 df[:, update(filter_col = count()), by('A')]
NameError: name 'update' is not defined
This is working fine in the example shown in above link but I am not sure why I am getting this error. Also tried help on this:
help(update())
But got this error:
--------------------------------------------------------------------------- NameError Traceback (most recent call
last) ~\AppData\Local\Temp/ipykernel_2040/1402169417.py in <module>
----> 1 help(update())
NameError: name 'update' is not defined

You're not using the right name to access update(). The very first example has:
from datatable import (dt, f, by, ifelse, update, sort,
count, min, max, mean, sum, rowsum)
Meaning that they can refer to datatable.update as just update.
However your import is like:
import datatable as dt
Meaning that to access datatable.update, you have to use dt.update. Same with datatable.count and datatable.by:
So the solution would look like:
df[:, dt.update(filter_col = dt.count()), dt.by('A')]

The function update is not imported directly, but through datatable (dt). You can access it withdt.update.

Related

Python string column iteration

I am working on openAI, and stuck I have tried to sort this issue on my own but didn't get any resolution. I want my code to run the sentence generation operation on every row of the Input_Description_OAI column and give me the output in another column (OpenAI_Description). Can someone please help me with the completion of this task. I am new to python.
The dataset looks like:
import os
import openai
import wandb
import pandas as pd
openai.api_key = "MY-API-Key"
data=pd.read_excel("/content/OpenAI description.xlsx")
data
data["OpenAI_Description"] = data.apply(lambda _: ' ', axis=1)
data
gpt_prompt = ("Write product description for: Brand: COILCRAFT ; MPN: DO5010H-103MLD..")
response = openai.Completion.create(engine="text-curie-001", prompt=gpt_prompt,
temperature=0.7, max_tokens=1000, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0)
print(response['choices'][0]['text'])
data['OpenAI_Description'] = data.apply(gpt_prompt,response['choices'][0]['text'], axis=1)
I got the error after execution on first row as:
---------------------------------------------------------------------------
TypeError
Traceback (most recent call last)
<ipython-input-32-c798fbf9bc16> in <module>
15 print(response['choices'][0]['text'])
16 #data.add_data(gpt_prompt,response['choices'][0]['text'])
---> 17 data['OpenAI_Description'] = data.apply(gpt_prompt,response['choices'][0]['text'], axis=1)
18
TypeError: apply() got multiple values for argument 'axis'

using library inside python function

Problem: Importing an python file (EDA.py) into a jupyter notebook.The python file uses pandas and has an "Import pandas as pd" in it. But in Jupyter I get the error that pd is not defined.
Python file:
<EDA.py>
def eda_df(df):
import pandas as pd
print('=================Unique Values============================')
unique_series = df.apply(pd.Series.nunique).sort_values()
print(unique_series)
Jupyter Notebook:
import EDA
train = pd.read_csv(r'.\kaggle\housing\house-prices-advanced-regression-techniques\train.csv')
eda_df(train)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-269-86ee9695b171> in <module>
----> 1 eda_df(train)
~\iCloudDrive\Adnan PC\Data Science\Jupyter NB\EDA.py in eda_df(df)
13 print('Features missing more than 40% data: ',len(missing_data_list))
14 print(missing_data_list)
---> 15 print('=================Unique Values============================')
16 unique_series = df.apply(pd.Series.nunique).sort_values()
17 unique_list = unique_series[unique_series<15].index.to_list()
NameError: name 'pd' is not defined

You just need to import pandas as pd:
import pandas as pd
def eda_df(df):
unique_series = df.apply(pd.Series.nunique).sort_values()
return (unique_series)

'module' object is not callable error in jupyter notebook

Initially, I was getting "list object is not callable" error but after "importing list " new error came in the picture as shown below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
imort list
data_cols=['user id','movie id','rating','timestamp']
item_cols=['movie id','movie title','release date','video release date','IMDb URL','unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance ','Sci-Fi','Thriller','War' ,'Western']
user_cols = ['user id','age','gender','occupation','zip code']
#importing the data files onto dataframes
users=pd.read_csv('u.user',sep='|',names=user_cols,encoding='latin-1')
item=pd.read_csv('u.item',sep='|',names=item_cols,encoding='latin-1')
data=pd.read_csv('u.data',sep='\t',names=data_cols,encoding='latin-1')
dataset=pd.merge(pd.merge(item,data),users)
#print(dataset.head())
rating_total=dataset.groupby('movie title').size()
rating_mean=(dataset.groupby('movie title'))['movie title','rating']
rating_mean=rating_mean.mean()
rating_total=pd.DataFrame({'movie title':rating_total.index,'total
ratings':rating_total.values})
rating_mean['movie title']=rating_mean.index
final=pd.merge(rating_mean,rating_total).sort_values(by='total
ratings',ascending=False)
pop=final[:300].sort_values(by='rating',ascending=False)
pop=pop['movie title']
pop1=list(pop.head(10))
Output
TypeError Traceback (most recent call last)
<ipython-input-57-0b36af3a9876> in <module>
30 pop=pop['movie title']
31 #print(pop.head())
---> 32 pop1=list(pop.head(10))
TypeError: 'module' object is not callable

Why do I get a TypeError when I try to create a data frame?

I am writing code to analyze some data and want to create a data frame. How do I set it up successfully to run?
this is for analysis of data and I would like to create a data frame that can categorize data in different grades such as A
Here is the code I wrote:
import analyze_lc_Feb2update
from imp import reload
analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
df = analyze_lc_Feb2update.create_df()
df.shape
df_new = df[df.grade=='A']
df_new.shape
df.columns
df.int_rate.head(5)
df.int_rate.tail(5)
df.int_rate.dtype
df.term.dtype
df_new = df[df.grade =='A']
df_new.shape
output:
TypeError Traceback (most recent call last)
<ipython-input-3-7079435f776f> in <module>()
2 from imp import reload
3 analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
4 df = analyze_lc_Feb2update.create_df()
5 df.shape
6 df_new = df[df.grade=='A']
TypeError: create_df() missing 1 required positional
argument: 'grade'

Based on what was provided I guess your problem is here:
from imp import reload
analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
df = analyze_lc_Feb2update.create_df()
This looks like some custom library you are trying to use, of which the .create_df() method requires a positional argument "grade" which would require you to do something like:
df = analyze_lc_Feb2update.create_df(grade="blah")

I ran dropna in dataframe but got an error message

I ran this statement dr=df.dropna(how='all') to remove missing values and got the error message shown below:
AttributeError Traceback (most recent call last)
<ipython-input-29-07367ab952bc> in <module>
----> 1 dr=df.dropna(how='all')
AttributeError: 'list' object has no attribute 'dropna'

According to pdf https://www.google.com/url?sa=t&source=web&rct=j&url=https://readthedocs.org/projects/tabula-py/downloads/pdf/latest/&ved=2ahUKEwiKr-mQ9qTnAhUKwqYKHcAtAcoQFjADegQIBRAB&usg=AOvVaw32D890VNjAq5wOkTo4icOi&cshid=1580168098808
df = tabula.read_pdf(file, lattice=True, pages='all', area=(1, 1, 1000, 100), relative_area=True)
pages='all' => probably return a list of Dataframe
So you have to check:
for sub_df in df:
dr=sub_df.dropna(how='all')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

"NameError: name 'update' is not defined" Error when using datatable - python

The function update is not imported directly, but through datatable (dt). You can access it withdt.update.

Related

Python string column iteration

using library inside python function

'module' object is not callable error in jupyter notebook

Why do I get a TypeError when I try to create a data frame?

I ran dropna in dataframe but got an error message

Categories

Resources