Problem: Importing an python file (EDA.py) into a jupyter notebook.The python file uses pandas and has an "Import pandas as pd" in it. But in Jupyter I get the error that pd is not defined.
Python file:
<EDA.py>
def eda_df(df):
import pandas as pd
print('=================Unique Values============================')
unique_series = df.apply(pd.Series.nunique).sort_values()
print(unique_series)
Jupyter Notebook:
import EDA
train = pd.read_csv(r'.\kaggle\housing\house-prices-advanced-regression-techniques\train.csv')
eda_df(train)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-269-86ee9695b171> in <module>
----> 1 eda_df(train)
~\iCloudDrive\Adnan PC\Data Science\Jupyter NB\EDA.py in eda_df(df)
13 print('Features missing more than 40% data: ',len(missing_data_list))
14 print(missing_data_list)
---> 15 print('=================Unique Values============================')
16 unique_series = df.apply(pd.Series.nunique).sort_values()
17 unique_list = unique_series[unique_series<15].index.to_list()
NameError: name 'pd' is not defined
You just need to import pandas as pd:
import pandas as pd
def eda_df(df):
unique_series = df.apply(pd.Series.nunique).sort_values()
return (unique_series)
Related
I am working on openAI, and stuck I have tried to sort this issue on my own but didn't get any resolution. I want my code to run the sentence generation operation on every row of the Input_Description_OAI column and give me the output in another column (OpenAI_Description). Can someone please help me with the completion of this task. I am new to python.
The dataset looks like:
import os
import openai
import wandb
import pandas as pd
openai.api_key = "MY-API-Key"
data=pd.read_excel("/content/OpenAI description.xlsx")
data
data["OpenAI_Description"] = data.apply(lambda _: ' ', axis=1)
data
gpt_prompt = ("Write product description for: Brand: COILCRAFT ; MPN: DO5010H-103MLD..")
response = openai.Completion.create(engine="text-curie-001", prompt=gpt_prompt,
temperature=0.7, max_tokens=1000, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0)
print(response['choices'][0]['text'])
data['OpenAI_Description'] = data.apply(gpt_prompt,response['choices'][0]['text'], axis=1)
I got the error after execution on first row as:
---------------------------------------------------------------------------
TypeError
Traceback (most recent call last)
<ipython-input-32-c798fbf9bc16> in <module>
15 print(response['choices'][0]['text'])
16 #data.add_data(gpt_prompt,response['choices'][0]['text'])
---> 17 data['OpenAI_Description'] = data.apply(gpt_prompt,response['choices'][0]['text'], axis=1)
18
TypeError: apply() got multiple values for argument 'axis'
I am trying datatable in python for first time and was following examples from this link: Grouping with by() to explore more on datatables, but am getting a NameError when attempted below code.
import numpy as np
import pandas as pd
import datatable as dt
df = dt.Frame([[1, 1, 5], [2, 3, 6]], names=['A', 'B'])
df[:, update(filter_col = count()), by('A')]
Error:
--------------------------------------------------------------------------- NameError Traceback (most recent call
last) ~\AppData\Local\Temp/ipykernel_2040/2701559568.py in <module>
----> 1 df[:, update(filter_col = count()), by('A')]
NameError: name 'update' is not defined
This is working fine in the example shown in above link but I am not sure why I am getting this error. Also tried help on this:
help(update())
But got this error:
--------------------------------------------------------------------------- NameError Traceback (most recent call
last) ~\AppData\Local\Temp/ipykernel_2040/1402169417.py in <module>
----> 1 help(update())
NameError: name 'update' is not defined
You're not using the right name to access update(). The very first example has:
from datatable import (dt, f, by, ifelse, update, sort,
count, min, max, mean, sum, rowsum)
Meaning that they can refer to datatable.update as just update.
However your import is like:
import datatable as dt
Meaning that to access datatable.update, you have to use dt.update. Same with datatable.count and datatable.by:
So the solution would look like:
df[:, dt.update(filter_col = dt.count()), dt.by('A')]
The function update is not imported directly, but through datatable (dt). You can access it withdt.update.
Initially, I was getting "list object is not callable" error but after "importing list " new error came in the picture as shown below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
imort list
data_cols=['user id','movie id','rating','timestamp']
item_cols=['movie id','movie title','release date','video release date','IMDb URL','unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance ','Sci-Fi','Thriller','War' ,'Western']
user_cols = ['user id','age','gender','occupation','zip code']
#importing the data files onto dataframes
users=pd.read_csv('u.user',sep='|',names=user_cols,encoding='latin-1')
item=pd.read_csv('u.item',sep='|',names=item_cols,encoding='latin-1')
data=pd.read_csv('u.data',sep='\t',names=data_cols,encoding='latin-1')
dataset=pd.merge(pd.merge(item,data),users)
#print(dataset.head())
rating_total=dataset.groupby('movie title').size()
rating_mean=(dataset.groupby('movie title'))['movie title','rating']
rating_mean=rating_mean.mean()
rating_total=pd.DataFrame({'movie title':rating_total.index,'total
ratings':rating_total.values})
rating_mean['movie title']=rating_mean.index
final=pd.merge(rating_mean,rating_total).sort_values(by='total
ratings',ascending=False)
pop=final[:300].sort_values(by='rating',ascending=False)
pop=pop['movie title']
pop1=list(pop.head(10))
Output
TypeError Traceback (most recent call last)
<ipython-input-57-0b36af3a9876> in <module>
30 pop=pop['movie title']
31 #print(pop.head())
---> 32 pop1=list(pop.head(10))
TypeError: 'module' object is not callable
I am new in python. I have a data frame with a column, named 'Name'. The column contains different type of accents. I am trying to remove those accents. For example, rubén => ruben, zuñiga=zuniga, etc. I wrote following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import unicodedata
data=pd.read_csv('transactions.csv')
data.head()
nm=data['Name']
normal = unicodedata.normalize('NFKD', nm).encode('ASCII', 'ignore')
I am getting error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-41-1410866bc2c5> in <module>()
1 nm=data['Name']
----> 2 normal = unicodedata.normalize('NFKD', nm).encode('ASCII', 'ignore')
TypeError: normalize() argument 2 must be unicode, not Series
The reason why it is giving you that error is because normalize requires a string for the second parameter, not a list of strings. I found an example of this online:
unicodedata.normalize('NFKD', u"Durrës Åland Islands").encode('ascii','ignore')
'Durres Aland Islands'
Try this for one column:
nm = nm.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
Try this for multiple columns:
obj_cols = data.select_dtypes(include=['O']).columns
data.loc[obj_cols] = data.loc[obj_cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
Try this for one column:
df[column_name] = df[column_name].apply(lambda x: unicodedata.normalize(u'NFKD', str(x)).encode('ascii', 'ignore').decode('utf-8'))
Change the column name according to your data columns.
From Wes:
def side_by_side(*objs, **kwds):
from pandas.core.common import adjoin
space = kwds.get('space', 4)
reprs = [repr(obj).split('\n') for obj in objs]
print adjoin(space, *reprs)
Apply below:
import pandas as pd
df1 = pd.DataFrame(np.random.rand(10,3))
df2 = pd.DataFrame(np.random.rand(10,3))
side_by_side(df1, df2)
Throws error:
ImportError Traceback (most recent call last)
<ipython-input-25-2674cd8a228c> in <module>()
3
4
----> 5 side_by_side(df1, df2)
<ipython-input-24-9f441ebc9cb3> in side_by_side(*objs, **kwds)
1 def side_by_side(*objs, **kwds):
----> 2 from pandas.core.common import adjoin
3 space = kwds.get('space', 4)
4 reprs = [repr(obj).split('\n') for obj in objs]
5 print adjoin(space, *reprs)
ImportError: cannot import name adjoin
I guess this function has been moved to pandas.formats.printing:
In [69]: from pandas.formats.printing import adjoin
UPDATE: as already mentioned by #debo for Pandas 0.20.0+ use:
from pandas.io.formats.printing import adjoin
Changed for pandas version 0.20.*
from pandas.io.formats.printing import adjoin