Cant convert byte to string and eventually to a number - python

I wish to use the class variable in order to predict the accuracy of my kmeans model. So i need it to be purely as an integer 1 or 2 for which i think i will need to convert it to string first. But im getting an error when using the decode function
from scipy.io import arff
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
data = arff.loadarff('bnknote.arff')
df = pd.DataFrame(data[0])
df.head()
V1 V2 V3 V4 Class
0 3.62160 8.6661 -2.8073 -0.44699 b'1'
1 4.54590 8.1674 -2.4586 -1.46210 b'1'
2 3.86600 -2.6383 1.9242 0.10645 b'1'
3 3.45660 9.5228 -4.0112 -3.59440 b'1'
4 0.32924 -4.4552 4.5718 -0.98880 b'1'
import codecs
Class=df['Class']
Class=codecs.decode(Class,'UTF-8')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/opt/conda/lib/python3.6/encodings/utf_8.py in decode(input, errors)
15 def decode(input, errors='strict'):
---> 16 return codecs.utf_8_decode(input, errors, True)
17
TypeError: a bytes-like object is required, not 'Series'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-20-face6f646db6> in <module>()
1 import codecs
2 Class=df['Class']
----> 3 Class=codecs.decode(Class,'UTF-8')
TypeError: decoding with 'UTF-8' codec failed (TypeError: a bytes-like object is required, not 'Series')

TypeError: decoding with 'UTF-8' codec failed (TypeError: a bytes-like object is required, not 'Series')
means you have tried decoding whole column of pandas.DataFrame at once. You do not have to use codecs here, it is sufficient to use .astype as int has no problem when working with ASCII-encoded representation of integer, consider following simple example
import pandas as pd
df = pd.DataFrame({'x':['A','B','C'],'y':[b'1',b'0',b'1']})
df['y'] = df['y'].astype(int)
print(df)
output
x y
0 A 1
1 B 0
2 C 1

Related

byte indices must be integers or slices, not str

I'm trying to run the below code
import requests
import pandas as pd
companies = requests.get(f'https://fmpcloud.io/api/v3/stock-screener? industry=Software&sector=tech&marketCapLowerThan=10000000000&limit=100&apikey=c609af2465eb19e3c82f0c3c38cb51ea')
companies.json()
At this point it's working fine but when getting to the following part , I was receiving an error
technological_companies = []
for item in companies:
technological_companies.append(item['symbol'])
print(technological_companies)
The error was :
Traceback (most recent call last)
<ipython-input-8-61eef8b7699a> in <module>
1 technological_companies = []
2 for item in companies:
----> 3 technological_companies.append(item['symbol'])
4 print(technological_companies)
TypeError: byte indices must be integers or slices, not str
You are not storing the JSON value, use
import requests
import pandas as pd
companies = requests.get(f'https://fmpcloud.io/api/v3/stock-screener? industry=Software&sector=tech&marketCapLowerThan=10000000000&limit=100&apikey=c609af2465eb19e3c82f0c3c38cb51ea')
companies = companies.json() # this is the line
Either store the json:
companies = companies.json()
Or loop over the json:
for item in companies.json():

TypeError: expected string or bytes-like object in Pandas

I want to tokenize text, but couldn't. How can I solve this?
Here is my problem:
#read_text from file
data = pd.read_csv("input data.txt",encoding = "UTF-8")
print(data)
Output: Bangla text
t = Tokenizers()
print(t.bn_word_tokenizer(data))
Error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-f9f299ecf33d> in <module>
1 `t = Tokenizers()`
----> 2 `print(t.bn_word_tokenizer(dataStr))`
D:\anaconda\lib\site-packages\bnltk\tokenize\bn_word_tokenizers.py in bn_word_tokenizer(self, input_)
15 `tokenize_list` = []
16 `r = re.compile(r'[\s\।{}]+'.format(re.escape(punctuation)))`
---> 17 `list_ = r.split(input_)`
18 `list_ = [i for i in list_ if i`]
19 `return list_`
TypeError: expected string or bytes-like object
Try this:
for column in data:
a = data.apply(lambda row: t.bn_word_tokenizer(row), axis=1)
print(a)
This will print one column at a time. If you want to convert the entire dataframe rather than just print then replace a with data[column] in the code above.

'module' object is not callable error in jupyter notebook

Initially, I was getting "list object is not callable" error but after "importing list " new error came in the picture as shown below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
imort list
data_cols=['user id','movie id','rating','timestamp']
item_cols=['movie id','movie title','release date','video release date','IMDb URL','unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance ','Sci-Fi','Thriller','War' ,'Western']
user_cols = ['user id','age','gender','occupation','zip code']
#importing the data files onto dataframes
users=pd.read_csv('u.user',sep='|',names=user_cols,encoding='latin-1')
item=pd.read_csv('u.item',sep='|',names=item_cols,encoding='latin-1')
data=pd.read_csv('u.data',sep='\t',names=data_cols,encoding='latin-1')
dataset=pd.merge(pd.merge(item,data),users)
#print(dataset.head())
rating_total=dataset.groupby('movie title').size()
rating_mean=(dataset.groupby('movie title'))['movie title','rating']
rating_mean=rating_mean.mean()
rating_total=pd.DataFrame({'movie title':rating_total.index,'total
ratings':rating_total.values})
rating_mean['movie title']=rating_mean.index
final=pd.merge(rating_mean,rating_total).sort_values(by='total
ratings',ascending=False)
pop=final[:300].sort_values(by='rating',ascending=False)
pop=pop['movie title']
pop1=list(pop.head(10))
Output
TypeError Traceback (most recent call last)
<ipython-input-57-0b36af3a9876> in <module>
30 pop=pop['movie title']
31 #print(pop.head())
---> 32 pop1=list(pop.head(10))
TypeError: 'module' object is not callable

How to remove every possible accents from a column in python

I am new in python. I have a data frame with a column, named 'Name'. The column contains different type of accents. I am trying to remove those accents. For example, rubén => ruben, zuñiga=zuniga, etc. I wrote following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import unicodedata
data=pd.read_csv('transactions.csv')
data.head()
nm=data['Name']
normal = unicodedata.normalize('NFKD', nm).encode('ASCII', 'ignore')
I am getting error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-41-1410866bc2c5> in <module>()
1 nm=data['Name']
----> 2 normal = unicodedata.normalize('NFKD', nm).encode('ASCII', 'ignore')
TypeError: normalize() argument 2 must be unicode, not Series
The reason why it is giving you that error is because normalize requires a string for the second parameter, not a list of strings. I found an example of this online:
unicodedata.normalize('NFKD', u"Durrës Åland Islands").encode('ascii','ignore')
'Durres Aland Islands'
Try this for one column:
nm = nm.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
Try this for multiple columns:
obj_cols = data.select_dtypes(include=['O']).columns
data.loc[obj_cols] = data.loc[obj_cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
Try this for one column:
df[column_name] = df[column_name].apply(lambda x: unicodedata.normalize(u'NFKD', str(x)).encode('ascii', 'ignore').decode('utf-8'))
Change the column name according to your data columns.

Getting error while trying to use a list with numpy to get some stat values

Hi I am having problems with this code:
**import numpy as np
# Summarize the data about minutes spent in the classroom
#total_minutes = total_minutes_by_account.values()
total_minutes = list(total_minutes_by_account.values())
type(total_minutes)
# Printing out the samething converting to a list
print('Printing out the samething converting to a list ')
print(type(total_minutes))
print ('Mean:', np.mean(total_minutes))
print ('Standard deviation:', np.std(total_minutes))
print ('Minimum:', np.min(total_minutes))
print ('Maximum:', np.max(total_minutes))**
The error I get is:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-93-945375bf6098> in <module>()
3 # Summarize the data about minutes spent in the classroom
4 #total_minutes = total_minutes_by_account.values()
----> 5 total_minutes = list(total_minutes_by_account.values())
6 type(total_minutes)
7 #print(total_minutes)
AttributeError: 'list' object has no attribute 'values'
I really would lie to know how I can make this work, I can do it with pandas converitng it to a numpy array and the getting values for the statistics I want with numpy

Categories