Append string to a numpy ndarray

Append string to a numpy ndarray - python

I'm working with a numpy array called "C_ClfGtLabels" in which 374 artist/creator names are stored. I want to append a 375th artist class with a string "other artists". I thought I could just do that as follows:
C_ClfGtLabels.append('other artists')
However, this results in the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'append'
I saw found this problem a few times on stackoverflow, to which the answer in one case was to use concatenate instead of append. When I tried that I got the following error:
TypeError: don't know how to convert scalar number to int
It seems to be a problem that the datatype does not match the datatype that I, trying to append/concatenate, which would be of type string. However, I don't know what I should do to make them match. The data inside the Clabels array is as follows:
[u"admiral, jan l'" u'aldegrever, heinrich' u'allard, abraham'
u'allard, carel' u'almeloveen, jan van' u'altdorfer, albrecht'
u'andriessen, jurriaan' u'anthonisz., cornelis' u'asser, eduard isaac' ..]
Any advice on how I can setup the "other artists" string so that I can append it to C_ClfGtLabels?

A quick workaround is to convert your C_ClfGtLabels into a list first, append, and convert it back into an ndarray
lst = list(C_ClfGtLabels)
lst.append('other artists')
C_ClfGtLabels = np.asarray(lst)

Related

Formatting output of floats in a dictionary

Im currently running some correlation in test in python. I get the results I need but they're in x.01234567890 and I'm trying to format them to output them into percentage.
this is reading from a csv.
results = (
{
clean_df['NZDUSD_Close'].corr(clean_df['USDJPY_Close']),
clean_df['NZDUSD_Close'].corr(clean_df['EURGBP_Close']),
....
eclean_df['AUDUSD_Close'].corr(clean_df['EURUSD_Close']),
clean_df['GBPUSD_Close'].corr(clean_df['EURUSD_Close'])
}
)
print(results)
The above works. There are 15 results and returns all in floats, but I get string errors when I formatted this as a tuple I decided to make it a dictionary.
I've tried the following for formatting the output:
print (f"{results:.2%}")
This works for a single variable but not the list.
for key in results.keys():
print(key, "{:.2f}".format(results[key]))

If the error your encountering is:
"AttributeError: 'set' object has no attribute 'keys'"
Then it's because your dictionary is inside a tuple and a tuple has no "keys" attribute. You would be best off if you used a tuple:
results = (...)
and printing it like:
print(f"{results:.2%}")
Doesn't work because python doesn't know to iterate over all the objects in the result.
Another way of printing the tuple is:
for val in results:
print("{:.2%}".format(val))
This will iterate over every value in results and print it in a 2 point percentage format

Converting a string value of a dictionary inside another dictionary into int value

I am using PulP to solve a linear programming problem. I am having the following error in the objective function:
TypeError: list indices must be integers or slices, not str
My objective function is
prob += lpSum([routes_vars][man][pa]*costs_distance[man][pa] for (man,pa) in routes)
according to the error message, I think my problem is the costs_distance dictionary that has string values
costs_distance = {'201': {'10267': '167724.1407', '10272': '151859.5908', '10275': '150131.7254', '10277': '153266.1819', '10279': '147949.5275', '10281': '145429.9767', '10283': '144757.2507', '10286': '166474.849', '10288': '152733.6419'}, '2595': {'10267': '186216.5193', '10272': '170351.9694', '10275': '168624.1039', '10277': '171758.5604', '10279': '166441.906', '10281': '163922.3553', '10283': '163249.6293', '10286': '186363.4807', '10288': '171226.0204'},
How can I convert only the dictionary string values ('167724.1407', '151859.5908', '150131.7254'... ) into int values?

Your issue has nothing to do with the costs_distance dictionary (otherwise the error message wouldn't mention a list). It's this part:
[routes_vars][man][pa]
I'm not sure what you expect this to return, but it first constructs a list with a single element (routes_vars) then tries to slice it using [man], which doesn't make any sense.

If sure you meant converting dictionary values to float and not int
list1 = costs_distance['201'].values()
list2 = list(map(float, list1))

How to remove stopwords in gensim?

df_clean['message'] = df_clean['message'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(x))
I tried this on a dataframe's column 'message' but I get the error:
TypeError: decoding to str: need a bytes-like object, list found

Apparently, the df_clean["message"] column contains a list of words, not a string, hence the error saying that need a bytes-like object, list found.
To fix this issue, you need to convert it to string again using join() method like so:
df_clean['message'] = df_clean['message'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(" ".join(x)))
Notice that the df_clean["message"] will contain string objects after applying the previous code.

This is not a gensim problem, the error is raised by pandas: there is a value in your column message that is of type list instead of string. Here's a minimal pandas example:
import pandas as pd
from gensim.parsing.preprocessing import remove_stopwords
df = pd.DataFrame([['one', 'two'], ['three', ['four']]], columns=['A', 'B'])
df.A.apply(remove_stopwords) # works fine
df.B.apply(remove_stopwords)
TypeError: decoding to str: need a bytes-like object, list found

What the error is saying is that remove_stopwords needs string type object and you are passing a list, So before removing stop words check that all the values in column are of string type. See the Docs

Type Conversion with a Dictionary

I have a dictionary in Python that looks like the following:
result = {'To Kill a Mockingbird': ('Fiction', '11.99', '89'), 'Killing Lincoln': ('Nonfiction', '15.99', '85'), ... }
I am trying to convert the numbers into ints and floats, respectively. I am getting this error: TypeError: 'int' object is not subscriptable (referring to the float conversion).
There is probably something really small to fix here, but I am just missing it. Thanks in advance!
for keys in result:
result[keys][1] = float(result[keys][1])
result[keys][2] = int(result[keys][2])
print(result[keys])

You might want to try something like this. Using your code I also got an error about assigning to tuples
for keys in result:
result[keys] = (result[keys][0], float(result[keys][1]), int(result[keys][2]))
print(result[keys])

Since tuples are immutable, you have to reconstruct each tuple in the dictionary by applying the corresponding transforming functions to each item in the tuple. There is no need to call str, but doing so makes the solution uniform and easily extendable.
transformers = str, float, int
for key in result:
result[key] = tuple(f(val)
for f,val in zip(transformers, result[key]))
#{'To Kill a Mockingbird': ('Fiction', 11.99, 89),
# 'Killing Lincoln': ('Nonfiction', 15.99, 85)}
Note: This is not the fastest solution. I present it solely to illustrate some unorthodox ways of solving problems with Python.

AttributeError: 'NoneType' object has no attribute 'lstrip'

I have a dataframe in Pandas that is giving me the error below when I try to strip it of certain characters:
AttributeError: 'NoneType' object has no attribute 'lstrip'
I began by removing any missing or null values:
df_sample1['counties'].fillna('missing')
Inspecting it, I see a lot of unclean data, a mix of actual data (County 1, Count 2...Count n) as well as gibberish ($%ZYC 2).
To clean this further, I ran the following code:
df_sample1['counties'] = df_sample1['counties'].map(lambda x: x.lstrip('+%=/-#$;!\(!\&=&:%;').rstrip('1234567890+%=/-#$;!\(!\&=&:%;'))
df_sample1[:10]
This generates the 'NoneType' error.
I dug up a little, and in the Pandas documentation, there are some hints about skipping missing values.
if df_sample1['counties'] is None:
pass
else:
df_sample1['counties'].map(lambda x: x.lstrip('+%=/-#$;!\(!\&=&:%;').rstrip('1234567890+%=/-#$;!\(!\&=&:%;'))
This still generates the NoneType error mentioned above. Could someone point out what I'm doing wrong?

You can "skip" the None by checking if x is truthy before doing the stripping...
df_sample1['counties'].map(lambda x: x and x.lstrip('+%=/-#$;!\(!\&=&:%;').rstrip('1234567890+%=/-#$;!\(!\&=&:%;'))
This will probably leave some None in the dataframe (in the same places that they were before), but the transform should still work on the strings.

If you are working with text data, why don't you simply first fill None type data with an empty string?
df_sample1['counties'].fillna("", inplace=True)

I suspect that your issue is that when you filled your missing values, you didn't do it inplace. This could be addressed by:
df_sample1['counties'].fillna('missing', inplace=True)
Or, when applying pandas.Series.map, you could use the argument na_action to leave these entries as None.
df_sample1['counties'] = df_sample1['counties'].map(lambda x: ..., na_action='ignore')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Append string to a numpy ndarray - python

A quick workaround is to convert your C_ClfGtLabels into a list first, append, and convert it back into an ndarray lst = list(C_ClfGtLabels) lst.append('other artists') C_ClfGtLabels = np.asarray(lst)

Related

Formatting output of floats in a dictionary

Converting a string value of a dictionary inside another dictionary into int value

How to remove stopwords in gensim?

Type Conversion with a Dictionary

AttributeError: 'NoneType' object has no attribute 'lstrip'

Categories

Resources