how to find max of a columns with same name - python

im having some trouble with this data frame where columns having the same name have to be reduced to values with at least one "1" as "1".
+---+---+---+---+---+---+---+---+---+
| a | a | a | b | c | c | c | d | d |
+---+---+---+---+---+---+---+---+---+
| 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 |
| 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+
to something like this using "or" condition for every column for a huge dataset could be a time-consuming task so I am having trouble figuring it out. I used max(axis=1, level=0) still couldn't make it.
my desired output :
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
| 1 | 1 | 1 | 1 |
| 0 | 1 | 1 | 1 |
| 1 | 0 | 1 | 0 |
+---+---+---+---+

Check with max
df = df.max(level=0, axis=1)

Related

How to split a column into several columns by taking the string values as column headers?

This is my dataset:
| Name | Dept | Project area/areas interested |
| -------- | -------- |-----------------------------------|
| Joe | Biotech | Cell culture//Bioinfo//Immunology |
| Ann | Biotech | Cell culture |
| Ben | Math | Trigonometry//Algebra |
| Keren | Biotech | Microbio |
| Alice | Physics | Optics |
This is how I want my result:
| Name | Dept |Cell culture|Bioinfo|Immunology|Trigonometry|Algebra|Microbio|Optics|
| -------- | -------- |------------|-------|----------|------------|-------|--------|------|
| Joe | Biotech | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| Ann | Biotech | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| Ben | Math | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| Keren | Biotech | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| Alice | Physics | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Not only do I have to split the last column into the different columns based on the rows - I have to resplit certain column values that are seperated by "//". And the values in the dataframe have to be replaced with 1 or 0 (int).
I've been stuck on this for a while now (-_-;)
You can use pandas.concat in combination with pandas.get_dummies like this:
pd.concat([df[["Name", "Dept"]], df["Project area/areas interested"].str.get_dummies(sep='//')], axis=1)

Pandas: Wide to first, second, third, identified categories

I am wondering if anyone knows a quick way in pandas to pivot a dataframe to achieve the desired transformation below. It is sort of a wide-to-long pivot, but not quite.
Input Dataframe structure (needs to be able to support N number of categories, not just 3 as case below)
+------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+
| id | catA_present | catA_pos | catA_neg | catA_ntrl | catB_present | catB_pos | catB_neg | catB_ntrl | catC_present | catC_pos | catC_neg | catC_ntrl |
+------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+
| 0001 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
+------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+
| 0002 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
+------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+
| 0003 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
+------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+
| 0004 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
+------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+
| 0005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
+------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+--------------+----------+----------+-----------+
Output Transformed Dataframe structure: (needs to support N number of categories, not just 3 as example shows)
+------+------+-------+------+-------+------+-------+
| id | cat1 | sent1 | cat2 | sent2 | cat3 | sent3 |
+------+------+-------+------+-------+------+-------+
| 0001 | catA | pos | catC | neg | NULL | NULL |
+------+------+-------+------+-------+------+-------+
| 0002 | catB | pos | catC | pos | NULL | NULL |
+------+------+-------+------+-------+------+-------+
| 0003 | catA | ntrl | catB | ntrl | NULL | NULL |
+------+------+-------+------+-------+------+-------+
| 0004 | catA | pos | catB | pos | catC | ntrl |
+------+------+-------+------+-------+------+-------+
| 0005 | catC | neg | NULL | NULL | NULL | NULL |
+------+------+-------+------+-------+------+-------+
I don't think it's a pivot at all.. However, anything is possible so here we go:
import io
import itertools
import pandas
# Your data
data = io.StringIO(
"""
id | catA_present | catA_pos | catA_neg | catA_ntrl | catB_present | catB_pos | catB_neg | catB_ntrl | catC_present | catC_pos | catC_neg | catC_ntrl
0001 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0
0002 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0
0003 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0
0004 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1
0005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0
"""
)
df = pandas.read_table(data, sep="\s*\|\s*")
def get_sentiment(row: pandas.Series) -> pandas.Series:
if row["cat_pos"] == 1:
return "pos"
elif row["cat_neg"] == 1:
return "neg"
elif row["cat_ntrl"] == 1:
return "ntrl"
else:
return None
# Initialize a dict that will hold an entry for every index in the dataframe, with a list of categories and sentiments
categories_per_index = {index: [] for index in df.index}
# Extract a list of unique names of all possible categories
categories = set([column[3] for column in df.columns if column.startswith("cat")])
# Loop over the unique categories
for key in categories:
# Select only the columns for a particular category, and where that category is present
group = df.loc[df[f"cat{key}_present"] == 1, [f"cat{key}_present", f"cat{key}_pos", f"cat{key}_neg", f"cat{key}_ntrl"]]
# Change the column names for generic processing
group.columns = ["cat_present", "cat_pos", "cat_neg", "cat_ntrl"]
# Figure out the sentiment for every line
group["sentiment"] = group.apply(get_sentiment, axis=1)
# Loop the rows in the group and add the sentiment for this category to the indices
for index, row in group.iterrows():
# Add the name of the category and the sentiment to the index
categories_per_index[index].append(f"cat{key}")
categories_per_index[index].append(row["sentiment"])
# Reconstruct the dataframe from the dictionary
df = pandas.DataFrame.from_dict(categories_per_index, orient="index", columns=list(itertools.chain.from_iterable([ [f"cat{i}", f"sent{i}"] for i in range(len(categories)) ])))
Output:
print(df)
cat0 sent0 cat1 sent1 cat2 sent2
0 catA pos catC neg None None
1 catB pos catC pos None None
2 catB ntrl catA ntrl None None
3 catB pos catA pos catC ntrl
4 catC neg None None None None

Logical indexing in pandas dataframes [duplicate]

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 3 years ago.
I have some data like this:
+-----------+---------+-------+
| Duration | Outcome | Event |
+-----------+---------+-------+
| 421 | 0 | 1 |
| 421 | 0 | 1 |
| 261 | 0 | 1 |
| 24 | 0 | 1 |
| 27 | 0 | 1 |
| 613 | 0 | 1 |
| 2454 | 0 | 1 |
| 227 | 0 | 1 |
| 2560 | 0 | 1 |
| 229 | 0 | 1 |
| 2242 | 0 | 1 |
| 6680 | 0 | 1 |
| 1172 | 0 | 1 |
| 5656 | 0 | 1 |
| 5082 | 0 | 1 |
| 7239 | 0 | 1 |
| 127 | 0 | 1 |
| 128 | 0 | 1 |
| 128 | 0 | 1 |
| 7569 | 1 | 1 |
| 324 | 0 | 2 |
| 6395 | 0 | 2 |
| 6196 | 0 | 2 |
| 31 | 0 | 2 |
| 228 | 0 | 2 |
| 274 | 0 | 2 |
| 270 | 0 | 2 |
| 275 | 0 | 2 |
| 232 | 0 | 2 |
| 7310 | 0 | 2 |
| 7644 | 1 | 2 |
| 6949 | 0 | 3 |
| 6903 | 1 | 3 |
| 6942 | 0 | 4 |
| 7031 | 1 | 4 |
+-----------+---------+-------+
Now, for each Event, with the Outcome 0/1 considered as Fail/Pass, I want to sum the total Duration of Fail/Pass events separately in 2 new columns (or 1, whatever ensures readability).
I'm new to dataframes and I feel significant logical indexing is involved here. What is the best way to approach this problem?
df.groupby(['Event', 'Outcome'])['Duration'].sum()
So you group by both the event then the outcome, look at the duration column then take the sum of each group.
You can also try:
pd.pivot_table(index='Event',
columns='Outcome',
values='Duration',
data=df,
aggfunc='sum')
which gives you a table with two columns:
+---------+-------+------+
| Outcome | 0 | 1 |
+---------+-------+------+
| Event | | |
+---------+-------+------+
| 1 | 35691 | 7569 |
| 2 | 21535 | 7644 |
| 3 | 6949 | 6903 |
| 4 | 6942 | 7031 |
+---------+-------+------+

Use the other columns value if a condition is met Panda

Assuming I have the following table:
+----+---+---+
| A | B | C |
+----+---+---+
| 1 | 1 | 3 |
| 2 | 2 | 7 |
| 6 | 3 | 2 |
| -1 | 9 | 0 |
| 2 | 1 | 3 |
| -8 | 8 | 2 |
| 2 | 1 | 9 |
+----+---+---+
if column A's value is Negative, update column B's value by the value of column C. if not do nothing
This is the desired output:
+----+---+---+
| A | B | C |
+----+---+---+
| 1 | 1 | 3 |
| 2 | 2 | 7 |
| 6 | 3 | 2 |
| -1 | 0 | 0 |
| 2 | 1 | 3 |
| -8 | 2 | 2 |
| 2 | 1 | 9 |
+----+---+---+
I've been trying the following code but it's not working
#not working
result.loc(result["A"] < 0,result['B'] = result['C'].iloc[0])
result.B[result.A < 0] = result.C
Try this:
df.loc[df['A'] < 0, 'B'] = df['C']

Unexpected Python TypeError: when using scalars

I am new to Python and in my opinion it is much different than Java.
I have looked at other answers which implies that the error is because I am passing an array when it is expecting a value. I don't know about that. I am pretty sure I am simply passing a value.
The line, 97, is:
exponential = math.exp(-(math.pow(feature_value-mean, 2) / (2*math.pow(standard_deviation, 2))))
The complete text of the error is:
Traceback (most recent call last):
File "D:/Personal/Python/NB.py", line 153, in <module>
main()
File "D:/Personal/Python/NB.py", line 148, in main
predictions = getPredict(summaries, testing_set)
File "D:/Personal/Python/NB.py", line 129, in getPredict
classification = predict(results, testData[index])
File "D:/Personal/Python/NB.py", line 117, in predict
probabilities = Classify(feature_summaries, classifications)
File "D:/Personal/Python/NB.py", line 113, in Classify
probabilities[classes] = probabilities[classes] * GaussianProbabilityDensity(feature_value, mean, standard_deviation)
File "D:/Personal/Python/NB.py", line 97, in GaussianProbabilityDensity
exponential = math.exp(-(math.pow(feature_value-mean, 2) / (2*math.pow(standard_deviation, 2))))
TypeError: only size-1 arrays can be converted to Python scalars
If it is useful, the csv is below. It should be noted I have two other algorithms that run on this dataset just fine.
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 |
| 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Your issue is that class_summaries (from line 107) is a list of tuples, one of which you select and pass into the GaussianProbabilityDensity function as feature_value.
It ends up causing the error on line 97. Note that if you were to fix it (I replaced the value with a hard 1.0), you'll end up with a division by zero error, as the standard_deviation you're putting in happens to be 0 at that point.
The way I found this was to use a Python IDE that has a proper debugger (I like PyCharm) and by setting a breakpoint on the line you indicated, inspecting the various variables before the error occurs. I recommend trying to solve these types of problems in a similar fashion, as it save a lot of time and spurious print statements.
math.pow (like all math functions) only works with scalars, that is single numbers (integer or float). The error says that one of the arguments, such as standard_deviation is a numpy array with more than one element, so it can't be converted to a scalar and passed to math.pow.
This occurs in your own code, so there's no difficulty in tracing those variables back to their source.
Either you unintentionally passed an array to this function, or you need to replace math.pow with np.pow (and np.exp) functions which do work with arrays.
You generate a numpy array when loading from the csv
data = numpy.loadtxt(data, delimiter=',')
# Loop through the data in the array
for index in range(len(data)):
# Utilize a try catch to try and convert to float, if it can't convert to float, converts to 0
try:
data[index] = [float(x) for x in data[index]]
except ValueError:
data[index] = 0
loadtxt returns an array, with float dtype (the default). All its elements will be floats - if it read something that wasn't valid float, it would have raised an error. Thus the loop isn't needed. And the loop looks too much like it was written for a list, not an array.
randomize_data shouldn't return anything. np.random.shuffle operates in-place on csv. That doesn't cause an error.

Categories