Numpy Arrays in python - python

I have this code that contains a for loop to print out this result for me.
Could I transfer this to Numpy Array instead of for loop and less memory?
categorical__unique = df.select_dtypes(['object']).columns
for col in categorical__unique:
print('{} : {} unique value(s)'.
format(col, df[col].nunique()))
I am trying to make this categorial_unique value to an array and then use the functions of the NumPy array instead of for loop.

To convert a dataframe into a matrix you have to use the "to_numpy" function from pandas
categorical_unique = df.select_dtypes(['object']).to_numpy()

Related

Is there a numpy function to find an array in multi dimensional array?

I have a numpy array with n row and p columns.
I want to check if a given row is in my array and find the index.
For exemple I have a numpy array like this :
[[1,0,8,7,2,2],[1,3,7,0,3,0],[1,7,1,0,1,0],[1,9,1,0,6,0],[1,8,1,7,9,0],....]
I want to check if this array [6,0,5,8,2,1] is in my numpy array or and where.
Is there a numpy function for that ?
I'm sorry for asking naive question but I'm quite confuse right now.
You can use == and .all(axis=1) to match entire rows, then use numpy.where() to get the index:
import numpy as np
a = np.array([[1,0,8,7,2,2],[1,3,7,0,3,0],[1,7,1,0,1,0],[1,9,1,0,6,0],[1,8,1,7,9,0], [6,0,5,8,2,1]])
b = np.array([6,0,5,8,2,1])
print(np.where((a==b).all(axis=1)))
Output:
(array([5], dtype=int32),)

Multidimensional numpy array appending with Python

In Python, I can concatenate two arrays like below,
myArray = []
myArray += [["Value1", "Value2"]]
I can then access the data by indexing the 2 dimensions of this array like so
print(myArray[0][0])
which will output:
Value1
How would I go about achieving the same thing with Numpy?
I tried the numpy append function but that only ever results in single dimensional arrays for me, no matter how many square brackets I put around the value I'm trying to append.
If you know the dimension of the final array then you could use np.vstack
>>> import numpy as np
>>> a = np.array([]).reshape(0,2)
>>> b = np.array([['Value1', 'Value2']])
>>> np.vstack([a,b])
array([['Value1', 'Value2']], dtype='<U32')

List to 2d array in pandas per line NEED MORE EFFICIENT WAY

I have a pandas dataframe for lists. And each one of the lists can use np.asarray(list) to convert the list to a numpy array. The shape of the array should be (263,300) ,so i do this
a=dataframe.to_numpy()
# a.shape is (100000,)
output_array=np.array([])
for list in a:
output_array=np.append(output_array,np.asarray(list))
Since there are 100000 rows in my pandas, so i expect to get
output_array.shape is (100000,263,300)
It works, but it takes long time.
I want to know which part of my code cost the most and how to solve it.
Is there a more efficient method to reach this? Thanks!

Create a numpy array when indexes of (fixed) elements are given

I have a numpy array of indexes e.g. [1,3,12]. I want to create another array from this such that at these indexes, I get a non-zero elements e.g. 1. So in this case, with input [1,3,12], I should get [0,1,0,1,0,0,0,0,0,0,0,0,1]. I can do it in a for loop, is there a short numpy function to achieve this?
With numpy you can index with lists directly:
a = [1,3,12]
vector = numpy.zeros(shape=max(a) + 1)
vector[a] = 1

Iterate over numpy array to fill a python list

I'm iterating over a numpy array to apply a function through each element and add the new value to a list so I can keep the original data.
The problem is: it's kinda slow.
Is there a better way to do this (without changing the original array)?
import numpy as np
original_data = np.arange(0,16000, dtype = np.float32)
new_data = [i/max(original_data) for i in original_data]
print('done')
You could simply do:
new_data = original_data/original_data.max()
Numpy already performs this operation element-wise.
In your code there is an extra source of slowness: each call max(original_data) will result in an iteration over all elements from original_data, making your cost proportional to O(n^2).

Categories