Iterating over for loop & apply np.union1d - python

I would like to do np.union1d by iterating over a for loop.
Here is the code I'm using:
arr = np.empty((1,), dtype=np.int32)
for i in range(2):
arr = np.union1d(arr, data[df.iloc[i,7]])
data is a dictionary, from which I want to pull value based on keys defined in 8th column of my data frame df. Sorry, I won't able to provide you more details for data & df because of business confidentiality. After running this loop I'm seeing arr is showing empty array while data[df.iloc[i,7]] is generating array of around 10K size for each iteration. Can you please let me know what I'm doing wrong?

Just to be sure, check that your code isn't actually something like this:
for i in range(2):
arr=np.empty((1,), dtype=np.int32)
arr=np.union1d(arr, data[df.iloc[i,7]])
If not, I'd debug in the following way
for i in range(2):
new_arr = data[df.iloc[i,7]]
print(type(new_arr)) # check that it's an array
print(new_arr.shape) # check that first dim > 1, second dim = 1
Once you get this working I'd also suggest initializing arr = [] because np.empty initializes the array by placing a very very small value as a placeholder so it will be present in your final concatenated array!

Related

How to concatenate numpy arrays to create a 2d numpy array

I'm working on using AI to give me better odds at winning Keno. (don't laugh lol)
My issue is that when I gather my data it comes in the form of 1d arrays of drawings at a time. I have different files that have gathered the data and formatted it as well as performed simple maths on the data set. Now I'm trying to get the data into a certain shape for my Neural Network layers and am having issues.
formatted_list = file.readlines()
#remove newline chars
formatted_list = list(filter(("\n").__ne__, formatted_list))
#iterate through each drawing, format the ends and split into list of ints
for i in formatted_list:
i = i[1:]
i = i[:-2]
i = [int(j) for j in i.split(",")]
#convert to numpy array
temp = np.array(i)
#t1 = np.reshape(temp, (-1, len(temp)))
#print(np.shape(t1))
#append to master list
master_list.append(temp)
print(np.shape(master_list))
This gives output of "(292,)" which is correct there are 292 rows of data however they contain 20 columns as well. If I comment in the "#t1 = np.reshape(temp, (-1, len(temp))) #print(np.shape(t1))" it gives output of "(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)", etc. I want all of those rows to be added together and keep the columns the same (292,20). How can this be accomplished?
I've tried reshaping the final list and many other things and had no luck. It either populates each number in the row and adds it to the first dimension, IE (5840,) I was expecting to be able to append each new drawing to a master list, convert to numpy array and reshape it to the 292 rows of 20 columns. It just appears that it want's to keep the single dimension. I've tried numpy.concat also and no luck. Thank you.
You can use vstack to concatenate your master_list.
master_list = []
for array in formatted_list:
master_list.append(array)
master_array = np.vstack(master_list)
Alternatively, if you know the length of your formatted_list containing the arrays and array length you can just preallocate the master_array.
import numpy as np
formatted_list = [np.random.rand(20)]*292
master_array = np.zeros((len(formatted_list), len(formatted_list[0])))
for i, array in enumerate(formatted_list):
master_array[i,:] = array
** Edit **
As mentioned by hpaulj in the comments, np.array(), np.stack() and np.vstack() worked with this input and produced a numpy array with shape (7,20).

Appending numpy array of arrays

I am trying to append an array to another array but its appending them as if it was just one array. What I would like to have is have each array appended on its own index, (withoug having to use a list, i want to use np arrays) i.e
temp = np.array([])
for i in my_items
m = get_item_ids(i.color) #returns an array as [1,4,20,5,3] (always same number of items but diff ids
temp = np.append(temp, m, axis=0)
On the second iteration lets suppose i get [5,4,15,3,10]
then i would like to have temp as
array([1,4,20,5,3][5,4,15,3,10])
But instead i keep getting [1,4,20,5,3,5,4,15,3,10]
I am new to python but i am sure there is probably a way to concatenate in this way with numpy without using lists?
You have to reshape m in order to have two dimension with
m.reshape(-1, 1)
thus adding the second dimension. Then you could concatenate along axis=1.
np.concatenate(temp, m, axis=1)
List append is much better - faster and easier to use correctly.
temp = []
for i in my_items
m = get_item_ids(i.color) #returns an array as [1,4,20,5,3] (always same number of items but diff ids
temp = m
Look at the list to see what it created. Then make an array from that:
arr = np.array(temp)
# or `np.vstack(temp)

Can I append specific values from one array to another?

I have successfully imported a CSV file into a multi-dimensional array in python. What I want to do now is pick specific values from the array and put them into a new single array. For instance if my current arrays were:
[code1, name1, number 1]
[code2, name2, number 2]
I want to select only the code1 and code 2 values and insert them into a new array, because I need to compare just those values to a user input for validation. I have tried using the following:
newvals=[]
newvals.append oldvals([0],[0])
where newvals is the new array for just the codes, oldvals is the original array with all the data and the index [0],[0] refers to code 1, but I'm getting a syntax error. I can't use any add ons as they will be blocked by my admin.
newvals = []
for i in oldvals:
newvals.append(i[0])
Usually you can get the first Element in an array a with a[0].
You can create a new array based on another by using the "array for in" syntax
oldData = [[1,2,3],[4,5,6]]
newData = [x[0] for x in oldList]
# newData is now [1,4]

Vectorization - how to append array without loop for

I have the following code:
x = range(100)
M = len(x)
sample=np.zeros((M,41632))
for i in range(M):
lista=np.load('sample'+str(i)+'.npy')
for j in range(41632):
sample[i,j]=np.array(lista[j])
print i
to create an array made of sample_i numpy arrays.
sample0, sample1, sample3, etc. are numpy arrays and my expected output is a Mx41632 array like this:
sample = [[sample0],[sample1],[sample2],...]
How can I compact and make more quick this operation without loop for? M can reach also 1 million.
Or, how can I append my sample array if the starting point is, for example, 1000 instead of 0?
Thanks in advance
Initial load
You can make your code a lot faster by avoiding the inner loop and not initialising sample to zeros.
x = range(100)
M = len(x)
sample = np.empty((M, 41632))
for i in range(M):
sample[i, :] = np.load('sample'+str(i)+'.npy')
In my tests this took the reading code from 3 seconds to 60 miliseconds!
Adding rows
In general it is very slow to change the size of a numpy array. You can append a row once you have loaded the data in this way:
sample = np.insert(sample, len(sample), newrow, axis=0)
but this is almost never what you want to do, because it is so slow.
Better storage: HDF5
Also if M is very large you will probably start running out of memory.
I recommend that you have a look at PyTables which will allow you to store your sample results in one HDF5 file and manipulate the data without loading it into memory. This will in general be a lot faster than the .npy files you are using now.
It is quite simple with numpy. Consider this example:
import numpy as np
l = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
#create an array with 4 rows and 3 columns
arr = np.zeros([4,3])
arr[:,:] = l
You can also insert rows or columns separately:
#insert the first row
arr[0,:] = l[0]
You just have to provide that dimensions are the same.

How to create a table in Python that has headers (text), filled with values (int and float)

I am filling an numpy array in python (could change this to a list if neccesary), and i want to fill it with column headings, then enter a loop and fill the table with values, I am struggling with which type to use for the array. I have something like this so far...
info = np.zeros(shape=(no_of_label+1,19),dtype = np.str) #Creates array to store coordinates of particles
info[0,:] = ['Xpos','Ypos','Zpos','NodeNumber','BoundingBoxTopX','BoundingBoxTopY','BoundingBoxTopZ','BoundingBoxBottomX','BoundingBoxBottomY','BoundingBoxBottomZ','BoxVolume','Xdisp','Ydisp','Zdisp','Xrot','Yrot','Zrot','CC','Error']
for i in np.arange(1,no_of_label+1,1):
info[i,:] = [C[0],C[1],C[2],i,int(round(C[0]-b)),int(round(C[1]-b)),int(round(C[2]-b)),int(round(C[0]+b)),int(round(C[1]+b)),int(round(C[2]+b)),volume,0,0,0,0,0,0,0,0] # Fills an array with label.No., size of box, and co-ords
np.savetxt(save_path+Folder+'/Data_'+Folder+'.csv',information,fmt = '%10.5f' ,delimiter=",")
There is other things in the loop, but they are irrelevent, C is an array of float, b is int.
I also need to be able to save it as a csv file as shown in the last line, and open it in excel.
What I have now, returns all the values as integers, when i need C[0], C[1], C[2] to be floating point.
Thanks in advance!
It depends on what you want to do with this array but I think you want to use 'dtype=object' instead of 'np.str'. You can do that explicitly, by changing 'np.str' to 'dtype' or here is how I would write the first part of your code:
import numpy as np
labels = ['Xpos','Ypos','Zpos','NodeNumber','BoundingBoxTopX','BoundingBoxTopY',
'BoundingBoxTopZ','BoundingBoxBottomX','BoundingBoxBottomY','BoundingBoxBottomZ',
'BoxVolume','Xdisp','Ydisp','Zdisp','Xrot','Yrot','Zrot','CC','Error']
no_of_label = len(labels)
#make a list of length ((no_of_label+1)*19) and convert it to an array and reshape it
info = np.array([None]*((no_of_label+1)*19)).reshape(no_of_label+1, 19)
info[0] = labels
Again, there is probably a better way of doing this if you have a specific application in mind, but this should let you store different types of data in the same 2D array.
I have solved it as follows:
info = np.zeros(shape=(no_of_label+1,19),dtype=float)
for i in np.arange(1,no_of_label+1,1):
info[i-1] = [C[0],C[1],C[2],i,int(round(C[0]-b)),int(round(C[1]-b)),int(round(C[2]-b)),int(round(C[0]+b)),int(round(C[1]+b)),int(round(C[2]+b)),volume,0,0,0,0,0,0,0,0]
np.savetxt(save_path+Folder+'/Data_'+Folder+'.csv',information,fmt = '%10.5f' ,delimiter=",",header='Xpos,Ypos,Zpos,NodeNumber,BoundingBoxTopX,BoundingBoxTopY,BoundingBoxTopZ,BoundingBoxBottomX,BoundingBoxBottomY,BoundingBoxBottomZ,BoxVolume,Xdisp,Ydisp,Zdisp,Xrot,Yrot,Zrot,CC,Error',comments='')
Using the header function built in to the numpy save text feature. Thanks everyone!

Categories