I'd like to take a list of 1000 np.ndarrays (each element in the list is an array whose shape is 3X3X8) and use this list as a pandas DataFrame column, so that each cell in the column is a matrix.
How can it be accomplished?
You may want to look at xarray.
I've found this really useful for abstracting "square" data where all of the arrays in your list have the same shape.
Related
I have my data set, https://github.com/mayuripandey/Data-Analysis/blob/main/similarity.csv, is there any way i can make matrix with two specific column and make a matrix of it? For eg:
Count and Topic?
Simply subset the columns of interest, and retrieve the values without the column names using the ".values" attribute.
df = pd.read_html("https://github.com/mayuripandey/Data-Analysis/blob/main/similarity.csv")[0]
df[["Count","Topic"]].values
This returns a 2D numpy array of only the values, then if you need, you can transform into a matrix object like this:
np.matrix(df[["Count","Topic"]].values)
I recently asked this question, about converting n 2-dimensional arrays to a dataframe with 2+n columns. The solution I got works perfectly well, but can not easily be generalized to higher dimensions.
In the more general case, the code would take as input a list of n same-size, d-dimensional numpy arrays, and d lists whose lengths correspond to the sizes of the arrays in the corresponding dimensions. The output is a pandas dataframe where each row corresponds to one position in the d-dimensional array. The first d comlumns contain the coordinates from the lists, and the next n columns contain the corresponding values from the n arrays. The linked question contains an example in 2D.
How would I go about generalizing the code to the d-dimensional case? I feel like this problem is related to the Pandas melt function, but I'm not sure how to make it work.
In pandas I have a dataframe with X by Y dimension containing values.
Then I have an identical pandas dataframe with X by Y dimension (same as df1) containing True/False values.
I want to return only the elements from df1 where the same location on df2 the value = True.
What is the fastest way to do this? Is there a way to do this without converting to numpy array?
Without having the reproducible example, I may be missing a couple tweaks/details here, but I think you may be able to accomplish this by dataframe multiplication
df1.mul(df2)
This will multiply each element by the corresponding element in the other dataframe, where True will act to return the other element and False will return a null.
It is also possible to use mask
df1.mask(df2)
This is similar to df1[df2] and replaces hidden values with NaN, although you can choose the value to replace with using the other option
A quick benchmark on a 10x10 dataframe suggests that the df.mul approach is ~5 times faster
I am trying to select specific elements from a ndarray. I have the list of row and column indices.
Here I have all_combos which contains a list which was an output of the following code, generating all combinations of a list
for i in range(len(Combo)):
all_combos.append(list(itertools.combinations(Combo[i],2)))
This gave me a list of lists which contain the row and columns of my desired matrix
I tried the below mentioned code to select the values from the matrix Trans and appended it to a list. This resulted in cluster_values being a list of list.
for j in range(len(Combo)):
for i in all_combos[j]:
cluster_values[c].append(Trans[i[0]][i[1]])
c=c+1
My final goal is to plot these values into a networkx graph. But it accepts only numpy matrix as argument. So I wanted to figure out what is the best way to capture the indices as well as the value from Trans matrix so that i can use it to plot the graph.
Thanks in advance
There must be a simple answer to this, but for some reason I can't find it. Apologies if this is a duplicate question.
I have a dataframe with shape on the order of (1000,100). I want to concatenate ALL items in the dataframe into a single series (or list). Order doesn't matter (so it doesn't matter what axis to concatenate along). I don't want/need to keep any column names or indices. Dropping NaNs and duplicates is ok but not required.
What's the easiest way to do this?
This will yield a 1-dim numpy-array of the lowest-common dtype for all elements.
df.values.ravel()