How to index from a list nested in an array? - python

I have a variable which returns this:
array(list([0, 1, 2]), dtype=object)
How do I index from this? Everything I have tried throws an error.
For reference, some code that would produce this variable.
import xarray as xr
x = xr.DataArray(
[[0,1,2],
[3,4]]
)
x
I guess before anyone asks, I am trying to test if xarray's DataArrays is a suitable way for me to store session-based data containing multiple recordings saved as vectors/1D arrays, but each recording/array can vary in length. That is why the DataArray doesn't have even dimensions.
Thanks

I used the code given by you to create the x variable. I was able to retrieve the lists using following code:
for arr in x:
print(arr.item())
Basically, you have to call .item() on that array to retrieve the inner list.

Related

Can I append specific values from one array to another?

I have successfully imported a CSV file into a multi-dimensional array in python. What I want to do now is pick specific values from the array and put them into a new single array. For instance if my current arrays were:
[code1, name1, number 1]
[code2, name2, number 2]
I want to select only the code1 and code 2 values and insert them into a new array, because I need to compare just those values to a user input for validation. I have tried using the following:
newvals=[]
newvals.append oldvals([0],[0])
where newvals is the new array for just the codes, oldvals is the original array with all the data and the index [0],[0] refers to code 1, but I'm getting a syntax error. I can't use any add ons as they will be blocked by my admin.
newvals = []
for i in oldvals:
newvals.append(i[0])
Usually you can get the first Element in an array a with a[0].
You can create a new array based on another by using the "array for in" syntax
oldData = [[1,2,3],[4,5,6]]
newData = [x[0] for x in oldList]
# newData is now [1,4]

Accessing specific element of an array

I'm unsure of how to access an element in an array (of arrays?). Basically, I need to be able to assign random numbers to a series of arrays but I'm not sure how indexing works.
array_20 = np.zeros((5,10))
a = [[array_20]]*10
#This gives me 10 arrays of 5x10. I'd like to be able to then assign random
#numbers to all of the elements.
You could use numpy.random.rand like so:
import numpy as np
a = np.random.rand(10, 5, 10)
You can then index a like a python list. (i.e. a[1][2][0])

List to 2d array in pandas per line NEED MORE EFFICIENT WAY

I have a pandas dataframe for lists. And each one of the lists can use np.asarray(list) to convert the list to a numpy array. The shape of the array should be (263,300) ,so i do this
a=dataframe.to_numpy()
# a.shape is (100000,)
output_array=np.array([])
for list in a:
output_array=np.append(output_array,np.asarray(list))
Since there are 100000 rows in my pandas, so i expect to get
output_array.shape is (100000,263,300)
It works, but it takes long time.
I want to know which part of my code cost the most and how to solve it.
Is there a more efficient method to reach this? Thanks!

Pandas convert columns type from list to np.array

I'm trying to apply a function to a pandas dataframe, such a function required two np.array as input and it fit them using a well defined model.
The point is that I'm not able to apply this function starting from the selected columns since their "rows" contain list read from a JSON file and not np.array.
Now, I've tried different solutions:
#Here is where I discover the problem
train_df['result'] = train_df.apply(my_function(train_df['col1'],train_df['col2']))
#so I've tried to cast the Series before passing them to the function in both these ways:
X_col1_casted = trai_df['col1'].dtype(np.array)
X_col2_casted = trai_df['col2'].dtype(np.array)
doesn't work.
X_col1_casted = trai_df['col1'].astype(np.array)
X_col2_casted = trai_df['col2'].astype(np.array)
doesn't work.
X_col1_casted = trai_df['col1'].dtype(np.array)
X_col2_casted = trai_df['col2'].dtype(np.array)
does'nt work.
What I'm thinking to do now is a long procedure like:
starting from the uncasted column-series, convert them into list(), iterate on them apply the function to the np.array() single elements, and append the results into a temporary list. Once done I will convert this list into a new column. ( clearly, I don't know if it will work )
Does anyone of you know how to help me ?
EDIT:
I add one example to be clear:
The function assume to have as input two np.arrays. Now it has two lists since they are retrieved form a json file. The situation is this one:
col1 col2 result
[1,2,3] [4,5,6] [5,7,9]
[0,0,0] [1,2,3] [1,2,3]
Clearly the function is not the sum one, but a own function. For a moment assume that this sum can work only starting from arrays and not form lists, what should I do ?
Thanks in advance
Use apply to convert each element to it's equivalent array:
df['col1'] = df['col1'].apply(lambda x: np.array(x))
type(df['col1'].iloc[0])
numpy.ndarray
Data:
df = pd.DataFrame({'col1': [[1,2,3],[0,0,0]]})
df

Why do these two arrays have the same shape?

So I am trying to create an array and then access the columns by name. So I came up with something like this:
import numpy as np
data = np.ndarray(shape=(3,1000),
dtype=[('x',np.float64),
('y',np.float64),
('z',np.float64)])
I am confused as to why
data.shape
and
data['x'].shape
both come back as (3,1000), this is causing me issues when I'm trying to populate my data fields
data['x'] = xvalues
where xvalues has a shape of (1000,). Is there a better way to do this?
The reason why it comes out the same is because 'data' has a bit more structure than the one revealed by shape.
Example:
data[0][0] returns:
(6.9182540632428e-310, 6.9182540633353e-310, 6.9182540633851e-310)
while data['x'][0][0]:
returns 6.9182540632427993e-310
so data contains 3 rows and 1000 columns, and the element of that is a 3-tuple.
data['x'] is the first element of that tuple of all combinations of 3 rows and 1000 columns, so the shape is (3,1000) as well.
Just set shape=(1000,). The triple dtype will create 3 columns.

Categories