Why do these two arrays have the same shape? - python

So I am trying to create an array and then access the columns by name. So I came up with something like this:
import numpy as np
data = np.ndarray(shape=(3,1000),
dtype=[('x',np.float64),
('y',np.float64),
('z',np.float64)])
I am confused as to why
data.shape
and
data['x'].shape
both come back as (3,1000), this is causing me issues when I'm trying to populate my data fields
data['x'] = xvalues
where xvalues has a shape of (1000,). Is there a better way to do this?

The reason why it comes out the same is because 'data' has a bit more structure than the one revealed by shape.
Example:
data[0][0] returns:
(6.9182540632428e-310, 6.9182540633353e-310, 6.9182540633851e-310)
while data['x'][0][0]:
returns 6.9182540632427993e-310
so data contains 3 rows and 1000 columns, and the element of that is a 3-tuple.
data['x'] is the first element of that tuple of all combinations of 3 rows and 1000 columns, so the shape is (3,1000) as well.

Just set shape=(1000,). The triple dtype will create 3 columns.

Related

Cannot plot or use .tolist() on pd dataframe column

so I am reading in data from a csv and saving it to a dataframe so I can use the columns. Here is my code:
filename = open(r"C:\Users\avalcarcel\Downloads\Data INSTR 9 8_16_2022 11_02_42.csv")
columns = ["date","time","ch104","alarm104","ch114","alarm114","ch115","alarm115","ch116","alarm116","ch117","alarm117","ch118","alarm118"]
df = pd.read_csv(filename,sep='[, ]',encoding='UTF-16 LE',names=columns,header=15,on_bad_lines='skip',engine='python')
length_ = len(df.date)
scan = list(range(1,length_+1))
plt.plot(scan,df.ch104)
plt.show()
When I try to plot scan vs. df.ch104, I get the following exception thrown:
'value' must be an instance of str or bytes, not a None
So what I thought to do was make each column in my df a list:
ch104 = df.ch104.tolist()
But it is turning my data from this to this:
before .tolist()
To this:
after .tolist()
This also happens when I use df.ch104.values.tolist()
Can anyone help me? I haven't used python/pandas in a while and I am just trying to get the data read in first. Thanks!
So, the df.ch104.values.tolist() code beasicly turns your column into a 2d 1XN array. But what you want is a 1D array of size N.
So transpose it before you call .tolist(). Lastly call [0] to convert Nx1 array to N array
df.ch104.values.tolist()[0]
Might I also suggest you include dropna() to avoid 'value' must be an instance of str or bytes, not a Non
df.dropna(subset=['ch104']).ch104.values.tolist()[0]
The error clearly says there are None or NaN values in your dataframe. You need to check for None and deal with them - replace with a suitable value or delete them.

How to concatenate numpy arrays to create a 2d numpy array

I'm working on using AI to give me better odds at winning Keno. (don't laugh lol)
My issue is that when I gather my data it comes in the form of 1d arrays of drawings at a time. I have different files that have gathered the data and formatted it as well as performed simple maths on the data set. Now I'm trying to get the data into a certain shape for my Neural Network layers and am having issues.
formatted_list = file.readlines()
#remove newline chars
formatted_list = list(filter(("\n").__ne__, formatted_list))
#iterate through each drawing, format the ends and split into list of ints
for i in formatted_list:
i = i[1:]
i = i[:-2]
i = [int(j) for j in i.split(",")]
#convert to numpy array
temp = np.array(i)
#t1 = np.reshape(temp, (-1, len(temp)))
#print(np.shape(t1))
#append to master list
master_list.append(temp)
print(np.shape(master_list))
This gives output of "(292,)" which is correct there are 292 rows of data however they contain 20 columns as well. If I comment in the "#t1 = np.reshape(temp, (-1, len(temp))) #print(np.shape(t1))" it gives output of "(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)(1,20)", etc. I want all of those rows to be added together and keep the columns the same (292,20). How can this be accomplished?
I've tried reshaping the final list and many other things and had no luck. It either populates each number in the row and adds it to the first dimension, IE (5840,) I was expecting to be able to append each new drawing to a master list, convert to numpy array and reshape it to the 292 rows of 20 columns. It just appears that it want's to keep the single dimension. I've tried numpy.concat also and no luck. Thank you.
You can use vstack to concatenate your master_list.
master_list = []
for array in formatted_list:
master_list.append(array)
master_array = np.vstack(master_list)
Alternatively, if you know the length of your formatted_list containing the arrays and array length you can just preallocate the master_array.
import numpy as np
formatted_list = [np.random.rand(20)]*292
master_array = np.zeros((len(formatted_list), len(formatted_list[0])))
for i, array in enumerate(formatted_list):
master_array[i,:] = array
** Edit **
As mentioned by hpaulj in the comments, np.array(), np.stack() and np.vstack() worked with this input and produced a numpy array with shape (7,20).

How to index from a list nested in an array?

I have a variable which returns this:
array(list([0, 1, 2]), dtype=object)
How do I index from this? Everything I have tried throws an error.
For reference, some code that would produce this variable.
import xarray as xr
x = xr.DataArray(
[[0,1,2],
[3,4]]
)
x
I guess before anyone asks, I am trying to test if xarray's DataArrays is a suitable way for me to store session-based data containing multiple recordings saved as vectors/1D arrays, but each recording/array can vary in length. That is why the DataArray doesn't have even dimensions.
Thanks
I used the code given by you to create the x variable. I was able to retrieve the lists using following code:
for arr in x:
print(arr.item())
Basically, you have to call .item() on that array to retrieve the inner list.

Why does pandas.to_numeric result in a list of lists?

I am trying to import csv data into a pandas dataframe. To do this I am doing the following:
df = pd.read_csv(StringIO(contents), skiprows=4, delim_whitespace=True,index_col=False,header=None)
index = pd.MultiIndex.from_arrays((columns, units, descr))
df.columns = index
df.columns.names = ['Name','Unit','Description']
df = df.apply(pd.to_numeric)
data['isotherm'] = df
This produces e.g. the following table:
In: data['isotherm']
Out:
Name Relative_Pressure Volume_STP
Unit - ccm/g
Description p/p0
0 0.042691 29.3601
1 0.078319 30.3071
2 0.129529 31.1643
3 0.183355 31.8513
4 0.233435 32.3972
5 0.280847 32.8724
However if I only want to get the values of the column Relative_Pressure I get this output:
In: data['isotherm']['Relative_Pressure'].values
Out:
array([[0.042691],
[0.078319],
[0.129529],
[0.183355],
[0.233435],
[0.280847]])
Of course I could now for every column I want to use flatten
x = [item for sublist in data['isotherm']['Relative_Pressure'].values for item in sublist]
However this would lead to a lot of extra effort and would also reduce the readability. How can I for the whole data frame make sure the data is flat?
array([[...]]) is not a list of lists, but a 2D numpy array. (I'm not sure why the values are returned as a single-column 2D array rather than a 1D array here, though. When I create a primitive DataFrame, a single column's values are returned as a 1D array.)
You can concatenate and flatten them using numpy's built-in functions, eg.
x = data['isotherm']['Relative_Pressure'].flatten()
Edit: This might be caused by the MultiIndex.
The direct way of indexing into one column belonging to your MultiIndex object is with a tuple as follows:
data[('isotherm', 'Relative_Pressure')]
which will return a Series object whose .values attribute will give you the expected 1D array. The docs discuss this here
You should be careful using chained indexing like data['isotherm']['Relative_Pressure'] because you won't know if you are dealing with a copy of the data or a view of the data. Please do a SO search of pandas' SettingWithCopyWarning for more details or read the docs here.

Set a column in numpy array to zero

I want to set a column in numpy array to zero at different times, in other words, I have numpy array M with size 5000x500. When I enter shape command the result is (5000,500), I think 5000 are rows and 500 are columns
shape(M)
(5000,500)
But the problem when I want to access one column like first column
Mcol=M[:][0]
Then I check by shape again with new matrix Mcol
shape(Mcol)
(500,)
I expected the results will be (5000,) as the first has 5000 rows. Even when changed the operation the result was the same
shape(M)
(5000,500)
Mcol=M[0][:]
shape(Mcol)
(500,)
Any help please in explaining what happens in my code and if the following operation is right to set one column to zero
M[:][0]=0
You're doing this:
M[:][0] = 0
But you should be doing this:
M[:,0] = 0
The first one is wrong because M[:] just gives you the entire array, like M. Then [0] gives you the first row.
Similarly, M[0][:] gives you the first row as well, because again [:] has no effect.

Categories