I have a problem loading specific Matlab file into pandas data frame (data). Basically, it is a multidimensional array consisting of 2x4 arrays and then either 3 or 4 columns. I searched through and found this Convert mat file to pandas dataframe, matlab data file to pandas DataFrame but neither works for me. Here is what I am using. Thit creates a pandas for one of the nests. I would like to have a column which will tell me where I am, ie first column tells if it is all_params[0] or all_params[1], second distinguishes the next level for each of those etc.
import numpy as np
import scipy.io as sio
all_params = sio.loadmat(path+'all_params')
all_params = all_params['all_params']
pd.DataFrame(all_params[0][0][:], columns=['par1', 'par2', 'par3'])
It must be simple, I'm just not able to figure it out. Or is there a way how to do it directly using scipy or another loading tool?
Thanks
Related
I am trying to access h5ad file which was downloaded from Human Cell Atlas.
Some packages have loaded to assist the file reading.
The first attempt is reading the whole file into memory.
import pandas as pd
import anndata as ad
from scipy.sparse import csr_matrix
adata = ad.read('local.h5ad', backed='r')
The content has 483152 x 58559.
Its's obs are mostly cell-related content and the var are genes.
So, I am trying to get each cell x Ensembl_id(gene) content from the adata.X.
Since R can use this approach to get 2 columns data.
ad$X[,c("var1", "var2")]
I assume python can use similar approach.
adata.X['ENSG00000223972.5','macrophage']
or
adata.X['macrophage', 'ENSG00000223972.5']
But, both attempt got nothing.
I did export the following dataframe in Google Colab. Whichever method I used, when I import it later, my dataframe appears as pandas.core.series.Series, not as an array.
from google.colab import drive
drive.mount('/content/drive')
path = '/content/drive/My Drive/output.csv'
with open(path, 'w', encoding = 'utf-8-sig') as f:
df_protein_final.to_csv(f)
After importing the dataframe looks like below
pandas.core.series.Series
Note: The first image and second image can be different order in terms of numbers (It can be look as a different dataset). Please don't get hung up on this. Don't worry. Those images are just an example.
Why does column, which is originally an array before exporting, converts to series after exporting?
The code below gives the same result. Can't export original structure.
from google.colab import files
df.to_csv('filename.csv')
files.download('filename.csv')
Edit: I am looking for a solution is there any way to keep original structure (e.g. array) while exporting.
Actually that is how pandas work. When you try to insert a list or an numpy array into a pandas dataframe, it converts that array to a series always. If you want to turn the series back to a list/array use Series.values, Series.array or Series.to_numpy() . refer this
EDIT :
I got an idea from your comments. You are asking to save dframe into a file while preserving its all properties. You are actually (intentionally or unintentionally) asking how to SERIALIZE the data frame. You have to use pickle for this. Refer this
Note : Pandas has inbuilt pickle support. So you can directly export dframe into pickle file like in this example
df.to_pickle(file_name)
I'm trying to save a large amount of Numpy arrays to a root file using uproot. I've read through the uproot documentation, and as far as I can tell this ability was remove in uproot3. Note I can't save this as a histogram because the order of the data is important (it's a waveform). Does anyone know of a work around, or thing I'm missing to do this?
e.g: I want something like this
'''
import numpy as np
import uproot as up
array = np.random.normal(0,1,1000)
file = up.recreate('./test.root')
file['Tree1'] = array
'''
I'm writing a Python program that will import a square matrix from an Excel sheet and do some NumPy work with it. So far it looks like OpenPyXl is the best way to transfer the data from an XLSX file to the Python environment, but it's not clear the best way to turn that data from a tuple of tuples* of cell references into an array of the actual values that are in the Excel sheet.
*created by calling sheet_ranges = wb['Sheet1'] and then mat = sheet_ranges['A1:IQ251']
Of course I could check the size of the tuple, write a nested for loop, check every element of each tuple within the tuple, and fill up an array.
But is there really no better way?
As commented above, the ideal solution is to use a pandas dataframe. For example:
import pandas as pd
dataframe = pd.read_excel("name_of_my_excel_file.xlsx")
print(dataframe)
Just pip install pandas and then run the code above, only replacing name_of_my_excel_file with the full path to your Excel file. Then you can proceed with Pandas functions to deeply analyse your data, for example. See docs at here!
I work for a company and I recently switched from using spreadsheet package to python. Since, I am very new to python there are alot of things that I have difficulty grasping.Using python, I am trying to extract data from a large csv file(37791 rows and 316 columns.) Here is a piece of code I wrote:
Solution 1
import numpy as np
import pandas as pd
df=pd.read_csv=('C:\\Users\\Maxwell\\Desktop\\Test.data.csv',skiprows=1)
data=df.loc[:,['Steps','Parameter']]
This command generates an error,i.e, it gives a DtypeWwarning:columns (0,1,2,3........81) have mixed types. Specify dtype option on import or set low memory= False
So, I found a workaround.
Solution 2
import pandas as pd
import numpy as np
df=pd.read_csv(('C:\\Users\\Maxwell\\Desktop\\Test.data.csv',skiprows=1,error_bad_lines=False, index_col=False, dtype='unicode')
data=df.loc[:,['Steps','Parameter']]
Two questions:
i)I was able to get around the error, but now the columns that I want(Steps & Parameter)have been converted to objects(probably due to the dtype='unicode' command). How can I convert Steps column into an integer type and parameter into a float.
ii) Some people say that dtype warning isn't really an error. But, I found out that when I use Solution 1 and read the csv file. The Steps column contains some floats.The original csv file doesn't have any floats in Steps column. It looks as if, some floats have been placed by python itself!! Why does this happen?
(I am not able to upload the original csv file, because my company doesn't allow it!)