I have a .dat file and I want to generate the content in python, thus I use the following code:
import numpy as np
bananayte=np.fromfile("U04_banana-ytest.dat",dtype=float)
print(bananayte)
However, my initial data should be like "1.0000000e+00", while the output is like "1.39804066e-76". What happened? and what should I do to get the correct value? Thanks!
Related
I am trying to access h5ad file which was downloaded from Human Cell Atlas.
Some packages have loaded to assist the file reading.
The first attempt is reading the whole file into memory.
import pandas as pd
import anndata as ad
from scipy.sparse import csr_matrix
adata = ad.read('local.h5ad', backed='r')
The content has 483152 x 58559.
Its's obs are mostly cell-related content and the var are genes.
So, I am trying to get each cell x Ensembl_id(gene) content from the adata.X.
Since R can use this approach to get 2 columns data.
ad$X[,c("var1", "var2")]
I assume python can use similar approach.
adata.X['ENSG00000223972.5','macrophage']
or
adata.X['macrophage', 'ENSG00000223972.5']
But, both attempt got nothing.
I did export the following dataframe in Google Colab. Whichever method I used, when I import it later, my dataframe appears as pandas.core.series.Series, not as an array.
from google.colab import drive
drive.mount('/content/drive')
path = '/content/drive/My Drive/output.csv'
with open(path, 'w', encoding = 'utf-8-sig') as f:
df_protein_final.to_csv(f)
After importing the dataframe looks like below
pandas.core.series.Series
Note: The first image and second image can be different order in terms of numbers (It can be look as a different dataset). Please don't get hung up on this. Don't worry. Those images are just an example.
Why does column, which is originally an array before exporting, converts to series after exporting?
The code below gives the same result. Can't export original structure.
from google.colab import files
df.to_csv('filename.csv')
files.download('filename.csv')
Edit: I am looking for a solution is there any way to keep original structure (e.g. array) while exporting.
Actually that is how pandas work. When you try to insert a list or an numpy array into a pandas dataframe, it converts that array to a series always. If you want to turn the series back to a list/array use Series.values, Series.array or Series.to_numpy() . refer this
EDIT :
I got an idea from your comments. You are asking to save dframe into a file while preserving its all properties. You are actually (intentionally or unintentionally) asking how to SERIALIZE the data frame. You have to use pickle for this. Refer this
Note : Pandas has inbuilt pickle support. So you can directly export dframe into pickle file like in this example
df.to_pickle(file_name)
I tried to import Matlab file into python and form a dataframe.
from scipy.io import loadmat
import os.path
path=os.path.abspath(os.getcwd())+"/BatteryDataSet/BatteryAgingARC_25_26_27_28_P1/B0025.mat"
mat = loadmat(path)
Then I tried to convert into pandas dataframe. It doesn't work.
Could anyone help me please? I've read the previous posts, still no answer.
Thank you very much!
I use Matlab to read the .mat file, it's like this:
Thanks again!
I am quite sure that my arff files are correct, for that I have downloaded different files on the web and successfully opened them in Weka.
But I want to use my data in python, then I typed:
import arff
data = arff.load('file_path','rb')
It always returns an error message: Invalid layout of the ARFF file, at line 1.
Why this happened and how should I do to make it right?
If you change your code like in below, it'll work.
import arff
data = arff.load(open('file_path'))
Using scipy we can load arff data in python
from scipy.io import arff
import pandas as pd
data = arff.loadarff('dataset.arff')
df = pd.DataFrame(data[0])
df.head()
I met a DF file which is encoded in binary format. But when I open it using Vim, still I can see characters like "pandas.core.frame", "numpy.core.multiarray". So I guess it is related with Python. However I know little about the Python language. Though I have tried using pandas and numpy modules, I failed to read the file. Could you guys give any suggestion on this issue? Thank you in advance. Here is the Dropbox link to the DF file: https://www.dropbox.com/s/b22lez3xysvzj7q/flux.df
Looks like DataFrame stored with pickle, use read_pickle() to read it:
import pandas as pd
df = pd.read_pickle('flux.df')