I'm having some difficulties trying to convert a NumPy array to R data set. I have a 2D image of 2362 x 2163 on gray scale. I have to import the gray scale value of each pixel to do some statistical analysis. First, this is what I imported for the process:
import cv2
import numpy as np
from skimage import img_as_ubyte
from skimage import data
from rpy2.robjects import r
from rpy2.robjects import pandas2ri
from pandas import DataFrame
pandas2ri.activate()
First I import the image using cv2 as a NumPy array and just in case I converted the array to values between 0 and 255 (256 gray levels).
xchest = cv2.imread("/home/user/xchest.tif", cv2.IMREAD_GRAYSCALE)
xchestsk = img_as_ubyte(xchest)
The output of:
type(xchestsk)
is:
<class 'numpy.ndarray'>
I visually checked the array an as expected is something like:
[[8, 8, 9, ... 200, 234, 245]...[250, 234, 134, ... 67, 8, 8]]
I need all that pixel information on a simple data set that I can use and analyze on RStudio. I tried with:
xchest_R = DataFrame(xchestsk)
xchest_R = r.data('xchestsk')
r.assign("test", xchest_R)
r("save(test, file='/home/user/xchest.gzip', compress=TRUE)")
But when I load it on R:
> load("/home/user/xchest.gzip")
I just get a value: test > "xchestsk"
Like if I just imported a string.
I tried with:
np.save("/home/user/xchest.npy", xchestsk)
But when I try to import it on R with:
> library(RcppCNPy)
> xchest_R <- npyLoad("/home/user/xchest.npy", "integer")
RStudio crashes and I have to restart the session.
Finally, I tried converting the NumPy array to a CSV file:
np.savetxt("/home/eera5607/xchest.csv", xchestsk, delimiter=",")
But when I import it to R:
> xchest_data = read.csv(file="/home/eera5607/xchest.csv", header=FALSE, sep=",")
I can't do simple statistical analysis like:
> mean(xchest_data)
Because I get this warning:
> argument is not numeric or logical: returning NA
I tried converting the data to one variable and 5000000+ points with:
xchest_list = xchestsk.tolist()
xchest_ov = []
for list in xchest_list:
xchest_ov += list
Then I converted the xchest_ov list to CSV but I get the same warning in RStudio.
What I need is to import all those values, if possible, keeping the matrix structure (but it is not necessary, at least import the pixel values as a regular R data set) to which I can apply some statistical analysis. I know I can do some analysis directly on Python but I would like this data on RStudio. I have very little knowledge in this topics. I'm a radiologist and this is the first time I'm working with R.
Related
I am currently working on the NASA aerosol optical depth data (MCD19A2), which is a NASA satellite level three product. I have uploaded the data. I want to save the data as a dataframe including all the information of longitude and latitude, and values. I have successfully converted the 0.47um band file into a three-dimensional array. I want to ask how to convert this array into a correct dataframe includes X, Y and the value.
Below are the codes I have tried:
from osgeo import gdal
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
rds = gdal.Open("MCD19A2.A2006001.h26v04.006.2018036214627.hdf")
names=rds.GetSubDatasets()
names[0][0]
*'HDF4_EOS:EOS_GRID:"MCD19A2.A2006001.h26v04.006.2018036214627.hdf":grid1km:Optical_Depth_047'*
aod_047 = gdal.Open(names[0][0])
a47=aod_047.ReadAsArray()
a47[1].shape
(1200,1200)
I would like the result to be like
X (n=1200)
Y (n=1200)
AOD_047
8896067
5559289
0.0123
I know that in R this can be done by
require('gdalUtils')
require('raster')
require('rgdal')
file.name<-"MCD19A2.A2006001.h26v04.006.2018036214627.hdf"
sds <- get_subdatasets(file.name)
gdal_translate(sds[1], dst_dataset = paste0('tmp047', basename(file.name), '.tiff'), b = nband)
r.047 <- raster(paste0('tmp047', basename(file.name), '.tiff'))
df.047 <- raster::as.data.frame(r.047, xy = T)
names(df.047)[3] <- 'AOD_047'
But, R really relies on memory and saving to 'tif' and reading 'tif' is using a lot of memory. So I want to do this task in python. Thanks a lot for your help.
You can use pandas:
import pandas as pd
df=pd.read_hdf('filename.hdf')
import rasterio as rio
from rasterio.plot import show
from sklearn import cluster
import matplotlib.pyplot as plt
import numpy as np
import glob
for filepath in glob.iglob('./dengue3/*.tiff'):
elhas_raster = rio.open(filepath)
elhas_arr = elhas_raster.read() # read the opened image
vmin, vmax = np.nanpercentile(elhas_arr, (5,95)) # 5-95% contrast stretch
# create an empty array with same dimension and data type
imgxyb = np.empty((elhas_raster.height, elhas_raster.width, elhas_raster.count), elhas_raster.meta['dtype'])
# loop through the raster's bands to fill the empty array
for band in range(imgxyb.shape[2]):
imgxyb[:,:,band] = elhas_raster.read(band+1)
#print(imgxyb.shape)
# convert to 1d array
img1d=imgxyb[:,:,:7].reshape((imgxyb.shape[0]*imgxyb.shape[1],imgxyb.shape[2]))
#print(img1d.shape)
Above code I am using to read the tiff images in a folder and get the arrays. However, the output is -
ValueError: cannot reshape array of size 6452775 into shape (921825,12)
Images are 12 band. I tried using 12 in place of 7 in the above code, but the code doesnt execute. How do I resolve this? Thank you for your time.
You have changed the size of the index you're trying to reshape, but not the reshape command parameter:
img1d=imgxyb[:,:,:7].reshape((imgxyb.shape[0]*imgxyb.shape[1],imgxyb.shape[2]))
This should be:
img1d=imgxyb[:,:,:7].reshape((imgxyb.shape[0]*imgxyb.shape[1],7))
1st, i wasnt sure if I should use pandas or numpy to read the list of coordinates from csv file?
2nd, when I try either I get stuck with OpenCV function cv2.circle(image,(x,y),25,(0,255,0)) and the reason why is because (x, y) only accept single integer number and the other reason is it only accept single int number with this function! My problem is I have multiple coordinates for this image and the other issue it i have float numbers!
import numpy as np
from PIL import Image
import cv2
import pandas as pd
import math
dfa = pd.read_csv("filter_14.csv")
image = cv2.imread("image_1602.png")
x = dfa['project_image_X'].astype(int)
y = dfa['project_image_Y'].astype(int)
cv2.circle(image,(x,y),25,(0,255,0))
cv2.imshow('test image', image)
cv2.waitKey(0)**
This is the error message:
Traceback (most recent call last):
File "image_print.py", line 23, in
cv2.circle(image,(x,y),25,(0,255,0))
File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 131, in wrapper
raise TypeError("cannot convert the series to " "{0}".format(str(converter)))
TypeError: cannot convert the series to <class 'int'>
It's difficult to say without the content of your csv file, but it looks like you are passing a whole series to draw one circle instead of just one (x, y) value pair.
Try looping through all the values in the dataframe:
import numpy as np
from PIL import Image
import cv2
import pandas as pd
import math
dfa = pd.read_csv("filter_14.csv")
image = cv2.imread("image_1602.png")
x = dfa['project_image_X'].astype(int)
y = dfa['project_image_Y'].astype(int)
for idx in dfa.index:
x, y = dfa.loc[idx, ['project_image_X', 'project_image_Y']]
cv2.circle(image,(x,y),25,(0,255,0))
cv2.imshow('test image', image)
cv2.waitKey(0)**
If the problem persists, it would be helpful to see a sample of the csv file.
I want to apply Hough Transform on stock prices (array of numbers).
I read OpenCV and scikit-image docs and examples ,but got nothing how to apply the transformation to the arrays of numbers instead of images.
I created 2D array from data. First dimension is X(simply index of data) and second dimension is close prices.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pywt as wt
from skimage.transform import (hough_line, hough_line_peaks,probabilistic_hough_line)
from matplotlib import cm
path = "22-31May-100Tick.csv"
df = pd.read_csv(path)
y = df.Close.values
x = np.arange(0,len(y),1)
data = []
for i in x:
a = [i,y[i]]
data.append(a)
data = np.array(data)
How is it possible to apply the transformation with OpenCV or sickit-image?
Thank you
This question already has answers here:
Read .mat files in Python
(15 answers)
Closed 6 years ago.
Is there a standard way to convert matlab .mat (matlab formated data) files to Panda DataFrame?
I am aware that a workaround is possible by using scipy.io but I am wondering whether there is a straightforward way to do it.
I found 2 way: scipy or mat4py.
mat4py
Load data from MAT-file
The function loadmat loads all variables stored in the MAT-file into a
simple Python data structure, using only Python’s dict and list
objects. Numeric and cell arrays are converted to row-ordered nested
lists. Arrays are squeezed to eliminate arrays with only one element.
The resulting data structure is composed of simple types that are
compatible with the JSON format.
Example: Load a MAT-file into a Python data structure:
data = loadmat('datafile.mat')
From:
https://pypi.python.org/pypi/mat4py/0.1.0
Scipy:
Example:
import numpy as np
from scipy.io import loadmat # this is the SciPy module that loads mat-files
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd
mat = loadmat('measured_data.mat') # load mat-file
mdata = mat['measuredData'] # variable in mat file
mdtype = mdata.dtype # dtypes of structures are "unsized objects"
# * SciPy reads in structures as structured NumPy arrays of dtype object
# * The size of the array is the size of the structure array, not the number
# elements in any particular field. The shape defaults to 2-dimensional.
# * For convenience make a dictionary of the data using the names from dtypes
# * Since the structure has only one element, but is 2-D, index it at [0, 0]
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
# Reconstruct the columns of the data table from just the time series
# Use the number of intervals to test if a field is a column or metadata
columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']]
# now make a data frame, setting the time stamps as the index
df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1),
index=[datetime(*ts) for ts in ndata['timestamps']],
columns=columns)
From:
http://poquitopicante.blogspot.fr/2014/05/loading-matlab-mat-file-into-pandas.html
Finally you can use PyHogs but still use scipy:
Reading complex .mat files.
This notebook shows an example of reading a Matlab .mat file,
converting the data into a usable dictionary with loops, a simple plot
of the data.
http://pyhogs.github.io/reading-mat-files.html
Ways to do this:
As you mentioned scipy
import scipy.io as sio
test = sio.loadmat('test.mat')
Using the matlab engine:
import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat",nargout=1)