How to get actual data points from a graph using python? - python

I have made a graph on stock data using fbprophet module in python. my graph looks like this :
The code i m using is this:
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365) # forecasting for 1 year from now.
forecast = model.predict(future)
''' Plotting the forecast '''
figure = model.plot(forecast)
plt.plot = figure
figure.savefig('forecasting for 1 year.svg')
From above code i have made that graph. then i extracted the data points from it using mpld3 module
import mpld3
# print(mpld3.fig_to_dict(figure))
print(mpld3.fig_to_dict(figure)['data'])
It gives me output like this:
{'data01': [[734094.0, 3.3773930153824794], [734095.0, 3.379438304627263], ........ 'data03': [[0.0, 0.0]]}
But the problem is from the above output the y values i m getting is correct but not the x values.. The actual x values are like this :
"x": [
"2010-11-18 00:00:00",
"2010-11-19 00:00:00",
"2010-11-22 00:00:00" ... ]
but i m getting x values like this : 734094.0 , 734095.0 ..
So how can i get the actual data (data points x and y values ) from graph ??
Is there any other way to do it ? I want to extract data points from graph then send those from a flask api to UI (angular 4)
Thanks in advance!

734094 / 365.25 = 2009.8398. That's a very suggestive number for a date that, from your example, I assume is 2010-11-18. It looks like your date information is expressed as a floating-point number, where the difference of 1.0 corresponds to one day: and, the reference date for the value 0.0 is January 1, 1 AD.
You could try to write a function that counts from 01-01-1, or maybe you could find one in a library. Alternately, you could look at the converted value for a date you know, and work from there.

Related

How to remove a series from box-plot data?

Hello I'm trying to solve this question with python and seaborn : Use "seaborn" to create box plots to represent the number of pieces per decade. We will not use the decade of the 40s because it only contains one year. **
The decades are starting from 1940 to 2010 and I would like to know how to delete the first decade (1940) from my boxplot.
Here this is what I did :
piecesDecade = sns.boxplot(x = "decade", y ="pieces" , data = lego)
but I dont know how to not use the first decade !
here the output of lego :
You can just filter out the decades:
sns.boxplot(x = "decade", y ="pieces" , data = lego[lego['year'] > 1949])
# or data = lego[lego['decade'] != '1940s']

Matplotlib Live Graph - Using Time as x-axis values

I was just wondering if it is possible to use Time as x-axis values for a matplotlib live graph.
If so, how should it be done? I have been trying many different methods but end up with errors.
This is my current code :
update_label(label):
def getvoltage():
f=open("VoltageReadings.txt", "a+")
readings = [0]*100
maxsample = 100
counter = 0
while (counter < maxsample):
reading = adc.read_adc(0, gain=GAIN)
readings.append(reading)
counter += 1
avg = sum(readings)/100
voltage = (avg * 0.1259)/100
time = str(datetime.datetime.now().time())
f.write("%.2f," % (voltage) + time + "\r\n")
readings.clear()
label.config(text=str('Voltage: {0:.2f}'.format(voltage)))
label.after(1000, getvoltage)
getvoltage()
def animate(i):
pullData = open("VoltageReadings.txt","r").read()
dataList = pullData.split('\n')
xList=[]
yList=[]
for eachLine in dataList:
if len(eachLine) > 1:
y, x = eachLine.split(',')
xList.append(float(x)))
yList.append(float(y))
a.clear()
a.plot(xList,yList)
This is one of the latest method I've tried and I'm getting error that says
ValueError: could not convert string to float: '17:21:55'
I've tried finding ways to convert the string into a float but I can't seem to do it
I'd really appreciate some help and guidance, thank you :)
I think that you should use the datetime library. You can read your dates using this command date=datetime.strptime('17:21:55','%H:%M:%S') but you have to use the Julian date as a reference by setting a date0=datetime(1970, 1, 1) You can also use the starting point of your time series as a date0 and then set your date as date=datetime.strptime('01-01-2000 17:21:55','%d-%m-%Y %H%H:%M:%S'). Compute the differences between your actual date and the reference date IN SECONDS (there are several functions to do this) for each line in your file using a loop and affect this difference to a list element (We will call this list Diff_list). At the end use T_plot= [dtm.datetime.utcfromtimestamp(i) for i in Diff_List]. Finally a plt.plot(T_plot,values) will allow you to visualize the dates on the x-axis.
You can also use the pandas library
first, define your date parsing depending on the dates type in your file parser=pd.datetime.strptime(date, '%Y-%m-%d %H:%M:%S')
Then you read your file
tmp = pd.read_csv(your_file, parse_dates={'datetime': ['date', 'time']}, date_parser=parser, comment='#',delim_whitespace=True,names=['date', 'time', 'Values'])
data = tmp.set_index(tmp['datetime']).drop('datetime', axis=1)
You can adapt these lines if you need to represent only hours HH:MM:SS not the whole date.
N.B: Indexing will not be from 0 to data.values.shape[0] but the dates will be used as indexes. So if you want to plot you can do a import matplotlib.pyplot as plt and then plt.plot(data.index,data.Values)
You could use the polt Python package which I developed for this exact purpose. polt uses matplotlib to display data from multiple sources simulateneously.
Create a script adc_read.py that reads values from your ADC and prints them out:
import random, sys, time
def read_adc():
"""
Implement reading a voltage from your ADC here
"""
# simulate measurement delay/sampling interval
time.sleep(0.001)
# simulate reading a voltage between 0 and 5V
return random.uniform(0, 5)
while True:
# gather 100 readings
adc_readings = tuple(read_adc() for i in range(100))
# calculate average
adc_average = sum(adc_readings) / len(adc_readings)
# output average
print(adc_average)
sys.stdout.flush()
which outputs
python3 adc_read.py
# output
2.3187490696344444
2.40019412977279
2.3702603804716555
2.3793495215651435
2.5596985467604703
2.5433401603774413
2.6048815735614004
2.350392397280291
2.4372325168231948
2.5618046803145647
...
This output can then be piped into polt to display the live data stream:
python3 adc_read.py | polt live
Labelling can be achieved by adding metadata:
python3 adc_read.py | \
polt \
add-source -c- -o name=ADC \
add-filter -f metadata -o set-quantity=voltage -o set-unit='V' \
live
The polt documentation contains information on possibilities for further customization.

Groupby a data set in Python

I have 30 years of daily data. I want to calculate average daily over 30 years. For example, I have data like this
1/1/2036 0
1/2/2036 73.61180115
1/3/2036 73.77733612
1/4/2036 73.61183929
1/5/2036 73.75443268
1/6/2036 73.58483887
.........
12/22/2065 73.90600586
12/23/2065 74.38092804
12/24/2065 77.76309967
I want to calculate:
1/1/yyyy ?
1/2/yyyy ?
1/3/yyyy ?
......
12/30/yyyy ?
12/31/yyyy ?
I wrote a code in python but it's only calculating 1st month avg. My dataset is 10950 x 1 which will be converted to 365 x 1. Following is my code:
import pandas as pd
files=glob.glob('*2036-2065*rcp26*.csv*')
RO_act=pd.read_csv('Reservoir storage zones_sohom.csv',index_col=0,parse_dates=True)
for i, fl in enumerate(files):
df = pd.read_csv(fl, index_col=0,usecols=[0,78],parse_dates=True)
df1=df.groupby(pd.TimeGrouper(freq='D')).mean()
Please help
You can pass a function to df.groupby which will act on the indices to make the groups. So, for you, use:
df.groupby(lambda x: (x.day,x.month)).mean()
Consider the following series s
days = pd.date_range('1986-01-01', '2015-12-31')
s = pd.Series(np.random.rand(len(days)), days)
then what you're looking for is:
s.groupby([s.index.month, s.index.day]).mean()
Timing
#juanpa.arrivillaga's answer gives the same solution but is slower.

Loop through netcdf files and run calculations - Python or R

This is my first time using netCDF and I'm trying to wrap my head around working with it.
I have multiple version 3 netcdf files (NOAA NARR air.2m daily averages for an entire year). Each file spans a year between 1979 - 2012. They are 349 x 277 grids with approximately a 32km resolution. Data was downloaded from here.
The dimension is time (hours since 1/1/1800) and my variable of interest is air. I need to calculate accumulated days with a temperature < 0. For example
Day 1 = +4 degrees, accumulated days = 0
Day 2 = -1 degrees, accumulated days = 1
Day 3 = -2 degrees, accumulated days = 2
Day 4 = -4 degrees, accumulated days = 3
Day 5 = +2 degrees, accumulated days = 0
Day 6 = -3 degrees, accumulated days = 1
I need to store this data in a new netcdf file. I am familiar with Python and somewhat with R. What is the best way to loop through each day, check the previous days value, and based on that, output a value to a new netcdf file with the exact same dimension and variable.... or perhaps just add another variable to the original netcdf file with the output I'm looking for.
Is it best to leave all the files separate or combine them? I combined them with ncrcat and it worked fine, but the file is 2.3gb.
Thanks for the input.
My current progress in python:
import numpy
import netCDF4
#Change my working DIR
f = netCDF4.Dataset('air7912.nc', 'r')
for a in f.variables:
print(a)
#output =
lat
long
x
y
Lambert_Conformal
time
time_bnds
air
f.variables['air'][1, 1, 1]
#Output
298.37473
To help me understand this better what type of data structure am I working with? Is ['air'] the key in the above example and [1,1,1] are also keys? to get the value of 298.37473. How can I then loop through [1,1,1]?
You can use the very nice MFDataset feature in netCDF4 to treat a bunch of files as one aggregated file, without the need to use ncrcat. So you code would look like this:
from pylab import *
import netCDF4
f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()
atemp = f.variables['air']
print atemp
ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)
for i in xrange(ntimes):
cold_days += atemp[i,:,:].data-273.15 < 0
pcolormesh(cold_days)
colorbar()
And here's one way to write the file (there might be easier ways):
# create NetCDF file
nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)
cold_days_v = nco.createVariable('cold_days', 'i4', ( 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'
lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')
# copy all the variable attributes from original file
for var in ['lon','lat','x','y','Lambert_Conformal']:
for att in f.variables[var].ncattrs():
setattr(nco.variables[var],att,getattr(f.variables[var],att))
# copy variable data for lon,lat,x and y
lono[:]=f.variables['lon'][:]
lato[:]=f.variables['lat'][:]
xo[:]=f.variables['x'][:]
yo[:]=f.variables['y'][:]
# write the cold_days data
cold_days_v[:,:]=cold_days
# copy Global attributes from original file
for att in f.ncattrs():
setattr(nco,att,getattr(f,att))
nco.Conventions='CF-1.6'
nco.close()
If I try looking at the resulting file in the Unidata NetCDF-Java Tools-UI GUI, it seems to be okay:
Also note that here I just downloaded two of the datasets for testing, so I used
f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
as an example. For all the data, you could use
f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.????.nc')
or
f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc')
Here is an R solution.
infiles <- list.files("data", pattern = "nc", full.names = TRUE, include.dirs = TRUE)
outfile <- "data/air.colddays.nc"
library(raster)
r <- raster::stack(infiles)
r <- sum((r - 273.15) < 0)
plot(r)
I know this is rather late for this thread from 2013, but I just want to point out that the accepted solution doesn't provide the solution to the exact question posed. The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), which can be important for climate applications (e.g. for farming) whereas the accepted solution only gives the total number of days in a year that the temperature is below zero. If this is really what mkmitchell wants (it has been accepted as the answer) then it can be done in from the command line in cdo without having to worry about NETCDF input/output:
cdo timsum -lec,273.15 in.nc out.nc
so a looped script would be:
files=`ls *.nc` # pick up all the netcdf files in a directory
for file in $files ; do
# I use 273.15 as from the question seems T is in Kelvin
cdo timsum -lec,273.15 $file ${file%???}_numdays.nc
done
If you then want the total number over the whole period you can then cat the _numdays files instead which are much smaller:
cdo cat *_numdays.nc total.nc
cdo timsum total.nc total_below_zero.nc
But again, the question seems to want accumulated days per event, which is different, but not provided by the accepted answer.

How to show lines connecting latitude and longitude points in world map?

I have get a CSV file of longitude and latitude like the following (the total length is 86 points in the CSV):
Index lon lat
1 2.352222 48.85661
2 -72.922343 41.31632
3 108.926694 34.25005
4 -79.944163 40.44306
5 -117.328119 33.97329
6 -79.953423 40.4442
7 -84.396285 33.77562
8 -95.712891 37.09024
And now I want to plot a line from the point(32.06025,118.7969) to all these points(lon,lat) that like many arrow lines from one point.
I have try all this work in R and I have meet something strange.For example, if I use
map('world2Hires')
for (j in 1:length(location$lon)) {
inter <- gcIntermediate(c(lon_nj, lat_nj), c(location$lon[j], location$lat[j]), n=100, addStartEnd=TRUE)
lines(inter, col="black", lwd=0.8)
}
View(location)
The result is like this:
If all the lines point to USA further come across the pacific ocean that the map is very good. But it doesn't.
Do you have any idea? How can I realize this? Any tools are OK although I have experience in Python and R.
Thank you!
First, you have to add argument breakAtDateLine=TRUE inside function gcIntermediate(). This will ensure that if the line cross the DateLine function will produce two segments and will not connect points with straight line. All results of this calculation I stored in list gg. This list contains data frame for each line or list of data frames of line consists of two segments.
library(mapdata)
library(geosphere)
lon_nj<-118.7969
lat_nj<-32.06025
location<-structure(list(Index = 1:8, lon = c(2.352222, -72.922343, 108.926694,
-79.944163, -117.328119, -79.953423, -84.396285, -95.712891),
lat = c(48.85661, 41.31632, 34.25005, 40.44306, 33.97329,
40.4442, 33.77562, 37.09024)), .Names = c("Index", "lon",
"lat"), class = "data.frame", row.names = c(NA, -8L))
gg<-lapply(1:length(location$lon),function(j) {
gcIntermediate(c(lon_nj, lat_nj), c(location$lon[j],
location$lat[j]), n=100,
breakAtDateLine=TRUE,
addStartEnd=TRUE)
})
This will change your list so that each segment is in separate data frame and not in list of list.
gg2<-unlist(lapply(gg, function(x)
if (class(x) == "list") x else list(x)), recursive=FALSE)
To plot those data again you can use function lapply().
If you use map("world) then do just
map("world")
lapply(gg2,lines)
If you use map('world2Hires') then this map is based on 0-360 latitudes. So you have to add 360 to those x coordinate values that are negative.
map('world2Hires')
lapply(gg2,function(x) lines(ifelse(x[,1]>0,x[,1],x[,1]+360),x[,2]))

Categories