EDIT 2
I fixed one part of the code that was wrong, With that line of code, I add the category for every information (Axis X).
y = joy(cat, EveryTest[i].GPS)
After adding that line of code, the graph improved, but something is still failing. The graph starts with the 4th category (I mean 12:40:00), and it must start in the first (12:10:00), What I am doing wrong?
EDIT 1:
I Updated Bkoeh to 0.12.13, then the label problem was fixed.
Now my problem is:
I suppose the loop for (for i, cat in enumerate(reversed(cats)):) put every chart on the label, but do not happen that. I see the chart stuck in the 5th o 6th label. (12:30:00 or 12:50:00)
- Start of question -
I am trying to reproduce the example of joyplot. But I have trouble when I want to lot my own data. I dont want to plot an histogram, I want to plot some list in X and some list in Y. But I do not understand what I am doing wrong.
the code (Fixed):
from numpy import linspace
from scipy.stats.kde import gaussian_kde
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, FixedTicker, PrintfTickFormatter
from bokeh.plotting import figure
#from bokeh.sampledata.perceptions import probly
bokeh.BOKEH_RESOURCES='inline'
import colorcet as cc
output_file("joyplot.html")
def joy(category, data, scale=20):
return list(zip([category]*len(data),data))
#Elements = 7
cats = ListOfTime # list(reversed(probly.keys())) #list(['Pos_1','Pos_2']) #
print len(cats),' lengh of times'
palette = [cc.rainbow[i*15] for i in range(16)]
palette += palette
print len(palette),'lengh palette'
x = X # linspace(-20,110, 500) #Test.X #
print len(x),' lengh X'
source = ColumnDataSource(data=dict(x=x))
p = figure(y_range=cats, plot_width=900, x_range=(0, 1500), toolbar_location=None)
for i, cat in enumerate(reversed(cats)):
y = joy(cat, EveryTest[i].GPS)
#print cat
source.add(y, cat)
p.patch('x', cat, color=palette[i], alpha=0.6, line_color="black", source=source)
#break
print source
p.outline_line_color = None
p.background_fill_color = "#efefef"
p.xaxis.ticker = FixedTicker(ticks=list(range(0, 1500, 100)))
#p.xaxis.formatter = PrintfTickFormatter(format="%d%%")
p.ygrid.grid_line_color = None
p.xgrid.grid_line_color = "#dddddd"
p.xgrid.ticker = p.xaxis[0].ticker
p.axis.minor_tick_line_color = None
p.axis.major_tick_line_color = None
p.axis.axis_line_color = None
#p.y_range.range_padding = 0.12
#p
show(p)
the variables are:
print X, type(X)
[ 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75
78 81 84 87 90 93 96 99] <type 'numpy.ndarray'>
and
print EveryTest[0].GPS, type(EveryTest[i].GPS)
0 2
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 2
10 2
11 2
12 2
13 2
14 2
15 2
16 2
17 2
18 2
19 2
20 2
21 2
22 2
23 2
24 2
25 2
26 2
27 2
28 2
29 2
30 2
31 2
32 2
Name: GPS, dtype: int64 <class 'pandas.core.series.Series'>
Following the example, the type of data its ok. But I get the next image:
And I expected something like this:
Related
I am trying to use a Canadian gridded historical dataset of temperature anomalies but it seems that I don't have the skills to pull that off. The grd file are temperatures anomalies on what I believe is a highly regular grid. I have no experience with that kind of grid and I am having trouble building the xarray dataset.
What I have (a subset of the grd and the text file is accessible here) :
2075 '.grd' files ('t190001.grd' to 't202112.grd' following "t{year}{month}.grd" structure)
1 txt file listing the grid coordinates called "CANGRD_points_LL.txt"
From this I would like to build a xarray dataset in order to do some analysis.
Naively, I thought the grid files were already georeferenced and all so I started by doing this :
import glob
import rioxarray as rio
import pandas as pd
import numpy as np
import xarray as xr
#not used for the moment even though I believe that will be needed
#df = pd.read_csv(r"CANGRD_points_LL.txt", sep = ' ', header=None)
list_files = sorted(set(glob.glob(r"t?????[0-2].grd" ) + glob.glob(r"t????0[0-9].grd" )))
times = pd.date_range("1900/01/01",freq='M', periods= len(list_files))
datarrays = [rio.open_rasterio(rst, masked=True,band_as_variable=True).assign_coords(time = t).expand_dims(dim='time').squeeze() for rst,t in zip(list_files, times)]
ds = xr.concat(datarrays,dim='time').rename({'band_1' : 'tas', 'y': 'lat', 'x' : 'lon'})
But as I plotted the results it became evident that my coordinates were only the indices of the pixels :
So I believe I have to use the txt file provided, however, I have no idea how to make the xarray grid using the grid's coordinates and how to make that match with my array obtained by loading a grid via rioxarray. Here is a sample, the complete file is available above. What baffles me is that most of the 11874 lines of the dataframe resulting from the txt file seem to be unique, so how could I fit an array of dimensions 125 lon by 95 lat into it.
0 1 2 3
0 0 0 40.0451 -129.8530
1 0 1 40.1780 -129.3650
2 0 2 40.3080 -128.8740
3 0 3 40.4348 -128.3801
4 0 4 40.5585 -127.8834
5 0 5 40.6790 -127.3840
6 0 6 40.7963 -126.8817
7 0 7 40.9104 -126.3768
8 0 8 41.0211 -125.8693
9 0 9 41.1286 -125.3591
10 0 10 41.2327 -124.8465
11 0 11 41.3335 -124.3314
12 0 12 41.4308 -123.8140
13 0 13 41.5247 -123.2942
14 0 14 41.6151 -122.7722
15 0 15 41.7020 -122.2481
16 0 16 41.7853 -121.7218
17 0 17 41.8651 -121.1936
18 0 18 41.9413 -120.6634
19 0 19 42.0139 -120.1313
20 0 20 42.0828 -119.5975
21 0 21 42.1481 -119.0620
22 0 22 42.2097 -118.5249
23 0 23 42.2675 -117.9863
24 0 24 42.3216 -117.4462
25 0 25 42.3720 -116.9049
26 0 26 42.4186 -116.3622
27 0 27 42.4614 -115.8185
28 0 28 42.5005 -115.2736
29 0 29 42.5357 -114.7279
30 0 30 42.5670 -114.1812
31 0 31 42.5946 -113.6338
32 0 32 42.6182 -113.0857
33 0 33 42.6381 -112.5371
34 0 34 42.6540 -111.9880
35 0 35 42.6661 -111.4385
36 0 36 42.6743 -110.8888
37 0 37 42.6786 -110.3389
38 0 38 42.6791 -109.7889
39 0 39 42.6757 -109.2390
40 0 40 42.6684 -108.6892
41 0 41 42.6572 -108.1397
42 0 42 42.6421 -107.5905
43 0 43 42.6232 -107.0417
44 0 44 42.6004 -106.4935
45 0 45 42.5738 -105.9459
46 0 46 42.5433 -105.3991
47 0 47 42.5090 -104.8531
48 0 48 42.4708 -104.3081
49 0 49 42.4289 -103.7640
Here is the view of one grid file loaded as xarray,
Any help would be greatly appreciated! Thank you so much
I directly asked on the Xarray Github discussion here is the original answer from Keewis:
https://github.com/pydata/xarray/discussions/7443#discussioncomment-4700261
The grid file contains stacked 2D coordinates, which I guess is due to the grid's original coordinate system not being aligned with the lat / lon axes.
To read the coordinates into 2D coordinates you can use:
df = pd.read_csv(r"CANGRD_points_LL.txt", sep=" ", header=None, names=["y", "x", "lat", "lon"])
grid = df.set_index(["y", "x"]).to_xarray().set_coords(["lat", "lon"])
raw = xr.concat([...], dim="time")
ds = xr.merge([raw, grid]).assign_coords(time=times).rename_vars(...)
Here I have a dataset with three inputs. Three inputs x1,x2,x3. Here I want to read just x2 column and in that column data stepwise row by row.
Here I wrote a code. But it is just showing only letters.
Here is my code
data = pd.read_csv('data6.csv')
row_num =0
x=[]
for col in data:
if (row_num==1):
x.append(col[0])
row_num =+ 1
print(x)
result : x1,x2,x3
What I expected output is:
expected output x2 (read one by one row)
65
32
14
25
85
47
63
21
98
65
21
47
48
49
46
43
48
25
28
29
37
Subset of my csv file :
x1 x2 x3
6 65 78
5 32 59
5 14 547
6 25 69
7 85 57
8 47 51
9 63 26
3 21 38
2 98 24
7 65 96
1 21 85
5 47 94
9 48 15
4 49 27
3 46 96
6 43 32
5 48 10
8 25 75
5 28 20
2 29 30
7 37 96
Can anyone help me to solve this error?
If you want list from x2 use:
x = data['x2'].tolist()
I am not sure I even get what you're trying to do from your code.
What you're doing (after fixing the indentation to make it somewhat correct):
Iterate through all columns of your dataframe
Take the first character of the column name if row_num is equal to 1.
Based on this guess:
import pandas as pd
data = pd.read_csv("data6.csv")
row_num = 0
x = []
for col in data:
if row_num == 1:
x.append(col[0])
row_num = +1
print(x)
What you probably want to do:
import pandas as pd
data = pd.read_csv("data6.csv")
# Make a list containing the values in column 'x2'
x = list(data['x2'])
# Print all values at once:
print(x)
# Print one value per line:
for val in x:
print(val)
When you are using pandas you can use it. You can try this to get any specific column values by using list to direct convert into a list.For loop not needed
import pandas as pd
data = pd.read_csv('data6.csv')
print(list(data['x2']))
When I'm trying to plot a bar plot (of histograms), using pd.cut, I get a funny (and very annoying!) 0.001 added to the axis (from the left), making it starting from -1.001 instead of -1. The question is how to get rid of this? (please see the figure).
My code is:
out_i = pd.cut(df, bins=np.arange(-1,1.2,0.2), include_lowest=True)
out_i.value_counts(sort=False).plot.bar(rot=45, figsize=(6,6))
plt.tight_layout()
with df:
a
0 -0.402203
1 -0.019031
2 -0.979292
3 -0.701221
4 -0.267261
5 -0.563602
7 -0.454961
8 0.632456
9 -0.843081
10 -0.629253
11 -0.946188
12 -0.628178
13 -0.776933
14 -0.717091
15 -0.392144
16 -0.799408
17 -0.897951
18 0.255321
19 -0.641854
20 -0.356393
21 -0.507321
22 -0.698238
23 -0.985097
25 -0.661444
26 -0.751593
27 -0.437505
28 -0.413451
29 -0.798745
30 -0.736440
31 -0.672727
32 -0.807688
33 -0.087085
34 -0.393203
35 -0.979730
36 -0.902951
37 -0.454231
38 -0.561951
39 -0.388580
40 -0.706501
41 -0.408248
42 -0.377235
43 -0.283110
44 -0.517428
45 -0.949603
46 -0.268667
47 -0.376199
48 -0.472293
49 -0.211781
50 -0.921520
51 -0.345870
53 -0.542487
55 -0.597996
In case it is acceptable to chop off the decimal points of the intervals, generate a custom list of interval labels and set this as the xticklabels of the plot:
out_i = pd.cut(df['a'], bins=np.arange(-1,1.2,0.2), include_lowest=True)
intervals = out_i.cat.categories
labels = ['(%.1f, %.1f]' % (int(interval.left*100)/100, interval.right) for interval in intervals]
ax = out_i.value_counts(sort=False).plot.bar(rot=45, figsize=(6,6))
ax.set_xticklabels(labels)
plt.tight_layout()
Which results in the following plot:
Note: this will always output a half-closed interval (a,b]. It can be improved by making the brackets dynamic as per the parameters of pd.cut.
i'm attempting to make a HeatMap just like this one using Bokeh.
Here is my dataframe Data from which i'm trying to make the HeatMap
Day Code Total
0 1 6001 44
1 1 6002 40
2 1 6006 8
3 1 6008 2
4 1 6010 38
5 1 6011 1
6 1 6014 19
7 1 6018 1
8 1 6019 1
9 1 6023 10
10 1 6028 4
11 2 6001 17
12 2 6010 2
13 2 6014 4
14 2 6020 1
15 2 6028 2
16 3 6001 48
17 3 6002 24
18 3 6003 1
19 3 6005 1
20 3 6006 2
21 3 6008 18
22 3 6010 75
23 3 6011 1
24 3 6014 72
25 3 6023 34
26 3 6028 1
27 3 6038 3
28 4 6001 19
29 4 6002 105
30 5 6001 52
...
And here is my code:
from bokeh.io import output_file
from bokeh.io import show
from bokeh.models import (
ColumnDataSource,
HoverTool,
LinearColorMapper
)
from bokeh.plotting import figure
output_file('SHM_Test.html', title='SHM', mode='inline')
source = ColumnDataSource(Data)
TOOLS = "hover,save"
# Creating the Figure
SHM = figure(title="HeatMap",
x_range=[str(i) for i in range(1,32)],
y_range=[str(i) for i in range(6043,6000,-1)],
x_axis_location="above", plot_width=400, plot_height=970,
tools=TOOLS, toolbar_location='right')
# Figure Styling
SHM.grid.grid_line_color = None
SHM.axis.axis_line_color = None
SHM.axis.major_tick_line_color = None
SHM.axis.major_label_text_font_size = "5pt"
SHM.axis.major_label_standoff = 0
SHM.toolbar.logo = None
SHM.title.text_alpha = 0.3
# Color Mapping
CM = LinearColorMapper(palette='RdPu9', low=Data.Total.min(), high=Data.Total.max())
SHM.rect(x='Day', y="Code", width=1, height=1,source=source,
fill_color={'field': 'Total','transform': CM})
show(SHM)
When i excecute my code i don't get any errors but i just get an empty Frame, as shown in the image below.
I've been struggling trying to find where is my mistake, ¿Why i'm getting this? ¿Where is my error?
The problem with your code is the data type that you are setting for the x and y axis range and the data type of your ColumnDataSource are different. You are setting the x_range and y_range to be a list of strings, but from looking at your data in csv format it will be treated as integers.
In your case, you would want to make sure that your Day and Code column are in
string format.
This can be easily done using the pandas package with
Data['Day'] = Data['Day'].astype('str')
Data['Code'] = Date['Code'].astype('str')
I have a list consisting of 148 entries. Each entry is a four digit number. I would like to print out the result as this:
1 14 27 40
2 15 28 41
3 16 29 42
4 17 30 43
5 18 31 44
6 19 32 45
7 20 33 46
8 21 34 47
9 22 35 48
10 23 36 49
11 24 37 50
12 25 38 51
13 26 39 52
53
54
55... and so on
I have some code that work for the first 13 rows and 4 columns:
kort_identifier = [my_list_with_the_entries]
print_val = 0
print_num_1 = 0
print_num_2 = 13
print_num_3 = 26
print_num_4 = 39
while (print_val <= 36):
print kort_identifier[print_num_1], '%10s' % kort_identifier[print_num_2], '%10s' % kort_identifier[print_num_3], '%10s' % kort_identifier[print_num_4]
print_val += 1
print_num_1 += 1
print_num_2 += 1
print_num_3 += 1
print_num_4 += 1
I feel this is an awful solution and there has to be a better and simpler way of doing this. I have searched through here (searched for printing tables and matrices) and tried those solution but none seems to work with this odd table/matrix behaviour that I need.
Please point me in the right direction.
A bit tricky, but here you go. I opted to manipulate the list until it had the right shape, instead of messing around with indexes.
lst = range(1, 149)
lst = [lst[i:i+13] for i in xrange(0, len(lst), 13)]
lst = zip(*[lst[i] + lst[i+4] + lst[i+8] for i in xrange(4)])
for row in lst:
for col in row:
print col,
print
It might be overkill, but you could just make a numpy array.
import numpy as np
x = np.array(kort_identifier).reshape(2, 13, 4)
for subarray in x:
for row in subarray:
print row