How to plot numbers from an array as annotation using matplotlib? - python

I am trying to produce the map on basemap using vales extracted from meteorological data. Sample code is:-
y=[2.56422, 3.77284,3.52623,3.51468,3.02199]
z=[0.15, 0.3, 0.45, 0.6, 0.75]
n=[58,651,393,203,123]
fig, ax = plt.subplots()
ax.scatter(z, y)
for i, txt in enumerate(n):
ax.annotate(txt, (z[i],y[i]))
The data I am using is a numpy array. I dont know how to loop through each array to plot the kind of map similar to above. I would like to plot only values (ie. no countour or contourf).
Initially I was trying to plot float values using pylab.plot function. However, it retured with error
ValueError: third arg must be a format string
Then I tried to convert this numpy array to string and then plot with this command:-
temperature = np.array2string(data, precision=2)
and the print statement looks like a modified string:-
print temperature
[[ 19.69 21.09 21.57 21.45 20.59 20.53 20.93 20.63 20.64 21.26
21.29 20.63 20.98 21.01 20.84 20.81 20.55 20.33 20.52 20.23
19.84]
[ 20.77 21.35 20.81 20.64 20.9 20.78 20.79 23.57 20.11 21.07
21.06 21.33 21.48 21.18 21.4 21.09 20.5 20.31 20.12 19.8
19.97]
[ 21.51 21.23 20.55 20.08 20.05 20.78 21.17 24.77 21.17 20.95
21.43 21.47 21.46 21.77 21.69 21.13 20.47 20.04 20.08 20.37
20.14]
[ 21.29 21.1 20.63 20.32 20.22 20.37 24.4 23.82 22.23 21.03
22.11 22.62 22.71 22.37 21.73 21.35 21.03 20.67 20.58 20.89
20.93]
[ 21.24 21.04 20.68 20.56 20.76 20.91 24.26 23.75 23.28 21.26
21.48 22. 21.94 21.78 21.36 21.14 20.96 20.92 21.1 21.19
21.31]
[ 20.83 20.88 20.6 20.87 21.01 21.91 22.33 22.21 21.74 20.66
20.76 20.73 21.04 21.09 20.83 20.7 20.72 20.71 21.23 21.04
20.73]
[ 20.32 20.41 20.19 20.05 20.68 22.17 21.82 20.67 19.85 19.02
18.91 19.6 20.15 20.64 20.64 20.09 19.81 19.76 19.9 19.94
19.46]
[ 19.68 20.37 20.56 20.68 20.93 21.28 21.24 20.33 20.7 20.
18.72 18.94 19.56 19.57 19.83 19.74 19.17 18.53 18.1 18.72
19.12]
[ 18.88 19.71 20.77 20.81 20.32 21.58 20.96 21.33 21.2 20.17
19.95 22.05 19.72 19.85 19.3 18.75 18.69 18.44 17.57 17.2
18.22]
[ 19.11 19.19 20.13 20.78 21.25 21.98 21.15 20.96 20.66 20.14
20.51 21.92 20.36 20.27 19. 18.22 17.81 17.58 17.16 16.67
17.46]
[ 18.5 19.28 19.57 20.01 21.16 21.01 21.06 20.93 20.62 19.89
20.3 20.7 19.7 19.76 18.24 17. 16.36 16.63 17.62 17.32
17.38]
[ 17.6 18.33 20.27 19.97 20.63 20.51 21.09 21.39 20.81 19.55
20. 18.3 17.32 18.24 17.57 17.15 16.42 15.76 16.14 16.45
21.95]
[ 17.04 17.55 18.16 18.32 21.23 20.5 20.41 19.82 20.7 20.55
20.41 18.47 18.05 17.63 17.11 15.6 16.02 15.46 14.29 13.88
23.04]]
Finally, I get this error when I tried to plot the above value on a map with this line
pylab.plot(x, y, temperature)
'Unrecognized character %c in format string' % c)
ValueError: Unrecognized character [ in format string
Problem seems to be with nparray to string conversion.
Any help to solve this issue is appreciated.

Your original solution with ax.annotate is perfectly fine for your more general solution. The only thing to change is that in case of 2d arrays, you need to flatten them before looping over them using np.ravel() (which is also a method of the ndarray class).
However, in your specific case you can spare explicit indexing and the use of ravel() by broadcasting the three arrays you need to plot:
import numpy as np
import matplotlib.pyplot as plt
# generate some dummy data
rng = np.random.default_rng()
z, y = np.mgrid[:3, :3]
n = rng.integers(low=50, high=500, size=z.shape)
fig, ax = plt.subplots()
ax.scatter(z, y)
for zz, yy, txt in np.broadcast(z, y, n):
ax.annotate(txt, (zz, yy))
Note that the result of np.broadcast is the same as if we'd used zip(z.ravel(), y.ravel(), n.ravel()).

Related

Animate px.line line with plotly express

Learning plotly line animation and come across this question
My df:
Date
1Mo
2Mo
3Mo
6Mo
1Yr
2Yr
0
2023-02-12
4.66
4.77
4.79
4.89
4.50
4.19
1
2023-02-11
4.66
4.77
4.77
4.90
4.88
4.49
2
2023-02-10
4.64
4.69
4.72
4.88
4.88
4.79
3
2023-02-09
4.62
4.68
4.71
4.82
4.88
4.89
4
2023-02-08
4.60
4.61
4.72
4.83
4.89
4.89
How do I animate this dataframe so the frame has
x = [1Mo, 2Mo, 3Mo, 6Mo, 1Yr, 2Yr], and
y = the actual value on a date, eg y=df[df['Date']=="2023-02-08"], animation_frame = df['Date']?
I tried
plot = px.line(df, x=df.columns[1:], y=df['Date'], title="Treasury Yields", animation_frame=df_treasuries_yield['Date'])
No joy :(
I think the problem is you cannot pass multiple columns to the animation_frame parameter. But we can get around this by converting your df from wide to long format using pd.melt – for your data, we will want to take all of the values from [1Mo, 2Mo, 3Mo, 6Mo, 1Yr, 2Yr] and put them a new column called "value" and we will have a variable column called "variable" to tell us which column the value came from.
df_long = pd.melt(df, id_vars=['Date'], value_vars=['1Mo', '2Mo', '3Mo', '6Mo', '1Yr', '2Yr'])
This will look like the following:
Date variable value
0 2023-02-12 1Mo 4.66
1 2023-02-11 1Mo 4.66
2 2023-02-10 1Mo 4.64
3 2023-02-09 1Mo 4.62
4 2023-02-08 1Mo 4.60
...
28 2023-02-09 2Yr 4.89
29 2023-02-08 2Yr 4.89
Now can pass the argument animation_frame='Date' to px.line:
fig = px.line(df_long, x="variable", y="value", animation_frame="Date", title="Yields")

matplotlib for stock data analysis plot not correct

I'm using matplotlib to draw trendance line for stock data.
import pandas as pd
import matplotlib.pyplot as plt
A = pd.read_csv('daily/A.csv', index_col=[0])
print(A)
AAL = pd.read_csv('daily/AAL.csv', index_col=[0])
print(AAL)
A['Close'].plot()
AAL['Close'].plot()
plt.show()
then result is:
High Low Open Close Volume Adj Close
Date
1999-11-18 35.77 28.61 32.55 31.47 62546300.0 27.01
1999-11-19 30.76 28.48 30.71 28.88 15234100.0 24.79
1999-11-22 31.47 28.66 29.55 31.47 6577800.0 27.01
1999-11-23 31.21 28.61 30.40 28.61 5975600.0 24.56
1999-11-24 30.00 28.61 28.70 29.37 4843200.0 25.21
... ... ... ... ... ... ...
2020-06-24 89.08 86.32 89.08 86.56 1806600.0 86.38
2020-06-25 87.35 84.80 86.43 87.26 1350100.0 87.08
2020-06-26 87.56 85.52 87.23 85.90 2225800.0 85.72
2020-06-29 87.36 86.11 86.56 87.29 1302500.0 87.29
2020-06-30 88.88 87.24 87.33 88.37 1428931.0 88.37
[5186 rows x 6 columns]
High Low Open Close Volume Adj Close
Date
2005-09-27 21.40 19.10 21.05 19.30 961200.0 18.19
2005-09-28 20.53 19.20 19.30 20.50 5747900.0 19.33
2005-09-29 20.58 20.10 20.40 20.21 1078200.0 19.05
2005-09-30 21.05 20.18 20.26 21.01 3123300.0 19.81
2005-10-03 21.75 20.90 20.90 21.50 1057900.0 20.27
... ... ... ... ... ... ...
2020-06-24 13.90 12.83 13.59 13.04 140975500.0 13.04
2020-06-25 13.24 12.18 12.53 13.17 117383400.0 13.17
2020-06-26 13.29 12.13 13.20 12.38 108813000.0 12.38
2020-06-29 13.51 12.02 12.57 13.32 114650300.0 13.32
2020-06-30 13.48 12.88 13.10 13.07 68669742.0 13.07
[3715 rows x 6 columns]
yes, the begin of 2 stocks is different, the end date is same.
so i get the plot is like this:
stockplot
this is not normal like others.
so, who could give me any advice, to draw a normal trendance line for 2 stocks?
You can try for making two different plots with same limits and then put one over the another for comparison.

Pandas Building a subtable based on means

I have a DataFrame containing columns of numerical and non-numerical data. Here's a slice of it:
ATG12 Norm ATG5 Norm ATG7 Norm Cancer Stage
5.55 4.99 8.99 IIA
4.87 5.77 8.88 IIA
5.98 7.88 8.34 IIC
I want to group data by Cancer Stage, take the mean of every numerical data column and produce a table which lists means for each Cancer Stage; like this:
Cancer Stage ATG12 Mean ATG5 Mean ATG7 Mean
IIA 5.03 6.20 8.34
IIB 7.45 4.22 7.99
IIIA 5.32 3.85 6.68
I've figured out the groupby and mean() functions and can compute the means for one column at a time with:
AVG = data.groupby("Cancer Stage")['ATG12 Norm'].mean()
But that only gives me:
Cancer Stage
IIA 5.03
IIB 7.45
IIIA 5.32
Name: ATG12 Norm, dtype: float64
How can I apply this process to all the columns I want at once and produce a dataframe of it all? Sorry if this is a repeat; the pandas questions I've found that seem to be about related topics are all over my head.
Did you try
df.groupby('Cancer Stage').mean()
or
df.groupby('Cancer Stage')['ATG12 Norm','ATG5 Norm'].mean()
Example data with extra text column:
import pandas as pd
from StringIO import StringIO
data='''ATG12 Norm ATG5 Norm ATG7 Norm Cancer Stage Text
5.55 4.99 8.99 IIA ABC
4.87 5.77 8.88 IIA ABC
5.98 7.88 8.34 IIC ABC'''
df = pd.DataFrame.from_csv(StringIO(data), index_col=None, sep='\s{2,}')
print df
print df.groupby('Cancer Stage')['ATG12 Norm','ATG5 Norm'].mean()
print df.groupby('Cancer Stage').mean()
result:
ATG12 Norm ATG5 Norm ATG7 Norm Cancer Stage Text
0 5.55 4.99 8.99 IIA ABC
1 4.87 5.77 8.88 IIA ABC
2 5.98 7.88 8.34 IIC ABC
ATG12 Norm ATG5 Norm
Cancer Stage
IIA 5.21 5.38
IIC 5.98 7.88
ATG12 Norm ATG5 Norm ATG7 Norm
Cancer Stage
IIA 5.21 5.38 8.935
IIC 5.98 7.88 8.340

Read a table according to a certain number of characters

I want to extract the name of comets from my table held in a text file. However some comets are 1-worded, others are 2-worded, and some are 3-worded. My table looks like this:
9P/Tempel 1 1.525 0.514 10.5 5.3 2.969
27P/Crommelin 0.748 0.919 29.0 27.9 1.484
126P/IRAS 1.713 0.697 45.8 13.4 1.963
177P/Barnard 1.107 0.954 31.2 119.6 1.317
P/2008 A3 (SOHO) 0.049 0.984 22.4 5.4 1.948
P/2008 Y11 (SOHO) 0.046 0.985 24.4 5.3 1.949
C/1991 L3 Levy 0.983 0.929 19.2 51.3 1.516
However, I know that the name of the comets is from character 5 till character 37. How can I write a code to tell python that the first column is from character 5 till character 37?
data = """9P/Tempel 1 1.525 0.514 10.5 5.3 2.969
27P/Crommelin 0.748 0.919 29.0 27.9 1.484
126P/IRAS 1.713 0.697 45.8 13.4 1.963
177P/Barnard 1.107 0.954 31.2 119.6 1.317
P/2008 A3 (SOHO) 0.049 0.984 22.4 5.4 1.948
P/2008 Y11 (SOHO) 0.046 0.985 24.4 5.3 1.949
C/1991 L3 Levy 0.983 0.929 19.2 51.3 1.516""".split('\n')
To read the whole file you can use
f = open('data.txt', 'r').readlines()
It seems that you have columns that you can use.
If you're only interested in the first column then :
len("9P/Tempel 1 ")
It gives 33.
So,
Extract the first column :
for line in data:
print line[:33].strip()
Here what's printed :
9P/Tempel 1
27P/Crommelin
126P/IRAS
177P/Barnard
P/2008 A3 (SOHO)
P/2008 Y11 (SOHO)
C/1991 L3 Levy
If what you want is :
Tempel 1
Crommelin
IRAS
...
You have to use a regular expression.
Example :
reg = '.*?/[\d\s]*(.*)'
print re.match(reg, '27P/Crommelin').group(1)
print re.match(reg, 'C/1991 L3 Levy').group(1)
Here's the output :
Crommelin
L3 Levy
You can also take a glance to the read_fwf of the python pandas library.
It allows to parse your file specifying the number of characters per columns.

Python Pandas interpolate with new x-axis

I want to do interpolation for a Pandas series of the following structure
X
22.88 3.047
45.75 3.215
68.63 3.328
91.50 3.423
114.38 3.516
137.25 3.578
163.40 3.676
196.08 3.756
228.76 3.861
261.44 3.942
294.12 4.012
326.80 4.084
359.48 4.147
392.16 4.197
Name: Y, dtype: float64
I want to interpolate the data so that I have a new series to cover X=[23:392:1]. I looked up the document but didn't find where I could input the new x-axis. Did I miss something? How can I do interpolation with the new x-axis?
This can be done with pandas's reindex and interpolate:
In [27]: s
Out[27]:
1
0
22.88 3.047
45.75 3.215
68.63 3.328
91.50 3.423
114.38 3.516
137.25 3.578
163.40 3.676
196.08 3.756
228.76 3.861
261.44 3.942
294.12 4.012
326.80 4.084
359.48 4.147
392.16 4.197
[14 rows x 1 columns]
In [28]: idx = pd.Index(np.arange(23, 392))
In [29]: s.reindex(s.index + idx).interpolate(method='values')
Out[29]:
1
22.88 3.047000
23.00 3.047882
24.00 3.055227
25.00 3.062573
26.00 3.069919
27.00 3.077265
28.00 3.084611
29.00 3.091957
30.00 3.099303
31.00 3.106648
32.00 3.113994
33.00 3.121340
34.00 3.128686
35.00 3.136032
36.00 3.143378
37.00 3.150724
38.00 3.158070
39.00 3.165415
40.00 3.172761
41.00 3.180107
42.00 3.187453
43.00 3.194799
44.00 3.202145
45.00 3.209491
45.75 3.215000
46.00 3.216235
47.00 3.221174
48.00 3.226112
The idea is the create the index you want (s.index + idx), which is sorted automatically, reindex an that (which makes a bunch of NaNs at the new points, and the interpolate to fill the NaNs, using the values method, which interpolates at the index points.
You can call numpy.interp() directly:
import numpy as np
import pandas as pd
import io
data = """x y
22.88 3.047
45.75 3.215
68.63 3.328
91.50 3.423
114.38 3.516
137.25 3.578
163.40 3.676
196.08 3.756
228.76 3.861
261.44 3.942
294.12 4.012
326.80 4.084
359.48 4.147
392.16 4.197"""
s = pd.read_csv(io.BytesIO(data), delim_whitespace=True, index_col=0, squeeze=True)
new_idx = np.arange(23,393)
new_val = np.interp(new_idx, s.index.values.astype(float), s.values)
s2 = pd.Series(new_val, new_idx)

Categories