I wanted to plot simple distribution of these data with years, but when plotting using the following code:
import matplotlib.pyplot as plt
plt.ylim(2000,2020)
plt.plot (years,Global)
it didn't show any plots, in addition, the global list when looking for its length it was read as one element, i couldn't understand why?
[0 1290.6
1 1256.0
2 1198.1
3 1128.4
4 1070.9
5 1011.7
6 975.7
7 945.5
8 954.1
9 885.9
10 832.8
11 805.4
12 736.3
13 715.2
14 677.6
15 647.1
16 642.0
17 582.1
Name: Global, dtype: float64]
I found the issue.
You have set ylim as 2000 to 2020 and when you do plt.plot(years,Global), the Global list is plotted on the y axis and since none of the values in the Global list are in the given range(i.e 2000 to 2020) they aren't plotted on the graph.
The simplest method is to use yticks instead of ylim. ylim specifies the range of values on the y axis, whereas yticks specifies the value data points on the y axis.
Try this -
import matplotlib.pyplot as plt
years = list(range(2000,2018))
Global = [1290.6,1256.0, 1198.1, 1128.4, 1070.9, 1011.7, 975.7, 945.5, 954.1, 885.9, 832.8, 805.4, 736.3,715.2, 677.6, 647.1, 642.0, 582.1]
plt.plot (Global,years)
plt.yticks(years)
plt.show()
Here I have taken the years on y axis with values 2000 to 2017 (I chose 17 years as your Global list has 17 elements and to plot two lists, they must have same number of elements) and set the yticks as the elements of list years.
Output -
I hope this helped you ..
Related
Let me begin this query by admitting that I am very new to Python. I want to create contour plot of the data in Python so as to automate the process, which otherwise can be easily carried out using Surfer. I have 1000s of such data files, and creating manually could be very tedious.
The data I'm using looks like follows, which is a dataframe with 0, 1 and 2 headers and 1,2,..279 as index:
0 1 2
0 3 -1 -0.010700
1 4 -1 0.040100
2 5 -1 0.061000
3 6 -1 0.052000
4 7 -1 0.013100
.. .. .. ...
275 30 -9 -1.530100
276 31 -9 -1.362300
277 32 -9 -1.190200
278 33 -9 -1.083600
279 30 -10 -1.864600
[280 rows x 3 columns]
Here,
x=data[0]
y=data[1]
z=data[2]
As contour function pf matplotlib requires z to be a 2D array; this is where the confusion begins. Following several solutions of stackoverflow queries, I did the following:
import numpy as np
x=np.array(x)
y=np.array(y)
z=np.array(z)
X, Y = np.meshgrid(x, y)
import scipy.interpolate
rbf = scipy.interpolate.Rbf(x, y, z, function='cubic')
Z=rbf(X,Y)
lmin=data[2].min()
lmax=data[2].max()
progn=(lmax-lmin)/20
limit=np.arange(lmin,lmax,progn)
fig, ax = plt.subplots(figsize=(6,2)) #x ranges between 3 to 57, y -1 to -10
ax.contour(X,Y,Z,limit)
ax.set_title('Contour Plot')
plt.show()
With the above code this plot is derived.
However, it is not desired and if once can see through the surfacial noise lines then there are ordered contour lines underneath, which actually is desired as seen from the contour plot generated by surfer here.
I'd like to reiterate that the same data was used in generating the surfer plot.
Any help in creating the desired plot shall be highly appreciated.
Thanks to #JohanC for the answer. I'd like to put his suggestion to perspective with my query.
ax.contour replaced by ax.tricontour solves my situation. And ax.tricontourf gets the contour fill done. Therefore, the last segment of my code would be:
fig, ax = plt.subplots(figsize=(6,2)) #x ranges between 3 to 57, y -1 to -10
ax.tricontour(X,Y,Z,limit)
ax.tricontourf(X,Y,Z,limit)
ax.set_title('Contour Plot')
plt.show()
I had a similar issue working with irregularly spaced data on top of that loss of missing data. The suggestion made by someone about a 2D scatter plot is the perfect solution.
plt.figure(figsize=(10,10))
plt.scatter(df.doy,df[i].UT,c=df[i].TEC,s=10,cmap="jet")
plt.colorbar()
Likewsie, I plotted the same contour plot using plt.tricontourf and got the same result
plt.figure(figsize=(10,10))
plt.tricontourf(df.doy,df.UT,df.TEC,100,cmap="jet")
plt.colorbar()
IM trying to create plots in python.the first 10 rows of the dataset named Psmc_dolphin looks like the below. the original file has 57 rows and 5 columns.
0 1 2 3 4
0 0.000000e+00 11.915525 299.807861 0.000621 0.000040
1 4.801704e+03 11.915525 326.288712 0.000675 0.000311
2 1.003041e+04 11.915525 355.090418 0.000735 0.000497
3 1.572443e+04 11.915525 386.413025 0.000800 0.000548
4 2.192481e+04 0.583837 8508.130872 0.017613 0.012147
5 2.867635e+04 0.583837 9092.811889 0.018823 0.014021
6 3.602925e+04 0.466402 12111.617016 0.025073 0.019815
7 4.403533e+04 0.466402 12826.458632 0.026553 0.021989
8 5.275397e+04 0.662226 9587.887034 0.019848 0.017158
9 6.224833e+04 0.662226 10201.024439 0.021118 0.018877
10 7.258698e+04 0.991930 7262.773560 0.015035 0.013876
im trying to plot the column0 in x axis and column1 in y axis i get a plot with xaxis values 1000000,2000000,3000000,400000 etc. andthe codes i used are attached below.
i need to adjust the values in x axis so that the x axis should have values such as 1e+06,2e+06,3e+06 ... etc instead of 1000000,2000000,3000000,400000 etc .
# load the dataset
Psmc_dolphin = pd.read_csv('Beluga_mapped_to_dolphin.0.txt', sep="\t",header=None)
plt.plot(Psmc_dolphin[0],Psmc_dolphin[1],color='green')
Any help or suggstion will be appreciated
Scaling the values might help you. Convert 1000000 to 1,2000000 to 2 and so on . Divide the values by 1000000. Or use some different scale like logarithmic scale. I am no expert just a newbie but i think this might help
Good time of day everyone! I'm working on a simple script for quality analysis that compares original and duplicate samples and plots those on a scatter plot.
So far I've been able to create plots that I need:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
'''read file'''
duplicates_file = 'C:/Users/cherp2/Desktop/duplicates.csv'
duplicates = pd.read_csv(
duplicates_file, usecols=['SAMPLE_NUMBER','Duplicate Sample Type'
,'FE', 'P','SIO2','AL2O3'
,'Orig. Sample Type', 'FE.1', 'P.1'
,'SIO2.1','AL2O3.1'])
'''calculate standard deviations for grades'''
grades = ['FE','P','SIO2','AL2O3']
for grade in grades:
grade_std = duplicates[grade].std()
'''create scatter plots for all grades'''
ax = duplicates.plot.scatter(f'{grade}', f'{grade}.1')
ax.set_xlabel('Original sample')
ax.set_ylabel('Duplicate sample')
but now I want to color plot points by a condition: if a grade difference between the original and duplicate sample is less than one standard deviation point should be in green, if it's between 2 and 3 stdev it should be orange and red if more than that.
I've been trying to find solutions online but so far nothing has worked. I have a feeling that I'd need to use some lambda function here, but I'm not sure about the syntax.
You can pass a color argument to the plotting call (via c=) and using pandas.cut to generate the necessary color code for different categories based on std.
In [227]: df
Out[227]:
a b
0 0.991415 -0.627043
1 1.365594 -0.036651
2 -0.376318 -0.536504
3 1.041561 -2.180642
4 1.017692 -0.308826
5 -0.626566 1.613980
6 -1.302070 1.258944
7 -0.453499 0.411277
8 -0.927880 0.439102
9 -0.282031 1.249862
10 0.504829 0.536641
11 -1.528550 1.420456
12 0.774111 -1.086350
13 -1.662715 0.732753
14 -1.038514 -1.987912
15 -0.432515 3.104590
16 1.682876 0.663448
17 0.287642 -1.038507
18 -0.307923 -2.340498
19 -1.024045 -1.948608
In [228]: change = df.a - df.b
In [229]: df.plot(kind='scatter', x='a', y='b',
c=pd.cut(((change - change.mean()) / (change.std())).abs(), [0, 1, 2, 3], labels=['r', 'g', 'b']))
I have the following dataframe:
Year Month Value
2005 9 1127.080000
2016 3 9399.000000
5 3325.000000
6 120.000000
7 40.450000
9 3903.470000
10 2718.670000
12 12108501.620000
2017 1 981879341.949982
2 500474730.739911
3 347482199.470025
4 1381423726.830030
5 726155254.759981
6 750914893.859959
7 299991712.719955
8 133495941.729959
9 27040614303.435833
10 26072052.099796
11 956680303.349909
12 755353561.609832
2018 1 1201358930.319930
2 727311331.659607
3 183254376.299662
4 9096130.550197
5 972474788.569924
6 779912460.479959
7 1062566320.859962
8 293262028544467.687500
9 234792487863.501495
As you can see, i have some huge values grouped by month and year. My problem is that i want to create a line plot, but when i do it, it doesn't make any sense to me:
df.plot(kind = 'line', figsize = (20,10))
The visual representation of the data doesn't make much sense taking into account that the values fluctuate over the months and years, but a flat line is shown for the most of the period and big peak at the end.
I guess the problem may be in the y axis scale that is not correctly fitting the data. I have tried to apply a log transformation to the y axis, but this don't add any changes, i have also tried to normalize the data between 0 and 1 just for test, but the plot still the same. Any ideas about how to get a more accurate representation of my data over the time period? And also, how can I display the name of the month and year in the x axis?
EDIT:
This is how i applied the log transform:
df.plot(kind = 'line', figsize = (20,10), logy = True)
and this is the result:
for me this plot still not really readable, taking into account that the plotted values represent income over the time, applying a logarithmic transformation to money values doesn't make much sense to me anyway.
Here is how i normalized the data:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
df_scaled.set_index(df.index, inplace = True)
And then i plotted it:
df_scaled.plot(kind = 'line', figsize = (20, 10), logy = True)
As you can see, noting seems to change with this, i'm a bit lost about how to correctly visualize these data over the given time periods.
The problem is that one value is much much bigger than the others, causing that spike. Instead, use a semi-log plot
df.plot(y='Value', logy=True)
outputs
To make it use the date as the x-axis do
df['Day'] = 1 # we need a day
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])
df.plot(x='Date', y='Value', logy=True)
which outputs
I want to know if it is possible to measure the correlation between a quantitative variable (in my case the average daily consumption of households) and a qualitative variable (in my case the month : 1, 2, ..., 12) in python ?
month | avg_daily_consumption
------------------------------------------
1 | 12.11836586156116
2 |11.713968603585668
3 |11.902829015188159
4 |10.12066900094302
5 |8.879703717271864
6 |8.384419625257689
7 |8.146453593663365
8 |7.961394876525876
9 |8.748848024841289
10 |9.820944144869841
11 |11.247017177860053
12 |12.069888731716086
Thanks.
We can use numpy and matplotlib libraries to show if there is any correlation.
The following was written in a Jupyter notebook but should work in Python with the removal of the line commented #remove
import numpy as np
#x values
x = [1,2,3,4,5,6,7,8,9,10,11,12]
# y values
y = [12.11836586156116, 11.713968603585668, 11.902829015188159, 10.12066900094302, 8.879703717271864, 8.384419625257689, 8.146453593663365, 7.961394876525876, 8.748848024841289, 9.820944144869841, 11.247017177860053 , 12.069888731716086]
print( np.corrcoef(x, y))
This outputs:
[[ 1. -0.22316588]
[-0.22316588 1. ]]
which shows a small negative correlation.
We can then plot the x,y values:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline # remove if not in Jupyter notebook
matplotlib.style.use('ggplot')
plt.scatter(x, y)
plt.show()
This gives us the following plot - sting that there is no straight correlation between the month and monthly consumption.
This looks like it may be a cyclical consumption. Assuming that the 1-12 are months then it looks like consumption rises from the middle of the year to year end, then drops to the mid-year point, and rises again. Adding data from preceding to succeeding years would she if that was the case.