IM trying to create plots in python.the first 10 rows of the dataset named Psmc_dolphin looks like the below. the original file has 57 rows and 5 columns.
0 1 2 3 4
0 0.000000e+00 11.915525 299.807861 0.000621 0.000040
1 4.801704e+03 11.915525 326.288712 0.000675 0.000311
2 1.003041e+04 11.915525 355.090418 0.000735 0.000497
3 1.572443e+04 11.915525 386.413025 0.000800 0.000548
4 2.192481e+04 0.583837 8508.130872 0.017613 0.012147
5 2.867635e+04 0.583837 9092.811889 0.018823 0.014021
6 3.602925e+04 0.466402 12111.617016 0.025073 0.019815
7 4.403533e+04 0.466402 12826.458632 0.026553 0.021989
8 5.275397e+04 0.662226 9587.887034 0.019848 0.017158
9 6.224833e+04 0.662226 10201.024439 0.021118 0.018877
10 7.258698e+04 0.991930 7262.773560 0.015035 0.013876
im trying to plot the column0 in x axis and column1 in y axis i get a plot with xaxis values 1000000,2000000,3000000,400000 etc. andthe codes i used are attached below.
i need to adjust the values in x axis so that the x axis should have values such as 1e+06,2e+06,3e+06 ... etc instead of 1000000,2000000,3000000,400000 etc .
# load the dataset
Psmc_dolphin = pd.read_csv('Beluga_mapped_to_dolphin.0.txt', sep="\t",header=None)
plt.plot(Psmc_dolphin[0],Psmc_dolphin[1],color='green')
Any help or suggstion will be appreciated
Scaling the values might help you. Convert 1000000 to 1,2000000 to 2 and so on . Divide the values by 1000000. Or use some different scale like logarithmic scale. I am no expert just a newbie but i think this might help
Related
I have I believe relatively simple problem but I fail to find a solution to what I am doing wrong. I have a df of 4m + rows such as:
sum(bytes)
0 2.452768e+08
1 3.781524e+09
2 2.897799e+09
3 1.851381e+09
4 1.185865e+10
... ...
4159349 2.515966e+08
4159350 1.719197e+06
4159351 7.499110e+05
4159352 9.540200e+04
4159353 2.457000e+03
dtype = sum(bytes) -> float64
I want to make a histogram with 10 bins here so I can see the percentile distribution of my values and check which value is say 10% top cut off value. I dud follwing:
import matplotlib.pyplot as plt
plt.hist(df['sum(bytes)'], bins=10)
and the output graph ended up like this:
Can anyone let me know what am I doing wrong? Thanks a lot!
Let me begin this query by admitting that I am very new to Python. I want to create contour plot of the data in Python so as to automate the process, which otherwise can be easily carried out using Surfer. I have 1000s of such data files, and creating manually could be very tedious.
The data I'm using looks like follows, which is a dataframe with 0, 1 and 2 headers and 1,2,..279 as index:
0 1 2
0 3 -1 -0.010700
1 4 -1 0.040100
2 5 -1 0.061000
3 6 -1 0.052000
4 7 -1 0.013100
.. .. .. ...
275 30 -9 -1.530100
276 31 -9 -1.362300
277 32 -9 -1.190200
278 33 -9 -1.083600
279 30 -10 -1.864600
[280 rows x 3 columns]
Here,
x=data[0]
y=data[1]
z=data[2]
As contour function pf matplotlib requires z to be a 2D array; this is where the confusion begins. Following several solutions of stackoverflow queries, I did the following:
import numpy as np
x=np.array(x)
y=np.array(y)
z=np.array(z)
X, Y = np.meshgrid(x, y)
import scipy.interpolate
rbf = scipy.interpolate.Rbf(x, y, z, function='cubic')
Z=rbf(X,Y)
lmin=data[2].min()
lmax=data[2].max()
progn=(lmax-lmin)/20
limit=np.arange(lmin,lmax,progn)
fig, ax = plt.subplots(figsize=(6,2)) #x ranges between 3 to 57, y -1 to -10
ax.contour(X,Y,Z,limit)
ax.set_title('Contour Plot')
plt.show()
With the above code this plot is derived.
However, it is not desired and if once can see through the surfacial noise lines then there are ordered contour lines underneath, which actually is desired as seen from the contour plot generated by surfer here.
I'd like to reiterate that the same data was used in generating the surfer plot.
Any help in creating the desired plot shall be highly appreciated.
Thanks to #JohanC for the answer. I'd like to put his suggestion to perspective with my query.
ax.contour replaced by ax.tricontour solves my situation. And ax.tricontourf gets the contour fill done. Therefore, the last segment of my code would be:
fig, ax = plt.subplots(figsize=(6,2)) #x ranges between 3 to 57, y -1 to -10
ax.tricontour(X,Y,Z,limit)
ax.tricontourf(X,Y,Z,limit)
ax.set_title('Contour Plot')
plt.show()
I had a similar issue working with irregularly spaced data on top of that loss of missing data. The suggestion made by someone about a 2D scatter plot is the perfect solution.
plt.figure(figsize=(10,10))
plt.scatter(df.doy,df[i].UT,c=df[i].TEC,s=10,cmap="jet")
plt.colorbar()
Likewsie, I plotted the same contour plot using plt.tricontourf and got the same result
plt.figure(figsize=(10,10))
plt.tricontourf(df.doy,df.UT,df.TEC,100,cmap="jet")
plt.colorbar()
I wanted to plot simple distribution of these data with years, but when plotting using the following code:
import matplotlib.pyplot as plt
plt.ylim(2000,2020)
plt.plot (years,Global)
it didn't show any plots, in addition, the global list when looking for its length it was read as one element, i couldn't understand why?
[0 1290.6
1 1256.0
2 1198.1
3 1128.4
4 1070.9
5 1011.7
6 975.7
7 945.5
8 954.1
9 885.9
10 832.8
11 805.4
12 736.3
13 715.2
14 677.6
15 647.1
16 642.0
17 582.1
Name: Global, dtype: float64]
I found the issue.
You have set ylim as 2000 to 2020 and when you do plt.plot(years,Global), the Global list is plotted on the y axis and since none of the values in the Global list are in the given range(i.e 2000 to 2020) they aren't plotted on the graph.
The simplest method is to use yticks instead of ylim. ylim specifies the range of values on the y axis, whereas yticks specifies the value data points on the y axis.
Try this -
import matplotlib.pyplot as plt
years = list(range(2000,2018))
Global = [1290.6,1256.0, 1198.1, 1128.4, 1070.9, 1011.7, 975.7, 945.5, 954.1, 885.9, 832.8, 805.4, 736.3,715.2, 677.6, 647.1, 642.0, 582.1]
plt.plot (Global,years)
plt.yticks(years)
plt.show()
Here I have taken the years on y axis with values 2000 to 2017 (I chose 17 years as your Global list has 17 elements and to plot two lists, they must have same number of elements) and set the yticks as the elements of list years.
Output -
I hope this helped you ..
I have the following dataframe:
Year Month Value
2005 9 1127.080000
2016 3 9399.000000
5 3325.000000
6 120.000000
7 40.450000
9 3903.470000
10 2718.670000
12 12108501.620000
2017 1 981879341.949982
2 500474730.739911
3 347482199.470025
4 1381423726.830030
5 726155254.759981
6 750914893.859959
7 299991712.719955
8 133495941.729959
9 27040614303.435833
10 26072052.099796
11 956680303.349909
12 755353561.609832
2018 1 1201358930.319930
2 727311331.659607
3 183254376.299662
4 9096130.550197
5 972474788.569924
6 779912460.479959
7 1062566320.859962
8 293262028544467.687500
9 234792487863.501495
As you can see, i have some huge values grouped by month and year. My problem is that i want to create a line plot, but when i do it, it doesn't make any sense to me:
df.plot(kind = 'line', figsize = (20,10))
The visual representation of the data doesn't make much sense taking into account that the values fluctuate over the months and years, but a flat line is shown for the most of the period and big peak at the end.
I guess the problem may be in the y axis scale that is not correctly fitting the data. I have tried to apply a log transformation to the y axis, but this don't add any changes, i have also tried to normalize the data between 0 and 1 just for test, but the plot still the same. Any ideas about how to get a more accurate representation of my data over the time period? And also, how can I display the name of the month and year in the x axis?
EDIT:
This is how i applied the log transform:
df.plot(kind = 'line', figsize = (20,10), logy = True)
and this is the result:
for me this plot still not really readable, taking into account that the plotted values represent income over the time, applying a logarithmic transformation to money values doesn't make much sense to me anyway.
Here is how i normalized the data:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
df_scaled.set_index(df.index, inplace = True)
And then i plotted it:
df_scaled.plot(kind = 'line', figsize = (20, 10), logy = True)
As you can see, noting seems to change with this, i'm a bit lost about how to correctly visualize these data over the given time periods.
The problem is that one value is much much bigger than the others, causing that spike. Instead, use a semi-log plot
df.plot(y='Value', logy=True)
outputs
To make it use the date as the x-axis do
df['Day'] = 1 # we need a day
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])
df.plot(x='Date', y='Value', logy=True)
which outputs
In Python, I have a 2D array, e.g.:
1.3 5.7 3.2
5.6 2.3 9.5
1.1 4.1 5.2
I then used 'imshow' to get what I needed - I essentially had a plot where the x axis was:
(column) 0 (column) 1 (column) 2 ....
and the y axis was:
.
.
(row) 2
(row) 1
(row) 0
and then the actual values (5.6 or 2.3 or whatever) were represented by colours, which was just what I wanted.
But then later, instead of the x axis just being column 0 column 1 and column 2 etc., I wanted the x axis to show the date which corresponds to column 0 column 1 and column 2 etc.. This information was stored in a different list, say "date_info[]".
So instead of an arbitrary indexing scheme on the bottom, I want the x values of the imshow to correspond to the values of the date_info[] list - instead of the number 2 for example, I wanted date_info[2] on the x axis.
Now with the help of this forum, I was able to do this using:
plt.xticks(mjdaxis,[int(np.floor(data_info[i])) for i in mjdaxis])
which was sufficient for a while, but I am just changing the labels of the x axis here right? rather than what is being plotted. Now when I am trying to lay one other plot (just a regular curve) on top of my original, the x axis scaling gets messed up, and my columns get bunched up as (1,2,3...) again, instead of their corresponding date_info values (55500, 55530, 55574...)
If anyone can make any sense of what I am saying, that would be great!!
For reference, here is the code that I am now trying:
fig = plt.figure()
ax1 = fig.add_subplot(111)
mjdaxis=np.linspace(0,date_info[0]-1,20).astype('int')
ax1.set_xticks(mjdaxis,[int(np.floor(date_info[i])) for i in mjdaxis])
ax1.imshow(residuals, aspect="auto")
ax2 = ax1.twinx()
ax2.plot(pdot[8:,0],pdot[8:,1])
plt.show()
If I understand you correctly, you should be able to just add the line ax2.set_xticks([]) before your plt.show(). You might also want to read up on the kwarg hold.