Jupyter Notebook graph has very inaccurate scale?

Jupyter Notebook graph has very inaccurate scale? - python

this is my first post and I hope it's okay. My mentor gave me a use case he found online to teach me machine learning on Jupyter. I had a problem with the graphing section, even though I'm sure the code in that part is accurate:
df21.plot(figsize=(20,10), fontsize=12,subplots=True, style=["-","o"], title = "Pump 2 - 100 Values")
plt.show()
The graphs seems to appear as two points or a single straight line, even though the df21 dataset I'm using has 100 lines, and the values are not binary:
Graphs just look like straight lines
Screenshot of the use case
I tried changing format to just plots and found that the points are actually all there, the scale of the axes is just incredibly squished:
Graph with only plots
And I have no idea what to do now, and I haven't been able to find any solutions online. Any advice is appreciated!

After going through the use case you added and trying the code myself i did not find any problem with the plot section. probably there is a problem with you parts before plotting. This is the code from the use case:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/sa-mw-dach/manuela-dev/master/ml-models/anomaly-detection/raw-data.csv')
df['time'] = pd.to_datetime(df['ts'], unit='ms')
df.set_index('time', inplace=True)
df.drop(columns=['ts'], inplace=True)
df21 = df.head(200)
df21.plot(figsize=(20,10), fontsize=12, subplots=True, style=["-","o"], title = "Pump 2 - 100 Values")
plt.show()
And this is the output:
Maybe try a different environment to run your notebook i tried running it on google colab.

Related

Why is my for loop for plotting printing and generating figures out of order?

My for loop, presented here, is supposed to generate a plot for each year. It is supposed to print the year, then show a figure. But what happens is, all of the years get printed first, then all of the figures get printed.
On top of that. There is an empty figure that gets printed at the very end.
I am using Spyder for the record.
Pics attached to show the outputs.
Can you please help me understand what is happening here so that I can control my outputs in the future?
Thank you so much.
import pandas as pd
import matplotlib.pyplot as plt
plt.figure(0)
for i in range(5):
k=2014+i
crimesyear=crimes.loc[crimes['Year'].isin([k])]
crimesyear.groupby('Month')['INCIDENT_ID'].c**strong text**ount().plot(marker='o')
plt.figure(i+1)
plt.xticks(numpy.arange(12),months)
plt.ylabel('Number of Crimes')
plt.show
print(k)
First part of output
Last part of output

Matplotlib has a way of generating plots without using the stateful interface. This should help you generate plots in a loop.
fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(5, 10))
for i, a in enumerate(ax):
a.plot(np.arange(10))
a.set_title(f'test {i}')
Check out the Object-Oriented API of matplotlib here

Here is a simplified version of your graphs without all the data.
import pandas as pd
import numpy
import matplotlib.pyplot as plt
for i in range(5):
k=2014+i
plt.figure(i)
plt.title(str(k))
plt.ylabel('Number of Crimes')
plt.show()
Your extra blank graph came from the plt.figure(0) at the start. You only need to do plt.show() once at the end. You can set the titles using plt.title(<string goes here>). In general, displaying something is an asynchronous call to another program, so it is often difficult to predict the order that you'll see things - I'm not familiar with spyder for the specifics here.

Maplotlib gives the list of array outputs before showing the scatter plot in Python 3

I made a couple of plots before using Python 2.7 and everything is fine. Now I am trying to pick it up in Python 3 as I am trying to visualize some of the data output of the project I'm working on. So I tried to see if this works:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# fake data:
a = np.random.normal(size=1000)
b = a*3 + np.random.normal(size=1000)
plt.hist2d(a, b, (50, 50), cmap=plt.cm.jet)
plt.colorbar()
The result is quite confusing for me: it shows the plot but before the plot it also shows the list of value of a and b, as shown in the picture below:
All I need is a clean graph of the plot. So what have I done wrong here? Haven't used matplotlib for a long time so I guess I have made some big mistakes here.
Thanks for your time in advance.

I am not an expert but what happens is that you are getting as results the variables that make your plot. I have run it into Spyder and on the right (variables section) I also get your results. What you need to do however, is to write explicitly "show the plot":
....
plt.colorbar
plt.show()
This will plot automatically your plot in a new window without showing all the arrays. Here some explanation Previous post.

Python x-labels evenly spreaded

I am trying to get my labels turned up correctly.
I want it to show 10 values on the list, but I have no idea how.
The usual way to show it doesn't work and when there are a lot of values, it is hard to read, see the picture.
Here you see that it is impossible to read the data.
Do you have an idea to make it work properly?
I have tried autoDateLocator, but that didn't work. The axis values were wrong.
I also tried to do that manually, but the same result happened.
Thanks in advance!

I have kinda 'solved this issue'. I asked around and didn't really find an answer that suited my needs, as every tick always gets shown.
However, when I keep the xtick_labels in the datetime-format, it gets sorted out by matplotlib itself, if there are too many values to show. This according to the dataset Armamatita provided
import matplotlib.pyplot as plt
import datetime
import numpy as np
x = np.array([datetime.datetime(i,1,1) for i in range(1700,2017)])
y = np.random.randint(0,100,len(x))
fig, ax = plt.subplots()
ax.plot(x,y)
plt.show()
I when the days I want to see is more than 14, I just let matplotlib pick te xtick_labels. When equal or less than 14, I add it myself using:
alldays = DayLocator()
weekFormatter = DateFormatter('%a %b %d %Y')
and this right before returning the fig:
ax.xaxis.set_major_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
It isn't the most neat pythonic code, but it does the trick.

Python Bokeh: Plotting same chart multiple times in gridplot

I'm currently trying to get an overview of plots of data of different dates. To get a good feeling of the data I would like to plot relevant plots next to each other. This means I want to use the same plot multiple times in the gridplot command. However what I noticed is that when i use the same chart multiple times it will only show it once in the final .html file. My first attempt at solving this was to use a copy.deepcopy for the charts, but this gave the following error:
RuntimeError: Cannot get a property value 'label' from a LineGlyph instance before HasProps.__init__
My approach has been as follows:
from bokeh.charts import Line, output_file, show, gridplot
import pandas as pd
output_file('test.html')
plots = []
df = pd.DataFrame([[1,2], [3,1], [2,2]])
print(df)
df.columns = ['x', 'y']
for i in range(10):
plots.append(Line(df, x='x', y='y', title='Forecast: ' + str(i),
plot_width=250, plot_height=250))
plot_matrix = []
for i in range(len(plots)-1, 2, -1):
plot_matrix.append([plots[i-3], plots[i-2], plots[i]])
p = gridplot(plot_matrix)
show(p)
The results of which is a an html page with a grid plot with a lot of missing graphs. Each graph is exactly shown once (instead of the 3 times required), which leads me to think that the gridplot does not like me using the same object multiple times. An obvious solve is to simply create every graph 3 times as a different object, which I will do for now, but not only is this inefficient, it also hurts my eyes when looking at my code. I'm hoping somebody has a more elegant solution for my problem.
EDIT: made code runable

This is not possible. Bokeh plots (or Bokeh objects in general) may not be re-used in layouts.

How to output a large number of histograms in a pandas groupby

df is a dataframe with a days column. There are 100 days. I want to look at a histogram for my data column for each of the 100 days. The problem is that this code outputs everything on a single chart and all histograms are stacked together. Two questions:
Any advice to get one histogram for each day?
Any advice to save each histogram to an appropriately named file?
Note: When I replace hist in my code below with describe, it perfectly gives me 100 describe series. Also, the type of the grouper.get_group(days) object is pandas.series.
My simple code:
grouper = df.groupby('days')['data']
for days in grouper.groups.keys():
print grouper.get_group(days).hist()

One option would be to use inline plotting either in ipython qtconsole or ipython notebook:
%matplotlib inline
import matplotlib.pyplot as plt
for days in grouper.groups.keys():
grouper.get_group(days).hist()
plt.show()

Actually, if you are using the Ipython notebook, you can simply do:
df.groupby('days')['data'].hist()
Any function added to the end of the groupby will be fired for all groups in parallel, this is the strength of the groupby function.
No need to iterate

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Jupyter Notebook graph has very inaccurate scale? - python

Related

Why is my for loop for plotting printing and generating figures out of order?

Maplotlib gives the list of array outputs before showing the scatter plot in Python 3

Python x-labels evenly spreaded

Python Bokeh: Plotting same chart multiple times in gridplot

How to output a large number of histograms in a pandas groupby

Categories

Resources