Python Visualisation Not Plotting Full Range of Data Points - python

I'm just starting out on using Python and I'm using it to plot some points through Power BI. I use Power BI as part of my work anyway and this is for an article I'm writing alongside learning. I'm aware Power BI isn't the ideal place to be using Python :)
I have a dataset of average banana prices since 1995 (https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1132096/bananas-30jan23.csv)
I've managed to turn that into a nice line chart which plots the average for each month but only shows the yearly labels. The chart is really nice and I'm happy with it other than the fact that it isn't plotting anything before 1997 or after 2020 despite the date range being outside that. Earlier visualisations without the x-axis labelling grouping led to all points being plot but with this it's now no longer working.
ChatGPT got me going in circles that never resolved the issue so I suspect my issue may lie in my understand of Python. If anyone could help me understand the issue that would be brilliant, I can provide more information if that helps:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Convert the 'Date' column to a datetime format
dataset['Date'] = pd.to_datetime(dataset['Date'])
# Group the dataframe by month and calculate the average price for each month
monthly_average = dataset.groupby(dataset['Date'].dt.strftime('%B-%Y'))['Price'].mean()
# Plot the monthly average price against the month using seaborn
ax = sns.lineplot(x=monthly_average.index, y=monthly_average.values)
# Find the unique years in the dataset
unique_years = np.unique(dataset['Date'].dt.year)
# Set the x-axis tick labels to only be the unique years
ax.xaxis.set_ticklabels(unique_years)
ax.xaxis.set_major_locator(plt.MaxNLocator(len(unique_years)))
# Show the plot
plt.show()
Resulting Chart

Related

How to Add Candlestick Pattern Marker on Stock Close Price Data Chart using Python's Matplotlib?

#Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
import talib
plt.style.use('fivethirtyeight')
#Collecting Data from Yahoo Finance
stock= 'COALINDIA.NS'
df= yf.download(stock,period='1mo', interval='1d')
#Using talib library to find instances where DOJI Candlestick Pattern showed up
doji = talib.CDLDOJI(df['Open'], df['High'], df['Low'], df['Close'])
df['Doji']= doji
#Using a variable to store all instances where the DOJI Candlestick Pattern was true (i.e. value > 0)
DojiSignal=[]
for i in range(0, len(df['Doji'])):
if df['Doji'][i] > 0:
DojiSignal.append(df['Doji'][i])
#Plotting Close price of the stock along with the days where DOJI Candlestick Pattern showed up (i.e. was TRUE)
figure, ax = plt.subplots(figsize=(13, 6))
plt.plot(df['Close'], markevery= DojiSignal)
plt.show()
Hello,
I'm trying to do the following:
Pull stock data from Yahoo Finance.
Store the data in a dataframe (df).
Use the DOJI function from the talib library to perform the required task of finding whether a particular trading day had a DOJI Candle Pattern show up.
Then I'm trying to add all the days where the condition i.e. DOJI Pattern showed up (value !=0) in a list.
Finally, I want to plot a chart of 'Close Price' of the stock and also mark the days where the DOJI Candle Pattern was true using the '*' marker.
Point numbers 4 and 5 is where I believe I'm struggling.
I'd really appreciate it if you could please help me out with some explanation as to what needs to be improved?
Thank you
Full disclosure: I am the maintainer of the python library mplfinance (MatPlotLib finance), and I would recommend using that library to plot signal markers on stock market data.
Here is an example plot of stock market data with signal markers:
To make a similar plot with mplfinance for your code example above:
Set the value of the Doji signal to a price slightly less than the Low price on the day of the signal. This causes the signal marker to appear just under the OHLC bar or Candlestick (as shown in the above example plot).
All non-signal values in the df['Doji'] column must be set to float('nan') so that no signal marker will appear on the plot for those dates.
Pass the df['Doji'] into the mpf.make_addplot() api and take its output and pass it into the addplot kwarg in the mpf.plot() api as follows:
mpf.make_addplot(df['Doji'])
mpf.plot(df,addplot=ap)
You can find basic usage here, and a tutorial on how to add signal markers and other studies here. Hope that helps.

How do I plot weather data from two data sets on one bar graph using python?

Python newbie here. I'm looking at some daily weather data for a couple of cities over the course of a year. Each city has its own csv file. I'm interested in comparing the count of daily average temperatures between two cities in a bar graph, so I can see (for example) how often the average temperature in Seattle was 75 degrees (or 30 or 100) compared to Phoenix.
I'd like a bar graph with side-by-side bars with temperature on the x-axis and count on the y-axis. I've been able to get a bar graph of each city separately with this data, but don't know how to get both cities on the same bar chart with with a different color for each city. Seems like it should be pretty simple, but my hours of search haven't gotten me a good answer yet.
Suggestions please, oh wise stackoverflow mentors?
Here's what I've got so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("KSEA.csv")
df2 = pd.read_csv("KPHX.csv")
df["actual_mean_temp"].value_counts(sort=False).plot(kind ="bar")
df2["actual_mean_temp"].value_counts(sort = False).plot(kind = 'bar')
You can concat DataFrames, assigning city as a column, and then use histplot in seaborn:
import seaborn as sns
z = pd.concat([
df[['actual_mean_temp']].assign(city='KSEA'),
df2[['actual_mean_temp']].assign(city='KPHX'),
])
ax = sns.histplot(data=z, x='actual_mean_temp', hue='city',
multiple='dodge', binwidth=1)
Output:

Smoothing the curve in a line plot - Values interval x axis

I'm trying to recreate the following plot:
With an online tool I could create the dataset (135 data points) which I saved in a CSV file with the following structure:
Year,Number of titles available
1959,1.57480315
1959,1.57480315
1959,1.57480315
...
1971,221.4273356
1971,215.2494175
1971,211.5426666
I created a Python file with the following code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('file.csv')
df.plot.line(x='Year', y='Number of titles available')
plt.show()
and I'm getting the following plot:
What can I do to get a smooth line like in the original plot?
How can I have the same values in the x axis like in the original plot?
EDIT: I worked on the data set and formatting properly the dates, the plot is now better.
This is how the data set looks now:
Date,Number of available titles
1958/07/31,2.908816952
1958/09/16,3.085527674
1958/11/02,4.322502727
1958/12/19,5.382767059
...
1971/04/13,221.6766907
1971/05/30,215.4918154
1971/06/26,211.7808903
This is the plot I can get with the same code posted above:
The question now is: how can I have the same date range as in the original plot (1958 - mid 1971)?
Try taking the mean of your values that you have grouped by year. This will smooth out the discontinuities that you get each year to an average value. If that does not help, then you should apply any one of numerous filters.
df.groupby('year').mean().plot(kind='line')

Plotting line plot with groupby in matplotlib/seaborn?

I have the following dataset (abbreviated, but still conveys the same idea). I want to show how user score changes over time (the postDate conveys time). The data is also presorted by postDate. The hope is to see a nice plot (perhaps using seaborn if possible) that has the score as the y-axis, time as the x-axis, and shows the users' scores over time (with a separate line for each user). Do I need to convert the postDate (currently a string) to another format in order to plot nicely? Thank you so much!
userID postDate userScore (1-10 scale)
Mia1 2017-01-11 09:07:10.616328+00:00 8
John2 2017-01-17 08:05:45.917629+00:00 6
Leila1 2017-01-22 07:47:67.615628+00:00 9
Mia1 2017-01-30 03:45:50.817325+00:00 7
Leila 2017-02-02 06:38:01.517223+00:00 10
Based on the sample data you show your postDate series is already pandas datetime values. So to plot with dates on the X axis the key in matplotlib is to use plot_date, not plot. Something like this:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
for key, g in df.groupby['userID']:
ax.plot_date(g['postDate'], g['userScore'], label=key)
ax.legend()
I've used plotly before, it's a really nice option to do interactive visualizations if you are using Jupyter Notebook. You generate htmls or plot inline in Jupyter with cufflinks. It's only paid for hosting your graphs somewhere but I use it for free for my own data analysis.
Install plotly and also cufflinks, cufflinks helps out to do plots almost instantly with pandas dfs.
For example you could do:
your_df.iplot(x='postDate', y='userScore')
this will automatically give you the 'time-series' you describe.

Plotting multiple timeseries power data using matplotlib and pandas

I have a csv file of power levels at several stations (4 in this case, though "HUT4" is not in this short excerpt):
2014-06-21T20:03:21,HUT3,74
2014-06-21T21:03:16,HUT1,70
2014-06-21T21:04:31,HUT3,73
2014-06-21T21:04:33,HUT2,30
2014-06-21T22:03:50,HUT3,64
2014-06-21T23:03:29,HUT1,60
(etc . .)
The times are not synchronised across stations. The power level is (in this case) integer percent. Some machines report in volts (~13.0), which would be an additional issue when plotting.
The data is easy to read into a dataframe, to index the dataframe, to put into a dictionary. But I can't get the right syntax to make a meaningful plot. Either all stations on a single plot sharing a timeline that's big enough for all stations, or as separate plots, maybe a subplot for each station. If I do:
import pandas as pd
df = pd.read_csv('Power_Log.csv',names=['DT','Station','Power'])
df2=df.groupby(['Station']) # set 'Station' as the data index
d = dict(iter(df2)) # make a dictionary including each station's data
for stn in d.keys():
d[stn].plot(x='DT',y='Power')
plt.legend(loc='lower right')
plt.savefig('Station_Power.png')
I do get a plot but the X axis is not right for each station.
I have not figured out yet how to do four independent subplots, which would free me from making a wide-enough timescale.
I would greatly appreciate comments on getting a single plot right and/or getting good looking subplots. The subplots do not need to have synchronised X axes.
I'd rather plot the typical way, smth like:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
plt.savefig()
( http://matplotlib.org/users/pyplot_tutorial.html )
Re more subplots: simply call plt.plot() multiple times, once for each data series.
P.S. you can set xticks this way: Changing the "tick frequency" on x or y axis in matplotlib?
Sorry for the comment above where I needed to add code. Still learning . .
From the 5th code line:
import matplotlib.dates as mdates
for stn in d.keys():
plt.figure()
d[stn].interpolate().plot(x='DT',y='Power',title=stn,rot=45)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%D/%M/%Y'))
plt.savefig('Station_Power_'+stn+'.png')
Does more or less what I want to do except the DateFormatter line does not work. I would like to shorten my datetime data to show just date. If it places ticks at midnight that would be brilliant but not strictly necessary.
The key to getting a continuous plot is to use the interpolate() method in the plot.
With this data having different x scales from station to station a plot of all stations on the same graph does not work. HUT4 in my data has far fewer records and only plots to about 25% of the scale even though the datetime values cover more or less the same range as the other HUTs.

Categories