How do I plot weather data from two data sets on one bar graph using python? - python

Python newbie here. I'm looking at some daily weather data for a couple of cities over the course of a year. Each city has its own csv file. I'm interested in comparing the count of daily average temperatures between two cities in a bar graph, so I can see (for example) how often the average temperature in Seattle was 75 degrees (or 30 or 100) compared to Phoenix.
I'd like a bar graph with side-by-side bars with temperature on the x-axis and count on the y-axis. I've been able to get a bar graph of each city separately with this data, but don't know how to get both cities on the same bar chart with with a different color for each city. Seems like it should be pretty simple, but my hours of search haven't gotten me a good answer yet.
Suggestions please, oh wise stackoverflow mentors?
Here's what I've got so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("KSEA.csv")
df2 = pd.read_csv("KPHX.csv")
df["actual_mean_temp"].value_counts(sort=False).plot(kind ="bar")
df2["actual_mean_temp"].value_counts(sort = False).plot(kind = 'bar')

You can concat DataFrames, assigning city as a column, and then use histplot in seaborn:
import seaborn as sns
z = pd.concat([
df[['actual_mean_temp']].assign(city='KSEA'),
df2[['actual_mean_temp']].assign(city='KPHX'),
])
ax = sns.histplot(data=z, x='actual_mean_temp', hue='city',
multiple='dodge', binwidth=1)
Output:

Related

Python Visualisation Not Plotting Full Range of Data Points

I'm just starting out on using Python and I'm using it to plot some points through Power BI. I use Power BI as part of my work anyway and this is for an article I'm writing alongside learning. I'm aware Power BI isn't the ideal place to be using Python :)
I have a dataset of average banana prices since 1995 (https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1132096/bananas-30jan23.csv)
I've managed to turn that into a nice line chart which plots the average for each month but only shows the yearly labels. The chart is really nice and I'm happy with it other than the fact that it isn't plotting anything before 1997 or after 2020 despite the date range being outside that. Earlier visualisations without the x-axis labelling grouping led to all points being plot but with this it's now no longer working.
ChatGPT got me going in circles that never resolved the issue so I suspect my issue may lie in my understand of Python. If anyone could help me understand the issue that would be brilliant, I can provide more information if that helps:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Convert the 'Date' column to a datetime format
dataset['Date'] = pd.to_datetime(dataset['Date'])
# Group the dataframe by month and calculate the average price for each month
monthly_average = dataset.groupby(dataset['Date'].dt.strftime('%B-%Y'))['Price'].mean()
# Plot the monthly average price against the month using seaborn
ax = sns.lineplot(x=monthly_average.index, y=monthly_average.values)
# Find the unique years in the dataset
unique_years = np.unique(dataset['Date'].dt.year)
# Set the x-axis tick labels to only be the unique years
ax.xaxis.set_ticklabels(unique_years)
ax.xaxis.set_major_locator(plt.MaxNLocator(len(unique_years)))
# Show the plot
plt.show()
Resulting Chart

DataFrame.plot multiple lines in a figure

I have a question with DataFram.plot(). I want a line graph in which the Years are in xlabel and Shipments are in ylabel, but the series are the Quarters column in the same graph. Please help me I'm new.
import pandas as pd
import matplotlib.pyplot as plt
df_ship = pd.read_csv('dmba/ApplianceShipments.csv') #importing the csv for the exercise
#Spliting the quarters and the years into two different columns
df_ship[['Quarter','Year']]= df_ship.Quarter.str.split('-',expand=True)
#Grouping the dataframe by Quarter
df_filt = df_ship.groupby(by=['Quarter'],axis=0)
#Creating the plot figure
fig1 = df_filt.plot(x='Year',y='Shipments')
As output I have 4 different line graphs
It would be helpful to have an extract of your dataset but does the code below do what you want?
import seaborn as sns
sns.lineplot(x='Year',y='Shipments', hue='Quarter', data=df_ship)
From Seaborn documentation:
hue : vector or key in data
Grouping variable that will produce lines with different colors. Can be either or numeric, although color mapping will behave
differently in latter case.

No Output: Bar Graph Using Matplotlib

I have a df of Airbnb where each row represents a airbnb listing. I am trying to plot two columns as bar plot using Matplotlib.
fig,ax= plt.subplots()
ax.bar(airbnb['neighbourhood_group'],airbnb['revenue'])
plt.show()
What I think is, this graph should plot every neighbourhood on x axis and avg revenue per neighbourhood group on y axis(by default bar graph takes mean value per category)
This code of line keeps on running without giving me any error as if it has entered an indefinite while loop.
Can someone please suggest what could be wrong?
following I have used a dataframe, since none is available.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample DataFrame
y = np.random.rand(10,2)
y[:,0]= np.arange(10)
df = pd.DataFrame(y, columns=["neighbourhood_group", "revenue"])
Make sure that the "np.random" always gives different values for the revenue column when you start the program.
df:
# bar plot
ax = df.plot(x="neighbourhood_group", y="revenue", kind="bar")
regarding your statement that your code runs like in a loop. Could it be that the amount of data to be processed from the DataFrame to display the bar chart is too much effort. However, to say that for sure you would have to provide us with a dataset.

Seaborn | Matplotlib: Scatter plot: 3rd variable is encoded by both size and colour

I want to create a plot that shows the geographical distribution of nightly prices using longitude and lattitude as coordinates, and the price encoded both by color and size of the circles. I curently have no idea on how to encode the plots by the price in both colour and size. I come to you in search of help~ I dont understand the documentation for seaborn in this scenario.
3 columns of interest:
longtitude lattitude Price
50.1235156 4.1236436 160
52.3697862 4.8935462 300
52.3640489 4.8895343 8000
52.3729765 4.8931707 1300
52.3657530 4.8796741 5000
52.2957663 4.3058365 60
52.6709324 4.6028347 100
In my scenario: each column is of equal length, but I only want to include prices that are >150
Im stuck with this filter in play, as the column with the applied filter is half the size as longitude and latitude.
My clueless attempt:
plt.scatter(df.longitude, df.latitude, s=(df.price)>150, c= (df.price)>150)
The way I understand it is that the latitude and longitude create the space/plane, and then apply the price data. But implementing it seems to work differently?
First of all, you need to filter the dataframe before plotting. If you do what you're doing (which won't work anyway), your Series of x and y coordinates will be the entire length of the dataframe but series responsible for color-coding and size will be shorter because you're trying to filter out values under 150 with this: s=(df.price)>150.
Secondly, you can't plot like this using matplotlib. With matplotlib to color-code points you need to create a dictionary, so I'd suggest using seaborn for simplicity.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df_plot = df.loc[df.price > 150]
fig = sns.scatterplot(data=df_plot, x='longitude', y='latitude', size='price', hue='price')
plt.show()

Visualize correlation between two columns in python

I have two columns. The first column contains data related to salary and second column contains data related to house_rent of employees. Now using python I want to find the correlation between the two. Is there some way in Python to visualize the correlation:
Salary house_rent
10000 50
10000 50
3000 465
The focus of this task is to find how correlated salary and house rent of employees are. E.g. some employees may have huge salary but small house rent and some others may have small salary and huge house rent. Note that it can very well be the case that two people have the same salary and house rent. Is it possible to visualize this in python?
You can plot a linear regression line using sklearn.linear_model.LinearRegression :
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
You can also build a correlation matrix use pyplot.matshow() from matplotlib:
import matplotlib.pyplot as plt
plt.matshow(dataframe.corr())
plt.show()
As has already been told, you can use corr method present in pandas to get the correlation.
A better way to visualize would be to use seaborn library instead of matplotlib.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)
df= pd.read_csv('path_to_your_csv_file')
g = sns.pairplot(df)
plt.show()
For further details refer https://seaborn.pydata.org/generated/seaborn.pairplot.html
and
https://towardsdatascience.com/visualizing-data-with-pair-plots-in-python-f228cf529166

Categories