nan being displayed as label in histogram for Y axis - python

This is a python problem. I am a novice to python and visualization and tried to do some research before this. But I wasn't able to get the right answer.
I have a csv file with first column as names of countries and remaining with some numerical data. I am trying to plot a horizontal histogram with the countries on y axis and the respective first column data on x axis. However, with this code I am getting "nan" instead of country names. How can I make sure that the yticks are correctly showing country names and not nan?
Click here for image of the plot diagram
My code is as such: (displaying only first 5 rows)
import numpy as np
import matplotlib.pyplot as plt
my_data = np.genfromtxt('c:\drinks.csv', delimiter=',')
countries = my_data[0:5,0]
y_pos = np.arange(len(countries)`enter code here`)
plt.figure()
plt.barh(y_pos, my_data[0:5:,1])
plt.yticks(y_pos, countries)
plt.show()
Here is the link to the csv file

This works but you have lots of countries on the y axis. I don't know if you plan to plot only few of them.
with open("drinks.csv") as file:
lines = file.readlines()
countries = [line.split(",")[0] for line in lines[0:10]]
my_data = [int(line.split(",")[1]) for line in lines[0:10]]
plt.figure()
y_pos = np.arange(len(countries))
plt.barh(y_pos, my_data)
plt.yticks(y_pos, countries)
plt.show()

Related

Plotting a nice graph with 3000 rows in dataset with matplotlib

I have a Dataframe (3440 rows x 2 columns) with two columns (int). I need to plot this data frame with y axis (strain-ylabel ) and x axis (time-xlabel) that is the same with the expecting plot (I will show this figure below as a link). There are several visual problems that I hope you guys can teach and show me with, because I am very week in visualization with Python.
Here is the datasource:
Here is the expecting plot:
Here is result:
Here is my code:
df=pd.read_csv('https://www.gw-openscience.org/GW150914data/P150914/fig2-unfiltered-waveform-H.txt')
df= df['index'].str.split(' ', expand=True)
df.coulumns=['time (s)','strain (h1)']
x=df['time'][:200]
y=df['strain'][:200]
plt.figure(figsize=(14,8))
plt.scatter(x,y,c='blue')
plt.show()
Note: I have tried with seaborn, but result was the same. I also tried to narrow down into 200 rows, but the result is different with the expecting plot.
I appreciate if you guys can help me with. Thank you very much!
The following works for me. I'm skipping the first row, because the column labels are not separated correctly. Furthermore, while loading the data I indicate that the columns are separated by a space.
I don't think that the file contains the data to plot the "reconstructed" line.
import pandas as pd
# read the csv file, skip the first row, columns are separated by ' '
df=pd.read_csv('fig2-unfiltered-waveform-H.txt', skiprows=1, sep=' ')
# add proper column names
df.columns = ['index', 'strain (h1)']
# extract the index & strain variables
index=df['index']
strain=df['strain (h1)']
# plot the figure
plt.figure(figsize=(14,8))
plt.plot(index, strain, c='red', label='numerical relativity')
# label the y axis and show the legend
plt.ylabel('strain (h1)')
plt.legend(loc="upper left")
plt.show()
This is the resulting plot:
The same with seaborn, once you've imported the data with pandas:
import seaborn as sns
sns.lineplot(data = df, x="index", y="strain (h1)", color='red')

Plot histogram from two columns of csv using pandas

I have a csv file containing two columns. What I'd like to do is to plot a histogram based on these two columns.
My code is as follows:
data = pd.read_csv('data.csv')
My csv data is made like this:
Age Blood Pressure
51 120
.. ...
I tried with plt.hist(data['Age'], bins=10) which only gives me an histogram based on the first column and its frequency, the same goes for the second column.
Is there a way to plot an histogram which shows me "Ages" in the x-Axis and "Blood Pressure" in the y-Axis?
If it actually makes sense for you, you can change the orientation of the second plot:
plt.hist(data['Age'], bins=10, alpha=.5)
plt.hist(data['Blood Pressure'], bins=10, alpha=.5, orientation='horizontal')
plt.show()
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(data['Age'], data['Blood Pressure'])

A problem of python plot for large number of data

I am new to python and trying to plot a color magnitude diagram(CMD) for a selected cluster by matplotlib, there are 3400000 stars that I need to plot, the data for each star would be color on x axis and magnitude on y axis, However, my code should read two columns in a csv file and plot. The problem is when I using a part of the data (3000 stars), I can plot a CMD succesfully but when I use all the data, the plot is very mess(see figure below) and it seems that points are ploted by their positions in the column instead of its value. For example, a point has data (0.92,20.64) should be close to the y-axis, but is actually located at the far right of the plot just becasue it placed at last few columns of the dataset. So I wanna know how can I plot the entire dataset and show a plot like the first figure.Thanks for yout time. These are my codes:
import matplotlib.pyplot as plt
import pandas as pd
import csv
data = pd.read_csv(r'C:\Users\Peter\Desktop\F275W test.csv', low_memory=False)
# Generate some test data
x = data['F275W-F336W']
y = data['F275W']
#remove the axis
plt.axis('off')
plt.plot(x,y, ',')
plt.show()
This is the plot I got for 3000 stars it's a CMD
This is the plot I got for entire dataset, which is very mess

Grid of plots with lines overplotted in matplotlib

I have a dataframe that consists of a bunch of x,y data that I'd like to see in scatter form along with a line. The dataframe consists of data with its form repeated over multiple categories. The end result I'd like to see is some kind of grid of the plots, but I'm not totally sure how matplotlib handles multiple subplots of overplotted data.
Here's an example of the kind of data I'm working with:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
category = np.arange(1,10)
total_data = pd.DataFrame()
for i in category:
x = np.arange(0,100)
y = 2*x + 10
data = np.random.normal(0,1,100) * y
dataframe = pd.DataFrame({'x':x, 'y':y, 'data':data, 'category':i})
total_data = total_data.append(dataframe)
We have x data, we have y data which is a linear model of some kind of generated dataset (the data variable).
I had been able to generate individual plots based on subsetting the master dataset, but I'd like to see them all side-by-side in a 3x3 grid in this case. However, calling the plots within the loop just overplots them all onto one single image.
Is there a good way to take the following code block and make a grid out of the category subsets? Am I overcomplicating it by doing the subset within the plot call?
plt.scatter(total_data['x'][total_data['category']==1], total_data['data'][total_data['category']==1])
plt.plot(total_data['x'][total_data['category']==1], total_data['y'][total_data['category']==1], linewidth=4, color='black')
If there's a simpler way to generate the by-category scatter plus line, I'm all for it. I don't know if seaborn has a similar or more intuitive method to use than pyplot.
You can use either sns.FacetGrid or manual plt.plot. For example:
g = sns.FacetGrid(data=total_data, col='category', col_wrap=3)
g = g.map(plt.scatter, 'x','data')
g = g.map(plt.plot,'x','y', color='k');
Gives:
Or manual plt with groupby:
fig, axes = plt.subplots(3,3)
for (cat, data), ax in zip(total_data.groupby('category'), axes.ravel()):
ax.scatter(data['x'], data['data'])
ax.plot(data['x'], data['y'], color='k')
gives:

How to Make a graph with 2 data sets plotted?

as of right now I am trying to create a graph, which is fine until I try to add a third column of data on my .csv file.
So essentially I am taking pressure-area isotherms, and what I have been tasked with is to make a pressure, area graph, which I achieved (woot!)
import matplotlib.pyplot as plt
import numpy as np
x, y = np.loadtxt("Example.csv", delimiter=',', unpack=True)
plt.plot(x,y)
plt.xlabel('Area-mm^2')
plt.ylabel('Pressure mN/m')
plt.title('Pressure-Area Isotherm\nKibron')
plt.legend()
plt.show()
this is what I got, what I need to do now is to also put the average pixel value of some photos I took into the graph so that I can positively correlate the inverse relation between area and pressure/light intensity.
My.csv (excel file) has three columns if it is not possible to do both of these at the same time could someone show me a way to only pick 2 of the three columns to put on the graph? I.E pressure/area, pressure/pixel values , or area/pixel values. I assume it would involve assigning each column a number(n) and have the pyplot graph "n" vs "n"
Edit: I would also like for their to be a second scale so that the overall graph doesn't look wonky . again thanks for the help!
|1st is area | then pressure| and average pixel value|
You can use zip and create overlaying plots:
import csv
import matplotlib.pyplot as plt
with open('filename.csv') as f:
headers = iter(['area', 'pressure', 'pixel'])
data = {next(headers):list(map(float, b)) for _, *b in zip(*csv.reader(f))}
labels = ['pressure/area', 'pressure/pixel', 'area/pixel']
for i in labels:
num, denom = i.split('/')
plt.plot(data[num], data[denom], label = i)
plt.legend(loc='upper left')
plt.show()

Categories