I'm working on python where I'm monitoring delay flows between two hosts. My program creates a file the contains two rows of information on column1 are different time interval for when I received the value in column2, example:
2.0 -0.430053710938
4.0 -0.0391845703125
1.0 5.830078125
4.0 5.07067871094
It took 2 seconds before I received the value -0.430053710938, 4 seconds later I got -0.0391845703125, a second later the value 5.830078125 and so on.
How can I plot this so it would make sense?, I tried look into gnuplot, but it creates column1 as x-axes which messes up everything since my my 3rd value as 1.0.
What you need is a cumulative sum (np.cumsum) of the time (first column) after reading from the file. Below is a complete working answer. I am reading the data from the file and then converting the time list to an array followed by taking a cumulative sum for the time.
import matplotlib.pyplot as plt
import numpy as np
with open('data.dat',"r") as file:
lines = file.readlines()
x = np.cumsum(np.array([float(row.split()[0]) for row in lines]))
y = [float(row.split()[1]) for row in lines]
plt.plot(x, y, '-kx')
plt.show()
Alternative way to load data
data = np.loadtxt('data.dat', usecols=(0,1))
x = np.cumsum(data[:,0])
y = data[:,1]
plt.plot(x, y, '-kx')
plt.show()
Related
I have multiple data files with two columns each(x,y). Now I want to take the mean of the y column from each dataset and write the mean as a new file. For example y1, y2, y3 are the columns of datasets dat1, dat2, dat3 respectively. Now let says column y has 3 entries y1_a1, y1_a2, y1_a3 and similarly other columns has y2_a1, y2_a2, y2_a3 ....... so on!! I want to calculate the mean along the rows of each columns s.t mean(y1_a1 + y2_a1 + y3_a1). Do someone have any idea how I can do that? Here is my code where I load the data files and split them to there x and y axis.
def plot_data(data):
xData, yData = np.hsplit(data, 2)
x = xData[:,0]
y = yData[:,0]
plt.plot(x,y)
plt.title("Graph number {index}".format(index=i))
fig, ax = plt.subplots(figsize=(12,8))
for i in range(0, 3):
data = np.loadtxt('data0{i}.dat')
plot_data(data)
plt.legend
plt.show()
If I understood correctly you are looking for something like this:
import pandas as pd
import numpy as np
# making up some data
dat1=pd.DataFrame({"x":["x1","x2","x3","x4"],"y":[24,35,81,99]})
dat2=pd.DataFrame({"x":["x1","x2","x3","x4"],"y":[12,17,1,76]})
# applying mean and converting it into a 1-column dataframe
df_means=pd.DataFrame({"means":[dat1["y"].mean(),dat2["y"].mean()]})
print(df_means)
output:
means
0 59.75
1 26.50
I need plot of aggregrated data
import pandas as pd
basic_data= pd.read_csv('WHO-COVID-19-global-data _2.csv',parse_dates= ['Date_reported'] )
cum_daily_cases = basic_data.groupby('Date_reported')[['New_cases']].sum()
import pylab
x = cum_daily_cases['Date_reported']
y = cum_daily_cases['New_cases']
pylab.plot(x,y)
pylab.show()
Error: 'Date_reported'
Input: Date_reported, Country_code, Country, WHO_region, New_cases, Cumulative_cases, New_deaths, Cumulative_deaths 2020-01-03,AF,Afghanistan,EMRO,0,0,0,0
Output: the total quantity of "New cases" showed on the plot per day.
What should I do to run this plot? link to dataset
The column names contain a leading space (can be easily seen by checking basic_data.dtypes). Fix that by adding the following line immediately after basic_data was read:
basic_data.columns = [s.strip() for s in basic_data.columns]
In addition, your x variable should be the index after groupby-sum, not a column Date_reported. Correction:
x = cum_daily_cases.index
The plot should show as expected.
I am really new in Python and I hope this is the right community for my question. Sorry if it is not.
I am trying to import data from a .txt file with pandas.
The file looks like this:
# Raman Scattering Spectrum
# X-Axis: Frequency (cm-1)
# Y-Axis: Intensity (10-36 m2 cm/sr)
# Harmonic Data
# Peak information (Harmonic)
# X Y
# 20.1304976000 1.1465331676
# 25.5433266000 6.0306906544
...
# 3211.8081700000 0.3440113123
# 3224.5118500000 0.8814596030
# Plot Curve (Harmonic)
# X Y DY/DX
0.0000000000 8.4803414671 0.6546818124
8.0000000000 17.8239097502 2.0146387573
I already wrote this pieces of code to import my data:
import pandas as pd
# import matplotlib as plt
# import scipy as sp
data = pd.read_csv('/home/andrea/Schreibtisch/raman_gauss.txt', sep='\t')
data
Now I just get one column.
If I try it with
pd.read_fwf(file)
I got 3 columns, but the x and y values from plot curve (harmonic) are in one column.
Now I want to import from Plot Curve (Harmonic) the x, y and DY/DX values in different variables or containers as series.
The hart part for me ist how to split x und y now in 2 columns and how to tell python that the import should start at the line number from plot cuve (harmonix) +2 lines.
I think about it yet and my idea was to check all containers for the string 'Plot Curve (Harmonic). Then I get a new series with true or false. Then I need to read out which line number is true for the search word. And then I start the import from this line...
I am too much a newbie to Python and I am not yet familiar with the documantation that I found the command i must use.
Has anyone tipps for me with a command or something? And how to split the columns?
Thank you very much!
You can read as follows.
Code
import pandas as pd
import re # Regex to parse header
def get_data(filename):
# Find row containing 'Plot Curve (Harmonic)'
with open('data.txt', 'r') as f:
for i, line in enumerate(f):
if 'Plot Curve (Harmonic)' in line:
start_row = i
# Parse header on next line
header = re.findall(r'\S+', next(f))[1:]
# [1:] to skip '#' at beginnning of line
break
else:
start_row = None # not found
if start_row:
# Use delimiter=r"\s+": since have multiple spaces between numbers
# skip_rows = start_row+2: to skip to data
# (skip current and header row)
# reference: https://thispointer.com/pandas-skip-rows-while-reading-csv-file-to-a-dataframe-using-read_csv-in-python/
# names = header: assigns column names
df = pd.read_csv('data.txt', delimiter=r"\s+", skiprows=start_row+2,
names = header)
return df
Test
df = get_data('data.txt')
print(df)
data.txt file
# Raman Scattering Spectrum
# X-Axis: Frequency (cm-1)
# Y-Axis: Intensity (10-36 m2 cm/sr)
# Harmonic Data
# Peak information (Harmonic)
# X Y
# 20.1304976000 1.1465331676
# 25.5433266000 6.0306906544
...
# 3211.8081700000 0.3440113123
# 3224.5118500000 0.8814596030
# Plot Curve (Harmonic)
# X Y DY/DX
0.0000000000 8.4803414671 0.6546818124
8.0000000000 17.8239097502 2.0146387573
Output
X Y DY/DX
0 0.0 8.480341 0.654682
1 8.0 17.823910 2.014639
First: Thank you very much for your answer. It helps me a lot.
I tried to used the comment function but i cannot add an 'Enter'
I want to plot the data, I can now extract from the file, but when I add my standard plot code:
plt.plot(df.X, df.Y)
plt.legend(['simulated'])
plt.xlabel('raman_shift')
plt.ylabel('intensity')
plt.grid(True)
plt.show()
I get now the error:
TypeError Traceback (most recent call last)
<ipython-input-240-8594f8545868> in <module>
28 plt.plot(df.X, df.Y)
29 plt.legend(['simulated'])
---> 30 plt.xlabel('raman_shift')
31 plt.ylabel('intensity')
32 plt.grid(True)
TypeError: 'str' object is not callable
I have nothing changed at the label function. In my other project this lines work well.
And I dont know as well how do read out the DY/DX column, the '/' kann not be used in the columnname.
Did you got a tipp for me, again? :)
Thanks.
I'm trying to plot two columns that have been read in using pandas.read_csv, the code:-
from pandas import read_csv
from matplotlib import pyplot
data = read_csv('Stats.csv', sep=',')
#data = data.astype(float)
data.plot(x = 1, y = 2)
pyplot.show()
the csv file snippet:-
1,a4,2000,125,1.9,2.8,25.6
2,a4,7000,125,1.7,2.3,18
3,a2,7000,30,0.84,1.1,8.11
4,a2,5000,30,0.83,1.05,6.87
5,a2,4000,45,2.8,3.48,16.54
when x = 1 and y = 2 it will plot the second column against the fourth not the third as I expected
When I try to plot the third column against the fourth (x = 2, y = 3) it plots the third against the fifth
I'm trying to plot the third against the fourth right now, when both x and y = 2 it will plot the third column against the fourth but the values are incorrect, what am I missing? is the read_csv changing the order of the columns?
Your input csv is without headers which doesn't help clarity (see Murali's comment). But I think the problem stems from the nature of column that contains a4,a2.
This column can be used for the x axis but not for y axis (non-numeric data on an x axis appears to be just read in order). Hence the count offset. So as y "reads over" the column at 1 (all 0 indexed) - but x does not.
Conducting
data.plot(x=1,y=0)
and
data.plot(x=0,y=1)
and inspecting the axis helps visualise what's going on.
Bizarrely this means you can do
df.plot(x=1,y=1)
to get what you want.
I had a very ambitious project (for my novice level) to use on numpy array, where I load a series of data, and make different plots based on my needs - I have uploaded a slim version of my data file input_data and wanted to make plots based on: F (where I would like to choose the desired F before looping), and each series will have the data from E column (e.g. A12 one data series, A23 another data series in the plot, etc) and on the X axis I would like to use the corresponding values in D.
so to summarize for a chosen value on column F I want to have 4 different data series (as the number of variables on column E) and the data should be reference (x-axis) on the value of column D (which is date)
I stumbled in the first step (although spend too much time) where I wanted to plot all data with F column identifier as one plot.
Here is what I have up to now:
import os
import numpy as np
N = 8 #different values on column F
M = 4 #different values on column E
dataset = open('array_data.txt').readlines()[1:]
data = np.genfromtxt(dataset)
my_array = data
day = len(my_array)/M/N # number of measurement sets - variation on column D
for i in range(0, len(my_array), N):
plt.xlim(0, )
plt.ylim(-1, 2)
plt.plot(my_array[i, 0], my_array[i, 2], 'o')
plt.hold(True)
plt.show()
this does nothing.... and I still have a long way to go..
With pandas you can do:
import pandas as pd
dataset = pd.read_table("toplot.txt", sep="\t")
#make D index (automatically puts it on the x axis)
dataset.set_index("D", inplace=True)
#plotting R vs. D
dataset.R.plot()
#plotting F vs. D
dataset.F.plot()
dataset is a DataFrame object and DataFrame.plot is just a wrapper around the matplotlib function to plot the series.
I'm not clear on how you are wanting to plot it, but it sound like you'll need to select some values of a column. This would be:
# get where F == 1000
maskF = dataset.F == 1000
# get the values where F == 1000
rows = dataset[maskF]
# get the values where A12 is in column E
rows = rows[rows.E == "A12"]
#remove the we don't want to see
del rows["E"]
del rows["F"]
#Plot the result
rows.plot(xlim=(0,None), ylim=(-1,2))