Plot csv file with min/max/avg using python - python

I have a csv file with measuring points every 100 seconds.
5 different measurements have been made.
I am looking for a simple solution how to create a line plot with the average value for each measuring point with the min, max values as a bar with Python.
The CSV file looks like this:
0,0.000622,0.000027,0.000033,0.000149,0.000170
100,0.014208,0.017168,0.017271,0.015541,0.027972
200,0.042873,0.067629,0.035837,0.033160,0.018006
300,0.030700,0.018563,0.016640,0.020294,0.020338
400,0.018906,0.016507,0.015445,0.018734,0.017593
500,0.027344,0.045668,0.015214,0.016045,0.015520
600,0.021233,0.098135,0.016511,0.015892,0.018342
First column is in seconds.
Maybe someone can help me with a quick idea.
thanks in advance
--------------------added
What i have so far:
import pandas as pd
input_df = pd.read_csv(input.csv")
input_df['max_value'] = input_df.iloc[:,1:6].max(axis=1)
input_df['min_value'] = input_df.iloc[:,1:6].min(axis=1)
input_df['avg_value'] = input_df.iloc[:,1:6].mean(axis=1)
input_df.plot(x=input_df["0"], y='avg_value')
How can i add error bars (min_value,max_value)

You can use matplotlib. For your problem:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=np.array([ [0,0.000622,0.000027,0.000033,0.000149,0.000170],
[100,0.014208,0.017168,0.017271,0.015541,0.027972],
[200,0.042873,0.067629,0.035837,0.033160,0.018006],
[300,0.030700,0.018563,0.016640,0.020294,0.020338],
[400,0.018906,0.016507,0.015445,0.018734,0.017593],
[500,0.027344,0.045668,0.015214,0.016045,0.015520],
[600,0.021233,0.098135,0.016511,0.015892,0.018342] ])
mean = np.mean(data[:,1:], axis=1)
min = np.min(data[:,1:], axis=1)
max = np.max(data[:,1:], axis=1)
errs = np.concatenate((mean.reshape(1,-1)-min.reshape(1,-1), max.reshape(1,-1)-
mean.reshape(1,-1)),axis=0)
plt.figure()
plt.errorbar(data[:,0], mean, yerr=errs)
plt.show()

Related

Plotting graph in python using matplotlib-pyplot by taking data from multiple csv files

I want to plot a graph by taking data from multiple csv files and plot them against round number(= number of csv files for that graph). Suppose I have to take a max value of a particular column from all csvs and plot them with the serial number of the csvs on x axis. I am able to read csv files and plot too but from a single file. I am unable to do as stated above.
Below is what I did-
import pandas as pd
import matplotlib.pyplot as plt
import glob
import numpy as np
csv = glob.glob("path" + "*csv")
csv.sort()
N = len(csv)
r = 0
for rno in range(1, N + 1):
r += 1
for f in csv:
df = pd.read_csv(f)
col1 = pd.DataFrame(df, columns=['col. name'])
a = col1[:].to_numpy()
Max = col1.max()
plt.plot(r, Max)
plt.show()`
If anyone has an idea it'd be helpful. Thank you.

Power law test using XY scatter plot

I have Daily Crude oil prices downloaded from FRED, about 10k observations, some values are blank(code cleans them). I believe that I cannot share excel sheets here, so I will just give you a screenshot of what the data looks like:
I calculate the differences and returns and clean up the data but I am kind of stuck.
Here is what the code looks like to get you started:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv("DCOILWTICO.csv")
nan_value = float("NaN")
data.replace("", nan_value, inplace=True)
data.replace(".", nan_value, inplace=True)
data['Previous'] = data['DCOILWTICO'].shift(1)
data.dropna(subset=['Previous'],inplace=True)
data.replace("", nan_value, inplace=True)
data.replace(".", nan_value, inplace=True)
data['DCOILWTICO'] = data['DCOILWTICO'].astype(float)
data['Previous'] = data['Previous'].astype(float)
data['Diff'] = data['DCOILWTICO'] - data['Previous']
data['Return'] = (data['DCOILWTICO'] - data['Previous'])/data['Previous']
Here comes the question: I am trying to duplicate the graph below.(which I believe was generated using Mathematica) The difficult part is to be able to create the bins in the right way. Looking at the graph it looks like there are around 200 bins. On the x-axis are the returns and on the y axis are the frequencies(which have been binned).
I think you are asking how to make equally spaced bins in logspace. If so then use the np.geomspace function (geometric space), rather than np.linspace (linear space).
plt.figure()
bins = np.geomspace(data['returns'].min(), data['returns'].max(), 200)
plt.hist(data['returns'], bins = bins)

plotting noise spectrum of the data

i have a single column file(contain only one column) and a matrix file(contain 10 columns) of data which are noisy data and i want to plot the noise spectrum of both file using python.
sample data for single column file is attached here
-1.064599999999999921e-02
-1.146800000000000076e-02
-1.011899999999999952e-02
-7.400200000000000007e-03
-4.306500000000000432e-03
-1.644800000000000081e-03
1.936600000000000127e-04
1.239199999999999980e-03
1.759200000000000043e-03
2.019799999999999981e-03
2.148699999999999916e-03
2.153099999999999806e-03
2.008799999999999822e-03
1.700899999999999981e-03
1.181500000000000042e-03
3.194000000000000116e-04
-1.072000000000000036e-03
-3.133799999999999954e-03
and sample data for matrix file is attached here
-2.596100000000000057e-03 -1.856000000000000011e-03 -1.821400000000000102e-02 5.023599999999999594e-03 -1.064599999999999921e-02 -1.906300000000000008e-02 -6.370799999999999380e-05 5.814800000000000177e-03 -5.391800000000000412e-03 -1.311000000000000013e-02
1.636700000000000047e-03 -8.651600000000000176e-04 -2.490799999999999959e-02 1.645399999999999988e-02 -1.146800000000000076e-02 -4.609199999999999929e-03 6.475800000000000306e-03 1.265800000000000085e-02 1.855799999999999898e-03 -5.387499999999999928e-03
4.516499999999999682e-03 1.438899999999999901e-03 -2.911599999999999952e-02 2.590800000000000047e-02 -1.011899999999999952e-02 2.378800000000000012e-02 1.080200000000000084e-02 1.994299999999999892e-02 8.882299999999999224e-03 2.866500000000000124e-03
5.604699999999999786e-03 4.557799999999999872e-03 -2.870800000000000088e-02 2.832300000000000095e-02 -7.400200000000000007e-03 2.882099999999999940e-02 1.145799999999999944e-02 2.488800000000000040e-02 1.367299999999999939e-02 8.998799999999999508e-03
4.797400000000000275e-03 7.657399999999999970e-03 -2.582800000000000026e-02 2.288000000000000103e-02 -4.306500000000000432e-03 8.315499999999999975e-03 7.967600000000000030e-03 2.487999999999999934e-02 1.516600000000000066e-02 1.177899999999999954e-02
2.314300000000000038e-03 9.749700000000000033e-03 -2.252099999999999935e-02 1.762000000000000025e-02 -1.644800000000000081e-03 -1.257800000000000064e-02 1.220600000000000070e-03 1.866299999999999903e-02 1.377199999999999952e-02 1.163999999999999931e-02
-1.290700000000000094e-03 9.894599999999999923e-03 -1.928900000000000059e-02 1.360300000000000051e-02 1.936600000000000127e-04 -2.438999999999999849e-02 -6.739199999999999878e-03 6.961199999999999853e-03 1.086299999999999939e-02 1.015199999999999957e-02
-5.137400000000000300e-03 7.453800000000000009e-03 -1.615099999999999869e-02 1.018799999999999914e-02 1.239199999999999980e-03 -1.585699999999999957e-02 -1.349500000000000005e-02 -7.773600000000000301e-03 7.680499999999999827e-03 9.148399999999999241e-03
-8.159500000000000086e-03 2.403600000000000094e-03 -1.270400000000000001e-02 5.359000000000000048e-03 1.759200000000000043e-03 -9.746799999999999908e-03 -1.730999999999999900e-02 -2.229599999999999985e-02 4.641100000000000433e-03 9.871700000000000613e-03
-9.419600000000000195e-03 -4.305599999999999705e-03 -8.259700000000000028e-03 -3.140800000000000015e-03 2.019799999999999981e-03 -5.883300000000000161e-03 -1.772100000000000064e-02 -2.695099999999999926e-02 1.592399999999999892e-03 1.255299999999999992e-02
-8.469000000000000833e-03 -1.101399999999999949e-02 -2.205400000000000155e-03 -1.641199999999999951e-02 2.148699999999999916e-03 -3.635199999999999890e-03 -1.558000000000000010e-02 -1.839000000000000010e-02 -1.408900000000000039e-03 1.642899999999999916e-02
-5.529599999999999967e-03 -1.553999999999999999e-02 5.413199999999999956e-03 -4.248000000000000040e-03 2.153099999999999806e-03 -2.403199999999999868e-03 -1.255099999999999966e-02 -8.339100000000000332e-03 -3.665700000000000035e-03 2.009499999999999828e-02
i tried with https://www.earthinversion.com/techniques/visualizing-power-spectral-density-demo-obspy/ but for my ascii data set i could not do it.I hope experts may help me .Thanks in advance.
Maybe this can give you a start. Give your matrix of data in the file "x.data", this plots the raw data as 10 curves, then runs an FFT on each column and displays the FFT. The FFT isn't very interesting with only 12 points, but it will spark ideas.
There's still the problem of "how do you define noise"? The signals you presented do not seem to be very noisy. Unless you know what kind of signal you're expecting,an FFT might not do much good.
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack
data = np.loadtxt("x.data")
for i in range(data.shape[1]):
plt.plot( data[:,i] )
plt.show()
for i in range(data.shape[1]):
f = scipy.fftpack.fft(data[:,i])
plt.plot(np.abs(f))
plt.show()
Use numpy.loadtxt() to convert the data to a numpy array. Then you can apply the method described in the link you provided in order to obtain the spectra. E.g.:
import numpy as np
data = np.loadtxt("file.txt")
The you plot the spectrum for that data. E.g.:
import matplotlib.pyplot as plt import scipy.fftpack
yf = scipy.fftpack.fft(data)
fig, ax = plt.subplots()
ax.plot(np.abs(yf))
plt.show()

Python: faster way of counting occurences in numpy arrays (large dataset)

I am new to Python. I have a numpy.array which size is 66049x1 (66049 rows and 1 column). The values are sorted smallest to largest and are of float type, with some of them being repeated.
I need to determine the frequency of occurrences of each value (the number of times a given value is equalled but not surpassed, e.g. X<=x in statistical terms), in order to later plot the Sample Cumulative Distribution Function.
The code I am currently using is as follows, but it is extremely slow, as it has to loop 66049x66049=4362470401 times. Is there any way to augment the speed of such piece of code? Will perhaps the use of dictionaries help in any way? Unfortunately I cannot change the size of the arrays I am working with.
+++Function header+++
...
...
directoryPath=raw_input('Directory path for native csv file: ')
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2]
x1=numpy.delete(x, 0, 0)
x2=numpy.zeros((x1.shape[0]))
x2=sorted(x1)
x3=numpy.around(x2, decimals=3)
count=numpy.zeros(len(x3))
#Iterates over the x3 array to find the number of occurrences of each value
for i in range(len(x3)):
temp=x3[i]
for j in range(len(x3)):
if (temp<=x3[j]):
count[j]=count[j]+1
#Creates a 2D array with (value, occurrences)
x4=numpy.zeros((len(x3), 2))
for i in range(len(x3)):
x4[i,0]=x3[i]
x4[i,1]=numpy.around((count[i]/x1.shape[0]),decimals=3)
...
...
+++Function continues+++
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
arr = np.random.randint(0, 100, (100000,1))
df = pd.DataFrame(arr)
cnt = Counter(df[0])
df_p = pd.DataFrame(cnt, index=['data'])
df_p.T.plot(kind='hist')
plt.show()
That whole script took a very short period to execute (~2s) for (100,000x1) array. I didn't time, but if you provide the time it took to do yours we can compare.
I used [Counter][2] from collections to count the number of occurrences, my experiences with it have always been great (timewise). I converted it into DataFrame to plot and used T to transpose.
Your data does replicate a bit, but you can try and refine it some more. As it is, it's pretty fast.
Edit
Create CDF using cumsum()
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
arr = np.random.randint(0, 100, (100000,1))
df = pd.DataFrame(arr)
cnt = Counter(df[0])
df_p = pd.DataFrame(cnt, index=['data']).T
df_p['cumu'] = df_p['data'].cumsum()
df_p['cumu'].plot(kind='line')
plt.show()
Edit 2
For scatter() plot you must specify the (x,y) explicitly. Also, calling df_p['cumu'] will result in a Series, not a DataFrame.
To properly display a scatter plot you'll need the following:
import numpy as np
import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt
arr = np.random.randint(0, 100, (100000,1))
df = pd.DataFrame(arr)
cnt = Counter(df[0])
df_p = pd.DataFrame(cnt, index=['data']).T
df_p['cumu'] = df_p['data'].cumsum()
df_p.plot(kind='scatter', x='data', y='cumu')
plt.show()
You should use np.where and then count the length of the obtained vector of indices:
indices = np.where(x3 <= value)
count = len(indices[0])
If efficiency counts, you can use the numpy function bincount, which need integers :
import numpy as np
a=np.random.rand(66049).reshape((66049,1)).round(3)
z=np.bincount(np.int32(1000*a[:,0]))
it takes about 1ms.
Regards.
# for counting a single value
mask = (my_np_array == value_to_count).astype('uint8')
# or a condition
mask = (my_np_array <= max_value).astype('uint8')
count = np.sum(mask)

Python Importing data with multiple delimiters

In Python, how can I import data that looks like this:
waveform [0]
t0 26/11/2014 10:53:03.639218
delta t 2.000000E-5
time[0] Y[0]
26/11/2014 10:53:03.639218 1.700977E-2
26/11/2014 10:53:03.639238 2.835937E-4
26/11/2014 10:53:03.639258 2.835937E-4
26/11/2014 10:53:03.639278 -8.079492E-3
There are two delimiters, : and white space. I want to get rid of the date 24/11/2014 and delete the semicolons so that the time array looks like 105303.639218, etc. So is there a way to specify two delimiters in the code, or is there a better way to analyse the data?
So far I have got:
import numpy as np
import matplotlib.pyplot as plt
_, time, y = np.loadtxt('data.txt', delimiter=':', skiprows=5)
plt.plot(time,y)
plt.show()
You can do this:
time = '10:34:20.454068'
list_ = time.split(':')
''.join(list_)
# '103420.454068'
for each row.
Maybe it's sort of a roundabout way of doing this, but...
import numpy as np
import matplotlib.pyplot as plt
mydata = np.loadtxt('data.txt', dtype='string', skiprows=5)
time = mydata[:,1]
time = np.array([s.replace(':','') for s in time])
y = np.array(mydata[:,2])
plt.plot(time,y)
plt.show()

Categories