Import Data in Python with Pandas just for specific rows - python

I am really new in Python and I hope this is the right community for my question. Sorry if it is not.
I am trying to import data from a .txt file with pandas.
The file looks like this:
# Raman Scattering Spectrum
# X-Axis: Frequency (cm-1)
# Y-Axis: Intensity (10-36 m2 cm/sr)
# Harmonic Data
# Peak information (Harmonic)
# X Y
# 20.1304976000 1.1465331676
# 25.5433266000 6.0306906544
...
# 3211.8081700000 0.3440113123
# 3224.5118500000 0.8814596030
# Plot Curve (Harmonic)
# X Y DY/DX
0.0000000000 8.4803414671 0.6546818124
8.0000000000 17.8239097502 2.0146387573
I already wrote this pieces of code to import my data:
import pandas as pd
# import matplotlib as plt
# import scipy as sp
data = pd.read_csv('/home/andrea/Schreibtisch/raman_gauss.txt', sep='\t')
data
Now I just get one column.
If I try it with
pd.read_fwf(file)
I got 3 columns, but the x and y values from plot curve (harmonic) are in one column.
Now I want to import from Plot Curve (Harmonic) the x, y and DY/DX values in different variables or containers as series.
The hart part for me ist how to split x und y now in 2 columns and how to tell python that the import should start at the line number from plot cuve (harmonix) +2 lines.
I think about it yet and my idea was to check all containers for the string 'Plot Curve (Harmonic). Then I get a new series with true or false. Then I need to read out which line number is true for the search word. And then I start the import from this line...
I am too much a newbie to Python and I am not yet familiar with the documantation that I found the command i must use.
Has anyone tipps for me with a command or something? And how to split the columns?
Thank you very much!

You can read as follows.
Code
import pandas as pd
import re # Regex to parse header
def get_data(filename):
# Find row containing 'Plot Curve (Harmonic)'
with open('data.txt', 'r') as f:
for i, line in enumerate(f):
if 'Plot Curve (Harmonic)' in line:
start_row = i
# Parse header on next line
header = re.findall(r'\S+', next(f))[1:]
# [1:] to skip '#' at beginnning of line
break
else:
start_row = None # not found
if start_row:
# Use delimiter=r"\s+": since have multiple spaces between numbers
# skip_rows = start_row+2: to skip to data
# (skip current and header row)
# reference: https://thispointer.com/pandas-skip-rows-while-reading-csv-file-to-a-dataframe-using-read_csv-in-python/
# names = header: assigns column names
df = pd.read_csv('data.txt', delimiter=r"\s+", skiprows=start_row+2,
names = header)
return df
Test
df = get_data('data.txt')
print(df)
data.txt file
# Raman Scattering Spectrum
# X-Axis: Frequency (cm-1)
# Y-Axis: Intensity (10-36 m2 cm/sr)
# Harmonic Data
# Peak information (Harmonic)
# X Y
# 20.1304976000 1.1465331676
# 25.5433266000 6.0306906544
...
# 3211.8081700000 0.3440113123
# 3224.5118500000 0.8814596030
# Plot Curve (Harmonic)
# X Y DY/DX
0.0000000000 8.4803414671 0.6546818124
8.0000000000 17.8239097502 2.0146387573
Output
X Y DY/DX
0 0.0 8.480341 0.654682
1 8.0 17.823910 2.014639

First: Thank you very much for your answer. It helps me a lot.
I tried to used the comment function but i cannot add an 'Enter'
I want to plot the data, I can now extract from the file, but when I add my standard plot code:
plt.plot(df.X, df.Y)
plt.legend(['simulated'])
plt.xlabel('raman_shift')
plt.ylabel('intensity')
plt.grid(True)
plt.show()
I get now the error:
TypeError Traceback (most recent call last)
<ipython-input-240-8594f8545868> in <module>
28 plt.plot(df.X, df.Y)
29 plt.legend(['simulated'])
---> 30 plt.xlabel('raman_shift')
31 plt.ylabel('intensity')
32 plt.grid(True)
TypeError: 'str' object is not callable
I have nothing changed at the label function. In my other project this lines work well.
And I dont know as well how do read out the DY/DX column, the '/' kann not be used in the columnname.
Did you got a tipp for me, again? :)
Thanks.

Related

plot matplotlib aggregated data python

I need plot of aggregrated data
import pandas as pd
basic_data= pd.read_csv('WHO-COVID-19-global-data _2.csv',parse_dates= ['Date_reported'] )
cum_daily_cases = basic_data.groupby('Date_reported')[['New_cases']].sum()
import pylab
x = cum_daily_cases['Date_reported']
y = cum_daily_cases['New_cases']
pylab.plot(x,y)
pylab.show()
Error: 'Date_reported'
Input: Date_reported, Country_code, Country, WHO_region, New_cases, Cumulative_cases, New_deaths, Cumulative_deaths 2020-01-03,AF,Afghanistan,EMRO,0,0,0,0
Output: the total quantity of "New cases" showed on the plot per day.
What should I do to run this plot? link to dataset
The column names contain a leading space (can be easily seen by checking basic_data.dtypes). Fix that by adding the following line immediately after basic_data was read:
basic_data.columns = [s.strip() for s in basic_data.columns]
In addition, your x variable should be the index after groupby-sum, not a column Date_reported. Correction:
x = cum_daily_cases.index
The plot should show as expected.

Plot values on a increasing x-axis

I'm working on python where I'm monitoring delay flows between two hosts. My program creates a file the contains two rows of information on column1 are different time interval for when I received the value in column2, example:
2.0 -0.430053710938
4.0 -0.0391845703125
1.0 5.830078125
4.0 5.07067871094
It took 2 seconds before I received the value -0.430053710938, 4 seconds later I got -0.0391845703125, a second later the value 5.830078125 and so on.
How can I plot this so it would make sense?, I tried look into gnuplot, but it creates column1 as x-axes which messes up everything since my my 3rd value as 1.0.
What you need is a cumulative sum (np.cumsum) of the time (first column) after reading from the file. Below is a complete working answer. I am reading the data from the file and then converting the time list to an array followed by taking a cumulative sum for the time.
import matplotlib.pyplot as plt
import numpy as np
with open('data.dat',"r") as file:
lines = file.readlines()
x = np.cumsum(np.array([float(row.split()[0]) for row in lines]))
y = [float(row.split()[1]) for row in lines]
plt.plot(x, y, '-kx')
plt.show()
Alternative way to load data
data = np.loadtxt('data.dat', usecols=(0,1))
x = np.cumsum(data[:,0])
y = data[:,1]
plt.plot(x, y, '-kx')
plt.show()

lmfit matplot - fitting many curves from many different files at the same time/graph

I have the following code, with which I intend to read and plot many curves from many different files. The "reading and plotting" is already working pretty good.
The problem is that now I want to make a fitting for all those curves in the same plot. This code already manages to fit the curves, but the output is all in one array and I can not plot it, since I could not separate it.
#!/usr/bin/python
import matplotlib.pyplot as plt
from numpy import exp
from lmfit import Model
def read_files(arquivo):
x = []
y = []
abscurrent = []
time = []
data = open(arquivo, 'r')
headers = data.readlines()[60:]
for line in headers:
line = line.strip()
X, Y, ABS, T = line.split('\t')
x.append(X)
y.append(Y)
abscurrent.append(ABS)
time.append(T)
data.close()
def J(x, j, n):
return j*((exp((1.6e-19*x)/(n*1.38e-23*333)))-1)
gmod = Model(J)
result = gmod.fit(abscurrent, x=x, j=10e-10, n=1)
return x, y, abscurrent, time
print(result.fit_report())
When I ask to print the "file" result.best_fit, which in the lmfit would give the best fit for that curve, I get 12 times this result (I have 12 curves) , with different values:
- Adding parameter "j"
- Adding parameter "n"
[ 4.30626925e-17 3.25367918e-14 9.60736218e-14 2.20310475e-13
4.63245638e-13 9.38169958e-13 1.86480698e-12 3.67881758e-12
7.22634738e-12 1.41635088e-11 2.77290634e-11 5.42490983e-11
1.06108942e-10 2.07520542e-10 4.05768903e-10 7.93323537e-10
1.55126521e-09 3.03311029e-09 5.93085363e-09 1.16032067e-08
2.26884736e-08 4.43641560e-08 8.67362753e-08 1.69617697e-07
3.31685858e-07 6.48478168e-07]
- Adding parameter "j"
- Adding parameter "n"
[ 1.43571772e-16 1.00037588e-13 2.92349492e-13 6.62623404e-13
This means that the code is calculating the fit correctly, I just have to separate this output somehow in order to plot each of them with the their curve. Each set of values between [] is what I want to separate in a way I can plot it.
I do not see how the code you posted could possibly produce your output. I do not see a print() function that prints out the array of 26 values, but would imagine that could be the length of your lists x, y and abscurrent -- it is not the output of your print(result.fit_report()), and I do not see that result.
I do not see anything to suggest you have 12 independent curves.
Also, result.best_fit is not a file, it is an array.

Graph Reset Query

To whomever,
I am having a graphing problem where it seems that previous data is being stacked on top of new data. I wanted to find a way to separate these so that I can get individual graphs per data set.
Briefly before we get into the script let me tell you what you're looking at. I have 8 data sets each one named somethingsomethingsomething...n=0,1,...,7. So there 8 different files with different sets of values for the wavelength (here I named it WL) and stokes parameters (here I named them SI SQ SU SV). I was told to make some graphs of them so here we are.
The following is what I have:
the base
import matplotlib.pyplot as plt
import numpy as np
import scipy.constants as c
from scipy.interpolate import spline
import re
something to tell the program to not worry about random spaces in data set files
split_on_spaces = re.compile(" +").split
defining the arrays
WL = np.array([])
SI = np.array([])
SQ = np.array([])
SU = np.array([])
SV = np.array([])
code for data interpretation
with open('C:\\Users\\Schmidt\\Desktop\\Python\\Homework_4\\CoolStuffLivesHere\\stokes_profiles_1.txt') as f:
for line in f:
data=split_on_spaces(line.strip())
if len(data) == 0:
continue
if len(data) != 5:
sys.stderr.write("BAD LINE: {}".format(repr(line)))
continue
WL = np.append(WL, float(data[0]))
SI = np.append(SI, data[1])
SQ = np.append(SQ, data[2])
SU = np.append(SU, data[3])
SV = np.append(SV, data[4])
plotting sequence
plt.plot(WL,SI)
plt.show()
Then rinse and repeat for the other 3 parameters and then rinse and repeat for the other data sets as well. It works real fine for the first rendering. However for subsequent graphs it looks more like these: first example, second example.
So in a nut shell what line of code should I be typing in where to resolve my graph stacking issue?
Without getting into subplots, you're just adding to the original plot. You need to close it if you want to re-use it.
i.e.
plt.plot(WL,SI)
plt.show()
plt.close()
plt.plot(WL,SQ)
Unless you want them on the same plot.

How to plot a graph with text file (2 columns of data) in Python

I have a text file with lots of data that is arranged in 2 columns. I need to use the data in the 2nd column in a formula (which outputs Energy). I need to plot that energy against the time which is all the data in the first column.
So far I have this, and it prints a very weird graph. I know that the energy should be oscillating and decaying exponentially.
import numpy as np
import matplotlib.pyplot as plt
m = 0.090
l = 0.089
g = 9.81
H = np.loadtxt("AngPosition_3p5cmSeparation.txt")
x, y = np.hsplit(H,2)
Ep = m*g*l*(1-np.cos(y))
plt.plot(x, Ep)
plt.show()
I'm struggling to see where I have gone wrong, but then again I am somewhat new to Python. Any help is much appreciated.
I managed to get it to work. My problem was that the angle data had to be converted into radians.
I couldn't do that automatically in Python using math.radians for some reason so I just edited the data in Excel and then back into Notepad.

Categories