Is there a python module that integrates simple chromatogram/trace analysis algorithms? I am looking for baseline correction, peak detection and peak integration functionality for simple time-courses (with data stored in numpy arrays).
I spent quite some time searching now and there doesn't seem to be any which really surprises me.
I'm not sure what analysis you are conducting but have you looked at PyLSS ?
It can (and I quote from the documentation):
PyLSS is able to compute:
LSS parameters (log kw and S)
Build and plot chromatograms from experimental/predicted retention times
Regarding peak detection and peak integration, I have used the functionality of the ptp() method in the Numpy module for this and I find that it is pretty powerful. Would this satisfy your requirement?
Related
I have a dataset with 7 parameters for each point:
counterOfPackets
counterOfSyn
counterOfPa
counterOfR
counterOfRA
counterOfFin
packetsTotalSize
I would like to find a way to get all the outliers to a python list (not as a plt.show GUI).
What algorithm should I use and how can I view the results as a python list?
Thanks for your help :D
This page on Medium from Will Badr is a good resource - https://towardsdatascience.com/5-ways-to-detect-outliers-that-every-data-scientist-should-know-python-code-70a54335a623. In terms of what outlier detection algorithm to use, the answer depends on the distribution of your data. I have found success using standard deviations and distance from inter-quartile ranges to identify outliers. However, these approaches work better over normal distributions, and in my scenario, I found methods to transform my data into a normal distribution without impacting the outcome.
I have a time series and generate its spectrogram in Python with matplotlib.pyplot.specgram.
After I make some analysis and changes I need to convert the spectrogram back into time series.
Is there any function in matplotlib or in other library that I can use directly? Or if not, could you please elaborate on which direction I should work on?
Your warm help is appreciated.
Matplotlib is a library for plotting data. Generally if you're trying to do any computation you'd use a library suited for that.
numpy is a very popular library for doing numerical computation in Python. It just so happens they have a fairly extensive set of fft and ifft methods.
I would check them out here and see if they can solve your problem.
One thing commonly done (for example in the source separation community) is to use the phase data of the original signal (before transformation where applied to it) - the result is much better than null or random phase, and not so far from algorithms aiming at reconstructing the phase information from scratch.
A classic reconstruction algorithm is Griffin&Lim's, described in the paper "Signal estimation from modified short-time Fourier transform". This is an iterative algorithm, each iteration requires a full STFT / inverse STFT, which makes it quite costly.
This problem is indeed an active area of research, a search for STFT + reconstruction + magnitude will yield plenty of papers aiming at improving on Griffin&Lim in terms of signal quality and/or computational efficiency.
You can find detailed dicussion hereThread on DSP Stack Exchange
I have many lines of georeferenced hydrological data with weekly resolution:
Station name, Lat, Long, Week 1 average, Week 2 average ... Week 52 average
Unfortunately, I also have some data with only monthly resolution:
Station name, Lat, Long, January average, February average ... December average
Rather than "reinventing the wheel," can anyone recommend a favorite module, package, or technique that would provide a reasonable interpolation of weekly values from monthly values? Linear would be fine, but it would be nice if we could use the coordinates to improve the interpolation based on nearby stations.
I've tagged this post with python because it's the language I've been using recently (although not its statistical functions). If the answer is "use a stats program like r" so be it, but I'm curious as to what's out there for python. Thanks!
I haven't had a chance to dig into it, but the hpgl (High Performance Geostatistics Library) provides a number of kriging (geospatial interpolation) methods:
Algorithms
Simple Kriging (SK)
Ordinary Kriging (OK)
Indicator Kriging (IK)
Local Varying Mean Kriging (LVM Kriging)
Simple CoKriging (Markov Models 1 & 2)
Sequential Indicator Simulation (SIS)
Corellogram Local Varying Mean SIS (CLVM SIS)
Local Varying Mean SIS (LVM SIS)
Sequential Gaussian Simulation (SGS)
If you are interested into expanding your experience into R, there are a number of good, well used and documented packages out there. I would start by looking at the Spatial Taskview, which lists what packages can be used for spatial data. One of the paragraphs deals with interpolation. I am most familiar with automap/gstat (I wrote automap), where especially gstat is a powerfull geostatistics package which supports a wide range of methods.
http://cran.r-project.org/web/views/Spatial.html
Integrating Python and R can be done in multiple ways, e.g. Using system calls or an in memory link using Rpy. See also:
Python interface for R Programming Language
I am looking into doing the same thing, and I found this kriging module written by Sat Kumar Tomer at AMBHAS.
There appears to be methods for producing variograms and performing ordinary kriging.
I'll update this answer if I use this and make further discoveries.
Since I originally posted this question (in 2012!) an actively-developed Python Kriging module has been released https://github.com/bsmurphy/PyKrige
There's also this older option:
https://github.com/capaulson/pyKriging
I want to simulate a propagating wave with absorption and reflection on some bodies in three dimensional space. I want to do it with python. Should I use numpy? Are there some special libraries I should use?
How can I simulate the wave? Can I use the wave equation? But what if I have a reflection?
Is there a better method? Should I do it with vectors? But when the ray diverge the intensity gets lower. Difficult.
Thanks in advance.
If you do any computationally intensive numerical simulation in Python, you should definitely use NumPy.
The most general algorithm to simulate an electromagnetic wave in arbitrarily-shaped materials is the finite-difference time domain method (FDTD). It solves the wave equation, one time-step at a time, on a 3-D lattice. It is quite complicated to program yourself, though, and you are probably better off using a dedicated package such as Meep.
There are books on how to write your own FDTD simulations: here's one, here's a document with some code for 1-D FDTD and explanations on more than 1 dimension, and Googling "writing FDTD" will find you more of the same.
You could also approach the problem by assuming all your waves are plane waves, then you could use vectors and the Fresnel equations. Or if you want to model Gaussian beams being transmitted and reflected from flat or curved surfaces, you could use the ABCD matrix formalism (also known as ray transfer matrices). This takes into account the divergence of beams.
If you are solving 3D custom PDEs, I would recommend at least a look at FiPy. It'll save you the trouble of building a lot of your matrix conditioners and solvers from scratch. It uses numpy and/or trilinos. Here are some examples.
I recommend you use my project GarlicSim as the framework in which you build the simulation. You will still need to write your algorithm yourself, probably in Numpy, but GarlicSim may save you a bunch of boilerplate and allow you to explore your simulation results in a flexible way, similar to version control systems.
Don't use Python. I've tried using it for computationally expensive things and it just wasn't made for that.
If you need to simulate a wave in a Python program, write the necessary code in C/C++ and export it to Python.
Here's a link to the C API: http://docs.python.org/c-api/
Be warned, it isn't the easiest API in the world :)
long-time R and Python user here. I use R for my daily data analysis and Python for tasks heavier on text processing and shell-scripting. I am working with increasingly large data sets, and these files are often in binary or text files when I get them. The type of things I do normally is to apply statistical/machine learning algorithms and create statistical graphics in most cases. I use R with SQLite sometimes and write C for iteration-intensive tasks; before looking into Hadoop, I am considering investing some time in NumPy/Scipy because I've heard it has better memory management [and the transition to Numpy/Scipy for one with my background seems not that big] - I wonder if anyone has experience using the two and could comment on the improvements in this area, and if there are idioms in Numpy that deal with this issue. (I'm also aware of Rpy2 but wondering if Numpy/Scipy can handle most of my needs). Thanks -
R's strength when looking for an environment to do machine learning and statistics is most certainly the diversity of its libraries. To my knowledge, SciPy + SciKits cannot be a replacement for CRAN.
Regarding memory usage, R is using a pass-by-value paradigm while Python is using pass-by-reference. Pass-by-value can lead to more "intuitive" code, pass-by-reference can help optimize memory usage. Numpy also allows to have "views" on arrays (kind of subarrays without a copy being made).
Regarding speed, pure Python is faster than pure R for accessing individual elements in an array, but this advantage disappears when dealing with numpy arrays (benchmark). Fortunately, Cython lets one get serious speed improvements easily.
If working with Big Data, I find the support for storage-based arrays better with Python (HDF5).
I am not sure you should ditch one for the other but rpy2 can help you explore your options about a possible transition (arrays can be shuttled between R and Numpy without a copy being made).
I use NumPy daily and R nearly so.
For heavy number crunching, i prefer NumPy to R by a large margin (including R packages, like 'Matrix') I find the syntax cleaner, the function set larger, and computation is quicker (although i don't find R slow by any means). NumPy's Broadcasting functionality for instance, i do not think has an analog in R.
For instance, to read in a data set from a csv file and 'normalize' it for input to an ML algorithm (e.g., mean center then re-scale each dimension) requires just this:
data = NP.loadtxt(data1, delimiter=",") # 'data' is a NumPy array
data -= NP.mean(data, axis=0)
data /= NP.max(data, axis=0)
Also, i find that when coding ML algorithms, i need data structures that i can operate on element-wise and that also understand linear algebra (e.g., matrix multiplication, transpose, etc.). NumPy gets this and allows you to create these hybrid structures easily (no operator overloading or subclassing, etc.).
You won't be disappointed by NumPy/SciPy, more likely you'll be amazed.
So, a few recommendations--in general and in particular, given the facts in your question:
install both NumPy and Scipy. As a rough guide, NumPy provides the
core data structures (in particular
the ndarray) and SciPy (which is
actually several times larger than
NumPy) provides the domain-specific
functions (e.g., statistics, signal
processing, integration).
install the repository versions, particularly w/r/t NumPy because the
dev version is 2.0. Matplotlib and NumPy are tightly integrated, you can use one without the other of course, but both are the best in their respective class among python libraries. You can get all three via easy_install, which i assume you already.
NumPy/SciPy have several modules
specifically directed to Machine
Learning/Statistics, including the Clustering package and the Statistics package.
As well as packages directed to
general computation, but which are
make coding ML algorithms a lot
faster, in particular,
Optimization and Linear Algebra.
There are also the SciKits, not included in the base NumPy or
SciPy libraries; you need to install them separately.
Generally speaking, each SciKit is a
set of convenience wrappers to
streamline coding in a given domain. The SciKits you are likely to find most relevant are: ann (approximate Nearest Neighbor), and learn (a set of ML/Statistics regression and classification algorithms, e.g., Logistic Regression, Multi-Layer Perceptron, Support Vector Machine).