Which Python library for file access to analyse and manipulate? - python

For starters I'm going to make a program which analyses my poker hand histories which are stored automatically as text files.
So which library do I need to snoop around in if I'm looking to analyse .txt files? I mean I can find some functions but I want to become more independent instead of googling a solution each time and actually learning things by myself...tell me if this is a stupid idea.
Thanks in advance.

open() if its very simple, csv if its a bit more complicated and pandas for everything else.

Related

How to read/translate *txt.erb template (ruby) with python script

I have a quite complex software system which was developed in ruby and has been now all "translated" and transported into python. The last thing which is left is a series of *.txt.erb templates. I would like to leave them as they are but have a python library which does what the old ruby routine was doing, that is creating a series of *.txt files which follow the *.erb templates structure. I have looked a lot around but I cannot find an answer.
Probably for a more expert python programmer this might be a simple question.
Thanks a lot for your help!
I've solved the problem changing the *txt.erb templates into *.txt files and I've coded a python script which reads the file and substitute the general variable into the specific variable I want to, for both file content and file name.
Thank you for the help!

Is there any parallel way of accessing Netcdf files in Python

Is there any way of doing parallel IO for Netcdf files in Python?
I understand that there is a project called PyPNetCDF, but apparently it's old, not updated and doesn't seem to work at all. Has anyone had any success with parallel IO with NetCDF in Python at all?
Any help is greatly appreciated
It's too bad PyPnetcdf is not a bit more mature. I see hard-coded paths and abandoned domain names. It doesn't look like it will take a lot to get something compiled, but then there's the issue of getting it to actually work...
in setup.py you should change the library_dirs_list and include_dirs_list to point to the places on your system where Northwestern/Argonne Parallel-NetCDF is installed and where your MPI distribution is installed.
then one will have to go through and update the way pypnetcdf calls pnetcdf. A few years back (quite a few, actually) we promoted a lot of types to larger versions.
I haven't seen good examples from either of the two python NetCDF modules, see https://github.com/Unidata/netcdf4-python/issues/345
However, if You only need to read files and they are NetCDF4 format, You should be able to use HDF5 directly -- http://docs.h5py.org/en/latest/mpi.html
because NetCDF4 is basically HDF5 with restricted data model. Probably won't work with NetCDF3.

How to parse a .shp file?

I am interested in gleaning information from an ESRI .shp file.
Specifically the .shp file of a polyline feature class.
When I open the .dbf of a feature class, I get what I would expect: a table that can open in excel and contains the information from the feature class' table.
However, when I try to open a .shp file in any program (excel, textpad, etc...) all I get is a bunch of gibberish and unusual ASCII characters.
I would like to use Python (2.x) to interpret this file and get information out of it (in this case the vertices of the polyline).
I do not want to use any modules or non built-in tools, as I am genuinely interested in how this process would work and I don't want any dependencies.
Thank you for any hints or points in the right direction you can give!
Your question, basically, is "I have a file full of data stored in an arbitrary binary format. How can I use python to read such a file?"
The answer is, this link contains a description of the format of the file. Write a dissector based on the technical specification.
If you don't want to go to all the trouble of writing a parser, you should take look at pyshp, a pure Python shapefile library. I've been using it for a couple of months now, and have found it quite easy to use.
There's also a python binding to shapelib, if you search the web. But I found the pure Python solution easier to hack around with.
might be a long shot, but you should check out ctypes, and maybe use the .dll file that came with a program (if it even exists lol) that can read that type of file. in my experience, things get weird when u start digging around .dlls

python with .pdb files

I am working on bio project.
I have .pdb (protein data bank) file which contains information about the molecule.
I want to find out the following of a molecule in the .pdb file:
Molecular Mass.
H bond donor.
H bond acceptor.
LogP.
Refractivity.
Is there any module in python which can deal with .pdb file in finding this?
If not then can anyone please let me know how can I do the same?
I found some modules like sequtils and protienparam but they don't do such things.
I have researched first and then posted, so, please don't down-vote.
Please comment, if you still down-vote as to why you did so.
Thanks in advance.
I don't know if it fits your needs, but Biopython looks like it might help.
PDB file also outputs an XML file PDBML that can be easily parsed using an xml parsing library
http://pdbml.pdb.org/
A pdb file can contain pretty much anything.
A lot of projects allows you to parse them. Some specific to biology and pdb files, other less specific but that will allow you to do more (setup calculations, measure distances, angles, etc.).
I think you got downvoted because these projects are numerous: you are not the only one wanting to do that so the chances that something perfectly fitting your needs exists are really high.
That said, if you just want to parse pdb files for this specific need, just do it yourself:
Open the files with a text editor.
Identify where the relevant data are (keywords, etc.).
Make a Python function that opens the file and look for the keywords.
Extract the figures from the file.
Done.
This can be done with a short script written in less than 10 minutes (other reason why downvoting).

Smallest learning curve language to work with CSV files

VBA is not cutting it for me anymore. I have lots of huge Excel files to which I need to make lots of calculations and break them down into other Excel/CSV files.
I need a language that I can pick up within the next couple of days to do what I need, because it is kind of an emergency. I have been suggested python, but I would like to check with you if there is anything else that does CSV file handling quickly and easily.
Python is an excellent choice. The csv module makes reading and writing CSV files easy (even Microsoft's, uh, "idiosyncratic" version) and Python syntax is a breeze to pick up.
I'd actually recommend against Perl, if you're coming to it fresh. While Perl is certainly powerful and fast, it's often cryptic to the point of incomprehensible to the uninitiated.
What kind of calculation you have to do? Maybe R would be an alternative?
EDIT: just to give a few basic examples
# Basic usage
data <- read.csv("myfile.csv")
# Pipe-separated values
data <- read.csv("myfile.csv", sep="|")
# File with header (columns will be named as header)
data <- read.csv("myfile.csv", header=TRUE)
# Skip the first 5 lines of the file
data <- read.csv("myfile.csv", skip=5)
# Read only 100 lines
data <- read.csv("myfile.csv", nrows=100)
There are many tools for the job, but yes, Python is perhaps the best these days. There is a special module for dealing with csv files. Check the official docs.
Python definitely has a small learning curve, and works with csv files well
You say you have "excel files to which i need to make lots of calculations and break them down into other excel/csv files" but all the answers so far talk about csv only ...
Python has a csv read/write module as others have mentioned. There are also 3rd party modules xlrd (reads) and xlwt (writes) modules for XLS files. See the tutorial on this site.
You know VBA? Why not Visual Basic 2008 / 2010, or perhaps C#? I'm sure languages like python and ruby would be relatively easier for the job, but you're already accustomed to the ".NET way" of doing things, so it makes sense to keep working with them instead of learning a whole new thing just for this job.
Using C#:
var csvlines = File.ReadAllLines("file.csv");
var query = from csvline in csvlines
let data = csvline.Split(',')
select new
{
ID = data[0],
FirstName = data[1],
LastName = data[2],
Email = data[3]
};
.NET: Linq to CSV library.
.NET: Read CSV with LINQ
Python: Read CSV file
Perl is surprisingly efficient for a scripting language for text. cpan.org has a tremendous number of modules for dealing with CSV data. I've also both written and wrote data in XLS format with another Perl module. If you were able to use VBA, you can certainly learn Perl (the basics of Perl are easy, though it's just as easy for you or others to write terse yet cryptic code).
That depends on what you want to do with the files.
Python's learning curve is less steep than R's. However, R has a bunch of built-in functions that make it very well suited for manipulating .csv files easily, particularly for statistical purposes.
Edit: I'd recommend R over Python for this purpose alone, if only because the basic operations (reading files, dropping rows, dropping columns, etc.) are slightly faster to write in R than in Python.
I'd give awk a try. If you're running windows, you can get awk via the cygwin utilities.
This may not be anybody's popular language du-jour, but since CSV files are line-oriented and split into fields, dealing with them is just about the perfect application for awk. It was built for processing line oriented text data that can be split into fields.
Most of the other languages folks are going to reccomend will be much more general-purpose, so there's going to be a lot more in them that isn't nessecarily applicable to processing line-oriented text data.
PowerShell has CSV import built in.
The syntax is ugly as death, but it's designed to be useful for administrators more than for programmers -- so who knows, you might like it.
It's supposed to be a quick get-up-and-go language, for better and worse.
I'm surprised nobody's suggested PowerQuery; it's perfect for consolidating and importing files to Excel, does column calculations nicely and has a good graphical editor built in. Works for csvs and excel files but also SQL databases and most other things you'd expect. I managed to get some basic cleaning and formatting stuff up and running in a day, maybe a few days to start writing my own functions (break free from the GUI)
And since it only really does database stuff, it's got barely any functions to learn (the actual language is called "M")
PHP has a couple of csv functions that are easy to use:
http://www.php.net/manual-lookup.php?pattern=csv&lang=en

Categories