TMY3 data not available from NSRDB anymore?

TMY3 data not available from NSRDB anymore? - python

A few months ago I was able to download TMY3 data from NSRDB and use it with pvlib.tmy.readtmy3
Now I have tried to download files for other locations but these seem to come in a different format. I am using NSRDB Data Viewer, more specifically the Data Download Wizard. I click on MTS2, as this seems to be the only model that now provides data in TMY3, and I click on the TMY3 button when I select the file for download. But the internal structure of the obtained CSV file is clearly different from what I got a few months ago and is also clearly different from what pvlib.tmy.readtmy3 expects (I have checked the current python source code).
At https://nsrdb.nrel.gov/tmy I get the following info:
Format of TMY Data
All TMY data are now in the System Advisor Model (SAM) CSV file
format. Formerly, TMY data were available only through TMY file
formats (i.e., TMY, TMY2, TMY3). By switching to the more
user-friendly SAM CSV, TMY data are more flexible than ever and can be
plugged into the vast majority of solar modeling programs.
This seems to imply that data is no longer available in TMY3 format, even though TMY3 data seems to be available in the NSRDB Data Download Wizard.
Do I need to write my own code to adapt NSRDB files to what pvlib expects?

The TMY3 files available at the link below are readable by pvlib:
https://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/tmy3/by_state_and_city.html
I'm not experienced with the NSRDB Data Viewer or how the format of its TMY3 files might differ. We'd welcome a contribution to improve compatibility with the new files, if necessary.

Related

Extracting only few columns from a FITS file that is freely available online to download using python

I'm working on a model of universe for which I'm using data available on Sloan Digital Sky Survey site. Problem is some files are more than 4GB large(total more than 50GB) and I know those files contain a lot of data columns but I want data only from few columns. I have heard about web scraping so I thought to search about how to do it but it didn't help as all the tutorials explained how to download the whole file using python. I want know that is there any way through which I can extract only few columns from that file so that I only have the data I need and I won't have to download the whole larges file just for a small fraction of its data?
Sorry, my question is just words and no codes because I'm not that pro in python. I just searched online and learned how to do basic web-scraping but it didn't solve my problem.
It will be even more helpful if you could suggest me some more ways to reduce the size of data I'll have to download.
Here is the URL to download FITS files: https://data.sdss.org/sas/dr12/boss/lss/
I only want to extract columns that have coordinates(ra, dec), distance, velocity and redshifts from the files.
Also, is there a way to do the same thing with CSV files or a general way to do it with any file?

I'm afraid what you're asking is generally not possible, at least not with significant effort and software support both on the client and server side.
First of all, the way FITS tables are stored in binary is row-oriented meaning if you wanted to stream a portion of a FITS table you can read it one row at a time. But to read individual columns you need to make partial reads of each row for every single row in the table. Some web servers support what are called "range requests" meaning you can request only a few ranges of bytes from a file, instead of the whole file. The web server has to have this enabled, and not all servers do. If FITS tables were stored column-oriented this could be feasible, as you could download just the header of the file to determine the ranges of the columns, and then download just the ranges for those columns.
Unfortunately, since FITS tables are row-oriented, if you wanted to load say 3 columns from it, and the table contains a million rows, that would involve 3 million range requests which would likely involve enough overhead that you wouldn't gain anything from it (and I'm honestly not sure what limits web servers place on how many ranges you can request in a single request but I suspect most won't allow something so extreme.
There are other astronomy data formats (e.g. I think CASA Tables) that can store tables in a column-oriented format, and so are more feasible for this kind of use case.
Further, even if the HTTP limitations could be overcome, you would need software support for loading the file in this manner. This has been discussed to a limited extent here but for the reasons discussed above it would mostly be useful for a limited set of cases, such as loading one HDU at a time (not so helpful in your case if the entire table is in one HDU) or possibly some other specialized cases such as sections of tile-compressed images.
As mentioned elsewhere, Dask supports loading binary arrays from various cloud-based filesystems, but when it comes to streaming data from arbitrary HTTP servers it runs into similar limitations.
Worse still, I looked at the link you provided and all the files there are gzip-compressed, so it is especially difficult to deal with since you can't know what ranges of them to request without decompressing them first.
As an aside, since you asked, you will have the same problem with CSV, only worse since CSV fields are not typically in fixed-width format, so there is no way to know how to extract individual columns without downloading the whole file.
For FITS maybe it would be helpful to develop a web service capable of serving arbitrary extracts from larger FITS files. If such a thing already exists I don't know, but I don't think it exists in a very general sense. So this would a) have to be developed, and b) you would have to ask anyone hosting the files you want to access to host such a service.
Your best bet is to just download the whole file, extract the data you need from it, and delete the original file assuming you no longer need it. It's possible the information you need is also already accessible through some online database.

How do I automatically replace a large .xml database file from a website with a newer file each month?

I am working on a project right now that uses data from a large xml database file (usually like 8gb) pulled from a website. The website updates this database file monthly, so every month, there's a newer and more accurate database file.
I started my project about a year ago, so it is using a database file from February 2019. For the sake of people using my program, I would like for the database file to be replaced with the new one from each month when that gets rolled out.
How could I go about implementing this in my project so I don't have to manually go and replace the file with a newer one each month? Is it something I should write into the program? But, if that's the case, it would only update when the program is ran. Or, is there a way to have some script do this that automatically checks once a month?
Note: this project is not being used by people yet, it has got a long way to go, but I am trying to figure out how to implement these features earlier on before I get to a point where I can publish it.

I would first find out if there is an API built on top of that XML data that you could leverage, instead of downloading the XML into your own website. That way you always get the latest version of the data, since you're pulling it on-demand.
However, an on-demand integration wouldn't be a good idea if you would be hitting the API with any kind of heavy frequency, or if you would be pulling large datasets from said API. In that case, you need an ETL integration. Look into open-source ETL tools (just Google it) to help move that data in an automated fashion; I would recommend importing the XML into MongoDB or some other DB, and pull the data from there instead of reading it from a flat file.
And if you absolutely have to have it as a flat file, looking into using Gatsby; it's a framework for static websites that need to be reconstituted every once in a while.

Best way to link excel workbooks feeding into each other for memory efficiency. Python?

I am building a tool which displays and compares data for a non specialist audience. I have to automate the whole procedure as much as possible.
I am extracting select data from several large data sets, processing it into a format that is useful and then displaying it in a variety of ways. The problem i foresee is in the updating of the model.
I don't really want the user to have to do anything more than download the relevant files from the relevant database, re-name and save them to a location and the spreadsheet should do the rest. Then the user will be able to look at the data in a variety of ways, perform a few different analytical functions, depending on what they are looking at. Output some graphs etc
Although some database exported files wont be that large, other data will be being pulled from very large xml or csv files (500000x50 cells) and there are several arrays working on the pulled data once it has been chopped down to the minimum possible. So it will be necessary to open and update several files in order, so that the data in the user control panel is up to date and not all at once so that the user machine freezes.
At the moment I am building all of this just using excel formulas.
My question is how best to do the updating and feeding bit. Perhaps some kind of controller program built with python? I don't know Python but i have other reasons to learn it.
Any advice would be very welcome.
Thanks

Read a .ssd01 data file from (ancient) SAS Version 6 (for Python/R)

In analyzing some data from a somewhat old academic paper, I received some data files that were produced by an even older piece of software, namely the 20+ year old SAS version 6 (.ssd01 extension).
How would you convert this file to a modern format for analysis with R, Python, etc? Bonus (bounty) points if the process doesn't require getting a SAS license or other commercial software.
Hints:
Reading SAS® data sets with a filename extension such as .ssd01 or .ssd
pandas has a read_sas method, but only works on SAS version 7 data and newer (*.sas7bdat)
R has a similar import function for .sas7bdat but this also won't work for this version: Read SAS sas7bdat data into R
Here are the files, in case anyone wants to get their hands dirty (no, they are not viruses, just the data from the above paper):
swallco.ssd01, swallpd.ssd01

I just grabbed a demo version of the Windows 64-bit version of the software from https://www.stattransfer.com and applied it to your first file, asking for csv output. Here are the first few lines of the result.
"TIME","PLAYERID","PLAY","PAY","N1","N1PLAY","N1PAY","N2","N2PLAY","N2PAY","N3","N3PLAY","N3PAY","N4","N4PLAY","N4PAY","NN","N5","N5PLAY","N5PAY","N6","N6PLAY","N6PAY","N7","N7PLAY","N7PAY","ACTION","N1ACT","N2ACT","N3ACT","N4ACT","N5ACT","N6ACT","N7ACT","NETWORK","GAME","SESSION","LAGACTON","N1ACTO","N2ACTO","N3ACTO","N4ACTO","N5ACTO","N6ACTO","N7ACTO","ACTS","LAGACTS","PROPORT","LAGPROP","GRAPH","CLUSTER","LENGHT"
2,0,"B",3.2,1,"A",2,2,"A",2,17,"B",3.2,16,"A",2,4,,"",,,"",,,"",,0,1,1,0,1,,,,"local","co","colc1fir",,1,1,0,1,0,0,0,3,,0.75,,"local",0.5,2.647
3,0,"A",0.5,1,"B",2.5,2,"B",2.5,17,"B",2.5,16,"A",0.5,4,,"",,,"",,,"",,1,0,0,0,1,,,,"local","co","colc1fir",0,0,0,0,1,0,0,0,1,3,0.25,0.75,"local",0.5,2.647
4,0,"B",2.5,1,"A",-1,2,"B",1.8,17,"B",3.2,16,"A",0.5,4,,"",,,"",,,"",,0,1,0,0,1,,,,"local","co","colc1fir",1,1,0,0,1,0,0,0,2,1,0.5,0.25,"local",0.5,2.647
5,0,"B",2.5,1,"B",2.5,2,"A",-1,17,"A",0.5,16,"B",3.2,4,,"",,,"",,,"",,0,0,1,1,0,,,,"local","co","colc1fir",0,0,1,1,0,0,0,0,2,2,0.5,0.5,"local",0.5,2.647
6,0,"A",2,1,"B",2.5,2,"A",2,17,"B",3.2,16,"A",3.5,4,,"",,,"",,,"",,1,0,1,0,1,,,,"local","co","colc1fir",0,0,1,0,1,0,0,0,2,2,0.5,0.5,"local",0.5,2.647
7,0,"B",1.8,1,"B",1.8,2,"B",2.5,17,"B",2.5,16,"A",2,4,,"",,,"",,,"",,0,0,0,0,1,,,,"local","co","colc1fir",1,0,0,0,1,0,0,0,1,2,0.25,0.5,"local",0.5,2.647
8,0,"A",2,1,"B",2.5,2,"B",2.5,17,"A",3.5,16,"A",3.5,4,,"",,,"",,,"",,1,0,0,1,1,,,,"local","co","colc1fir",0,0,0,1,1,0,0,0,2,1,0.5,0.25,"local",0.5,2.647
9,0,"B",2.5,1,"B",1.8,2,"B",1.8,17,"A",2,16,"A",3.5,4,,"",,,"",,,"",,0,0,0,1,1,,,,"local","co","colc1fir",1,0,0,1,1,0,0,0,2,2,0.5,0.5,"local",0.5,2.647
10,0,"B",1.8,1,"B",1,2,"B",1.8,17,"B",2.5,16,"A",2,4,,"",,,"",,,"",,0,0,0,0,1,,,,"local","co","colc1fir",0,0,0,0,1,0,0,0,1,2,0.25,0.5,"local",0.5,2.647
11,0,"B",3.2,1,"A",0.5,2,"B",1.8,17,"A",3.5,16,"A",3.5,4,,"",,,"",,,"",,0,1,0,1,1,,,,"local","co","colc1fir",0,1,0,1,1,0,0,0,3,1,0.75,0.25,"local",0.5,2.647
12,0,"B",2.5,1,"A",0.5,2,"B",2.5,17,"B",3.2,16,"A",2,4,,"",,,"",,,"",,0,1,0,0,1,,,,"local","co","colc1fir",0,1,0,0,1,0,0,0,2,3,0.5,0.75,"local",0.5,2.647
13,0,"B",2.5,1,"A",0.5,2,"B",3.2,17,"B",3.2,16,"A",0.5,4,,"",,,"",,,"",,0,1,0,0,1,,,,"local","co","colc1fir",0,1,0,0,1,0,0,0,2,2,0.5,0.5,"local",0.5,2.647
14,0,"B",3.2,1,"A",2,2,"A",3.5,17,"B",3.2,16,"A",2,4,,"",,,"",,,"",,0,1,1,0,1,,,,"local","co","colc1fir",0,1,1,0,1,0,0,0,3,2,0.75,0.5,"local",0.5,2.647
I have no idea how good this is! :) No, I'm not associated with any companies that make or market this. And, no, I've never even tried it out before. You now know everything about it (almost) that I do.
Best of luck.

Modify EXIF/IPTC info in .dng (rawfiles) via Python?

Anyone aware of some Python module or library capable of modifying EXIF and IPTC data in Adobe RAW files (.dng)? Until some eight years ago, I used JPEG and could rather easily do such modifications helped by Python. After having switched to RAW, I have to use image tools to modify EXIF info.
Primarily the EXIF Taken date is of interest to be modified, but some IPTC-fields are also candidates of modification.
(I'm geotagging photos from my cameras each of which have RTC's that creeps in various directions and amounts. My 'worst' camera 'hurries' ~2.4 sec per day. Before matching photodates with .gpx-data from a GPS-logger, I need to modify the Taken date with various amounts depending on number of days since cameraclocksetting.)

In one of my projects I use GExiv2 (https://wiki.gnome.org/Projects/gexiv2) with the PyGObject bindings (https://wiki.gnome.org/Projects/PyGObject). GExiv2 is a wrapper around exiv2, which can read & write Exif, IPTC and XMP metadata in DNG files: http://www.exiv2.org/manpage.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.