Import Only Necessary CSV Columns In IDL - python

I am struggling to find a function in IDL that will replicate something I have done in Python with Pandas. I am new to IDL and there is next to nothing resource wise that I can find.
In Python, I use the following:
pd.read_csv('<csv filepath>', usecols=[n])
The usecols part will only pull in the columns of a CSV I would like in my data frame. Is there a way to do this in IDL?
I hope this makes sense - my first post here!
Thanks.

There is a READ_CSV routine that can read CSV files, but it does not have a way to pull out specific columns. It will give you a structure with one field for each column of the CSV file — so you could just grab the column you need from the structure and throwing away the rest of the structure. Something like:
csv = read_csv('somefile.csv')
col_n = csv.(n)

Related

Excel data reshaping with Python

I have a particular spreadsheet which has point of sale data exported feom a sql database.
Im trying to migrate to a new point of sale system and si i need to copy this data that i exported into a csv file into another csv file which has a different format, for example different columns thatvi have to rearrange the original data into.
Im trto do this using python but im failing to find a way to automate this task.
Does anyone have any ideas or any videos on a similar project
Pandas seems like the python tool for you.
Open up the first CSV file with Pandas as a DataFrame, apply any modifications you want, and save as a new CSV file. There is A LOT of documentation and support for Pandas, so I'm sure you can find tutorials on how to do any kind of data reshaping that you want.

Python Processing large amount of data from excel

I've got very hard task to do. I need to process Excel file with 6336 rows x 53 columns. My task is to create program which:
Read data from input Excel file.
Sort all rows by specific column data, for eg. Sort by A1:A(last)
Place columns in new output Excel file by given order, for eg.
SaleCity Branch CustomerID InvoiceNum
Old File For eg. Old File Merge old file cols
Col[A1:A(last)] SaleCity='Oklahoma' Col[M1:M(last) Col[K1:K(last) &
Branch='OKL GamesShop' B1:B(last)]
Save new excel File.
Excel Sample:
Excel
(All data in this post is not real so don't try to hack someone or something :D)
I know that I did not provide any code but to be honest I tried solving it by myself and I don't even know which module I should use. I tried using OpenPyXl and Pandas but there's too much data for my capabilities.
Thank you in advance for any help. If I asked the question in the wrong place, please direct me to the right one.
Edit:
To be clear. I'm not asking for full solution here. What am I asking for is guidance and mentority.
I would recommend you to use PySpark. It is more difficult than pandas, but the parallelization provided will help with yours large excel files.
Or you could also use multiprocessing lib from python to paralelize pandas functions.
https://towardsdatascience.com/make-your-own-super-pandas-using-multiproc-1c04f41944a1

Transposing first column in dataframe in pandas

I am fairly new to Python and I just do pretty basic stuff. So when I managed to sort this out by myself I was really chuffed, although I am not sure if this is the most pythonic way to do it.
I had a csv file that contained this information when read the normal way:
I wanted the items in the first-row to be the column headers.
So I used Transpose:
df_t=df.T
And save it as a new file, without the header to remove the existing headings.
df_t.to_csv('employment.csv)
When opening the new file, this is what I have:
As I said, not the most pythonic, but it seems to work. Any suggestions on how to improve this, will be most appreciated.

How do I tell python what my data structure (that is in binary) looks like so I can plot it?

I have a data set that looks like this.
b'\xa3\x95\x80\x80YFMT\x00BBnNZ\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Type,Length,Name,Format,Columns\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa3\x95\x80\x81\x17PARMNf\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Name,Value\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa3\x95\x80\x82-GPS\x00BIHBcLLeeEefI\x00\x00\x00Status,TimeMS,Week,NSats,HDop,Lat,Lng,RelAlt,Alt,Spd,GCrs,VZ,T\x00\x00\xa3\x95\x80\x83\x1fIMU\x00Iffffff\x00\x00\x00\x00\x00\x00\x00\x00\x00TimeMS,GyrX,GyrY,G
I have been reading around to try and find how do I implement a code into python that will allow me to parse this data so that I can plot some of the column against each other (Mostly time).
Some things I found that may help in doing this:
There is a code that will allow me to convert this data into a CSV file. I know how to use the code and convert it to a CSV file and plot from there, but for a learning experience I want to be able to do this without converting it to a CSV file. Now I tried reading that code but I am clueless since I am very new to python. Here is the link to the code:
https://github.com/PX4/Firmware/blob/master/Tools/sdlog2/sdlog2_dump.py
Also, Someone posted this saying this might be the log format, but again I couldn't understand or run any code on that page.
http://dev.px4.io/advanced-ulog-file-format.html
A good starting point for parsing binary data is the struct module https://docs.python.org/3/library/struct.html and it's unpack function. That's what the CSV dump routine you linked to is doing as well. If you walk through the process method, it's doing the following:
Read a chunk of binary data
Figure out if it has a valid header
Check the message type - if it's a FORMAT message parse that. If it's
a description message, parse that.
Dump out a CSV row
You could modify this code to essentially replace the __printCSVRow method with something that captures the data into a pandas dataframe (or other handy data structure) so that when the main routine is all done you can grab all the data from the dataframe and plot it.

Script that converts html tables to CSV (preferably python)

I have a large number of html tables that I'd like to convert into CSV. Pasting individual tables into excel and saving them as .csv works, as does pasting the html tables into simple online converters. But I have thousands of individual tables, so I need a script that can automate the conversion process.
I was wondering if anyone has any suggestions as to how I could go about doing this? Python is the only language I have a decent knowledge of, so some sort of python script would be ideal. I've searched for similar questions, but all the python examples I've found are quite complicated to me, and go beyond my basic level of understanding.
Any advice would be much appreciated.
Use pandas. It has a function to read html tables into a data structure, and then a function that will write that data structure to a csv file.
import pandas as pd
url = 'http://myurl.com/mypage/'
for i, df in enumerate(pd.read_html(url)):
df.to_csv('myfile_%s.csv' % i)
Note that since an html page may have more than one table, the function to get the table always returns a list of tables (even if there is only one table). That is why I use a loop here.

Categories