Picking out a specific column in a table - python

My goal is to import a table of astrophysical data that I have saved to my computer (obtained from matching 2 other tables in TOPCAT, if you know it), and extract certain relevant columns. I hope to then do further manipulations on these columns. I am a complete beginner in python, so I apologise for basic errors. I've done my best to try and solve my problem on my own but I'm a bit lost.
This script I have written so far:
import pandas as pd
input_file = "location\\filename"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])
The file that I'm trying to import is listed as having file type "File", in my drive. I've looked at this file in Notepad and it has a lot of descriptive bumf in the first few rows, so to try and get rid of this I've used "skiprows" as you can see. The data in the file is separated column-wise by lines--at least that's how it appears in Notepad.
The problem is when I try to extract the first column using "usecol" it instead returns what appears to be the first row in the command window, as well as a load of vertical bars between each value. I assume it is somehow not interpreting the table correctly? Not understanding what's a column and what's a row.
What I've tried: Modifying the file and saving it in a different filetype. This gives the following error:
FileNotFoundError: \[Errno 2\] No such file or directory: 'location\\filename'
Despite the fact that the new file is saved in exactly the same location.
I've tried using "pd.read_table" instead of csv, but this doesn't seem to change anything (nor does it give me an error).
When I've tried to extract multiple columns (ie "usecol=[1,2]") I get the following error:
ValueError: Usecols do not match columns, columns expected but not found: \[1, 2\]
My hope is that someone with experience can give some insight into what's likely going on to cause these problems.

Maybie you can try dataset.iloc[:,0] . With iloc you can extract the column or line you want by index(not only). [:,0] for all the lines of 1st column.

The file is incorrectly named.
I expect that you are reading a csv file or an xlsx or txt file. So the (windows) path would look similar to this:
import pandas as pd
input_file = "C:\\python\\tests\\test_csv.csv"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])
The error message tell you this:
No such file or directory: 'location\\filename'

Related

Reading a .csv file in a CombiTimeTable in Dymola using SimulateExtendedModel() function through Python

I am trying to write a establish a Dymola python interface where a .csv file is read and given in the CombiTimeTable in Dymola. But I do not find any help to do that. Even with Dymola Script/Command I wasn't able to give in my csv file and a filename.
Does anyone have any idea how would i be able to do so?
P.S. I have also tried the ExternData package.
I have used
Modelica.Blocks.Sources.CombiTimeTable CombiTimeTable(tableOnFile=True, tableName="",fileName="")
And
innerparameter ExternData
Update
Found a alternative approach of doing it with simulateExtendedModel() as:
dymola.simulateExtendedModel("Model",startTime=0, stopTime=1, numberOfIntervals=0, outputInterval=300,
initialNames={"CombiTimeTable.table"}, initialValues={DataFiles.readCSVmatrix("Z:\your.csv")});
Still struggling with the errors but in the combiTimeTable it's possible to write a .csv in the table, whereas .txt and .mat under tableOnFile.
i have myself encountered a similar issue some time ago.
If you look at the documentation CombiTimeTable Documentation you can see that you can provide a .txt file as a source for the time table.
Here is a bit of python code used to process a csv into a suitable file (it may not be the best way, but it works perfectly for me). I am using the python pandas library :
import pandas as pd
"""Creating a dataframe by reading from your source csv file"""
data = pd.read_csv("Path/To/Your/SourceFile.csv")
"""Creating the entry file for the combi time table"""
f = open (file="Path/To/Dymola/Source.txt" ,mode="w+")
"""Writing whatever Dymola needs to read the file (Check the CombiTimeTable documentation for more informations) """
f.write("#1 \n")
f.write("double tab1("+str(len(data.axes[0]))+","+str(len(data.axes[1]))+") # comment line" +"\n")
f.close()
"""Using the to_csv method by pointing to an existing txt file, and using the 'a' (append) mode will cause the values from the dataframe to be next to the existing text"""
data.to_csv("Path/To/Dymola/Source.txt",mode ="a",sep = '\t', index = None, header=False)
Here is an example of what it returns me : Example of a generated CombiTimeTable source file
Some precisions :
I am assuming that your data from your csv have headers.
The first column in your csv contains time points (in my example they represent minutes).
Dymola side : Here is how you can setup the CombiTimeTable (based on my example) : CombiTimeTable setup example
I hope that you'll find this answer helpful.

Python reads only one column from my CSV file

first post here.
I am very new to programmation, sorry if it is confused.
I made a database by collecting multiple different data online. All these data are in one xlsx file (each data a column), that I converted in csv afterwards because my teacher only showed us how to use csv file in Python.
I installed pandas and make it read my csv file, but it seems it doesnt understand that I have multiple columns, it reads one column. Thus, I can't get the info on each data (and so i can't transform the data).
I tried df.info() and df.info(verbose=True, show_counts=True) but it makes the same thing
len(df.columns) = 1 which proves it doesnt see that each data has its own column
len(df) = 1923, which is right
I was expecting that : https://imgur.com/a/UROKtxN (different project, not the same database)
database used: https://imgur.com/a/Wl1tsYb
And I have that instead : https://imgur.com/a/iV38YNe
database used: https://imgur.com/a/VefFrL4
idk, it looks pretty similar, why doesn't work :((
Thanks.

Reading csv-file in Python

I know this question has been asked a lot, but none of the solutions I can find seems to work.
I'm trying to read a csv in python using pandas. The csv file 'data.csv' contains 8 comma separated and no header in the format:
T,000027E7,24.56,3.41,5.03,12,1260497437.817,4,0.18
T,00006726,28.84,8.24,5.03,14,1260497437.818,4,3.62
However, when using the command below, only a single column containing all values is outputted.
import pandas as pd
data2=pd.read_csv('data.csv',header=None)
I've also tried specifying names of each column to no avail.
data2=pd.read_csv('data.csv',header=None, names=['Type','TagID','x','y','z','BatLvl','TimeStamp','Unit','DQI'])
Does anybody know of a way to solve this?

pd.read_excel does recognize the file but does not actually read it

I've been busy working on some code and one part of it is importing an excel file. I've been using the code below. Now, on one pc it works but on another it does not (I did change the paths though). Python does recognize the excel file and does not give an error when loading, but when I print the table it says:
Empty DataFrame
Columns: []
Index: []
Just to be sure, I checked the filepath which seems to be correct. I also checked the sheetname but that is all good too.
df = pd.read_excel(book_filepath, sheet_name='Potentie_alles')
description = df["#"].map(str)
The key error '#' (# is the header of the first column of the sheet).
Does anyone know how to fix this?
Kind regards,
iCookieMonster

Creating a dataframe from a csv file in pandas: column issue

I have a messy text file that I need to sort into columns in a dataframe so I
can do the data analysis I need to do. Here is the messy looking file:
Messy text
I can read it in as a csv file, that looks a bit nicer using:
import pandas as pd
data = pd.read_csv('phx_30kV_indepth_0_0_outfile.txt')
print(data)
And this prints out the data aligned, but the issue is that the output is [640 rows x 1 column]. And I need to separate it into multiple columns and manipulate it as a dataframe.
I have tried a number of solutions using StringIO that have worked here before, but nothing seems to be doing the trick.
However, when I do this, there is the issue that the
delim_whitespace=True
Link to docs ^
df = pd.read_csv('phx_30kV_indepth_0_0_outfile.txt', delim_whitespace=True)
Your input file is actually not in CSV format.
As you provided only .png picture, it is even not clear, whether this file
is divided into rows or not.
If not, you have to start from "cutting" the content into individual lines and
read the content from the output file - result of this cutting.
I think, this is the first step, before you can use either read_csv or read_table (of course, with delim_whitespace=True).

Categories