I want to convert a CSV file of time-series data with
multiple sensors.
This is what the data currently looks like:
The different sensors are described by numbers and have different numbers of axes. If a new activity is labeled, everything below belongs to this new label. The label is in the same column as the first entry of each sensor.
This is the way I would like the data to be:
Each sensor axis has its own column and the according label is added in the last column.
So far, I have created a DataObject class to access timestamp, sensortype, sensorvalues, and the belonging parent_label for each row in the CSV.
I thought the most convenient way to solve this would be by using pandas DataFrame but simply using pd.DataFrame(timestamp, sensortype, sensorvalues, label)
won't work.
Any ideas/hints? Maybe other ways to solve this problem?
I am fairly new to programming, especially Python, so I have already run out of ideas.
Thanks in advance
Try creating a numpy matrix of the columns you require then convert them to a pandas DataFrame.
Otherwise, you can also try to import the csv using pandas from the start.
Also for the following
pd.DataFrame(timestamp, sensortype, sensorvalues, label)
try referring to the pd.concat function as well. You would need to convert each array to a DataFrame, put them in a list and then concat them with pandas.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
Related
I am trying to manipulate my dataframe before I conduct network analysis using networkx.
Here is an sample of data i got:
sample data
I am trying to use the title and cast columns and trun them to something like this:
ideal format
The ideal result is to have one column for each individual actor and the movie/show that he/she is in. If the actor has more than 1 show/movie, I want to have different rows for that actor as well.
Could someone please advise me on how to make it happen? Thank you!!
So to use pandas you first import into the dataframe. Lets call it "f".
import pandas
f = pandas.read_csv('path/to/csv')
after that you can access individual columns by doing:
f['title']
similar to a dictionary. if you want both in the same dataframe, pass in a list of columns like so:
f[['title', 'cast']]
that is as much as I can provide without knowing the extent of the project.
I have a seemingly complicated problem and I have a general idea of how I should solve it but I am not sure if it is the best way to go about it. I'll give the scenario and would appreciate any help on how to break this down. I'm fairly new with Pandas so please excuse my ignorance.
The Scenario
I have a CSV file that I import as a dataframe. My example I am working through contains 2742 rows × 136 columns. The rows are variable but the columns are set. I have a set of 23 lookup tables (also as CSV files) named per year, per quarter (range is 2020 3rd quarter - 2015 1st quarter) The lookup files are named as such: PPRRVU203.csv. So that contains values from the 3rd quarter of 2020. The lookup tables are matched by two columns ('Code' and 'Mod') and I use three values that are associated in the lookup.
I am trying to filter sections of my data frame, pull the correct values from the matching lookup file, merge back into the original subset, and then replace into the original dataframe.
Thoughts
I can probably abstract this and wrap in a function but not sure how I can place back in. My question, for those that understand Pandas better than myself, what is the best method to filter, replace the values, and write the file back out.
The straight forward solution would be to filter the original dataframe into 23 separate dataframes, then do the merge on each individual file, then concat into a new dataframe and output to CSV.
This seems highly inefficient?
I can post code but I am looking for more of any high-level thoughts?
Not sure exactly how your DataFrame looks like but Pandas.query() method will maybe prove useful for the selection of data.
name = df.query('columnname == "something"')
I assume this is a trick question on this hw i'm working on but maybe it's not?
What object do you get after reading a csv file?
data frame
character vector
panel
all of the above
From what I know, you can use pandas to read in a csv file into a dataframe. But i know a panel is a data structure in pandas too...character vector I've never even heard of.
Any one got any ideas? I'm fairly certain the answer is just dataframe, but hey never know.
The time when you read a CSV file into a variable it is stored as a pandas.core.frame.DataFrame object which you are familiar of.
Now, talking about Panel, which represents wide format panel data, stored as 3-dimensional array have been deprecated since version 0.20.0 as listed Pandas Panel.
I have a stata .dta file. If I open it in stata, I can see several columns with value labels. I can go into browse, click on one of them, and see the original code behind the label.
If I read this .dta file into python via pd.read_stata(..., convert_categoricals=True), I can get the data types via df.dtypes.
For some of the columns, categories have been created. However, for one column of interest, instead a series with dtype Object has been created, which contains the labels as string.
How exactly does the process of category creation in pd.read_stata work?
How can I access the original data codes behind the labels when reading in with convert_categorical=True
What do I do in the case where columns are converted into dtype Object -- do I have to read in the data frame a second time with convert_categoricals=False and merge? That really sounds non-pythonic.
I want to create a "presentation ready" excel document with embedded pandas DataFrames and additional data and formatting
A typical document will include some titles and meta data, several Data Frames with sum row\column for each data frame.
The DataFrame itself should be formatted
The best thing I found was this which explains how to use pandas with XlsxWriter.
The main problem is that there's no apparent method to get the exact location of the embedded DataFrame to add the summary row below (the shape of the DataFrame is a good estimate, but it might no be exact when rendering complex DataFrames.
If there's a solution that relies on some kind of template, and not hard coding it would be even better.