I keep seeing
for index, row in group.object.iterrows():
in Tensorflow tutorials. I get what it's doing, and that group is a GroupBy object, but I wonder what the ".object" is there for. I googled "group.object.iterrows", all I got was Tensorflow object detection code. I tried other variants, but nothing had a GroupBy.object example or description of what it is.
EDIT: here's a tutorial:
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/blob/master/generate_tfrecord.py
See line 70.
Here's another, there are a bunch, actually:
https://www.skcript.com/svr/realtime-object-and-face-detection-in-android-using-tensorflow-object-detection-api/
Some more context:
They involve making a tensorflow.train.Example, loading features into it. These were originally taken from some xml from some producing labeling tools, then converted to a csv, then converted to a pandas data frame.
In fact, the code mostly looks like cut-and-paste from some original script with small edits.
Like a DataFrame, a Pandas GroupBy object supports accessing columns by attribute access notation, as long as the column name doesn't conflict with "regular" attributes. object is merely one of the column names in the grouped data, and group.object accesses that column.
object is a column in the group DataFrame.
Related
I've used TERR to make calculated columns and other types of data functions in Spotfire and am happy to hear you can now use Python. I did a simple test to ensure things are working (x3 = x2*2) - that's literally script i wrote in the python data function window and then set up the input paramters (x2 as a column) and the output paramters (x3) to be a new column....the values come out fine but the newly calculated column comes out as named x2(2)....i looked into the input/output parameters and all the names are correct, yet the column still comes out named that way. My concern is that this is uber simple, yet why is the column not being named what is in the script even though everything is set up correct. There is even a Youtube video by a Spotfire employee, the same thing happens to them and the don't mention it at all.
Has anybody else run across this?
It does seem to differ from how the equivalent TERR data function works. I consulted with the Spotfire engineering team, and here is what they suggest. It has to do with how a Column input is handled internally in Python vs TERR. In both Python and TERR, inputs (and outputs) are passed over as a table. In TERR's case a data.frame, and in Python's case a pandas.DataFrame. In TERR's case though, if the Data Function says the input is a Column, this is actually converted from a 1-column data.frame to a vector of the equivalent type; similarly, for a Value it is converted from its 1x1 data.frame to a scalar type. In Python, Value inputs are treated the same, but Column inputs are left as a pandas.Series, which retains the column name from the original input column.
Maybe you can try something different. You wouldn't want to convert it to a standard Python list, because in that case, x2*2 would actually make the column twice as long, rather than a vectorised arithmetic operation. But you could make it a straight numpy array instead. You can try adding "x2 = x2.to_numpy()" at the top of your example, and see if the result matches what you expected.
I want to convert a CSV file of time-series data with
multiple sensors.
This is what the data currently looks like:
The different sensors are described by numbers and have different numbers of axes. If a new activity is labeled, everything below belongs to this new label. The label is in the same column as the first entry of each sensor.
This is the way I would like the data to be:
Each sensor axis has its own column and the according label is added in the last column.
So far, I have created a DataObject class to access timestamp, sensortype, sensorvalues, and the belonging parent_label for each row in the CSV.
I thought the most convenient way to solve this would be by using pandas DataFrame but simply using pd.DataFrame(timestamp, sensortype, sensorvalues, label)
won't work.
Any ideas/hints? Maybe other ways to solve this problem?
I am fairly new to programming, especially Python, so I have already run out of ideas.
Thanks in advance
Try creating a numpy matrix of the columns you require then convert them to a pandas DataFrame.
Otherwise, you can also try to import the csv using pandas from the start.
Also for the following
pd.DataFrame(timestamp, sensortype, sensorvalues, label)
try referring to the pd.concat function as well. You would need to convert each array to a DataFrame, put them in a list and then concat them with pandas.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
I have a seemingly complicated problem and I have a general idea of how I should solve it but I am not sure if it is the best way to go about it. I'll give the scenario and would appreciate any help on how to break this down. I'm fairly new with Pandas so please excuse my ignorance.
The Scenario
I have a CSV file that I import as a dataframe. My example I am working through contains 2742 rows × 136 columns. The rows are variable but the columns are set. I have a set of 23 lookup tables (also as CSV files) named per year, per quarter (range is 2020 3rd quarter - 2015 1st quarter) The lookup files are named as such: PPRRVU203.csv. So that contains values from the 3rd quarter of 2020. The lookup tables are matched by two columns ('Code' and 'Mod') and I use three values that are associated in the lookup.
I am trying to filter sections of my data frame, pull the correct values from the matching lookup file, merge back into the original subset, and then replace into the original dataframe.
Thoughts
I can probably abstract this and wrap in a function but not sure how I can place back in. My question, for those that understand Pandas better than myself, what is the best method to filter, replace the values, and write the file back out.
The straight forward solution would be to filter the original dataframe into 23 separate dataframes, then do the merge on each individual file, then concat into a new dataframe and output to CSV.
This seems highly inefficient?
I can post code but I am looking for more of any high-level thoughts?
Not sure exactly how your DataFrame looks like but Pandas.query() method will maybe prove useful for the selection of data.
name = df.query('columnname == "something"')
I have this dataframe from where I need to exact the act1omschr from the column adresactiviteit, however sinds it is an object with a list and dict I don't know how to extract these values.
Can someone help me out?
It looks like that's not a dictionary, but a 'json' (java script object notation). It's a bit like a csv but with nested values and pretty comumn especially for web data.
Pandas has a function called 'json_normalize' which should help. For specifically using it on one column, this was answered pretty well over here. You should more or less be able to use the exact code given.
I have a stata .dta file. If I open it in stata, I can see several columns with value labels. I can go into browse, click on one of them, and see the original code behind the label.
If I read this .dta file into python via pd.read_stata(..., convert_categoricals=True), I can get the data types via df.dtypes.
For some of the columns, categories have been created. However, for one column of interest, instead a series with dtype Object has been created, which contains the labels as string.
How exactly does the process of category creation in pd.read_stata work?
How can I access the original data codes behind the labels when reading in with convert_categorical=True
What do I do in the case where columns are converted into dtype Object -- do I have to read in the data frame a second time with convert_categoricals=False and merge? That really sounds non-pythonic.