How to pass a Python DataFrame to UiPath?

How to pass a Python DataFrame to UiPath? - python

I'm calling a python function using the UiPath Python Activities Pack (Get Python Object) and it returns a DataFrame in order to use it within UiPath. Unfortunately, UiPath is not able to convert the DataFrame to a .Net DataType like a DataTable.
Even when I try to convert the DataFrame to any other format (String, numpy array, html etc.) it is not working although the documentation mentions explicitly that all DataTypes are supported. The Python script does its work an stores the Content of the DataFrame in an Excel file and I could, of course, just read the Excel file. I was just wondering whether there is a way to directly pass the data to UiPath instead of saving it first and reading it again.

Actually, I spent quite some time on this but finally figured out how to pass the pandas DataFrame to UiPath and make it available there in a dataTable. I explain how I did it in the following:
Python Script:
I let the python function that I call in the UiPath 'Invoke Paython Method' activity return the pandas dataframe as a JSON string, i.e.
return df.to_json(orient='records')
Get Python Object:
Save the JSON string in a variable of type string
Deserialize JSON:
Choose 'System.Data.DataTable' as TypeArgument and store the result in a variable of type dataTable
Now the Data from the pandas dataFrame is available in a .Net dataTable in Uipath.

Related

What is a pandas.core.Frame.DataFrame, and how to convert it to pd.DataFrame?

Currently I was trying to do a machine learning classification of 6 time series datasets (in .csv format) using MiniRocket, an sktime machine learning package. However, when I imported the .csv files using pd.read_csv and run them through MiniRocket, the error "TypeError: X must be in an sktime compatible format" pops up, and it says that the following data types are sktime compatible:
['pd.Series', 'pd.DataFrame', 'np.ndarray', 'nested_univ', 'numpy3D', 'pd-multiindex', 'df-list', 'pd_multiindex_hier']
Then I checked the data type of my imported .csv files and got "pandas.core.Frame.DataFrame", which is a data type that I never saw before and is obviously different from the sktime compatible pd.DataFrame. What is the difference between pandas.core.Frame.DataFrame and pd.DataFrame, and how to convert pandas.core.Frame.DataFrame to the sktime compatible pd.DataFrame?
I tried to convert pandas.core.Frame.DataFrame to pd.DataFrame using df.join and df.pop functions, but neither of them was able to convert my data from pandas.core.Frame.DataFrame to pd.DataFrame (after conversion I checked the type again and it is still the same).

If you just take the values from your old DataFrame with .values, you can create a new DataFrame the standard way. If you want to keep the same columns and index values, just set those when you declare your new DataFrame.
df_new = pd.DataFrame(df_old.values, columns=df_old.columns, index=df_old.index)

Most of the pandas classes are defined under pandas.core folder: https://github.com/pandas-dev/pandas/tree/main/pandas/core.
For example, class DataFrame is defined in pandas.core.frame.py:
class DataFrame(NDFrame, OpsMixin):
...
def __init__(...)
...
Pandas is not yet a py.typed library PEP 561, hence the public API documentation uses pandas.DataFrame but internally all error messages still refer to the source file structure such as pandas.core.frame.DataFrame.

Apache beam: Reading and transforming multiple data types from single file

Is there a way to read each data type as it is by a PCollection from a CSV file?
By default, all the values in a row read by a PCollection are converted into a list of strings, but is there a way such that, an integer is considered as integer, float as float, double as double, and string as string, etc.
So that, the PTransformations can be easily performed on each value of the row separately.
Or is it has to be done externally using a ParDo function?

The root of your issue is that a CSV file only contains strings, so it is necessary to parse the strings as whatever type you know the column contains.
A convenient way to do this is to use Beam's pandas-compatible dataframe API to read your CSV files, as in:
pipeline | read_csv(...)
This will use pandas to sample the CSV file and guess what the types may be for each column.
You can see more examples and explanation at https://beam.apache.org/documentation/dsls/dataframes/overview/#using-dataframes

key Error 'data' when loading rdata into python using pyreadr

So until now I was able to use the pyreadr package to load rdata files to python. Somehow this time I keep getting a key error 'data'
# load the data set:
rdata_read = pyreadr.read_r("/content/GrowthData.rda")
data = rdata_read[ 'data' ]
n = data.shape[0]
Where does the error come from ?
Furthermore, I found out that the type of this is a "collections.OrderedDict" which is new to me and never happened before. Consequently, I tried to convert it to a pandas data frame. Unfortunately, I could not convert this type to a pandas data frame either as I receive the error "must pass a 2-D array". Hence, I am very confused right now and don't know how I can access this data via python and work with it. Appreciate any help!!

Pyreadr read_r function always gives back a OrderedDict (think of it just as a regular python dictionary, the distinction was important in older versions of python, not anymore), where the keys of the dictionary are the name of the object (dataframe) as it was set in R, and the value is the dataframe. You can read about this in the README
The reason why it returns a dictionary is because in an RData file you can save multiple objects (dataframes), therefore pyreadr has to give a way to return multiple dataframes you can recognize by their name.
In R you would do:
save(dataframe1, dataframe2, file="GrowthData.rda")
What I would suggest you to do in python, is after you have read the data, explore what keys you have in there:
# load the data set:
rdata_read = pyreadr.read_r("/content/GrowthData.rda")
print(rdata_read.keys())
# would print dataframe1, dataframe2 in the above example
this will tell you what objects have been saved in the Rdata file and you can retrieve as you were doing before
data = rdata_read['dataframe1']

Converting oracle.sql.STRUCT# using python (into geojson or dataframe)

I would like to convert an Oracle DB frame including a 'SHAPE' column that hold the 'oracle.sql.STRUCT#' information into something more accessible, either a geojson/shapefile/dataframe either using Python/R or SQL.
Any ideas?

Create your frame with a query using one of the SDO_UTIL functions to convert the shape (sdo_geometry type) to a type easily consumed by Python/R, i.e. wkb, wkt, geojson. For example SDO_UTIL.TO_WKTGEOMETRY(shape). See info on the conversion functions here; https://docs.oracle.com/en/database/oracle/oracle-database/19/spatl/SDO_UTIL-reference.html

python equivalent to listObjects in VBA for Excel (tables)

I have implemented a program in VBA for excel to generate automatic communications based on user inputs (selections of cells).
Such Macro written in VBA uses extensively the listObject function of VBA
i.e.
defining a table (list object)
Dim ClsSht As Worksheet
Set ClsSht = ThisWorkbook.Sheets("paragraph texts")
Dim ClsTbl As ListObject
Set ClsTbl = ClsSht.ListObjects(1)
accessing the table in the code in a very logical manner:
ClsTbl being now the table where I want to pick up data.
myvariable= ClsTbl.listcolumns("D1").databodyrange.item(34).value
Which means myvariable is the item (row) 34 of the data of the column D1 of the table clstbl
I decided to learn python to "translate" all that code into python and make a django based program accesable for anyone.
I am a beginner in Python and I am wondering what would be the equivalent in python to listobject of VBA. This decision will shape my whole program in python from the beginning, and I am hesitating a lot to decide what is the python equivalent to listobject in VBA.
The main idea here getting a way where I can access tables-data in a readable way,
i.e. give me the value of column "text" where column "chapter" is 3 and column paragraph is "2". The values are unique, meaning there is only one value in "text" column where that occurs.
Some observations:
I know everything can be done with lists in python, lists can contain lists that can contain lists..., but this is terrible for readability. mylist1[2][3] (assuming for instance that every row could be a list of values, and the whole table a list of lists of rows).
I don't considered an option to build any database. There are multiple relatively small tables (from 10 to 500 rows and from 3 to 15 columns) that are related but not in a database manner. That would force me to learn yet another language SQL or so, and I have more than enough with python and DJango.
The user modifies the structure of many tables (chapters coming together or getting splitted.
the data is 100% strings. The only integers are numbers to sort out text. I don't perform any mathematical operation with values but simple add together pieces of text and make replacements in texts.
the tables will be load into Python as CSV text files.
Please indicate me if there is something not enough clear in the question and I will complete it
Would it be necesary to operate with numpy? pandas?
i.e give me the value of cell

A DataFrame using pandas should provide everything you need, i.e. converstion to strings, manipulation, import and export. As a start, try
import pandas as pd
df = pd.read_csv('your_file.csv')
print(df)
print(df['text'])
The entries of the first row will be converted to labels of the DataFrame columns.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.