Create cumsum column with Python Script Widget in Orange - python

I can't create one new column with the cumulative sum of another.
Orange documentation is to hard to understand if you are new to Python like me.
This is the code i have in my Python Script Widget
import numpy as np
## make a copy from the data that widget recieves
out_data = in_data.copy()
## compute cumulative sum of column values
newCumsumColValues = np.cumsum(out_data[:,('myCumsumcolTarget')])
## i got the values
print(newCumsumColValuesl)
## i need to create a new column with the values in out_data
## i've tried to update column values first to test
## with an static value column values updated to 8
out_data[:,'myCumsumcolTarget'] = 8
## with newCumsumColValue is NOT working
out_data[:,'myCumsumcolTarget'] = newCumsumColValues
These examples are hard to understand for me:
https://docs.orange.biolab.si/3/visual-programming/widgets/data/pythonscript.html
https://docs.orange.biolab.si/3/data-mining-library/tutorial/data.html#exploration-of-the-data-domain
Thanks in advance,
Vince.

Try:
out_data.X[:, i] = newCumsumColValues
where i is
out_data.domain.index(out_data.domain['myCumsumcolTarget'])
This code is a bit complicated but it works.

Related

An efficient way to fill a dearpygui table using pandas

For now, I just make each column in to a list df['Name'].to_list() -> zip(list1,list2 ,....) all the lists, and iterate over them and then I add them in the table.
I would imagine this is far from an ideal solution. Is there anything better to fill the dearpygui table while using pandas?
I don't know much about your approach but here is a generalized example of what i use:
dataset = pd.read_csv(filename) # Take your df from wherever
with dpg.table(label='DatasetTable'):
for i in range(dataset.shape[1]): # Generates the correct amount of columns
dpg.add_table_column(label=dataset.columns[i]) # Adds the headers
for i in range(n): # Shows the first n rows
with dpg.table_row():
for j in range(dataset.shape[1]):
dpg.add_text(f"{dataset.iloc[i,j]}") # Displays the value of
# each row/column combination
I hope it can be useful to someone.

Creating multiples pandas dataframes as outputs of a function iteration over a list

I'm trying to use the function itis.hierarchy_full of the pytaxize package in order to retrieve information about a biological species from a specific Id.
The function takes only one values/Id and save all the taxonomic information inside a pandas dataframe that I can edit later.
import pandas as pd
from pytaxize import itis
test1 = itis.hierarchy_full(180530, as_dataframe = True)
I have something like 800 species Ids, and I want to automate the process to obtain 800 different dataframes.
I have somehow created a test with a small list (be aware, I am a biologist so the code is really basic and maybe inefficient:
species = [180530, 48739, 567823]
tx = {}
for e in species2:
tx[e] = pd.DataFrame(itis.hierarchy_full(e, as_dataframe = True))
Now if I input tx (I'm using a Jupyter Notebook) I obtain a dictionary of pandas dataframes (I think it is a nested dictionary). And if I input tx[180530] I obtain exactly a single dataframe equal to the ones that I can create with the original function.
from pandas.testing import assert_frame_equal
assert_frame_equal(test_180530, sp_180530)
Now I can write something to save each result stored in dictionary as a separate dataframe:
sp_180530 = tx[180530]
sp_48739 = tx[48739]
sp_567823 = tx[567823]
There is a way to automate the process and save each dataframe to a sp_id? Or even better, there is a way to include in the original function where I create tx, to output directly multiple dataframes?
Not exactly what you asked, but to be able to elaborate a bit more on working with the dataframes in the dictionary... To work with the dictionary, loop over the dict and then use every contained dataframe one by one...
for key in tx.keys():
df_temp = tx[key]
# < do all your stuff to df_temp .....>
# Save the dataframe as you want/need (I assume as csv for here)
df_temp.to_csv(f'sp_{key}.csv')

Create a column where values are max of range of another column in python

My Problem
I am trying to create a column in python where each value is equal to the max value of the last 64 rows of another column i.e. to find the rolling 64 day high of a stock.
I am currently using the following code, but it is really slow because of the loops. I want to try and re-do it without using loops. The dataset is simply the last closing price of a stock.
Current Working Code
import numpy as np
import pandas as pd
csv1 = pd.read_csv('vod_price.csv', delimiter = ',')
df = pd.DataFrame(csv1)
for x in range(1,65):
df["3m high"].iloc[x]= df["PX_LAST"].iloc[:(x+1)].max()
for x in range(65,len(df.index)):
df["3m high"].iloc[x]= df["PX_LAST"].iloc[(x-64):(x+1)].max()
df
Attempt at Solution
I have tried the following, but it just gives me the max of the whole column.
maxrange = df['PX_LAST'].between(df['PX_LAST'].shift(64),df['PX_LAST'])
df['3m high'] = df['PX_LAST'].loc[maxrange].max()
Does anyone know how I might be able to do it?
Cheers
Use Series.rolling:
df["3m high"] = df["PX_LAST"].rolling(64).max()

beginner panda change row data based upon code

I'm a beginner in panda and python, trying to learn it.
I would like to iterate over panda rows, to apply simple coded logic.
Instead of fancy mapping functions, I just want simple coded logic.
So then I can easily adapt it later for other coded logic rules as well.
In my dataframe dc,
I like to check if column AgeUnkown == 1 (or >0 )
And if so it should move the value of column Age to AgeUnknown.
And then make Age equal to 0.0
I tried various combinations of my below code but it won't work.
# using a row reference #########
for index, row in dc.iterrows():
r = row['AgeUnknown']
if (r>0):
w = dc.at[index,'Age']
dc.at[index,'AgeUnknown']=w
dc.at[index,'Age']=0
Another attempt
for index in dc.index:
r = dc.at[index,'AgeUnknown'].[0] # also tried .sum here
if (r>0):
w= dc.at[index,'Age']
dc.at[index,'AgeUnknown']=w
dc.at[index,'Age']=0
Also tried
if(dc[index,'Age']>0 #wasnt allowed either
Why isn't this working as far as I understood a dataframe should be able to be addressed like above.
I realize you requested a solution involving iterating the df, but I thought I'd provide one that I think is more traditional.
A non-iterating solution to your problem is something like this- 1) get all the indexes that meet your criteria 2) set those indexes of the df to what you want.
# indexes where column AgeUnknown is >0
inds = dc[dc['AgeUnknown'] > 0].index.tolist()
# change the indexes of AgeUnknown to to the Age column
dc.loc[inds, 'AgeUnknown'] = dc.loc[inds, 'Age']
# change the Age to 0 at those indexes
dc.loc[inds, 'Age'] = 0

How Can I implement functions like mean.median and variance if I have a dictionary with 2 keys in Python?

I have many files in a folder that like this one:
enter image description here
and I'm trying to implement a dictionary for data. I'm interested in create it with 2 keys (the first one is the http address and the second is the third field (plugin used), like adblock). The values are referred to different metrics so my intention is to compute the for each site and plugin the mean,median and variance of each metric, once the dictionary has been implemented. For example for the mean, my intention is to consider all the 4-th field values in the file, etc. I tried to write this code but, first of all, I'm not sure that it is correct.
enter image description here
I read others posts but no-one solved my problem, since they threats or only one key or they don't show how to access the different values inside the dictionary to compute the mean,median and variance.
The problem is simple, admitting that the dictionary implementation is ok, in which way must I access the different values for the key1:www.google.it -> key2:adblock ?
Any kind oh help is accepted and I'm available for any other answer.
You can do what you want using a dictionary, but you should really consider using the Pandas library. This library is centered around tabular data structure called "DataFrame" that excels in column-wise and row-wise calculations such as the one that you seem to need.
To get you started, here is the Pandas code that reads one text file using the read_fwf() method. It also displays the mean and variance for the fourth column:
# import the Pandas library:
import pandas as pd
# Read the file 'table.txt' into a DataFrame object. Assume
# a header-less, fixed-width file like in your example:
df = pd.read_fwf("table.txt", header=None)
# Show the content of the DataFrame object:
print(df)
# Print the fourth column (zero-indexed):
print(df[3])
# Print the mean for the fourth column:
print(df[3].mean())
# Print the variance for the fourth column:
print(df[3].var())
There are different ways of selecting columns and rows from a DataFrame object. The square brackets [ ] in the previous examples selected a column in the data frame by column number. If you want to calculate the mean of the fourth column only from those rows that contain adblock in the third column, you can do it like so:
# Print those rows from the data frame that have the value 'adblock'
# in the third column (zero-indexed):
print(df[df[2] == "adblock"])
# Print only the fourth column (zero-indexed) from that data frame:
print(df[df[2] == "adblock"][3])
# Print the mean of the fourth column from that data frame:
print(df[df[2] == "adblock"][3].mean())
EDIT:
You can also calculate the mean or variance for more than one column at the same time:
# Use a list of column numbers to calculate the mean for all of them
# at the same time:
l = [3, 4, 5]
print(df[l].mean())
END EDIT
If you want to read the data from several files and do the calculations for the concatenated data, you can use the concat() method. This method takes a list of DataFrame objects and concatenates them (by default, row-wise). Use the following line to create a DataFrame from all *.txt files in your directory:
df = pd.concat([pd.read_fwf(file, header=None) for file in glob.glob("*.txt")],
ignore_index=True)

Categories