How to apply calculation to column of text file? - python

I'm trying to apply a calculation to every value of every column in my csv file and replacing the old values with these new calculated values.
#temp_list is a list of lists. Eg. [['1.3','2.2','1.6'],['1.2','4.5','2.3']]
for row in temp_list:
minimum = min(row) #find minimum value of values in column 2
y = every value in the 2nd column - minimum
#for every value in the 2nd column, apply y calculation to it and replace original values with these values
row[1] = float(row[1])
I understand that if I did
row[1] = float(row[1]) * 3
for example, I would get each value in column 2 (index 1) to be multiplied by 3. How would I do that for my y calculation written above?

You can use zip to transpose the list of lists, convert the sequence to a list and then use [1] to get the values in the second row (originally second column), so that you can use the min function with float as a key function to get minimum of the values based on their values in floating point:
min(list(zip(*temp_list))[1], key=float)
This returns: 2.2

Based on your comment, I think this is what you wanted.
Since your lists are of strings, there's a bit of casting back and forth between Decimal and string
from decimal import Decimal
temp_list = [['1.3','2.2','1.6'],['1.2','4.5','2.3']]
for x in temp_list:
x[1] = str(Decimal(x[1]) - min(Decimal(y) for y in x))
print(temp_list)
# [['1.3', '0.9', '1.6'], ['1.2', '3.3', '2.3']]

Related

Pandas - if X float in column is greater than Y, find difference between X and Y and multiply by .25

I suspect the solution is quite simple, but I have been unable to figure it out. Essentially, what I want to do is to query a column with the float object type to see if each value >= 100.00. If it is greater, then I want to take the value x and do so: ((x - 100)*.25)+100 = new value (replace original values inplace, preferably.)
The data looks something like:
Some columns here
A percentage stored as float
foobar
84.85
foobar
15.95
fuubahr
102.25
The result of the above operation mentioned would give the following for the above:
Some columns here
A percentage stored as float
foobar
84.85
foobar
15.95
fuubahr
100.5625
Thanks!
List comprehension is easy solution for this:
dataframe["A percentage stored as float"] = [((x - 100)*.25) + 100 if x >= 100 else x for x in dataframe["A percentage stored as float"]]
What it does: It loops through the each column row, checks if value meets our if stement and then does the applies the calculation, if statement is not met, then it returns the original row value.

Compare and store elements of multidimensional array to two new arrays

Assume I have the following simple array:
my_array = np.array([[1,2],[2,4],[3,6],[2,1]])
which corresponds to another parent array:
parent_array = np.array([0,1,2,3])
Of course, there is a function that maps parent_array to np.array but it is not important what function this is.
Goal:
I want to use this my_array so as to create two new arrays A and B by iterating each row of my_array: for row i if the value of the first column of my_array[i] is greater than the value of the second column I will store parent_array[i] in A . Otherwise I will store parent_array[i] in B (if the value of the second column in my_array[i] if bigger).
So for the case above the result would be:
A = [3]
because only in the 4-th value of my_array the first column has greater value and
B = [0,1,2]
because the in the first three rows the second column has greater value.
Now, although I know how to save the greater element in a row of columns to a new array, the fact that each row in my_array is associated with a row in parent_array is confusing me. I don't know how to correlate them.
Summary:
I need therefore to associate each row of parent_array to each row of my_array and then if check row by row the latter and if the value of the first column is greater in my_array[i] I save parent_row[i] in A while if the second column is greater in my_array[i] I save parent_row[i] in B.
Use boolean array indexing for this: create boolean condition array by comparing values from 1st and 2nd column of my_array and then use it to select values from parent_array:
cond = my_array[:,0] > my_array[:,1]
A, B = parent_array[cond], parent_array[~cond]
A
# [3]
B
# [0 1 2]

Modifying Dataframes Stored in a List of Dataframes

I am segmenting data into a set of Pandas dataframes that have identical structure. For each dataframe, there are a total of cnames columns that have unique names, and a total of nrows rows, that are identified by an integer-valued index running from 0 to nrows-1. There are a total of nframes segments, each containing 3 dataframes.
The goal is, within each segment, calculate a quotient of two of the dataframes and send the result to the third. I've implemented and tested a process that works, but have a question as to why a slight variation of the process doesn't.
The steps (and variation) are as follows:
Initialize data frames:
Ldf_num = [pd.DataFrame(0.0, index=range(0, nrows), columns=cnames) for x in range(0, nframes)]
Ldf_den = [pd.DataFrame(0.0, index=range(0, nrows), columns=cnames) for x in range(0, nframes)]
Ldf_quo = [pd.DataFrame(0.0, index=range(0, nrows), columns=cnames) for x in range(0, nframes)]
Populate data frames:
#For loop over a set of data-records stored as a list of lists:
#Determine x, the index of the data frame related to this record, from the data
df_num = Ldf_num[x]
df_den = Ldf_den[x]
#Derive values (including row) for each column of the data frame, and store them as...
df_num[cname][row] += derived_value1
df_den[cname][row] += derived_value2
Determine quotient for each set of dataframes:
for x in range(0, nframes):
df_num = Ldf_num[x]
df_den = Ldf_den[x]
Ldf_quo[x] = df_num.div(df_den)
The above version of step 3 worked, i.e. I can print each dataframe in the quotient dataframe, and see that they have different values that match the numerator and denominator values.
3b. However, the versison below did not work:
for x in range(0, nframes):
df_num = Ldf_num[x]
df_den = Ldf_den[x]
df_quo = Ldf_quo[x]
df_quo = df_num.div(df_den)
...as all entries in all dataframes in the list Ldf_quo contained their initial value of 0.
Can anyone explain why when I assign a variable to a single dataframe stored in a list of dataframes, and I change values of the assigned variable, it changes the values in the original dataframe in the list in step 2...
...but when I send the output of the "div" method to a variable assigned to a single dataframe in a list of dataframes as in step 3b, the values in the original dataframe do not change (but I can get the desired result by sending the output from the "div" method explicitly to the right slot in the list of dataframes, as in step 3)
In the answer 3b. you are assigning the value at Ldf_quo[x] to df_quo which basically be an integer. However, when you do Ldf_quo[x], you assigning df_num.div(df_den) at the xth index of the data frame.

Pandas: How to find index of value of one column whose cell value matches certain value

My dataframe consists of the following table:
Time X Y
0100 5 9
0200 7 10
0300 11 12
0400 3 13
0500 4 14
My goal is to find the index of the value of Y which corresponds to a certain number (e.g.: 9) and return the corresponding X value from the table.
My idea previously was for a for-loop (as I have a number of Ys) to loop through and find all the values which match and then to create an empty array to store the values of X as such:
for i in (list of Ys):
empty_storing_array.append(df[index_of_X].loc[df[Y] == i])
Problem is (if my newbie understanding of Pandas holds true), the values that loc gives is no number, but rather something else. How should I do it so that empty_storing_array then lists the numbers of X which corresponds to the values in array Y?
you can use df.loc and then ask for the index explicitly. This will return an array, so we slice the first item to get the integer:
df.loc[df['Y']==9, 'X'].index.values[0]
try with this :
list_Ys = [9,8,15] #example
new_df = df[df['Y'].isin(list_Ys)]['X']
the isin method tells whether each element in the DataFrame is contained in values.
if you want to convert your resulting dataframe to an array
new_df.values
If you need to have a way to retrieve which Y corresponds to a given X then keep both X and Y:
df.loc[df['Y'].isin(list_of_ys), ['Y', 'X']].values
Perhaps create a dictionary that puts all the Xs corresponding to a Y in a tuple and make the Y the keys.

Sortings pandas dataframe numbers first then strings

I have a dataframe with columns containing values like P123Y8O9 mixture of numbers and characters and if I apply sort function to this particular series in dataframe, it sorts the strings basis first digit then second and so on, what I want is to sort the strings basis first all numbers like 32456789 and then mixed strings like 2AJ6JH67
you can see that in above example numerically 2 (first digit of 2AJ6JH67) comes before 3 (first digit of 32456789) but the sorting is to be done 32456789 first and then 2AJ6JH67
How to sort dataframes this way?
One way is to sort numeric and non-numeric data separately.
Below are equivalent examples for a list or pd.Series.
lst = ['P123Y8O9', '32456789']
lst_sorted = list(map(str, sorted(int(x) for x in lst if x.isdigit()))) + \
sorted(x for x in lst if not x.isdigit())
# ['32456789', 'P123Y8O9']
s = pd.Series(lst)
s_sorted = pd.Series(list(map(str, sorted(int(x) for x in s if x.isdigit()))) + \
sorted(x for x in s if not x.isdigit()))
# 0 32456789
# 1 P123Y8O9
# dtype: object

Categories