How to plot a boxplot grouped by the column names in pandas?

How to plot a boxplot grouped by the column names in pandas? - python

my dataframe:
Q JJ R S R' S' P T JJ Q ... P T JJ Q \
0 -0.2 0.0 6.1 -1.0 0.0 0 0.6 2.1 0.0 0.0 ... 0.9 3.9 -0.3 0.0
1 -0.6 0.0 7.2 0.0 0.0 0 0.4 1.5 0.0 0.0 ... 0.4 2.6 -0.5 0.0
2 1.0 0.0 4.5 -2.8 0.0 0 0.3 2.5 0.8 -0.4 ... 0.4 3.4 0.9 0.0
3 0.9 0.0 7.8 -0.7 0.0 0 1.1 1.9 0.1 0.0 ... 0.6 3.0 0.1 0.0
4 0.0 0.0 5.2 -1.4 0.0 0 0.9 2.3 0.1 0.0 ... -0.2 2.9 -0.4 0.0
R S R' S' P T
0 9.0 -0.9 0.0 0 0.9 2.9
1 8.5 0.0 0.0 0 0.2 2.1
2 9.5 -2.4 0.0 0 0.3 3.4
3 12.2 -2.2 0.0 0 0.4 2.6
4 13.1 -3.6 0.0 0 -0.1 3.9
I'm trying to plot a boxplot grouped by the column names (there are 8 groups so I would expect 8 boxplots).
I used:
bp = df_net_wave_amplitude_for_std.plot.box(figsize=(20,8))
and
bp = df_net_wave_amplitude_for_std.boxplot(figsize=(20,8))
but I'm getting all of the columns in x-axis instead of getting them grouped by the name:

I figured it out:
first i should have satcked the data:
df_net_wave_amplitude_for_std = df_net_wave_amplitude_for_std.stack()
then I arranged the result: reset the index, got rid of a redundant column (called 'level_0') and rename the colums:
df_net_wave_amplitude_for_std = df_net_wave_amplitude_for_std.reset_index()
del df_net_wave_amplitude_for_std['level_0']
df_net_wave_amplitude_for_std.columns = ['Wave', 'data']
and finally I could use the boxplot function:
bp = df_net_wave_amplitude_for_std.boxplot('data', by='Wave', figsize=(20,8))

Related

Concat/join/merge multiple dataframes based on row index (number) of each individual dataframes

I want to read every nth row of a list of DataFrames and create a new DataFrames by appending all the Nth rows.
Let's say we have the following DataFrames:
>>> df1
A B C D
0 -0.8 -2.8 -0.3 -0.1
1 -0.1 -0.9 0.2 -0.7
2 0.7 -3.3 -1.1 -0.4
>>> df2
A B C D
0 1.4 -0.7 1.5 -1.3
1 1.6 1.4 1.4 0.2
2 -1.4 0.2 -1.7 0.7
>>> df3
A B C D
0 0.3 -0.5 -1.6 -0.8
1 0.2 -0.5 -1.1 1.6
2 -0.3 0.7 -1.0 1.0
I have used the following approach to get the desired df:
df = pd.DataFrame()
df_list = [df1, df2, df3]
for i in range(len(df1)):
for x in df_list:
df = df.append(x.loc[i], ignore_index = True)
Here's the result:
>>> df
A B C D
0 -0.8 -2.8 -0.3 -0.1
1 1.4 -0.7 1.5 -1.3
2 0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9 0.2 -0.7
4 1.6 1.4 1.4 0.2
5 0.2 -0.5 -1.1 1.6
6 0.7 -3.3 -1.1 -0.4
7 -1.4 0.2 -1.7 0.7
8 -0.3 0.7 -1.0 1.0
I was just wondering if there is a pandas way of rewriting this code which would do the same thing (maybe by using .iterrows, pd.concat, pd.join, or pd.merge)?
Cheers
Update
Simply appending one df after another is not what I am looking for here.
The code should do:
df.row1 = df1.row1
df.row2 = df2.row1
df.row3 = df3.row1
df.row4 = df1.row2
df.row5 = df2.row2
df.row6 = df3.row2
...

For a single output dataframe, you can concatenate and sort by index:
res = pd.concat([df1, df2, df3]).sort_index().reset_index(drop=True)
A B C D
0 -0.8 -2.8 -0.3 -0.1
1 1.4 -0.7 1.5 -1.3
2 0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9 0.2 -0.7
4 1.6 1.4 1.4 0.2
5 0.2 -0.5 -1.1 1.6
6 0.7 -3.3 -1.1 -0.4
7 -1.4 0.2 -1.7 0.7
8 -0.3 0.7 -1.0 1.0
For a dictionary of dataframes, You can concatenate and then group by index:
res = dict(tuple(pd.concat([df1, df2, df3]).groupby(level=0)))
With the dictionary defined as above, each value represents a row number. For example, res[0] will give the first row from each input dataframe.

There is pd.concat
df=pd.concat([df1,df2,df3]).reset_index(drop=True)
recommended by Jez
df=pd.concat([df1,df2,df3],ignore_index=True)

try :
>>> df1 = pd.DataFrame({'A':['-0.8', '-0.1', '0.7'],
... 'B':['-2.8', '-0.9', '-3.3'],
... 'C':['-0.3', '0.2', '-1.1'],
... 'D':['-0.1', '-0.7', '-0.4']})
>>>
>>> df2 = pd.DataFrame({'A':['1.4', '1.6', '-1.4'],
... 'B':['-0.7', '1.4', '0.2'],
... 'C':['1.5', '1.4', '-1.7'],
... 'D':['-1.3', '0.2', '0.7']})
>>>
>>> df3 = pd.DataFrame({'A':['0.3', '0.2', '-0.3'],
... 'B':['-0.5', '-0.5', '0.7'],
... 'C':['-1.6', '-1.1', '-1.0'],
... 'D':['-0.8', '1.6', '1.0']})
>>> df=pd.concat([df1,df2,df3],ignore_index=True)
>>> print(df)
A B C D
0 -0.8 -2.8 -0.3 -0.1
1 -0.1 -0.9 0.2 -0.7
2 0.7 -3.3 -1.1 -0.4
3 1.4 -0.7 1.5 -1.3
4 1.6 1.4 1.4 0.2
5 -1.4 0.2 -1.7 0.7
6 0.3 -0.5 -1.6 -0.8
7 0.2 -0.5 -1.1 1.6
8 -0.3 0.7 -1.0 1.0
OR
df=pd.concat([df1,df2,df3], axis=0, join='outer', ignore_index=True)
Note:
axis: whether we will concatenate along rows (0) or columns (1)
join: can be set to inner, outer, left, or right. by using outer its sort it's lexicographically
ignore_index: whether or not the original row labels from should be retained, by default False ,If True, do not use the index labels.

You can concatenate them keeping their original indexes as a column this way:
df_total = pd.concat([df1.reset_index(), df2.reset_index(),
df3.reset_index()])
>> df_total
index A B C D
0 0 -0.8 -2.8 -0.3 -0.1
1 1 -0.1 -0.9 0.2 -0.7
2 2 0.7 -3.3 -1.1 -0.4
0 0 1.4 -0.7 1.5 -1.3
1 1 1.6 1.4 1.4 0.2
2 2 -1.4 0.2 -1.7 0.7
0 0 0.3 -0.5 -1.6 -0.8
1 1 0.2 -0.5 -1.1 1.6
2 2 -0.3 0.7 -1.0 1.0
Then you can make a multiindex dataframe and order by index:
df_joined = df_total.reset_index(drop=True).reset_index()
>> df_joined
level_0 index A B C D
0 0 0 -0.8 -2.8 -0.3 -0.1
1 1 1 -0.1 -0.9 0.2 -0.7
2 2 2 0.7 -3.3 -1.1 -0.4
3 3 0 1.4 -0.7 1.5 -1.3
4 4 1 1.6 1.4 1.4 0.2
5 5 2 -1.4 0.2 -1.7 0.7
6 6 0 0.3 -0.5 -1.6 -0.8
7 7 1 0.2 -0.5 -1.1 1.6
8 8 2 -0.3 0.7 -1.0 1.0
>> df_joined = df_joined.set_index(['index', 'level_0']).sort_index()
>> df_joined
A B C D
index level_0
0 0 -0.8 -2.8 -0.3 -0.1
3 1.4 -0.7 1.5 -1.3
6 0.3 -0.5 -1.6 -0.8
1 1 -0.1 -0.9 0.2 -0.7
4 1.6 1.4 1.4 0.2
7 0.2 -0.5 -1.1 1.6
2 2 0.7 -3.3 -1.1 -0.4
5 -1.4 0.2 -1.7 0.7
8 -0.3 0.7 -1.0 1.0
You can put all this a dataframe just by doing:
>> pd.DataFrame(df_joined.values, columns = df_joined.columns)
A B C D
0 -0.8 -2.8 -0.3 -0.1
1 1.4 -0.7 1.5 -1.3
2 0.3 -0.5 -1.6 -0.8
3 -0.1 -0.9 0.2 -0.7
4 1.6 1.4 1.4 0.2
5 0.2 -0.5 -1.1 1.6
6 0.7 -3.3 -1.1 -0.4
7 -1.4 0.2 -1.7 0.7
8 -0.3 0.7 -1.0 1.0

Subtracting columns based on key column in pandas dataframe

I have two dataframes looking like
df1:
ID A B C D
0 'ID1' 0.5 2.1 3.5 6.6
1 'ID2' 1.2 5.5 4.3 2.2
2 'ID1' 0.7 1.2 5.6 6.0
3 'ID3' 1.1 7.2 10. 3.2
df2:
ID A B C D
0 'ID1' 1.0 2.0 3.3 4.4
1 'ID2' 1.5 5.0 4.0 2.2
2 'ID3' 0.6 1.2 5.9 6.2
3 'ID4' 1.1 7.2 8.5 3.0
df1 can have multiple entries with the same ID whereas each ID occurs only once in df2. Also not all ID in df2 are necessarily present in df1. I can't solve this by using set_index() as multiple rows in df1 can have the same ID, and that the ID in df1 and df2 are not aligned.
I want to create a new dataframe where I subtract the values in df2[['A','B','C','D']] from df1[['A','B','C','D']] based on matching the ID.
The resulting dataframe would look like:
df_new:
ID A B C D
0 'ID1' -0.5 0.1 0.2 2.2
1 'ID2' -0.3 0.5 0.3 0.0
2 'ID1' -0.3 -0.8 2.3 1.6
3 'ID3' 0.5 6.0 1.5 0.2
I know how to do this with a loop, but since I'm dealing with huge data quantities this is not practical at all. What is the best way of approaching this with Pandas?

You just need set_index and subtract
(df1.set_index('ID')-df2.set_index('ID')).dropna(axis=0)
Out[174]:
A B C D
ID
'ID1' -0.5 0.1 0.2 2.2
'ID1' -0.3 -0.8 2.3 1.6
'ID2' -0.3 0.5 0.3 0.0
'ID3' 0.5 6.0 4.1 -3.0
If the order matters add reindex for df2
(df1.set_index('ID')-df2.set_index('ID').reindex(df1.ID)).dropna(axis=0).reset_index()
Out[211]:
ID A B C D
0 'ID1' -0.5 0.1 0.2 2.2
1 'ID2' -0.3 0.5 0.3 0.0
2 'ID1' -0.3 -0.8 2.3 1.6
3 'ID3' 0.5 6.0 4.1 -3.0

Similarly to what Wen (who beat me to it) proposed, you can use pd.DataFrame.subtract:
df1.set_index('ID').subtract(df2.set_index('ID')).reset_index()
A B C D
ID
'ID1' -0.5 0.1 0.2 2.2
'ID1' -0.3 -0.8 2.3 1.6
'ID2' -0.3 0.5 0.3 0.0
'ID3' 0.5 6.0 4.1 -3.0

One method is to use numpy. We can extract the ordered indices required from df2 using numpy.searchsorted.
Then feed this into the construction of a new dataframe.
idx = np.searchsorted(df2['ID'], df1['ID'])
res = pd.DataFrame(df1.iloc[:, 1:].values - df2.iloc[:, 1:].values[idx],
index=df1['ID']).reset_index()
print(res)
ID 0 1 2 3
0 'ID1' -0.5 0.1 0.2 2.2
1 'ID2' -0.3 0.5 0.3 0.0
2 'ID1' -0.3 -0.8 2.3 1.6
3 'ID3' 0.5 6.0 4.1 -3.0

Taking each element in a column to calculate and create a new column using python

I have a dataset that looks like the following;
ID val
1 3.1
2 2.7
3 6.3
4 1.3
And want to calculate the similarity of val between each row and each other in order to obtain a matrix like the following
ID val c_1 c_2 c_3 c_4
1 3.1 0.0 0.4 -3.2 0.8
2 2.7 -0.4 0.0 -3.6 1.4
3 6.3 3.2 3.6 0.0 5.0
4 1.3 -0.8 -1.4 -5.0 0.0
I have got the following code:
def similarities(data):
j=0
k=0
for i in data:
data[j,k+2] = data[j+1] - data[j]
j=j+1
k=k+1
return None
This evidently doesnt work at the moment but is this even the right approach of trying to iterate through the data set and using indexes?

I think you need np.subtract.outer, create new Dataframe and join to original:
df1=pd.DataFrame(np.subtract.outer(df['val'], df['val']), columns=df['ID']).add_prefix('c_')
df = df.join(df1)
print (df)
ID val c_1 c_2 c_3 c_4
0 1 3.1 0.0 0.4 -3.2 1.8
1 2 2.7 -0.4 0.0 -3.6 1.4
2 3 6.3 3.2 3.6 0.0 5.0
3 4 1.3 -1.8 -1.4 -5.0 0.0
Another solution with broadcasting:
val = df.val.values
ids = df.ID.values
df1 = pd.DataFrame(val[:, None] - val, columns = ids).add_prefix('c_')
df = df.join(df1)
print (df)
ID val c_1 c_2 c_3 c_4
0 1 3.1 0.0 0.4 -3.2 1.8
1 2 2.7 -0.4 0.0 -3.6 1.4
2 3 6.3 3.2 3.6 0.0 5.0
3 4 1.3 -1.8 -1.4 -5.0 0.0

You can try this:
s = """
ID val
1 3.1
2 2.7
3 6.3
4 1.3
"""
data = [i.split() for i in filter(None, s.split('\n'))]
rows = map(float, zip(*data[1:])[-1])
final_data = [[i+b]+[round(b-c, 2) for c in rows] for i, b in enumerate(rows, start=1)]
print('ID val {}'.format(' '.join('c_{}'.format(i) for i in range(1, len(rows)+1))))
for row in final_data:
print(' '.join(map(str, row)))
Output:
ID val c_1 c_2 c_3 c_4
4.1 0.0 0.4 -3.2 1.8
4.7 -0.4 0.0 -3.6 1.4
9.3 3.2 3.6 0.0 5.0
5.3 -1.8 -1.4 -5.0 0.0

Pandas join/merge/concat two DataFrames and combine rows of identical key/index [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
I am attempting to combine two sets of data, but I can't figure out which method is most suitable (join, merge, concat, etc.) for this application, and the documentation doesn't have any examples that do what I need to do.
I have two sets of data, structured like so:
>>> A
Time Voltage
1.0 5.1
2.0 5.5
3.0 5.3
4.0 5.4
5.0 5.0
>>> B
Time Current
-1.0 0.5
0.0 0.6
1.0 0.3
2.0 0.4
3.0 0.7
I would like to combine the data columns and merge the 'Time' column together so that I get the following:
>>> AB
Time Voltage Current
-1.0 0.5
0.0 0.6
1.0 5.1 0.3
2.0 5.5 0.4
3.0 5.3 0.7
4.0 5.4
5.0 5.0
I've tried AB = merge_ordered(A, B, on='Time', how='outer'), and while it successfully combined the data, it output something akin to:
>>> AB
Time Voltage Current
-1.0 0.5
0.0 0.6
1.0 5.1
1.0 0.3
2.0 5.5
2.0 0.4
3.0 5.3
3.0 0.7
4.0 5.4
5.0 5.0
You'll note that it did not combine rows with shared 'Time' values.
I have also tried merging a la AB = A.merge(B, on='Time', how='outer'), but that outputs something combined, but not sorted, like so:
>>> AB
Time Voltage Current
-1.0 0.5
0.0 0.6
1.0 5.1
2.0 5.5
3.0 5.3 0.7
4.0 5.4
5.0 5.0
1.0 0.3
2.0 0.4
...it essentially skips some of the data in 'Current' and appends it to the bottom, but it does so inconsistently. And again, it does not merge the rows together.
I have also tried AB = pandas.concat(A, B, axis=1), but the result does not get merged. I simply get, well, the concatenation of the two DataFrames, like so:
>>> AB
Time Voltage Time Current
1.0 5.1 -1.0 0.5
2.0 5.5 0.0 0.6
3.0 5.3 1.0 0.3
4.0 5.4 2.0 0.4
5.0 5.0 3.0 0.7
I've been scouring the documentation and here to try to figure out the exact differences between merge and join, but from what I gather they're pretty similar. Still, I haven't found anything that specifically answers the question of "how to merge rows that share an identical key/index". Can anyone enlighten me on how to do this? I only have a few days-worth of experience with Pandas!

merge
merge combines on columns. By default it takes all commonly named columns. Otherwise, you can specify which columns to combine on. In this example, I chose, Time.
A.merge(B, 'outer', 'Time')
Time Voltage Current
0 1.0 5.1 0.3
1 2.0 5.5 0.4
2 3.0 5.3 0.7
3 4.0 5.4 NaN
4 5.0 5.0 NaN
5 -1.0 NaN 0.5
6 0.0 NaN 0.6
join
join combines on index values unless you specify the left hand side's column instead. That is why I set the index for the right hand side and Specify a column for the left hand side Time.
A.join(B.set_index('Time'), 'Time', 'outer')
Time Voltage Current
0 1.0 5.1 0.3
1 2.0 5.5 0.4
2 3.0 5.3 0.7
3 4.0 5.4 NaN
4 5.0 5.0 NaN
4 -1.0 NaN 0.5
4 0.0 NaN 0.6 
pd.concat
concat combines on index values... so I create a list comprehension in which I iterate over each dataframe I want to combine [A, B]. In the comprehension, each dataframe assumes the name d, hence the for d in [A, B]. axis=1 says to combine them side by side thus using the index as the joining feature.
pd.concat([d.set_index('Time') for d in [A, B]], axis=1).reset_index()
Time Voltage Current
0 -1.0 NaN 0.5
1 0.0 NaN 0.6
2 1.0 5.1 0.3
3 2.0 5.5 0.4
4 3.0 5.3 0.7
5 4.0 5.4 NaN
6 5.0 5.0 NaN
combine_first
A.set_index('Time').combine_first(B.set_index('Time')).reset_index()
Time Current Voltage
0 -1.0 0.5 NaN
1 0.0 0.6 NaN
2 1.0 0.3 5.1
3 2.0 0.4 5.5
4 3.0 0.7 5.3
5 4.0 NaN 5.4
6 5.0 NaN 5.0

It should work properly if the Time column is of the same dtype in both DFs:
In [192]: A.merge(B, how='outer').sort_values('Time')
Out[192]:
Time Voltage Current
5 -1.0 NaN 0.5
6 0.0 NaN 0.6
0 1.0 5.1 0.3
1 2.0 5.5 0.4
2 3.0 5.3 0.7
3 4.0 5.4 NaN
4 5.0 5.0 NaN
In [193]: A.dtypes
Out[193]:
Time float64
Voltage float64
dtype: object
In [194]: B.dtypes
Out[194]:
Time float64
Current float64
dtype: object
Reproducing your problem:
In [198]: A.merge(B.assign(Time=B.Time.astype(str)), how='outer').sort_values('Time')
Out[198]:
Time Voltage Current
5 -1.0 NaN 0.5
6 0.0 NaN 0.6
0 1.0 5.1 NaN
7 1.0 NaN 0.3
1 2.0 5.5 NaN
8 2.0 NaN 0.4
2 3.0 5.3 NaN
9 3.0 NaN 0.7
3 4.0 5.4 NaN
4 5.0 5.0 NaN
In [199]: B.assign(Time=B.Time.astype(str)).dtypes
Out[199]:
Time object # <------ NOTE
Current float64
dtype: object
Visually it's hard to distinguish:
In [200]: B.assign(Time=B.Time.astype(str))
Out[200]:
Time Current
0 -1.0 0.5
1 0.0 0.6
2 1.0 0.3
3 2.0 0.4
4 3.0 0.7
In [201]: B
Out[201]:
Time Current
0 -1.0 0.5
1 0.0 0.6
2 1.0 0.3
3 2.0 0.4
4 3.0 0.7

Solution found
As per the suggestions below, I had to round the numbers in the 'Time' column prior to merging them, despite the fact that they were both of the same dtype (float64). The suggestion was to round like so:
A = A.assign(A.Time = A.Time.round(4))
But in my actual situation, the column was labeled 'Time, (sec)' (there was punctuation that screwed with the assignment. So instead I used the following line to round it:
A['Time, (sec)'] = A['Time, (sec)'].round(4)
And it worked like a charm. Are there any issues with doing it like that?

How to build a lookup table for tri-linear interpolation in NumPy?

The following extract is of a 500 row table that I'm trying to build a numpy lookup function for. My problem is that the values are non-linear.
The user enters a density, volume, and content. so the function will be:
def capacity_lookup(density, volume, content:
For example a typical user entry would be capacity_lookup (47, 775, 41.3). The function should interpolate between the values of 45 and 50 and densities 700 and 800, and contents 40 and 45.
The table extract is:
Volume Density Content
<30 35 40 45 50>=
45.0 <=100 0.1 1.8 0.9 2.0 0.3
45.0 200 1.5 1.6 1.4 2.4 3.0
45.0 400 0.4 2.1 0.9 1.8 2.5
45.0 600 1.3 0.8 0.2 1.7 1.9
45.0 800 0.6 0.9 0.8 0.4 0.2
45.0 1000 0.3 0.8 0.5 0.3 1.0
45.0 1200 0.6 0.0 0.6 0.2 0.2
45.0 1400 0.6 0.4 0.3 0.7 0.1
45.0 >=1600 0.3 0.0 0.6 0.1 0.3
50.0 <=100 0.1 0.0 0.5 0.9 0.2
50.0 200 1.3 0.4 0.8 0.2 2.7
50.0 400 0.4 0.1 0.7 1.3 1.7
50.0 600 0.8 0.7 0.1 1.2 1.6
50.0 800 0.5 0.3 0.4 0.2 0.0
50.0 1000 0.2 0.4 0.4 0.2 0.3
50.0 1200 0.4 0.0 0.0 0.2 0.0
50.0 1400 0.0 0.3 0.1 0.5 0.1
50.0 >=1600 0.1 0.0 0.0 0.0 0.2
55.0 <=100 0.0 0.0 0.4 0.6 0.1
55.0 200 0.8 0.3 0.7 0.1 1.2
55.0 400 0.3 0.1 0.3 1.1 0.7
55.0 600 0.4 0.3 0.0 0.6 0.1
55.0 800 0.0 0.0 0.0 0.2 0.0
55.0 1000 0.2 0.1 0.2 0.1 0.3
55.0 1200 0.1 0.0 0.0 0.1 0.0
55.0 1400 0.0 0.2 0.0 0.2 0.1
55.0 >=1600 0.0 0.0 0.0 0.0 0.1
Question
How can I store the 500 row table so I can do interpolation on its non linear data and get the correct value based on user input?
Clarifications
If the user inputs the following vector (775, 47, 41.3), the program should return an interpolated value between the following four vectors: 45.0, 600, 0.2, 1.7, 45.0, 800, 0.8, 0.4, 50.0, 600, 0.1, 1.2, and 50.0, 800, 0.4, 0.2
Assume data will be pulled from a DB as a numpy array of your design

The first difficulty I found were the <= and >=, which I could handle duplicating the extremities for Density, and changing their values for very close dummy values 99 and 1601, which will not affect the interpolation.
Volume Density Content
<30 35 40 45 50>=
45.0 99 0.1 1.8 0.9 2.0 0.3
45.0 100 0.1 1.8 0.9 2.0 0.3
45.0 200 1.5 1.6 1.4 2.4 3.0
45.0 400 0.4 2.1 0.9 1.8 2.5
45.0 600 1.3 0.8 0.2 1.7 1.9
45.0 800 0.6 0.9 0.8 0.4 0.2
45.0 1000 0.3 0.8 0.5 0.3 1.0
45.0 1200 0.6 0.0 0.6 0.2 0.2
45.0 1400 0.6 0.4 0.3 0.7 0.1
45.0 1600 0.3 0.0 0.6 0.1 0.3
45.0 1601 0.3 0.0 0.6 0.1 0.3
50.0 99 0.1 0.0 0.5 0.9 0.2
50.0 100 0.1 0.0 0.5 0.9 0.2
50.0 200 1.3 0.4 0.8 0.2 2.7
50.0 400 0.4 0.1 0.7 1.3 1.7
50.0 600 0.8 0.7 0.1 1.2 1.6
50.0 800 0.5 0.3 0.4 0.2 0.0
50.0 1000 0.2 0.4 0.4 0.2 0.3
50.0 1200 0.4 0.0 0.0 0.2 0.0
50.0 1400 0.0 0.3 0.1 0.5 0.1
50.0 1600 0.1 0.0 0.0 0.0 0.2
50.0 1601 0.1 0.0 0.0 0.0 0.2
55.0 99 0.0 0.0 0.4 0.6 0.1
55.0 100 0.0 0.0 0.4 0.6 0.1
55.0 200 0.8 0.3 0.7 0.1 1.2
55.0 400 0.3 0.1 0.3 1.1 0.7
55.0 600 0.4 0.3 0.0 0.6 0.1
55.0 800 0.0 0.0 0.0 0.2 0.0
55.0 1000 0.2 0.1 0.2 0.1 0.3
55.0 1200 0.1 0.0 0.0 0.1 0.0
55.0 1400 0.0 0.2 0.0 0.2 0.1
55.0 1600 0.0 0.0 0.0 0.0 0.1
55.0 1601 0.0 0.0 0.0 0.0 0.1
Then, as #Jaime already pointed out, you have to find 8 vertices in order to do the tri-linear interpolation.
The following algorithm will give you the points:
import numpy as np
def get_8_points(filename, vi, di, ci):
a = np.loadtxt(filename, skiprows=2)
vol = a[:,0].repeat(a.shape[1]-2).reshape(-1,)
den = a[:,1].repeat(a.shape[1]-2).reshape(-1,)
#FIXME maybe you have to change the next line
con = np.tile(np.array([30., 35., 40., 45., 50.]),a.shape[0]).reshape(-1,)
#
val = a[:,2:].reshape(a.shape[0]*5).reshape(-1,)
u = np.unique(vol)
diff = np.absolute(u-vi)
vols = u[diff.argsort()][:2]
u = np.unique(den)
diff = np.absolute(u-di)
dens = u[diff.argsort()][:2]
u = np.unique(con)
diff = np.absolute(u-ci)
cons = u[diff.argsort()][:2]
check = np.in1d(vol,vols) & np.in1d(den,dens) & np.in1d(con,cons)
points = np.vstack((vol[check], den[check], con[check], val[check]))
return points.T
Using your example:
vi, di, ci = 47, 775, 41.3
points = get_8_points(filename, vi, di, ci)
#array([[ 4.50e+01, 6.00e+02, 4.00e+01, 2.00e-01],
# [ 4.50e+01, 6.00e+02, 4.50e+01, 1.70e+00],
# [ 4.50e+01, 8.00e+02, 4.00e+01, 8.00e-01],
# [ 4.50e+01, 8.00e+02, 4.50e+01, 4.00e-01],
# [ 5.00e+01, 6.00e+02, 4.00e+01, 1.00e-01],
# [ 5.00e+01, 6.00e+02, 4.50e+01, 1.20e+00],
# [ 5.00e+01, 8.00e+02, 4.00e+01, 4.00e-01],
# [ 5.00e+01, 8.00e+02, 4.50e+01, 2.00e-01]])
Now you can perform the tri-linear interpolation...

To complement Saullo's answer, here's how to do trilinear interpolation. You basically interpolate the cube into a square, then the square into a segment, and the segment into a point. Order of the interpolations does not alter the final result. Saullo's numbering scheme is already the right one: the base vertex is number 0, increasing the last dimension adds 1 to the vertex number, the second-to-last adds 2, and the first dimension adds 4. So from his vertex returning function, you could do the following:
coords = np.array([47, 775, 41.3])
ndim = len(coords)
# You would get this with a call to:
# vertices = get_8_points(filename, *coords)
vertices = np.array([[ 4.50e+01, 6.00e+02, 4.00e+01, 2.00e-01],
[ 4.50e+01, 6.00e+02, 4.50e+01, 1.70e+00],
[ 4.50e+01, 8.00e+02, 4.00e+01, 8.00e-01],
[ 4.50e+01, 8.00e+02, 4.50e+01, 4.00e-01],
[ 5.00e+01, 6.00e+02, 4.00e+01, 1.00e-01],
[ 5.00e+01, 6.00e+02, 4.50e+01, 1.20e+00],
[ 5.00e+01, 8.00e+02, 4.00e+01, 4.00e-01],
[ 5.00e+01, 8.00e+02, 4.50e+01, 2.00e-01]])
for dim in xrange(ndim):
vtx_delta = 2**(ndim - dim - 1)
for vtx in xrange(vtx_delta):
vertices[vtx, -1] += ((vertices[vtx + vtx_delta, -1] -
vertices[vtx, -1]) *
(coords[dim] -
vertices[vtx, dim]) /
(vertices[vtx + vtx_delta, dim] -
vertices[vtx, dim]))
print vertices[0, -1] # prints 0.55075
The function reuses the vertices array for the intermediate interpolations leading to the final value, stored in vertices[0, -1], you would have to do a copy of the vertices array if you will need it afterwards.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to plot a boxplot grouped by the column names in pandas? - python

Related

Concat/join/merge multiple dataframes based on row index (number) of each individual dataframes

Subtracting columns based on key column in pandas dataframe

Taking each element in a column to calculate and create a new column using python

Pandas join/merge/concat two DataFrames and combine rows of identical key/index [duplicate]

How to build a lookup table for tri-linear interpolation in NumPy?

Categories

Resources