LAS files - Python

LAS files - Python - python

I'm pretty sure this is a very menial question about LAS files, but I wasn't entirely sure how to google this. For context, I'm trying to create a plot given the information in a LAS file.
import lasio as ls
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
well = ls.read(r'1051325649.las')
df = well.df()
fig = plt.subplots(figsize=(10,10))
#Set up the plot axes
ax1 = plt.subplot2grid((1,3), (0,0), rowspan=1, colspan = 1)
ax2 = plt.subplot2grid((1,3), (0,1), rowspan=1, colspan = 1)
ax3 = plt.subplot2grid((1,3), (0,2), rowspan=1, colspan = 1)
ax1.plot("GR", "DEPT", data = df, color = "green") # Call the data from the well dataframe
ax1.set_title("Gamma") # Assign a track title
ax1.set_xlim(0, 200) # Change the limits for the curve being plotted
ax1.set_ylim(400, 1000) # Set the depth range
ax1.grid() # Display the grid
The LAS file pretty much looks like this where I want to create a plot where the far left column "DEPT" should be the X-axis. However, the "DEPT" or depth column isn't able to be made into a format to allow for me to plot it. **Note: there is a GR column on the right not in this picture, so don't worry. Any tips would help greatly.

Short answer:
plt.plot expects that both "GR" and "DEPT" are columns in df, however the latter (DEPT) is not a column, it is the index. You can solve it by converting the index in df to a column:
df2 = df.reset_index()
ax1.plot("GR", "DEPT", data = df2, color = "green")

When reading .las files using lasio library and converting them to pandas dataframe, it automatically sets DEPT as the index for the dataframe.
There are two solutions for this problem:
Use the data as-is:
import matplotlib.pyplot as plt
import lasio
well = lasio.read('filename.las')
well_df = well.df()
plt.plot(well_df.GR, well_df.index)
And well_df.index will be the DEPT values.
Reset the index and use DEPT as a column
import matplotlib.pyplot as plt
import lasio
well = lasio.read('filename.las')
well_df = well.df()
well_df = well_df.reset_index()
plt.plot(well_df.GR, well_df.DEPT)

Related

Annotate Min/Max/Median in Matplotlib Violin Plot

Given this example code:
import pandas as pd
import matplotlib.pyplot as plt
data = 'https://raw.githubusercontent.com/marsja/jupyter/master/flanks.csv'
df = pd.read_csv(data, index_col=0)
# Subsetting using Pandas query():
congruent = df.query('TrialType == "congruent"')['RT']
incongruent = df.query('TrialType == "incongruent"')['RT']
# Combine data
plot_data = list([incongruent, congruent])
fig, ax = plt.subplots()
xticklabels = ['Incongruent', 'Congruent']
ax.set_xticks([1, 2])
ax.set_xticklabels(xticklabels)
ax.violinplot(plot_data, showmedians=True)
Which results in the following plot:
How can I annotate the min, max, and mean lines with their respective values?
I haven't been able to find examples online that allude to how to annotate violin plots in this way. If we set plot = ax.violinplot(plot_data, showmedians=True) then we can access attributes like plot['cmaxes'] but I cant quite figure out how to use that for annotations.
Here is an example of what I am trying to achieve:

So this was as easy as getting the medians/mins/maxes and then enumerating, adding the annotation with plt.text, and adding some small values for positioning:
medians = results_df.groupby(['model_cat'])['test_f1'].median()
for i, v in enumerate(medians):
plt.text((i+.85), (v+.001), str(round(v, 3)), fontsize = 12)

Changing the order of pandas/matplotlib line plotting without changing data order

Given the following example:
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
df.plot(linewidth=10)
The order of plotting puts the last column on top:
How can I make this keep the data & legend order but change the behaviour so that it plots X on top of Y on top of Z?
(I know I can change the data column order and edit the legend order but I am hoping for a simpler easier method leaving the data as is)
UPDATE: final solution used:
(Thanks to r-beginners) I used the get_lines to modify the z-order of each plot
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot(ax=ax, linewidth=10)
lines = ax.get_lines()
for i, line in enumerate(lines, -len(lines)):
line.set_zorder(abs(i))
fig
In a notebook produces:

Get the default zorder and sort it in the desired order.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(2021)
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
ax = df.plot(linewidth=10)
l = ax.get_children()
print(l)
l[0].set_zorder(3)
l[1].set_zorder(1)
l[2].set_zorder(2)
Before definition
After defining zorder

I will just put this answer here because it is a solution to the problem, but probably not the one you are looking for.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# generate data
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
# read columns in reverse order and plot them
# so normally, the legend will be inverted as well, but if we invert it again, you should get what you want
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
Note that in this example, you don't change the order of your data, you just read it differently, so I don't really know if that's what you want.
You can also make it easier on the eyes by creating a corresponding method.
def plot_dataframe(df: pd.DataFrame) -> None:
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
# then you just have to call this
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
plot_dataframe(df)

Clustermapping in Python using Seaborn

I am trying to create a heatmap with dendrograms on Python using Seaborn and I have a csv file with about 900 rows. I'm importing the file as a pandas dataframe and attempting to plot that but a large number of the rows are not being represented in the heatmap. What am I doing wrong?
This is the code I have right now. But the heatmap only represents about 49 rows.
Here is an image of the clustermap I've obtained but it is not displaying all of my data.
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
# Data set
df = pd.read_csv('diff_exp_gene.csv', index_col = 0)
# Default plot
sns.clustermap(df, cmap = 'RdBu', row_cluster=True, col_cluster=True)
plt.show()
Thank you.

An alternative approach would be to use imshow in matpltlib. I'm not exactly sure what your question is but I demonstrate a way to graph points on a plane from csv file
import numpy as np
import matplotlib.pyplot as plt
import csv
infile = open('diff_exp_gene.csv')
df = csv.DictReader(in_file)
temp = np.zeros((128,128), dtype = int)
for row in data:
if row['TYPE'] == types:
temp[int(row['Y'])][int(row['X'])] = temp[int(row['Y'])][int(row['X'])] + 1
plt.imshow(temp, cmap = 'hot', origin = 'lower')
plt.show()

As far as I know, keywords that apply to seaborn heatmaps also apply to clustermap, as the sns.clustermap passes to the sns.heatmap. In that case, all you need to do in your example is to set yticklabels=True as a keyword argument in sns.clustermap(). That will make all of the 900 rows appear.
By default, it is set as "auto" to avoid overlap. The same applies to the xticklabels. See more here: https://seaborn.pydata.org/generated/seaborn.heatmap.html

additional row colors in seaborn cluster map

I am currently generating clustermaps in seaborn and labeling the row colors as below.
matrix = pd.DataFrame(np.random.random_integers(0,1, size=(50,4)))
labels = np.random.random_integers(0,5, size=50)
lut = dict(zip(set(labels), sns.hls_palette(len(set(labels)), l=0.5, s=0.8)))
row_colors = pd.DataFrame(labels)[0].map(lut)
g=sns.clustermap(matrix, col_cluster=False, linewidths=0.1, cmap='coolwarm', row_colors=row_colors)
plt.show()
I have a second annotation column similar to the labels data I would also like to add to the plot. The seaborn API doesn't support adding a second row_colors column, which is fine, but I am struggling in finding a workaround using matplotlib to add this annotation column to the clustering.
If I cannot use seaborn to do this and have to generate all of this manually using matplotlib that would be fine, I just can't figure that out either.
Thanks for your help!

The solution is below. The seaborn API does actually allow this to be done.
matrix = pd.DataFrame(np.random.random_integers(0,1, size=(50,4)))
labels = np.random.random_integers(0,5, size=50)
lut = dict(zip(set(labels), sns.hls_palette(len(set(labels)), l=0.5, s=0.8)))
row_colors = pd.DataFrame(labels)[0].map(lut)
#Create additional row_colors here
labels2 = np.random.random_integers(0,1, size=50)
lut2 = dict(zip(set(labels2), sns.hls_palette(len(set(labels2)), l=0.5, s=0.8)))
row_colors2 = pd.DataFrame(labels2)[0].map(lut2)
g=sns.clustermap(matrix, col_cluster=False, linewidths=0.1, cmap='coolwarm', row_colors=[row_colors, row_colors2])
plt.show()
This produces a Clustermap with two additional columns:

I tried to concat the row_colors dataframe by pandas and it worked!
Please try this code:
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
import pandas as pd
iris = sns.load_dataset("iris")
print(iris)
species = iris.pop("species")
lut1 = dict(zip(species.unique(), ['#ED2323','#60FD00','#808080']))
row_colors1 = species.map(lut1)
lut2 = dict(zip(species.unique(), "rbg"))
row_colors2 = species.map(lut2)
row_colors = pd.concat([row_colors1,row_colors2],axis=1)
print(row_colors)
g = sns.clustermap(iris, row_colors=row_colors, col_cluster=False,cmap="mako", yticklabels=False, xticklabels=False)
plt.show()

There is another option for feeding in the annotation colors: you can provide a whole dataframe in the row colors or col_colors options, instead of a list of lists.
This strategy might be particularly helpful if you have a dataframe with several annotations you want represented. Instead of map, you can use the pandas function replace.
Something such as this bit can be used to modify the other answer:
## This step is necessary because you can't use replace with the tuple rgb values
lut = {k:matplotlib.colors.to_hex(v) for k, v in lut.iteritems()}
annotations_df = annotations_df.replace(lut)
g=sns.clustermap(matrix, col_cluster=False, linewidths=0.1, cmap='coolwarm', row_colors=annotations_df)
plt.show()

Date removed from x axis on overlaid plots matplotlib

I am trying to show time series lines representing an effort amount using matplotlib and pandas.
I've got my DF's to all to overlay in one plot, however when I do python seems to strip the x axis of the date and input some numbers. (I'm not sure where these come from but at a guess, not all days contain the same data so python has reverted to using an index id number). If I plot any one of these they come up with date on the x-axis.
Any hints or solutions to make the x axis show date for the multiple plot would be much appreciated.
This is the single figure plot with time axis:
Code I'm using to plot is
fig = pl.figure()
ax = fig.add_subplot(111)
ax.plot(b342,color='black')
ax.plot(b343,color='blue')
ax.plot(b344,color='red')
ax.plot(b345,color='green')
ax.plot(b346,color='pink')
ax.plot(fi,color='yellow')
plt.show()
This is the multiple plot fig with weird x axis:

One option would be to manually specify the x-axis based on the DataFrame index, and then plot directly using matplotlib.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# make up some data
n = 100
dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
dfs = []
for i in range(3):
df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
columns = ["FishEffort"] )
df.df_name = str(i)
dfs.append(df)
# plot it directly using matplotlib instead of through the DataFrame
fig = plt.figure()
ax = fig.add_subplot()
for df in dfs:
plt.plot(df.index,df["FishEffort"], label = df.df_name)
plt.legend()
plt.show()
Another option would be to concatenate your DataFrames and plot using Pandas. If you give your "FishEffort" field the correct label name when loading the data or via DataFrame.rename then the labels will be specified automatically.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n = 100
dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
dfs = []
for i in range(3):
df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
columns = ["DataFrame #" + str(i) ] )
df.df_name = str(i)
dfs.append(df)
df = pd.concat(dfs, axis = 1)
df.plot()

I've found an answer that does what I want, it seems that calling plt.plot wasn't using the date as the x axis, however calling it using the pandas documentation did the trick.
ax = b342.plot(label='342')
b343.plot(ax=ax, label='test')
b344.plot(ax=ax)
b345.plot(ax=ax)
b346.plot(ax=ax)
fi.plot(ax=ax)
plt.show()
I was wondering if anyone knew hwo to change the labels here?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

LAS files - Python - python

Short answer: plt.plot expects that both "GR" and "DEPT" are columns in df, however the latter (DEPT) is not a column, it is the index. You can solve it by converting the index in df to a column: df2 = df.reset_index() ax1.plot("GR", "DEPT", data = df2, color = "green")

Related

Annotate Min/Max/Median in Matplotlib Violin Plot

Changing the order of pandas/matplotlib line plotting without changing data order

Clustermapping in Python using Seaborn

additional row colors in seaborn cluster map

Date removed from x axis on overlaid plots matplotlib

Categories

Resources