Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am new to matplotlib, and I want to create a plot, with the following information:
A line joining the medians of around 200 variable length vectors (input)
A line joining the corresponding quantiles of these vectors.
A line joining the corresponding spread (largest and smallest points).
So basically, its somewhat like a continuous box plot.
Thanks!
Using just scipy and matplotlib (you tagged only those libraries in your question) is a little bit verbose, but here's how you would do it (I'm doing it only for the quantiles):
import numpy as np
from scipy.stats import mstats
import matplotlib.pyplot as plt
# Create 10 columns with 100 rows of random data
rd = np.random.randn(100, 10)
# Calculate the quantiles column wise
quantiles = mstats.mquantiles(rd, axis=0)
# Plot it
labels = ['25%', '50%', '75%']
for i, q in enumerate(quantiles):
plt.plot(q, label=labels[i])
plt.legend()
Which gives you:
Now, I would try to convince you to try the Pandas library :)
import numpy as np
import pandas as pd
# Create random data
rd = pd.DataFrame(np.random.randn(100, 10))
# Calculate all the desired values
df = pd.DataFrame({'mean': rd.mean(), 'median': rd.median(),
'25%': rd.quantile(0.25), '50%': rd.quantile(0.5),
'75%': rd.quantile(0.75)})
# And plot it
df.plot()
You'll get:
Or you can get all the stats in just one line:
rd.describe().T.drop('count', axis=1).plot()
Note: I dropped the count since it's not a part of the "5 number summary".
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
This post was edited and submitted for review last year and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I want to split images like this in a way that every symbols gets splits up vertically kind of like this input image:
![input image][1]
to this:
![here][2]
The problem is each symbol might have different width so I can't really fix the splitting points like we do in array splitting. If all objects had same width then I could segment the image base on width. In this scenario, what logic I should use to extract these connected objects?
First load the img from the url
import numpy as np
import urllib.request
from PIL import Image
from matplotlib import pyplot as plt
urllib.request.urlretrieve(
'https://i.stack.imgur.com/GRHzg.png',
"img.png")
img = Image.open("img.png")
img.show()
Then consider the black part as "filled" and convert in numpy array
arr = (np.array(img)[:,:,:-1].sum(axis=-1)==0)
If we sum the rows values for each column we can have a simple sum of how much pixel are filled in each column:
plt.subplot(211)
plt.imshow(arr, aspect="auto")
plt.subplot(212)
plt.plot(arr.sum(axis=0))
plt.xlim(0,arr.shape[1])
finally if we compute the differential of this sum over the columns we can obtain the following result:
plt.subplot(211)
plt.imshow(arr, aspect="auto")
plt.subplot(212)
plt.plot(np.diff(arr.sum(axis=0)))
plt.xlim(0,arr.shape[1])
At this point you can simply chose a threshold and cut the image:
threshold = 25
cut = np.abs(np.diff(arr.sum(axis=0)))>threshold
x_lines = np.arange(len(cut))[cut]
plt.imshow(arr, aspect="auto")
plt.vlines(x_lines, 0, arr.shape[0], color="r")
This is my solution and it works fine, but it is sensitive to the chosen threshold and to the columns gradient. I hope it is useful.
I have 10 years output from the WRF climate model. I am looking for an efficient code which for every grid point in the xarray selects only those number of days where T>0 for more than 2 days. For my plots, I want for each month at each grid point the total number of days where T>2 for more than 2 days.
I am new to xarrays and looking at similar questions, I still couldn't find a proper loop or count function to apply for each grid point and month wise! Would really appreciate any help with this code.
Here is my current code:
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import netCDF4
from netCDF4 import Dataset
import numpy as np
#concatenate the 10year output
dataset=xr.open_mfdataset("\Python files for plotting wrfoutput\era5_1990-2000_output\*.nc",concat_dim='Time', combine='nested', compat='no_conflicts', preprocess=None, engine=None, data_vars='all', coords='all', parallel=False, join='outer', attrs_file=None,)
#dimensions are: Time, south_north, west_east
DS=dataset
DS = DS.assign_coords(Time=pd.to_datetime(DS['Time'].values))
#Select/extract only the mean 2m surface temperature (T2) from the large xarray
DST2=DS.T2
#apply the where function to check at which grid points in each month the T2>0
T2threshold=DST2.groupby('Time.month').where(DST2>0)
In general it is difficult to support you without a code that generates the issue you are running in.
Stackoverflow is not there to help you learn programming. It is there to help find solutions for edge cases and issues.
Never mind here are some thoughts for you. xarray is working similar as pandas. So if you can find a solution for pandas, try it with xarray.
ds['threshold_mask'] = ds.T2.where(dataset.T2>0)
Building a mask and then using groupby and cumsum:
ds.groupby((ds['threshold_mask'] == 0).cumsum().threshold_mask).cumsum()
No grants that this works, but I guess it will help you finding the right solution.
Seen here: Pandas : dataframe cumsum , reset if other column is false
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have an image of shape (31278,25794,3). I would like to know how is possible to obtain MxN segment of the picture, using np functions. For example starting from:
I would like to obtain:
In numpy you can split a picture like you slice an array.
Here's an example with your image:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
img = np.array(Image.open("cat.jpg"))
plt.imshow(img)
xs = img.shape[0]//2 # division lines for the picture
ys = img.shape[1]//2
# now slice up the image (in a shape that works well with subplots)
splits = [[img[0:xs, 0:ys], img[0:xs, ys:]], [img[xs:, 0:ys], img[xs:, ys:]]]
fig, axs = plt.subplots(2, 2)
for i in range(2):
for j in range(2):
axs[i][j].imshow(splits[i][j])
Keep in mind that the splits here are views into the original array, not arrays with new data, so changes you make to the views will change the original data. If you don't want this, you can do something to copy the data after slice up the array.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Power BI has a Python visualization element. It creates dataframe from fields of Power BI data source, and then visualize it with matplotlib.pyplot.show() method.
I need to visualize dataframe in table form (with ability to color cells depending on different data conditions)
Problem is that any example of table visualizaions of dataframes doesn't work inside Power BI Py element (and doesn't says what the problem is) even when it works in Anaconda.
Can somebody show a working example of dataframe table visualisation for Power BI?
I created data in a dataframe to keep the example simple. This could also be the output of manipulation.
import pandas as pd
dataset = pd.DataFrame({'a': range(0,20,2), 'b': range(10,30,2)})
print(dataset)
a b
0 0 10
1 2 12
2 4 14
3 6 16
4 8 18
5 10 20
6 12 22
7 14 24
8 16 26
9 18 28
In a new Power BI file,
1. Get Data/More/Other/Python Script
Paste in:
dataset = pandas.DataFrame({'a': range(0,20,2), 'b': range(10,30,2)})
# Note the use of pandas, not pd
In the Navigator window, select 'dataset' under Python
Select Load or Transform Data if you wish to manipulate the data.
Once loaded you can to visualization and use the data just like any other table.
EDIT
While the question is closed because it was not focussed. I think this is what the op was looking for.
In Power BI, create a dataset as follows from python script:
dataset = pd.DataFrame(np.random.randn(10, 8), columns=list('abcdefgh'))
Use matplotlib.pyplot to create a heatmap from the table. You can control the heatmap more extensively than in this example.
So in visualization in Power BI, add the following python script (taken from Conditional formatting for 2- or 3-scale coloring of cells of a table):
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset = pandas.DataFrame(a, b, c, d, e, f, g, h)
# dataset = dataset.drop_duplicates()
# Paste or type your script code here:
import pandas as pandas
import numpy as np
import matplotlib.pyplot as plt
#Round to two digits to print nicely
vals = np.around(dataset.values, 2)
#Normalize data to [0, 1] range for color mapping below
normal = (dataset - dataset.min()) / (dataset.max() - dataset.min())
fig = plt.figure()
ax = fig.add_subplot(111)
ax.axis('off')
the_table=ax.table(cellText=vals, rowLabels=dataset.index, colLabels=dataset.columns,
loc='center', cellColours=plt.cm.RdYlGn(normal),animated=True)
plt.show()
From this you get:
If you refresh your data, the script will yeild a new heatmap, which is what should happen in your power bi if you update whatever data you are using. Hope this helps.
Applying regression on each of the columns or rows in a pandas dataframe, without using for loops.
There is a similar post about this; Apply formula across pandas rows/ regression line, that does a regression for each of the "rows," however plotting the answer given is wrong. I couldn't comment on it as i do not have enough reputation, the main problem with that is that, it takes the values of the columns but then uses the apply function on each row.
Currently I only know how to do each column eg.
np.random.seed(1997)
df = pd.DataFrame(np.random.randn(10, 4))
first_stats = scipy.stats.linregress(df.index,df[0])
second_stats = scipy.stats.linregress(df.index,df[1])
I was hoping to find an answer without creating a function or for loops, similar to; pandas df.sum(), but instead of sum i want to do a regression that results in slope, intercept, r-value, p-value and standard error.
Look at the following example:
import numpy as np
import pandas as pd
from scipy.stats import linregress
np.random.seed(1997)
df = pd.DataFrame(pd.np.random.rand(100, 10))
df.apply(lambda x: linregress(df.index, x), result_type='expand').rename(index={0: 'slope', 1:
'intercept', 2: 'rvalue', 3:
'p-value', 4:'stderr'})
It should return what you want.