Mark Gawron
2016-06-10 07:06:22 UTC
The scipy.stats.qqplot and scipy.stats.probplot functions plot expected values versus actual data values for visualization of fit to a distribution. First a one-D array of expected percentiles is generated for a sample of size N; then that is passed to dist.ppf, the per cent point function for the chosen distribution, to return an array of expected values. The visualized data points are pairs of expected and actual values, and a linear regression is done on these to produce the line data points in this distribution should lie on.
Mark Gawron
osr = np.sort(x)
osm_uniform = _calc_uniform_order_statistic_medians(len(x))
osm = dist.ppf(osm_uniform)
slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
My question concerns the plot display.osm_uniform = _calc_uniform_order_statistic_medians(len(x))
osm = dist.ppf(osm_uniform)
slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
The x-axis of the resulting plot is labeled quantiles, but the xticks and xticklabels produced produced by qqplot and problplot do not seem correct for the their intended interpretations. First the numbers on the x-axis do not represent quantiles; the intervals between them do not in general contain equal numbers of points. For a normal distribution with sigma=1, they represent standard deviations. Changing the label on the x-axis does not seem like a very good solution, because the interpretation of the values on the x-axis will be different for different distributions. Rather the right solution seems to be to actually show quantiles on the x-axis. The numbers on the x-axis can stay as they are, representing quantile indexes, but they need to be spaced so as to show the actual division points that carve the population up into groups of the same size. This can be done in something like the following way.import numpy as np
xt = np.arange(-3,3,dtype=int)
# Find the 5 quantiles to divide the data into sixths
percentiles = [x*.167 + .502 for x in xt]
percentiles = np.array(percentiles + [.999])
vals = dist.ppf(percentiles)
ax.set_xticks(vals)
xt = np.array(list(xt)+[3])
ax.set_xticklabels(xt)
ax.set_xlabel('Quantile')
plt.show()
Ive attached two images to show the difference between the current visualization and the suggested one.xt = np.arange(-3,3,dtype=int)
# Find the 5 quantiles to divide the data into sixths
percentiles = [x*.167 + .502 for x in xt]
percentiles = np.array(percentiles + [.999])
vals = dist.ppf(percentiles)
ax.set_xticks(vals)
xt = np.array(list(xt)+[3])
ax.set_xticklabels(xt)
ax.set_xlabel('Quantile')
plt.show()
Mark Gawron