概念
假设某个数据的均值为u,实际抽样时离u越近说明假设的这个均值越合理,越远,就说明越不合理。
这里又有个p-value的概念,代表实际抽样的结果与假设的差异程度。值越大意味着越无差异。实际中我们会自己设置一个阈值,如0.05,当计算出来的p-value大于这个0.05时,就满足我们的需求,这个0.05叫显著性水平。
单样本T检验:检验单个样本的平均值是否等于目标值;
Python代码
如下场景,一串数据,他的增长率是否是0.1,显著性水平为0.5,也就是p-value大于0.5说明检验单个样本的平均值等于目标值;
如下代码:
- import statsmodels.api as sm
-
- valueList = [0.169747191462884, 0.165484359308337, 0.141358295556684, 0.0631967134074211, 0.101527686160212]
-
- if __name__ == '__main__':
-
- d = sm.stats.DescrStatsW(valueList)
- print('t检验= %6.4f,p-value=%6.4f, df=%s' % d.ttest_mean(0.10))
-
- pass
运行截图如下:
其中来看下ttest_mean这个函数
- def ttest_mean(self, value=0, alternative="two-sided"):
- """ttest of Null hypothesis that mean is equal to value.
- The alternative hypothesis H1 is defined by the following
- - 'two-sided': H1: mean not equal to value
- - 'larger' : H1: mean larger than value
- - 'smaller' : H1: mean smaller than value
- Parameters
- ----------
- value : float or array
- the hypothesized value for the mean
- alternative : str
- The alternative hypothesis, H1, has to be one of the following:
- - 'two-sided': H1: mean not equal to value (default)
- - 'larger' : H1: mean larger than value
- - 'smaller' : H1: mean smaller than value
- Returns
- -------
- tstat : float
- test statistic
- pvalue : float
- pvalue of the t-test
- df : int or float
- """
- # TODO: check direction with R, smaller=less, larger=greater
- tstat = (self.mean - value) / self.std_mean
- dof = self.sum_weights - 1
- # TODO: use outsourced
- if alternative == "two-sided":
- pvalue = stats.t.sf(np.abs(tstat), dof) * 2
- elif alternative == "larger":
- pvalue = stats.t.sf(tstat, dof)
- elif alternative == "smaller":
- pvalue = stats.t.cdf(tstat, dof)
-
- return tstat, pvalue, dof
需要注意以下几点:
①ttest_mean有2个参数,一个是value,一般传array进去,第二个参数有3个值,分别是:
- "two-sided": 不等与value;(默认)
- "larger": 大于value;
- "smaller": 小于value;
②返回值有3个参数:
- tstat : float :t检验值,越大说明越合理
- pvalue : float :p-value值,和设置的显著水平比较,证明这个传入的参数value是否合理;
- df : int or float :上面的数据类型是什么float还是int,保留几位小数;