A novel basis function approach to finite population parameter estimation

Document Type : Article

Authors

Department of Statistics, Quaid-i-Azam University, Islamabad, 44000, Pakistan

Abstract

Modeling non-linear data is a common practice in data science and machine learning (ML). It is aberrant to get a natural process whose outcome varies linearly with the values of input variable(s). A
robust and easy methodology is needed for accurately and quickly fitting a sampled data set with
a set of covariates assuming that the sampled data could be a complicated non-linear function. A
novel approach for estimation of finite population parameter τ , a linear combination of the population values is considered, in this article, under superpopulation setting with known basis functions
regression (BFR) models. The problems of subsets selection with single predictor under an automatic
matrix approach, and ill-conditioned regression models are discussed. Prediction error variance of
the proposed estimator is estimated under widely used feature selection criteria in ML. Finally, the
expected squared prediction error (ESPE) of the proposed estimator and the expectation of estimated
error variance under bootstrapping as well as simulation study with different regularizers are obtained
to observe the long-run behavior of the proposed estimator.

Keywords


References:
1. Cochran, W. "The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce", The Journal of Agricultural Science, 30(2) pp. 262-275 (1940).
2. Murthy, M. "Product method of estimation", Sankhya: The Indian Journal of Statistics, Series A, 26,(1), pp. 69-74 (1964). 
3. Upadhyaya, L.N. and Singh, H.P. "Use of transformed auxiliary variable in estimating the finite population mean", Biometrical Journal: Journal of Mathematical Methods in Biosciences, 41(5), pp. 627-636 (1999).
4. Gupta, S. and Shabbir, J. "On improvement in estimating the population mean in simple random sampling", Journal of Applied Statistics, 35(5), pp. 559- 566 (2008).
5. Diana, G., Giordan, M., and Perri, P.F. "An improved class of estimators for the population mean", Statistical Methods & Applications, 20(2), pp. 123-140 (2011).
6. Mahdizadeh, M. and Zamanzade, E. "Estimation of a symmetric distribution function in multistage ranked set sampling", Statistical Papers, 61(2), pp. 851-867 (2020).
7. Zamanzade, E. and Mahdizadeh, M. "Using ranked set sampling with extreme ranks in estimating the population proportion", Statistical Methods in Medical Research, 29(1), pp. 165-177 (2020).
8. Valliant, R., Dorfman, A.H., and Royall, R.M., Finite Population Sampling and Inference: A Prediction Approach, Number 04; QA276. 6, V3. John Wiley, New York (2000).
9. Godambe, V. "A unified theory of sampling from finite populations", Journal of the Royal Statistical Society, Series B (Methodological), pp. 269-278 (1955).
10. Dorfman, A.H., Hall, P., et al. "Estimators of the finite population distribution function using nonparametric regression", The Annals of Statistics, 21(3), pp. 1452- 1475 (1993).
11. Chambers, R.L., Dorfman, A.H., and Wehrly, T.E. "Bias robust estimation in finite populations using nonparametric calibration", Journal of the American Statistical Association, 88(421), pp. 268-277 (1993).
12. Breidt, F.J. and Opsomer, J.D. "Local polynomial regression estimators in survey sampling", Annals of Statistics, 16, pp. 1026-1053 (2000).
13. Kikechi, C.B., Simwa, R.O., and Pokhariyal, G.P. "On local linear regression estimation in sampling surveys", Far East Journal of Theoretical Statistics, 53(5), pp. 291-311 (2017).
14. Nadaraya, E.A. "On estimating regression", Theory of Probability & Its Applications, 9(1), pp. 141-142 (1964).
15. Watson, G.S. "Smooth regression analysis", Sankhya: The Indian Journal of Statistics, Series A, 26(4), pp. 359-372 (1964).
16. Chambers, R., Dorfman, A., and Sverchkov, M.Y. "Nonparametric regression with complex survey data", Analysis of Survey Data, pp. 151-174 (2003).
17. Fan, G., Local Polynomial Modeling and Its  pplications, London (1996).
18. Zheng, H. and Little, R.J. "Penalized spline modelbased estimation of the finite populations total from probability-proportional-to-size samples", Journal of Official Statistics, 19(2), p. 99 (2003).
19. Zheng, H. and Little, R. "Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples" (2004).
20. Hazlett, C. "A balancing method to equalize multivariate densities and reduce bias without a specification search", Working Draft, 17 (2013).
21. Sanchez-Borrego, I., Opsomer, J.D., Rueda, M., et al. "Nonparametric estimation with mixed data types in survey sampling", Revista Matematica Complutense, 27(2), pp. 685-700 (2014).
22. Luc, C. "Nonparametric kernel regression using complex survey data", Job Market Paper (2016).
23. Ardilly, P. "Echantillonnage representatif optimuma probabilityes inegales", Annales d Economie et de Statistique, pp. 91-113 (1991).
24. Deville, J. "Constrained samples, conditional inference, weighting: Three aspects of the utilization of auxiliary information", In Proceedings of the Workshop on Uses of Auxiliary Information in Surveys, pp. 5-7 (1992).
25. Hedayat, A. and Majumdar, D. "Generating desirable sampling plans by the technique of tradeoff in experimental design", Journal of Statistical Planning and Inference, 44(2), pp. 237-247 (1995).
26. Deville, J.-C. and Tille, Y. "Efficient balanced sampling: the cube method", Biometrika, 91(4), pp. 893- 912 (2004).
27. Falorsi, P.D. and Righi, P. "A unified approach for defining optimal multivariate and multidomains sampling designs", Topics in Theoretical and Applied Statistics, pp. 145-152, Springer (2016).
28. Clair, L. "Nonparametric Kernel estimation methods using complex survey data", PhD Thesis (2017).
29. Kikechi, C.B., Simwa, R.O., and Pokhariyal, G.P. "On local linear regression estimation of finite population totals in model based surveys", American Journal of Theoretical and Applied Statistics, 7(3), pp. 92-101 (2018).
30. Rastkhiz, S.E.A., Dehkordi, A.M., Farsi, J.Y., et al. "A new approach to evaluating entrepreneurial opportunities", Journal of Small Business and Enterprise Development, 26(1), pp. 67-84 (2019).
31. Kumar, S., Sisodia, B., Singh, D., et al. "Calibration approach based estimation of finite population total in survey sampling under super population model when study variable and auxiliary variable are inversely related", Journal of Reliability and Statistical Studies, 10(2), pp. 83-93 (2017).
32. Chauhan, S. and Sisodia, B. "Model based prediction of finite population total under super population model", Journal of Reliability and Statistical Studies, 11(2), pp. 57-68 (2018).
33. Kawakubo, Y. and Kobayashi, G. "Small area estimation of general finite-population parameters based on grouped data", arXiv preprint arXiv:1903.07239 (2019).
34. Ahmed, S. and Shabbir, J. "Model based estimation of population total in presence of nonignorable nonresponse", PloS One, 14(10), p. e0222701 (2019a).
35. Ahmed, S. and Shabbir, J. "On use of ranked set sampling for estimating super-population total: Gamma population model", Scientia Iranica, 28(1), (2019b).DOI: 10.24200/SCI.2019.50976.1946.
36. Liu, C., Li, H.-C., Fu, K., et al. "Bayesian estimation of generalized gamma mixture model based on variational em algorithm", Pattern Recognition, 87, pp. 269-284 (2019).
37. Molina, I. and Ghosh, M. "Accounting for dependent informative sampling in model-based finite population inference", TEST, 30(1), pp. 1-19 (2020).
38. Jafaraghaie, R. "Prediction of finite population parameters using parametric model under some loss functions", Communications in Statistics-Theory and Methods, 51(4), pp. 863-882 (2020).
39. Royall, R.M. "The linear least-squares prediction approach to two-stage sampling", Journal of the American Statistical Association, 71(355), pp. 657-664 (1976).
40. Royall, R.M. and Herson, J. "Robust estimation in finite populations I", Journal of the American Statistical Association, 68(344), pp. 880-889 (1973).
41. Jekabsons, G. and Zhang, Y. "Adaptive basis function construction: an approach for adaptive building of sparse polynomial regression models", Machine Learning, 1(10), pp. 127-155 (2010).
42. Chambers, R. and Clark, R. "An introduction to model-based survey sampling with applications", OUP Oxford, 37 (2012).
43. Deville, J.-C. and Sarndal, C.-E. "Calibration estimators in survey sampling", Journal of the American Statistical Association, 87(418), pp. 376-382 (1992).
44. Krishnaiah, P.R. and Alpaydin, E., Introduction to Machine Learning, MIT Press (2009).
45. Broomhead, D.S. and Lowe, D. "Radial basis functions, multi-variable functional interpolation and adaptive networks", Technical Report, Royal Signals and Radar Establishment Malvern (United Kingdom) (1988).
46. Powell, M.J.D. "Restart procedures for the conjugate gradient method", Mathematical Programming, 12(1), pp. 241-254 (1977).
47. Lowe, D. and Broomhead, D. "Multivariable functional interpolation and adaptive networks", Complex Systems, 2(3), pp. 321-355 (1988).
48. Scholkopf, B., Sung, K.-K., Burges, C.J., et al. "Comparing support vector machines with gaussian kernels to radial basis function classifiers", IEEE Transactions on Signal Processing, 45(11), pp. 2758-2765 (1997).
49. Buhmann, M.D., Radial Basis Functions: Theory and Implementations, 12, Cambridge University Press (2003).
50. Biancolini, M.E., Fast Radial Basis Functions for Engineering Applications, Springer (2017).
51. Tikhonov, A.N. and Arsenin, V.I., Solutions of Ill-Posed Problems, 14, Vh Winston (1977). Technometrics, 12(1), pp. 55-67 (1970).
52. Hoerl, A.E. and Kennard, R.W., Ridge Regression: Biased Estimation for Non-Orthogonal Problems (1970).
53. Teukolsky, P. and Teukolsky, S. "Vetterling, and  flannery", Numerical Recipes in C, 18, pp. 656-680 (1992).
54. Cartis, C., Gould, N.I., and Toint, P.L. "Universal regularization methods: varying the power, the smoothness and the accuracy", SIAM Journal on Optimization, 29(1), pp. 595-615 (2019).
55. Tibshirani, R. "Regression shrinkage and selection via the lasso", Journal of the Royal Statistical Society: Series B (Methodological), 58(1), pp. 267-288 (1996).
56. Zou, H. and Hastie, T. "Regularization and variable selection via the elastic net", Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), pp. 301-320 (2005).
57. Shao, J. and Wu, C.J. "A general theory for jackknife variance estimation", The Annals of Statistics, 17(3), pp. 1176-1197 (1989).
58. Zheng, B. and Agresti, A. "Summarizing the predictive power of a generalized linear model", Statistics in Medicine, 19(13), pp. 1771-1781 (2000).
59. Golub, G.H., Heath, M., and Wahba, G. "Generalized cross-validation as a method for choosing a good ridge parameter", Technometrics, 21(2), pp. 215-223 (1979).
60. Mallows, C.L. "Some comments on Cp", Technometrics, 15(4), pp. 661-675 (1973).
61. Boisbunon, A., Canu, S., Fourdrinier, D., et al. "AIC, Cp and estimators of loss for elliptically symmetric distributions", arXiv preprint arXiv:1308.2766 (2013).
62. James, G., Witten, D., Hastie, T., et al., An Introduction to Statistical Learning, 112, Springer (2013).
63. Giraud, C., Introduction to High-Dimensional Statistics, Chapman and Hall/CRC (2014).
64. Schwarz, G. "Estimating the dimension of a model", The Annals of Statistics, 6(2), pp. 461-464 (1978).
65. Akaike, H. "On Entropy Maximization Principle, Applications of Statistics", In Proceedings of the Symposium Held at Wright State University, PR Krishnaiah, Ed. North-Holland Publishing Company (1977).
66. Rawlings, J.O., Pantula, S.G., and Dickey, D.A., Applied Regression Analysis: A Research Tool, Springer Science & Business Media (2001).
67. Orr, M.J.L., Introduction to Radial Basis Function Networks, Center for Cognitive Science, Edinburgh University, Scotland, UK. (1996) http://anc.ed.ac.uk/rbf.
68. Lutkepohl, H., Handbook of Matrices, 1, Wiley (1996).
69. Horn, R.A. and Johnson, C.R., Matrix Analysis Cambridge University Press (2012).