During the last decade, a new learning paradigm called Structural Risk Minimization (SRM) derived from Statistical Learning Theory, has become widely studied in machine learning. Machines implementing SRM, e. g., Support Vector Machines (SVMs) and Kernel Fisher Discriminants (KFDs), have been very successfully used for solving pattern recognition and function regression problems. SRM's ability to simultaneously minimize the risk of error on training data and the complexity of a learning machine results in better generalization capability than plain Empirical Risk Minimization (ERM), especially if the amount of training data is limited. The present work is devoted to applying SRM to the problem of probability density function (PDF) estimation. When modeling sequences of continuous-valued events using Hidden Markov Models (HMMs), e. g., automatic speech recognition (ASR), PDFs are used to model the emission probabilities of the HMMs' states. This thesis investigates and develops methods to efficiently train sparse kernel PDF models by regression of the empirical cumulative distribution function (ECDF). A new method for obtaining a sparse approximation of the orthogonal least-squares regression solution by forward-selection of relevant samples is presented, where a novel memory-efficient thin update of the orthogonal decomposition is used. This method is evaluated on standard benchmark problems of up to five dimensions, showing superior performance to traditional parametric Gaussian Mixture Models (GMMs) and similar performance to the theoretically optimal, non-sparse Parzen windows PDF models. However, it is found that this new method cannot be applied to the problem of estimating PDFs for ASR due to the complexity of the ECDF in high dimensions. Instead, posterior class probabilities calibrated from the outputs of binary discriminants such as SVMs or KFDs are turned into class-conditional PDFs using Bayes' rule. This approach is tested within a monophone HMM ASR system on the Resource Management task, outperforming traditional HMM-GMM systems significantly, especially on random limited samples which demonstrates the new models' improved generalization ability on small-sample problems. In order to realize these large-scale experiments, a novel machine learning software library is presented. Primary focus is put on fast computations, simplicity both in terms of expressing algorithms and extending functionality, and flexibility in order to properly appreciate algorithms' properties and advantages. The software library follows an object-oriented design and has been implemented in C++. For productivity, the library is equipped with fine-grained tracing, an object-oriented persistence model, transparent error handling and parallelization on distributedmemory computer clusters.