Applications of Bessel Functions in Machine Learning

On neural network architectures (Bessel-CNNs), trainable Bessel activations, SVM kernels, Matérn covariance in Gaussian processes, and rotation-invariant vision filters

An in-depth investigation into how Bessel functions are used in machine learning, particularly in neural networks, optimization algorithms, and potentially in computer vision.

1. Bessel Functions in Machine Learning Models and Neural Networks
2. Bessel Functions in Optimization and Convergence
- 2.1 Convergence Analysis via Bessel Functions
- 2.2 Performance Improvements and Stability Contributions
3. Roles in Regularization Techniques and Loss Functions
- 3.1 Loss Functions Involving Bessel Functions
- 3.2 Regularization Techniques
4. Bessel Functions in Feature Engineering and Transformations
- 4.1 Fourier–Bessel Transformations for Features
- 4.2 Transformations in Deep Learning Architectures
5. Bessel Functions in Computer Vision Applications
6. Conclusion

Bessel functions – a family of solutions to Bessel’s differential equation – have found niche yet impactful applications in machine learning. They appear in various contexts, from specialized neural network architectures and kernel methods to optimization analysis and computer vision techniques. Below we explore where Bessel functions arise in ML, how they contribute to convergence or performance, and their roles in regularization, loss formulations, feature transformations, and vision algorithms. Each section provides examples from research to illustrate these applications, with an emphasis on clarity for a mathematically inclined audience.

1. Bessel Functions in Machine Learning Models and Neural Networks

1.1 Neural Network Architectures Involving Bessel Functions

Bessel-Convolutional Neural Networks (B-CNNs): Bessel functions have been used to design convolutional layers that are invariant to continuous rotations. Delchevalerie et al. (2021) introduced Bessel-CNNs, where the convolutional filters utilize Bessel function bases to achieve rotational invariance by design (Achieving Rotational Invariance with Bessel-Convolutional Neural Networks | OpenReview) (Achieving Rotational Invariance with Bessel-Convolutional Neural Networks | OpenReview). In these networks, image features are transformed using Bessel functions (often by converting images to polar coordinates and applying a Bessel basis), so that rotating the input produces a rotated feature map rather than entirely different activations (⇒) (⇒). This built-in equivariance means the final feature representation does not change with image orientation, which is highly desirable in tasks like medical or satellite imaging where objects can appear at arbitrary angles (Achieving Rotational Invariance with Bessel-Convolutional Neural Networks | OpenReview). Prior to B-CNNs, other works had also explored Bessel-based decompositions for vision – for example in image denoising and recognition – to enforce rotation-invariant representations (⇒). B-CNNs extend this idea by incorporating Bessel functions directly into the convolution operation, yielding rotation invariance throughout the learning process.

Bessel-Based Activation Functions: Beyond convolutional layers, Bessel functions have been proposed as activation functions in neural networks. The rationale is that Bessel functions (especially of the first kind, $J_\nu$ ) oscillate in a controlled manner, potentially combining beneficial properties of both linear and sinusoidal activations. Vieira (2023) developed trainable Bessel activation functions within quaternionic CNNs, showing they can outperform standard ReLU activations ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions) ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions). These Bessel-type activations have adjustable parameters that allow the shape of the activation curve to be tuned during training. The result is a more flexible nonlinearity that improved classification accuracy and loss convergence compared to ReLU in experiments ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions). Intriguingly, when the order of the Bessel function is set to a half-integer, the activation becomes a mix of polynomial and sinusoidal terms ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions) ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions). This gives the network a “best of both worlds” behavior – mimicking the unbounded linear growth of ReLU for large inputs while also introducing oscillatory sensitivity for smaller inputs. Such properties can enhance learning capacity and adaptability of the model ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions) ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions).

Neural Networks with Bessel Basis Functions: Researchers have also experimented with using Bessel functions as basis functions in network layers. One example is constructing hidden neurons from Fourier–Bessel series expansions. Huang et al. (2021) proposed a classification neural network where each hidden neuron implements a double Fourier-Bessel series, effectively using Bessel functions in the neuron’s computation (Fourier-Bessel series neural networks for classification | Request PDF). In this architecture, the input is projected onto a set of Bessel function modes (akin to a frequency-domain representation), and these responses feed into the next layer. The motivation is that Bessel functions form an orthogonal basis on circular domains, so they can efficiently capture patterns in data with radial or oscillatory structure. Such a Bessel-based neural network was shown to be viable for classification tasks (Fourier-Bessel series neural networks for classification | Request PDF). While this is a specialized approach, it demonstrates the principle that special function expansions (like Bessel series) can serve as alternatives to conventional neuron activations or basis functions in neural nets.

1.2 Kernel Methods and Support Vector Machines

Bessel Kernels in SVMs: In kernel-based algorithms like Support Vector Machines (SVM), Bessel functions have been used to define kernel functions that measure similarity between data points. A kernel must satisfy Mercer’s condition (be positive-definite); certain Bessel function forms meet this criterion. Shao and Jiang (2024) introduced a new class of Bessel kernels for SVMs, proving they are continuous and Mercer-compliant ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine). These kernels are parameterized by an order (related to the Bessel function order) and a scale, and interestingly, in the limit of high smoothness they degenerate to the Gaussian kernel ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine). This means the Bessel kernel family bridges the gap between less smooth kernels and the infinitely differentiable Gaussian RBF kernel. One practical advantage observed is that Bessel kernels come with fewer hyperparameters that need tuning ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine), which can simplify model selection. Empirical evaluations on classification and regression tasks showed that SVMs with Bessel kernels achieved accuracy on par with or better than standard kernels, demonstrating strong generalization performance ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine). Other researchers have also noted that Bessel kernels, though less common than Gaussian or polynomial kernels, can be well-suited for specific data types – particularly those with inherent circular or oscillatory patterns (What is Kernel Trick in SVM ? Interview questions related ... - Medium).

Gaussian Process Covariance (Matérn Kernel): In Gaussian Process (GP) models, the popular Matérn covariance function contains a modified Bessel function of the second kind. The Matérn kernel $K_\nu(d)$ (for distance $d$ ) is defined as:

$K_\nu(d) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)}\Big(\sqrt{2\nu}\frac{d}{\rho}\Big)^\nu K_\nu\!\Big(\sqrt{2\nu}\frac{d}{\rho}\Big)$

where $K_\nu$ is a modified Bessel function (second kind) and $\nu,\rho$ are positive parameters (Matérn covariance function - Wikipedia). This Bessel-based form gives the Matérn family a desirable flexibility: the parameter $\nu$ controls the smoothness of the GP’s samples (with $\nu \to \infty$ yielding the infinitely-smooth RBF kernel, and $\nu = 1/2$ yielding the absolute exponential kernel as special cases). Thus, Bessel functions are directly responsible for the Matérn kernel’s shape and its interpolation properties. In machine learning applications of GPs, choosing $\nu$ allows a trade-off between model smoothness and fidelity to data; for instance, smaller $\nu$ (using more of the Bessel function’s non-polynomial behavior) may better fit rough, jumpy data, while larger $\nu$ approaches a Gaussian kernel for very smooth underlying functions (Matérn covariance function - Wikipedia). The presence of the Bessel $K_\nu$ function also influences optimization of hyperparameters – gradients of the log-likelihood involve derivatives of $K_\nu$ , which are well-understood and implemented in GP libraries thanks to this analytical form.

1.3 Summary of Model-Level Uses

In summary, Bessel functions appear in ML models in several ways:

As components of neural network layers: to enforce symmetry (e.g., rotation invariance) or provide new nonlinearities. For example, Bessel-CNNs use them in convolutional filters (Achieving Rotational Invariance with Bessel-Convolutional Neural Networks | OpenReview), and quaternionic networks use Bessel-type activation functions ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions).
As basis expansions: replacing conventional basis functions with Bessel series in hidden layers (Fourier-Bessel series neural networks for classification | Request PDF) for capturing oscillatory patterns.
As kernel functions: defining similarity measures in kernel machines. Bessel kernels for SVMs broaden the kernel toolbox, and the Matérn kernel in GPs leverages a modified Bessel to tune smoothness (Matérn covariance function - Wikipedia).

These uses leverage mathematical properties of Bessel functions – such as orthogonality in polar coordinates, controlled oscillation, and well-behaved tails – to improve or extend learning algorithms.

2. Bessel Functions in Optimization and Convergence

Bessel functions also emerge in the context of optimization algorithms and their theoretical analysis. Here we discuss how they relate to convergence and stability of training processes.

2.1 Convergence Analysis via Bessel Functions

Momentum Methods and Differential Equations: The dynamics of certain optimization algorithms can be analyzed by continuous-time analogues, leading to differential equations whose solutions involve Bessel functions. A notable example is Nesterov’s Accelerated Gradient (NAG) method. When optimizing a simple quadratic function, the continuous-time limit of NAG is a second-order ODE (akin to a damped harmonic oscillator). The solution to this ODE can be written in terms of the Bessel function $J_1$ (first kind, order 1). In fact, for a quadratic with curvature $\lambda$ , the position $x(t)$ under idealized NAG follows $x(t) \propto \frac{1}{t}J_1(t\sqrt{\lambda})$ (A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method). Using asymptotic properties of $J_1$ , one can show that as $t$ grows, $J_1(t) \sim \sqrt{\frac{2}{\pi t}}\cos(t - 3\pi/4)$ (A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method). This analytic form explains the behavior of the accelerated method: an oscillation (the cosine term) combined with decay in amplitude (the $1/\sqrt{t}$ factor) (A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method). In practical terms, the Bessel-function solution indicates that Nesterov’s method will exhibit diminishing oscillations as it converges, rather than a purely exponential decay. This insight connects to the known behavior of momentum-based optimizers, where one often observes overshooting and oscillatory traces before settling. The Bessel function here contributes a precise description of that effect, helping researchers tune momentum hyperparameters for stability (e.g. preventing divergence from excessive oscillation). It’s an elegant case where a special function provides clarity on convergence dynamics.

Damped Oscillations and Stability: More generally, any time one linearizes an iterative process or studies its continuous limit in polar/spherical coordinates, Bessel functions can appear in the solution, especially for radially symmetric or oscillatory components. For instance, some analyses of gradient descent in the limit of small learning rates involve Bessel functions when evaluating certain integrals or transfer functions ([PDF] analytic time evolution of gradient descent dynamics). While not commonplace in simple convex optimization proofs, Bessel functions often arise in advanced analyses like average-case convergence rates or solving recurrence relations that approximate differential equations. Their appearance usually signals an underlying rotational symmetry or oscillatory behavior in the algorithm’s state space. By leveraging known bounds and properties of Bessel functions, one can establish the stability conditions or convergence rates of these algorithms. In the NAG example, the boundedness of $J_1(t)$ and its decay ensure the method’s iterates remain controlled over time.

2.2 Performance Improvements and Stability Contributions

Improving Optimization via Bessel Properties: In some cases, incorporating Bessel functions into the algorithm or objective can improve convergence behavior:

Bessel Kernels & Optimization: As mentioned, SVMs using Bessel kernels have fewer hyperparameters ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine). This simplification can make the model selection more straightforward, indirectly aiding the optimization of SVM dual variables (since there are fewer meta-parameters to grid-search over). Moreover, the Bessel kernel’s ability to morph into a Gaussian kernel means the SVM can benefit from well-behaved optimization landscapes similar to RBF kernels, while also handling rougher data when needed ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine). This adaptability can lead to more stable training (avoiding falling into bad local minima due to a mis-specified kernel).
Trainable Bessel Activations & Convergence: Introducing Bessel-based activation functions in neural nets (with trainable parameters) was found to improve learning – models converged to lower loss and higher accuracy compared to using ReLU ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions). One reason is that the additional parameters in the activation (governing the Bessel function’s shape) can be optimized to suit the data, effectively regularizing the shape of the activation function during training. This can accelerate convergence by better shaping the loss landscape (e.g., reducing dead neuron effects that ReLU can have, and providing smooth gradients akin to sinusoidal activations). Vieira’s experiments showed consistently better performance with Bessel activations, implying the optimizer (stochastic gradient descent) was navigating a more favorable error surface ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions).

Convergence Challenges and Mitigation: On the other hand, using Bessel functions in models sometimes introduces new challenges for optimization, which need to be addressed for stable convergence:

In loss functions involving Bessel functions (see §3.1), gradients can become very small or expensive to compute. For example, training with a von Mises–Fisher loss (which uses a Bessel function in the probability normalization) required careful treatment to avoid slow convergence. Scott et al. (2021) noted that directly backpropagating through modified Bessel functions was computationally heavy and led to vanishing gradients when the concentration parameter $\kappa$ was small (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning) (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning). They mitigated this by using mathematical bounds on Bessel function ratios and by initializing parameters such that $\kappa$ isn’t in the flat-gradient regime (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning) (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning). This intervention improved the stability and speed of convergence of their spherical embedding model. In general, knowing the behavior of Bessel functions (e.g., $I_\nu(x)$ grows rapidly, or $J_\nu(x)$ oscillates) allows ML practitioners to adjust optimization strategies – such as scaling inputs, constraining parameters, or using look-up tables/approximations for better numerical stability.

In summary, Bessel functions contribute to ML optimization both theoretically – by characterizing algorithm dynamics (as in momentum methods) – and practically – by sometimes providing smoother or more flexible functional forms that lead to better-conditioned optimization problems. Care is needed to handle their computational complexity and ensure stable gradient flows.

3. Roles in Regularization Techniques and Loss Functions

Bessel functions occasionally appear in regularization terms or loss function formulations, particularly when probabilistic models or physical analogies are used in neural networks.

3.1 Loss Functions Involving Bessel Functions

Directional Losses and von Mises–Fisher (vMF) Distribution: The vMF distribution is often used for modeling unit-vectors (e.g., directional data or feature embeddings on a hypersphere). Its probability density on the sphere $S^{n-1}$ is: $p(x|\mu,\kappa) = C(n,\kappa)\exp(\kappa \mu^\top x)$ , where the normalization constant $C(n,\kappa)$ involves a modified Bessel function:

$C(n,\kappa) = \frac{\kappa^{\frac{n}{2}-1}}{(2\pi)^{\frac{n}{2}} I_{\frac{n}{2}-1}(\kappa)}$

When incorporating such a distribution into a loss function or likelihood, one inevitably deals with $I_{\nu}(\kappa)$ (the modified Bessel function of first kind) and its ratios. Scott et al. (2021) proposed a von Mises–Fisher loss for deep embeddings, which essentially encourages the model’s outputs to follow a vMF distribution for classification calibration (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning) (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning). In their implementation, the log-loss included terms with $\log I_\nu(\kappa)$ , and so the gradient had contributions from $I'_\nu(\kappa)/I_\nu(\kappa)$ (which is a ratio of Bessel functions). This had two important implications: 1. Calibration and Confidence: The presence of the Bessel $I$ function meant the loss was shaping the distribution of output norms. It helped produce well-calibrated probabilities on the sphere by penalizing norm deviations through $\kappa$ . The Bessel function’s growth controls how sharply peaked the distribution is, thereby tuning model confidence. 2. Regularization via Bessel Term: The $I_{\nu}$ in the denominator acts somewhat like a regularizer – for large $\kappa$ it grows, discouraging overly confident embeddings unless truly needed. For small norms, $I_\nu(\kappa)\approx (\kappa/2)^\nu/\Gamma(\nu+1)$ (very small), which makes $C(n,\kappa)$ large and thus heavily penalizes tiny norm (preventing trivially small outputs). In effect, the Bessel term in the loss keeps the embedding norms at a reasonable scale, neither exploding nor collapsing to zero, which is a regularizing effect on feature magnitudes.

However, as noted, directly using Bessel functions in the loss posed optimization hurdles. The gradients of $I_{\nu}(\kappa)$ can be near-zero for $\kappa$ small, stalling learning (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning), or expensive to compute for large $\kappa$ . The authors handled this by leveraging known bounds for the ratio $I_{\nu+1}(\kappa)/I_{\nu}(\kappa)$ to approximate gradients, and by initializing the model such that $\kappa$ starts in a regime where the Bessel ratio is well-behaved (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning) (von Mises-Fisher Loss: An Exploration of Embedding Geometries for Supervised Learning). This is a case where understanding the analytic form of the Bessel function was key to implementing a novel loss that improves performance (in terms of calibration and sometimes accuracy) without sacrificing convergence.

Other Loss Functions: Outside of directional data, Bessel functions are less commonly seen in loss functions. One could imagine regularization schemes borrowed from physics or signal processing (for example, a loss penalizing solutions of Bessel’s equation, to enforce certain spatial patterns), but these are rare in standard ML practice. The vMF loss stands out as a clear example in deep learning. Additionally, in some Bayesian deep learning contexts, if one places a prior that has a Bessel-based normalization (like a Matthews distribution or certain information field priors), the log-prior term in the loss would involve Bessel functions. For instance, a prior on weights that favors band-limited functions in polar coordinates might involve an $I_0$ or $K_0$ term. While theoretically possible, such uses remain specialized.

3.2 Regularization Techniques

Implicit Regularization in Kernels: The Bessel kernel for SVM (Section 1.2) can be viewed through a regularization lens. Because it smoothly interpolates between different basis functions, using it implicitly constrains the hypothesis space in a way that can control overfitting. The fact that it degenerates to a Gaussian kernel in a limiting case ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine) means that, at one extreme, it imposes a very strong smoothness prior (like Tikhonov regularization in function space), and at the other extreme (lower $\nu$ ) it allows rougher functions. Thus the Bessel kernel class can act as a form of regularization tuning: the Bessel function’s order $\nu$ serves as a regularization parameter controlling function complexity. Empirical studies showed that an appropriate choice of $\nu$ yields good generalization ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine), consistent with the idea that Bessel-based similarity measures can reduce overfitting by matching the true function’s smoothness.

Regularization via Architectural Constraints: Bessel functions have also played a role in physics-driven regularization. For example, enforcing that a neural network’s output satisfies certain physical equations can be a form of regularization in scientific machine learning. If a network is structured to satisfy Bessel’s differential equation (common in wave or heat propagation in cylindrical coordinates) (⇒), then its solutions will be linear combinations of Bessel functions $J_\nu$ and $Y_\nu$ . By constraining a network or its loss to prefer such solutions, one inherently regularizes it to favor physically plausible outputs. This technique, often used in physics-informed neural networks (PINNs), might involve adding a term to the loss that penalizes deviations from Bessel’s equation for known boundary conditions, effectively injecting a Bessel function prior. While not mainstream in generic ML tasks, it is another avenue where Bessel functions indirectly contribute to regularization.

In summary, the role of Bessel functions in regularization and losses is targeted to specific domains (like directional outputs or physics contexts). In those domains, Bessel terms can act to normalize, calibrate, or constrain models in beneficial ways – ensuring stability (through proper scaling of outputs) and improving generalization (through smoothness or structural priors).

4. Bessel Functions in Feature Engineering and Transformations

Feature engineering often exploits transforms that make certain patterns easier for models to learn. Bessel functions, due to their appearance in Fourier and cylindrical coordinate transforms, are useful in constructing features for both shallow and deep models, especially when rotational symmetry or frequency content is important.

4.1 Fourier–Bessel Transformations for Features

Fourier–Bessel Series for Signal Representation: In signal processing, the Fourier–Bessel series expansion (FBSE) is a tool for representing a signal in a hybrid domain – essentially performing a Fourier expansion in time combined with Bessel functions to handle radial or damped components (Automated Emotion Identification Using Fourier–Bessel Domain ...). ML practitioners have used FBSE to extract features from signals that have time-varying frequency content or circular structure. For example, Bhattacharyya et al. (2018) developed an empirical wavelet transform based on Fourier–Bessel expansion for analyzing non-stationary signals (Automated Emotion Identification Using Fourier–Bessel Domain-Based Entropies). By projecting signals onto Bessel function bases, they captured transient oscillatory patterns more effectively than a standard Fourier transform. These Bessel-based signal features have been applied in tasks like EEG signal analysis and speech emotion recognition. Sharma et al. (2022) combined a Fourier-Bessel wavelet decomposition with feature selection to improve automated emotion recognition from speech signals (Automated Emotion Identification Using Fourier–Bessel Domain-Based Entropies). The use of Bessel functions in these features is crucial – Bessel functions naturally model signals that have decaying oscillations (thanks to their $(1/\sqrt{x})$ envelope for large arguments) and can represent radial frequency content (important for 2D signals/images). In practice, an ML pipeline might transform raw signals using FBSE, then feed the coefficients or energies in certain bands as input features to a classifier or a neural network. The benefit observed is often a more compact or invariant representation, leading to better accuracy with fewer training examples (Automated Emotion Identification Using Fourier–Bessel Domain-Based Entropies).

Polar and Spherical Feature Representations: When dealing with data that has an inherent spherical or circular domain (images, spatial data, 3D point clouds), transforming to a coordinate system where Bessel functions arise can simplify feature extraction:

In 2D images, converting an image to polar coordinates and performing a Fourier transform in the angular direction yields features described by Bessel functions in the radial direction (the transform in polar coordinates is essentially a Hankel transform, which uses Bessel functions). Some vision systems use Fourier–Bessel descriptors for shape recognition: they represent a shape’s boundary as coefficients in a series of $e^{im\theta} J_m(k r)$ basis functions. This inherently gives rotation invariance (rotation only changes the phase of the angular Fourier part) and scale-controllable radial features. Research in image retrieval and classification has shown that such Fourier-Bessel descriptors can be robust to rotation and noise (⇒).
In spherical data or 3D volumetric data, one can use spherical harmonics for angular part and spherical Bessel functions for the radial part of a 3D Fourier transform. Indeed, in some 3D deep learning, spherical CNNs use a decomposition of filters into radial and angular components, where the radial part is expanded in spherical Bessel functions (solutions to Bessel’s equation in a ball) to enforce rotational equivariance. For instance, filters learned on a sphere might be constrained to lie in the space spanned by $j_\ell(kr)$ (spherical Bessel functions) radially and $Y_{\ell m}(\theta,\phi)$ (spherical harmonics) angularly, ensuring that rotations of the input correspond to predictable transformations of the filter response. This approach has been employed to model physical fields or 3D objects with CNNs that are equivariant to 3D rotations.

4.2 Transformations in Deep Learning Architectures

Scattering Transforms and Bessel Functions: The scattering transform (an unsupervised feature extractor introduced by Mallat) uses wavelets to capture signal structure. If one chooses circular wavelets, their radial profiles can be designed from Bessel functions. For example, a Bessel wavelet could be constructed to have a specific band-pass frequency response using Bessel $J_0$ or $J_1$ so that its Fourier transform is localized in a ring. Although not common, it’s conceivable to integrate such wavelets into a deep network’s first layer to extract rotation-invariant features. The earlier-cited works on Bessel invariants (⇒) essentially did a form of scattering: first transforming images with Bessel bases, then feeding into a learning algorithm. Modern deep learning could incorporate the same as fixed front-end layers or as initialization for convolutional filters that are then fine-tuned.

Dimension Reduction and Encoding: Another use is in feature encoding for dimensionality reduction. If one needs to compress data that lies in a circular manifold, using Bessel function eigenbases of the Laplacian on that domain is optimal (in terms of capturing variance). For example, performing PCA on images of circular objects might be improved by first expressing each image in a Bessel function basis (since Bessel functions diagonalize the Laplace operator in a disk). This is related to techniques like Zernike moments in image analysis: Zernike polynomials involve radial polynomials that can be related to Bessel functions, and they are used to create compact representations of shapes. In deep learning, one might use these as inputs or even design a network layer that computes Zernike moment-like features, giving the network a built-in invariance to certain transformations.

In essence, Bessel functions enter feature engineering pipelines whenever a problem has a natural expression in cylindrical or spherical coordinates, or where oscillatory basis functions are beneficial. By using Bessel-based transforms, features can become more invariant (to rotations or scaling) and more informative (capturing frequency content in a localized way) for the subsequent learning algorithm.

5. Bessel Functions in Computer Vision Applications

Computer vision often leverages signal processing techniques where Bessel functions play a role, especially in filtering and frequency analysis for images. Here we highlight key areas in vision where Bessel functions are utilized.

5.1 Filtering and Convolution Operations

Jinc Filters for Image Resampling: The jinc function, defined as $\text{jinc}(r) = J_1(r)/r$ , is directly derived from a Bessel $J_1$ and is the 2D radial analog of the sinc function (Jinc function: Bessel function analog to the sinc function). In image processing, the jinc function is used as an ideal low-pass filter kernel for circularly symmetric filtering. For example, when resizing or rotating images, one must interpolate pixel values; using a jinc-based kernel (sometimes called a Bessel interpolation or Lanczos with a jinc window) can produce high-quality results with minimal aliasing. Some libraries like ImageMagick provide a "Bessel" or "Jinc" filter option for resampling, which was shown to produce smoother results by accounting for circular symmetry in images (Bessel filter option works for v6.9.9-17 but not in v7.0.7-5 · Issue #829). The jinc filter’s Fourier transform is compactly supported on a disk (just as sinc’s Fourier transform is a rectangular band limit in 1D), which is why it serves as an ideal band-limiter for 2D. Practically, this means using Bessel-based convolution kernels can preserve image details while reducing artifacts, improving the performance of computer vision systems that rely on preprocessed inputs (e.g., a CNN taking less aliased images might achieve higher accuracy).

Defocus Blur and Point Spread Functions: A classic appearance of Bessel functions in vision is in the modeling of defocused images. The point spread function (PSF) of an out-of-focus circular aperture (like a camera lens) produces an “Airy disk” pattern, whose intensity profile is given by a squared Bessel function: $[2J_1(r)/r]^2$ . Correspondingly, the optical transfer function (Fourier transform of the PSF) for a defocused lens is proportional to $J_1(\omega)/\omega$ (Defocused Images). Specifically, for spatial frequency radius $p = \sqrt{u^2+v^2}$ , one model is $H(u,v) = J_1(a p)/(a p)$ (Defocused Images), which is often called the sombrero function due to its shape (Defocused Images). This Bessel-derived function describes how different frequencies are attenuated by defocus blur. In practical terms, vision algorithms for deblurring or autofocusing incorporate this model: knowing the blur kernel is a Bessel function allows for deconvolution algorithms that specifically invert the Bessel effect. For example, one can filter an image with the inverse of $J_1(r)/r$ in frequency (within stability limits) to restore sharpness (Defocused Images). Also, when simulating realistic camera effects in graphics or data augmentation, generating blur via Bessel functions yields physically accurate results. The circular symmetry of lenses makes Bessel functions the natural choice to describe these effects, and thus any vision tasks involving circular apertures, diffraction, or isotropic filters will likely use them.

Bessel Filters in Frequency Domain: The term Bessel filter in electronics refers to a filter with maximally flat phase delay, but in image processing a similar concept exists. A 2D Bessel filter can be designed to have a specific frequency response (for example, using Bessel polynomials in filter design). While classical digital image processing might not explicitly cite “Bessel function” beyond jinc, some advanced techniques might use Bessel functions to design custom convolution kernels that meet certain criteria (like rotational symmetry or specific roll-off characteristics). For instance, one could design a high-pass filter that emphasizes features at a certain radial frequency by using a Bessel $J_0$ pattern (which has a central peak and oscillating tail) as the kernel in spatial domain. This would accentuate ring-shaped structures in an image. Such tailored filters could be part of feature extraction in classical vision pipelines (e.g., detecting circular textures or ripples).

5.2 Frequency-Based Representations and Analysis

Polar Fourier Transform (Hankel Transform): As touched on, the Fourier transform in polar coordinates involves Bessel functions (specifically the zeroth-order Bessel $J_0$ for the radial part in the Hankel transform). In computer vision, analyzing the frequency content of an image in polar form is useful for rotation-invariant recognition. Techniques like RADD (Radial Descriptor) and others perform a polar Fourier transform, obtaining coefficients $F(m, k)$ where $m$ is angular frequency and $k$ indexes radial frequency (with Bessel functions linking $k$ to spatial frequency). These coefficients can be used as features invariant to rotation and modest scale changes. Because Bessel functions $J_m(kr)$ serve as the kernel of this transform, any algorithm using a polar frequency representation is implicitly using Bessel functions. They ensure that the reconstruction or analysis treats the image in a rotation-symmetric way. Vision tasks like texture classification or object recognition have benefited from such representations – a texture might be recognized by the energy distribution in these Fourier-Bessel coefficients, which remains the same regardless of orientation (⇒).

Shape Descriptors (Radial Moments): Many shape descriptors used in image analysis (e.g., for object detection or OCR of symbols) use radial moments that are often related to Bessel functions. For example, Zernike moments – widely used for character and trademark recognition – involve a radial polynomial $R_{n,m}(r)$ multiplied by $e^{im\theta}$ . While $R_{n,m}$ is not exactly a Bessel function, it is mathematically related and can be derived from a series expansion that includes Bessel functions. Similarly, Fourier–Bessel descriptors explicitly project the image onto $J_m$ functions radially. These descriptors have nice properties like orthogonality and completeness on the unit disk, and they have been used to classify shapes independent of rotation (⇒). In modern deep learning, one might not manually craft such descriptors because convolutional networks can learn them, but they might appear as an analysis tool to understand what a CNN has learned. For instance, one could analyze the frequency response of CNN filters and notice that some learned filters resemble Bessel functions (especially in networks trained for isotropic pattern detection).

Steerable Filters and Bessel Functions: Steerable filters are a concept where a filter at any rotation can be expressed as a combination of a few basis filters. Bessel functions pop up in the theory of steerable filters when dealing with circular harmonics. A steerable filter that is radially Gaussian and angularly sinusoidal can be extended by replacing the Gaussian (which is isotropic but not band-limited) with a Bessel Jinc function to achieve a different radial profile. This hybrid can yield a filter that’s steerable and has a compact support in frequency (for instance, used in some implementations of oriented Gabor-like filters where the radial part is jinc). Although these details are behind the scenes, they impact the design of features in earlier vision systems and potentially inspire modern equivalents in CNN filter initialization for better rotation coverage.

5.3 Notable Vision Applications Summary

To summarize significant uses of Bessel functions in computer vision:

Image filtering/resampling: jinc (Bessel) filters provide high-quality interpolation and anti-aliasing ([PDF] Linear Image Processing and Filtering - Stanford University).
Optics and blur: Lens blur (Airy disks) are modeled by Bessel functions (Defocused Images), informing deblurring algorithms and camera simulation.
Frequency analysis: Polar Fourier (Hankel) transforms and radial feature extraction rely on Bessel $J_m$ for analyzing patterns irrespective of orientation (Defocused Images).
Shape and texture descriptors: Bessel-related moment invariants aid in recognizing rotated/scaled objects (⇒).
Invariances in CNNs: As discussed, Bessel functions have even been built into CNN layers to grant rotational invariance from the ground up (Achieving Rotational Invariance with Bessel-Convolutional Neural Networks | OpenReview).

Computer vision, being closely tied to image physics and signal processing, sees Bessel functions whenever a problem has circular symmetry or requires translation of radial patterns into useful signals. Integrating these into ML models can yield more stable, interpretable, and invariant visual recognition systems.

6. Conclusion

Bessel functions, while originating in mathematical physics, have carved out a multifaceted role in machine learning. They appear in specialized neural networks to enforce symmetries and enrich activation dynamics, in kernel methods to define flexible similarity measures, and in the theoretical analysis of algorithms to explain convergence behaviors. In optimization, they both illuminate the behavior of methods like momentum-based optimizers and present challenges (e.g. gradient computation in vMF losses) that can be overcome with analytical insight. In regularization and loss design, Bessel functions underpin distributions and priors that keep models well-behaved, particularly for directional data. For feature engineering, Bessel-based transforms offer powerful ways to capture patterns in data that are rotationally or radially symmetric, leading to more invariant and compact representations. Finally, in computer vision, Bessel functions are ingrained in core techniques for filtering, frequency analysis, and modeling optical effects – bridging the gap between image physics and algorithmic processing.

In essence, Bessel functions contribute to convergence (through well-posed transforms and analytic forms that allow understanding of training dynamics), stability (through inherent normalization and symmetry properties), and performance improvements (by enabling invariances and richer function classes) across various corners of machine learning. Ongoing research, as evidenced by recent papers, continues to find new ways to leverage these classical functions for modern ML challenges – from achieving rotation invariance in deep networks (Achieving Rotational Invariance with Bessel-Convolutional Neural Networks | OpenReview) to crafting new activation functions and kernels ((PDF) A New Class of Bessel Kernel Functions for Support Vector Machine) ((PDF) Quaternionic Convolutional Neural Networks with Trainable Bessel Activation Functions). As machine learning models grow more complex and incorporate more domain knowledge, we can expect special functions like Bessel’s to remain valuable tools in the theorist’s and practitioner’s toolkit.