Skip to content

GSoC 2025 ‐ Gururaj Gurram

Gururaj Gurram edited this page Aug 28, 2025 · 6 revisions

About me

Hey there! I am Gururaj Gurram, a pre-final year Computer Science undergraduate at Walchand Institute of Technology, Solapur, India. I am deeply passionate about programming, mathematics, and software development. Beyond this, I have a strong interest in machine learning, large language models, and their real-world applications, which have always fascinated me. I also have a keen curiosity to explore emerging domains within computer science and continuously expand my knowledge.

Project Overview

Statistical functions serve as foundational building blocks for data science, machine learning, scientific simulations, financial modeling, and signal processing. Given that stdlib is a comprehensive mathematical and statistical library, my project was focused on enhancing the overall statistical routines revolving around the stats namespace.

Objectives

The primary goals of the project were:

  • Introduce convenience array wrappers for all 1D strided APIs in the stats/strided/* namespace, simplifying access to statistical functions for common use cases.
  • Add one-dimensional ndarray packages in stats/base/ndarray/*.
  • Develop high-level ndarray reduction APIs in stats/*.
  • Add C ndarray interfaces and accessor array support, and refactor all single- and double-precision packages, migrating them from the stats/base/* namespace to stats/strided/*.
  • Migrate the refactored packages from stats/base/* to stats/strided/*.
  • Implement binary reduction APIs in ndarray/base/* to support binary statistical kernels.

Approach

My approach involved several stages:

  1. Refactoring: Updated all packages in stats/base/* that lacked C ndarray interfaces and ensured that general dtype packages included accessor array support.
  2. Migration: Transferred the refactored packages to the stats/strided/* namespace.
  3. Enhancements: Introduced convenience array wrappers and one-dimensional ndarray packages, enabling the development of high-level ndarray reduction APIs in stats/*.
  4. Parallel Work: Concurrently developed binary reduction APIs in ndarray/base/*, which serve as core utilities for binary statistical kernels.

Through this step-by-step process, I was able to improvise the statistical routines and make them easier to use, while also contributing to the larger goal of making stdlib a stronger and more practical library for numerical and statistical computing.

Project recap

During the community bonding period, I began coding and exploring the prerequisite stats packages. Understanding the codebase was not particularly difficult, as I was already familiar with it through prior contributions. I started my project with:

Refactoring and C Ndarray Implementation

Many single- and double-precision packages in the stats/base/* namespace (later migrated to stats/strided/*) were missing C ndarray implementations and required basic refactoring to align with the latest stdlib conventions. I initiated my work in this area and successfully added complete C ndarray implementations across all relevant packages.

The C ndarray implementation introduces an additional offset parameter, allowing users to specify the starting index for operations while maintaining stride control. This provides greater flexibility and usability for real-world applications.

Refactoring and Accessor Array Support

Concurrently, I worked on adding accessor array support for all generic packages that lacked it. Accessor arrays reduce friction for both developers and consumers by enabling clean, consistent, and efficient data manipulation without exposing the internal representation of arrays.

  • Example: Accessor Array Support

    var toAccessorArray = require( '@stdlib/array/base/to-accessor-array' );
    var arraylike2object = require( '@stdlib/array/base/arraylike2object' );
    var max = require( '@stdlib/stats/strided/max' );
    
    var x = toAccessorArray( [ 2.0, 1.0, 2.0, -2.0, -2.0, 2.0, 3.0, 4.0 ] );
    
    var v = max( 4, arraylike2object( x ), 2, 1 );
    // returns 4.0

Here, accessor arrays allow for uniform handling of different array-like objects by abstracting element access through getter/setter functions. This ensures consistency and improves maintainability across packages, while also providing developers with a safer, higher-level interface for array operations.

Migration of Packages

Once accessor array support and C ndarray implementations were in place, the next task was to migrate the packages from the stats/base/* namespace to stats/strided/*. This migration brought the packages in line with the latest organizational conventions of stdlib. The migration process followed the guidelines documented in the official contributing guide. By systematically moving these packages, I ensured consistency across the codebase, improved discoverability for developers, and maintained alignment with the evolving structure of the library.

1D Array Convenience Wrappers

Following the migration, I worked on adding 1D array convenience wrappers in the stats/array/* namespace for all the strided packages in stats/strided/*. These wrappers simplify usage by allowing developers to directly pass a standard array without needing to manage strides and offsets explicitly. This significantly improves ease of use while still leveraging the performance and robustness of the underlying strided implementations.

  • Example

    var max = require( '@stdlib/stats/array/max' );
    
    var x = [ 1.0, -2.0, 2.0 ];
    
    var v = max( x );
    // returns 2.0

This approach ensures that developers can access statistical functions with minimal boilerplate, making stdlib more approachable for everyday use cases while still maintaining full support for advanced, lower-level strided operations.

1-Dimensional NdArray Packages

The next step involved introducing 1D ndarray implementations within stats/base/ndarray/*. These functions were not just standalone utilities but served as the foundation for higher-level reduction APIs in stats/*. By leveraging low-level utilities such as numelDimension, getStride, getOffset, and getData, they ensured that statistical operations could be seamlessly extended to work over one or more dimensions of an ndarray. In this way, the 1D ndarray packages acted as the building blocks upon which flexible, multi-dimensional reduction routines were constructed. This design avoided code duplication, promoted modularity, and allowed stdlib to support advanced reduction operations while maintaining high performance and consistency across its statistical ecosystem.

  • Example

    var ndarray = require( '@stdlib/ndarray/base/ctor' );
    var max = require( '@stdlib/stats/base/ndarray/max' );
    
    var xbuf = [ 1.0, 3.0, 4.0, 2.0 ];
    var x = new ndarray( 'generic', xbuf, [ 4 ], [ 1 ], 0, 'row-major' );
    
    var v = max( [ x ] );
    // returns 4.0

Development of Higher-Level Reduction APIs

Building on the 1D ndarray implementations, the next step was designing higher-level reduction APIs within stats/*. These serve as user-facing entry points for reductions (e.g., max, min, mean, variance) on ndarray objects, with support for dimension selection and flexible output handling. This work bridged the gap between low-level kernels and real-world applications, delivering both performance and usability.

The key goal was to create a clean abstraction layer over stats/base/ndarray/* routines, extending them to handle multi-dimensional reductions. The design ensured consistency with stdlib’s API principles—predictable signatures, strong error handling, and support for diverse dtype policies—while leaving room for future extensions like higher moments and percentiles.

Technical Approach

  1. Dispatch Mechanism: Usage of unary-reduce-strided1d-dispatch-factory to dynamically select the appropriate kernel implementation (e.g., dmax, smax, or generic fallback) based on input dtype.
  2. Options Handling: Added optional parameters (dims, keepdims, dtype) to give users fine-grained control:
    • dims: specify dimensions for reduction
    • keepdims: retain reduced dimensions as singletons for broadcasting compatibility
    • dtype: enforce output dtype for consistency
  3. Accessor & Strided Arrays: Ensured reductions work seamlessly with accessor arrays and strided layouts, keeping the APIs versatile.
  • Example

    var ndarray2array = require( '@stdlib/ndarray/to-array' );
    var array = require( '@stdlib/ndarray/array' );
    var max = require( '@stdlib/stats/max' );
    
    var x = array( [ -1.0, 2.0, -3.0, 4.0 ], {
        'shape': [ 2, 2 ],
        'order': 'row-major'
    });
    
    var v = ndarray2array( x );
    // returns [ [ -1.0, 2.0 ], [ -3.0, 4.0 ] ]
    
    var y = max( x, {
        'dims': [ 0 ],
        'keepdims': true
    });
    // returns <ndarray>
    
    v = ndarray2array( y );
    // returns [ [ -1.0, 4.0 ] ]
    
    y = max( x, {
        'dims': [ 1 ],
        'keepdims': true
    });
    // returns <ndarray>
    
    v = ndarray2array( y );
    // returns [ [ 2.0 ], [ 4.0 ] ]
    
    y = max( x, {
        'dims': [ 0, 1 ],
        'keepdims': true
    });
    // returns <ndarray>
    
    v = ndarray2array( y );
    // returns [ [ 4.0 ] ]

Overall, the higher-level reduction APIs transformed low-level ndarray routines into powerful, user-friendly tools, reinforcing stdlib’s role as a robust, high-performance numerical computing library.

Implementation of ztest2

Another significant addition during this project was the implementation of the two-sample Z-test (ztest2). This statistical routine enables hypothesis testing for comparing the means of two independent samples (X and Y) under the assumption that their population standard deviations are known. It supports the standard three forms of hypotheses:

  • H0: μX - μY ≥ Δ versus H1: μX - μY < Δ
  • H0: μX - μY ≤ Δ versus H1: μX - μY > Δ
  • H0: μX - μY = Δ versus H1: μX - μY ≠ Δ

Here, μX and μY represent the population means, and Δ is the hypothesized difference in means (commonly 0).

  • Example

    var Results = require( '@stdlib/stats/base/ztest/two-sample/results/float64' );
    var ztest2 = require( '@stdlib/stats/strided/ztest2' );
    
    var x = [ 4.0, 4.0, 6.0, 6.0, 5.0 ];
    var y = [ 3.0, 3.0, 5.0, 7.0, 7.0 ];
    
    var results = new Results();
    var out = ztest2( x.length, y.length, 'two-sided', 0.05, 0.0, 1.0, x, 1, 2.0, y, 1, results );
    // returns {...}

By implementing ztest2, stdlib now offers a reliable tool for comparing two samples under known population variances. This expands its hypothesis testing suite and reinforces its role as a robust, high-performance numerical computing library.

Lower-Level ndarray Reduction Kernels

To enable efficient reductions across specified dimensions of two input ndarrays, I worked on extending the support for binary reduction kernels. While unary reduction functionality already existed, this work introduced the ability to perform reductions via one-dimensional strided array binary reduction functions. This included the development of the following core components:

1. @stdlib/ndarray/base/binary-reduce-strided1d

This kernel enables binary reduction operations over strided 1D slices of higher-dimensional arrays. It generalizes functionality to support reduction along specific axes of two input ndarrays, producing a reduced output.

  • Example

    var Float64Array = require( '@stdlib/array/float64' );
    var ndarray2array = require( '@stdlib/ndarray/base/to-array' );
    var gdot = require( '@stdlib/blas/base/ndarray/gdot' );
    
    // Create data buffers:
    var xbuf = new Float64Array( [ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0 ] );
    var ybuf = new Float64Array( [ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0 ] );
    var zbuf = new Float64Array( [ 0.0, 0.0, 0.0 ] );
    
    // Shapes:
    var xsh = [ 1, 3, 2, 2 ];
    var ysh = [ 1, 3, 2, 2 ];
    var zsh = [ 1, 3 ];
    
    // Strides:
    var sx = [ 12, 4, 2, 1 ];
    var sy = [ 12, 4, 2, 1 ];
    var sz = [ 3, 1 ];
    
    // Create input ndarray-like objects:
    var x = { 'dtype': 'float64', 'data': xbuf, 'shape': xsh, 'strides': sx, 'offset': 0, 'order': 'row-major' };
    var y = { 'dtype': 'float64', 'data': ybuf, 'shape': ysh, 'strides': sy, 'offset': 0, 'order': 'row-major' };
    
    // Create an output ndarray-like object:
    var z = { 'dtype': 'float64', 'data': zbuf, 'shape': zsh, 'strides': sz, 'offset': 0, 'order': 'row-major' };
    
    // Perform a reduction:
    binaryReduceStrided1d( gdot, [ x, y, z ], [ 2, 3 ] );
    
    var arr = ndarray2array( z.data, z.shape, z.strides, z.offset, z.order );
    // returns [ [ 30.0, 174.0, 446.0 ] ]

2. binary-reduce-strided1d-dispatch-factory

A factory function for creating binary reduction operations that are tailored to specific input/output ndarray dtypes. It generates optimized functions that dispatch to the appropriate underlying kernel.

3. binary-reduce-strided1d-dispatch

A constructor utility that provides a structured interface for performing binary reductions. It leverages the factory to ensure the correct routine is called depending on input/output dtypes.

4. binary-input-casting-dtype

A utility for resolving the correct input casting data type for a binary operation, ensuring type promotion rules are consistently applied. This is critical for maintaining precision and correctness in mixed-type computations.

  • Example

    var binaryInputCastingDataType = require( '@stdlib/ndarray/base/binary-input-casting-dtype' );
    
    var dt = binaryInputCastingDataType( 'int32', 'int32', 'float64', 'promoted' );
    // returns 'float64'
    
    var dt = binaryInputCastingDataType( 'float32', 'float32', 'float64', 'complex128' );
    // returns 'complex128'
    
    dt = binaryInputCastingDataType( 'float16', 'float16', 'float64', 'accumulation' );
    // returns 'float16'

Completed Work

1. Array Convenience Wrappers

  • Status: Completed, except for a few packages.
  • PRs: Array Packages
  • Pending Packages: cumax, cumaxabs, cumin, cuminabs

2. C ndarray Implementation

3. Lower-Level ndarray Reduction Kernels

4. Generic Datatype Packages in stats/strided/*

  • Status: Completed.
  • Notes: Added accessor array support to all generic datatype packages.

Current State

1. Array Convenience Wrappers

  • Pending: cumax, cumaxabs, cumin, cuminabs remain due to unclear behavior.

2. C ndarray Implementation

  • Pending: snanvariance* and snanstdev* blocked by reliance on blas/ext/base/snannsumpw.
  • Issue: Numerical safety concerns since float32 max safe integer is too small for large arrays.

3. Migration

  • Pending: Some packages still need migration from stats/base/* to stats/strided/*.

What Remains

1D ndarray Packages, High-Level ndarray Reduction APIs

  • PRs: ndarray Packages, Reduction APIs
  • Note: The 1D ndarray packages goes hand-in-hand with building high-level ndarray reduction APIs in the stats/* namespace, which simplify multi-dimensional computations. These APIs are key to simplifying multi-dimensional reductions, but are progressing at a slower pace.

Challenges and Lessons Learned

Initially, while working on the C ndarray implementations and adding accessor array support for the generic dtype packages, the tasks were relatively straightforward due to clear objectives and a proper understanding of the work. The only significant challenge arose while working with blas/ext/base/snannsumpw, which is used in snanvariance* and snanstdev*. This remains unresolved and continues to be a blocker.

In contrast, the array convenience wrappers in stats/array/*, the 1D ndarray packages in stats/base/ndarray/*, and the higher-level reduction APIs in stats/* were simple to implement and did not present major difficulties.

Certain statistical packages, such as ztest2, required additional research to fully understand their working and implementation. However, this made the task intellectually engaging and rewarding.

The lower-level ndarray reduction kernels in ndarray/base/* posed more of a challenge. At first, they were difficult to grasp, but as I progressed, the connections between components became clearer. Larger packages like binary-reduce-strided1d, which handled multi-dimensional reductions, were particularly complex and tedious due to their extensive documentation and implementation. Nevertheless, subsequent layers such as binary-reduce-strided1d-dispatch and binary-reduce-strided1d-dispatch-factory were more straightforward, building directly on top of the initial work.

Overall, these challenges taught me the importance of breaking down complex problems, leveraging documentation effectively, and seeking timely guidance when necessary. I also learned that even initially overwhelming tasks can become manageable once the underlying design patterns and abstractions are understood.

Conclusion

This project gave me the opportunity to contribute across multiple layers of the stdlib ecosystem, from C ndarray implementations and convenience wrappers to high-level statistical APIs and low-level reduction kernels. Along the way, I faced both straightforward tasks and challenging blockers, each of which strengthened my understanding of numerical computing and library design. I am deeply grateful to @kgryte and @Planeshifter for providing this opportuninty, constant guidance and support, and to @gunjjoshi for the constructive guidance during our weekly 1:1 sessions. I would also like to thank @aayush0325, @anandkaranubc, @headlessNode, and @ShabiShett07 for their collaborative spirit and engaging discussions, which made the journey both insightful and enjoyable. I will truly miss the weekly standups. Overall, this has been a rewarding experience that has not only expanded my technical skills but also reinforced my motivation to keep contributing to open source.

Clone this wiki locally