Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
P parabix-devel
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 12
    • Issues 12
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cameron
  • parabix-devel
  • Issues
  • #101

Closed
Open
Created Jun 12, 2026 by cameron@cameronMaintainer

Short stride support

Branch short-spread-stride has a version of ElemSpread that uses a short stride equal to the size of a mask needed for one pack of elements. For example, stride of 64 for working with bitblocks of 8-bit elements on AVX-512.

The idea is to be able to make progress whenever enough data for a full mvmd_expand operation is available.

The popcount attribute does not work for this stride length, so I have implemented the rate on the source input stream using BoundedRate(0, 1) and setting processed item count explicitly. It would be good to update popcount rates to support this case.

The following two commands both produce correct results on ARM and AVX-512.

bin/nfd --ByteMerging ../QA/Normalization/NF-source --short-strides -UnalignedLoads  > nfs.nfd2
bin/nfd --ByteMerging ../QA/Normalization/NF-source --short-strides >nfs.nfd3

However, these modes segfault sporadically with larger files. Are there limitations in the pipeline infrastructure that may cause an issue? One thing that I have observed is that, there are sometimes a large number of strides when the ProcessedItemCount on source does not advance, in the event that the mask stream has a large run of zeroes.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking