CMS 3D CMS Logo

List of all members | Public Member Functions | Private Attributes
MiniFloatConverter::ReduceMantissaToNbitsRounding Class Reference

#include <libminifloat.h>

Public Member Functions

float operator() (float f) const
 
 ReduceMantissaToNbitsRounding (int bits)
 

Private Attributes

const uint32_t mask
 
const uint32_t maxn
 
const int shift
 
const uint32_t test
 

Detailed Description

Definition at line 52 of file libminifloat.h.

Constructor & Destructor Documentation

◆ ReduceMantissaToNbitsRounding()

MiniFloatConverter::ReduceMantissaToNbitsRounding::ReduceMantissaToNbitsRounding ( int  bits)
inline

Definition at line 54 of file libminifloat.h.

References cms::cuda::assert().

55  : shift(23 - bits), mask((0xFFFFFFFF >> (shift)) << (shift)), test(1 << (shift - 1)), maxn((1 << bits) - 2) {
56  assert(bits <= 23); // "max mantissa size is 23 bits"
57  }
assert(be >=bs)

Member Function Documentation

◆ operator()()

float MiniFloatConverter::ReduceMantissaToNbitsRounding::operator() ( float  f) const
inline

Definition at line 58 of file libminifloat.h.

References edm::bit_cast(), f, mask, maxn, and shift.

58  {
59  constexpr uint32_t low23 = (0x007FFFFF); // mask to keep lowest 23 bits = mantissa
60  constexpr uint32_t hi9 = (0xFF800000); // mask to keep highest 9 bits = the rest
61  uint32_t i32 = edm::bit_cast<uint32_t>(f);
62  if (i32 & test) { // need to round
63  uint32_t mantissa = (i32 & low23) >> shift;
64  if (mantissa < maxn)
65  mantissa++;
66  i32 = (i32 & hi9) | (mantissa << shift);
67  } else {
68  i32 &= mask;
69  }
70  return edm::bit_cast<float>(i32);
71  }
To bit_cast(const From &src) noexcept
Definition: bit_cast.h:29
double f[11][100]

Member Data Documentation

◆ mask

const uint32_t MiniFloatConverter::ReduceMantissaToNbitsRounding::mask
private

Definition at line 75 of file libminifloat.h.

Referenced by operator()().

◆ maxn

const uint32_t MiniFloatConverter::ReduceMantissaToNbitsRounding::maxn
private

Definition at line 75 of file libminifloat.h.

Referenced by operator()().

◆ shift

const int MiniFloatConverter::ReduceMantissaToNbitsRounding::shift
private

Definition at line 74 of file libminifloat.h.

Referenced by operator()().

◆ test

const uint32_t MiniFloatConverter::ReduceMantissaToNbitsRounding::test
private