CMS 3D CMS Logo

List of all members | Public Member Functions | Private Attributes
MiniFloatConverter::ReduceMantissaToNbitsRounding Class Reference

#include <libminifloat.h>

Public Member Functions

float operator() (float f) const
 
 ReduceMantissaToNbitsRounding (int bits)
 

Private Attributes

const uint32_t mask
 
const uint32_t maxn
 
const int shift
 
const uint32_t test
 

Detailed Description

Definition at line 52 of file libminifloat.h.

Constructor & Destructor Documentation

◆ ReduceMantissaToNbitsRounding()

MiniFloatConverter::ReduceMantissaToNbitsRounding::ReduceMantissaToNbitsRounding ( int  bits)
inline

Member Function Documentation

◆ operator()()

float MiniFloatConverter::ReduceMantissaToNbitsRounding::operator() ( float  f) const
inline

Definition at line 62 of file libminifloat.h.

References ALPAKA_ACCELERATOR_NAMESPACE::brokenline::constexpr(), f, mask, maxn, and shift.

62  {
63  constexpr uint32_t low23 = (0x007FFFFF); // mask to keep lowest 23 bits = mantissa
64  constexpr uint32_t hi9 = (0xFF800000); // mask to keep highest 9 bits = the rest
65  uint32_t i32 = edm::bit_cast<uint32_t>(f);
66  if (i32 & test) { // need to round
67  uint32_t mantissa = (i32 & low23) >> shift;
68  if (mantissa < maxn)
69  mantissa++;
70  i32 = (i32 & hi9) | (mantissa << shift);
71  } else {
72  i32 &= mask;
73  }
74  return edm::bit_cast<float>(i32);
75  }
double f[11][100]

Member Data Documentation

◆ mask

const uint32_t MiniFloatConverter::ReduceMantissaToNbitsRounding::mask
private

Definition at line 79 of file libminifloat.h.

Referenced by operator()().

◆ maxn

const uint32_t MiniFloatConverter::ReduceMantissaToNbitsRounding::maxn
private

Definition at line 79 of file libminifloat.h.

Referenced by operator()().

◆ shift

const int MiniFloatConverter::ReduceMantissaToNbitsRounding::shift
private

Definition at line 78 of file libminifloat.h.

Referenced by operator()().

◆ test

const uint32_t MiniFloatConverter::ReduceMantissaToNbitsRounding::test
private