CMS 3D CMS Logo

List of all members | Classes | Public Types | Public Member Functions | Static Public Member Functions | Public Attributes | Static Public Attributes
notcub::CachingHostAllocator Struct Reference

A simple caching allocator pinned host memory allocations. More...

#include <CachingHostAllocator.h>

Classes

struct  BlockDescriptor
 
class  TotalBytes
 

Public Types

typedef std::multiset< BlockDescriptor, CompareBusyBlocks
 Set type for live blocks (ordered by ptr) More...
 
typedef std::multiset< BlockDescriptor, CompareCachedBlocks
 Set type for cached blocks (ordered by size) More...
 
typedef bool(* Compare) (const BlockDescriptor &, const BlockDescriptor &)
 BlockDescriptor comparator function interface. More...
 

Public Member Functions

 CachingHostAllocator (unsigned int bin_growth, unsigned int min_bin=1, unsigned int max_bin=INVALID_BIN, size_t max_cached_bytes=INVALID_SIZE, bool skip_cleanup=false, bool debug=false)
 Set of live pinned host allocations currently in use. More...
 
 CachingHostAllocator (bool skip_cleanup=false, bool debug=false)
 Default constructor. More...
 
cudaError_t FreeAllCached ()
 Frees all cached pinned host allocations. More...
 
cudaError_t HostAllocate (void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr)
 Provides a suitable allocation of pinned host memory for the given size. More...
 
cudaError_t HostFree (void *d_ptr)
 Frees a live allocation of pinned host memory, returning it to the allocator. More...
 
void NearestPowerOf (unsigned int &power, size_t &rounded_bytes, unsigned int base, size_t value)
 
void SetMaxCachedBytes (size_t max_cached_bytes)
 Sets the limit on the number bytes this allocator is allowed to cache. More...
 
 ~CachingHostAllocator ()
 Destructor. More...
 

Static Public Member Functions

static unsigned int IntPow (unsigned int base, unsigned int exp)
 

Public Attributes

unsigned int bin_growth
 Mutex for thread-safety. More...
 
CachedBlocks cached_blocks
 Aggregate cached bytes. More...
 
TotalBytes cached_bytes
 Whether or not to print (de)allocation events to stdout. More...
 
bool debug
 Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators) More...
 
BusyBlocks live_blocks
 Set of cached pinned host allocations available for reuse. More...
 
unsigned int max_bin
 Minimum bin enumeration. More...
 
size_t max_bin_bytes
 Minimum bin size. More...
 
size_t max_cached_bytes
 Maximum bin size. More...
 
unsigned int min_bin
 Geometric growth factor for bin-sizes. More...
 
size_t min_bin_bytes
 Maximum bin enumeration. More...
 
std::mutex mutex
 
const bool skip_cleanup
 Maximum aggregate cached bytes. More...
 

Static Public Attributes

static const unsigned int INVALID_BIN = (unsigned int)-1
 Out-of-bounds bin. More...
 
static const int INVALID_DEVICE_ORDINAL = -1
 Invalid device ordinal. More...
 
static const size_t INVALID_SIZE = (size_t)-1
 Invalid size. More...
 

Detailed Description

A simple caching allocator pinned host memory allocations.

Overview
The allocator is thread-safe. It behaves as follows:

I presume the CUDA stream-safeness is not useful as to read/write from/to the pinned host memory one needs to synchronize anyway. The difference wrt. device memory is that in the CPU all operations to the device memory are scheduled via the CUDA stream, while for the host memory one can perform operations directly.

  • Allocations are categorized and cached by bin size. A new allocation request of a given size will only consider cached allocations within the corresponding bin.
  • Bin limits progress geometrically in accordance with the growth factor bin_growth provided during construction. Unused host allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.
  • Allocation requests below (bin_growth ^ min_bin) are rounded up to (bin_growth ^ min_bin).
  • Allocations above (bin_growth ^ max_bin) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.
  • If the total storage of cached allocations will exceed max_cached_bytes, allocations are simply freed when they are deallocated instead of being returned to their bin-cache.
For example, the default-constructed CachingHostAllocator is configured with:
  • bin_growth = 8
  • min_bin = 3
  • max_bin = 7
  • max_cached_bytes = 6MB - 1B
which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes

Definition at line 100 of file CachingHostAllocator.h.

Member Typedef Documentation

◆ BusyBlocks

Set type for live blocks (ordered by ptr)

Definition at line 170 of file CachingHostAllocator.h.

◆ CachedBlocks

Set type for cached blocks (ordered by size)

Definition at line 167 of file CachingHostAllocator.h.

◆ Compare

typedef bool(* notcub::CachingHostAllocator::Compare) (const BlockDescriptor &, const BlockDescriptor &)

BlockDescriptor comparator function interface.

Definition at line 157 of file CachingHostAllocator.h.

Constructor & Destructor Documentation

◆ CachingHostAllocator() [1/2]

notcub::CachingHostAllocator::CachingHostAllocator ( unsigned int  bin_growth,
unsigned int  min_bin = 1,
unsigned int  max_bin = INVALID_BIN,
size_t  max_cached_bytes = INVALID_SIZE,
bool  skip_cleanup = false,
bool  debug = false 
)
inline

Set of live pinned host allocations currently in use.

Constructor.

Parameters
bin_growthGeometric growth factor for bin-sizes
min_binMinimum bin (default is bin_growth ^ 1)
max_binMaximum bin (default is no max bin)
max_cached_bytesMaximum aggregate cached bytes (default is no limit)
skip_cleanupWhether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate)
debugWhether or not to print (de)allocation events to stdout (default is no stderr output)

Definition at line 242 of file CachingHostAllocator.h.

251  min_bin(min_bin),
252  max_bin(max_bin),
257  debug(debug),
static bool SizeCompare(const BlockDescriptor &a, const BlockDescriptor &b)
static unsigned int IntPow(unsigned int base, unsigned int exp)
static bool PtrCompare(const BlockDescriptor &a, const BlockDescriptor &b)
CachedBlocks cached_blocks
Aggregate cached bytes.
size_t min_bin_bytes
Maximum bin enumeration.
unsigned int max_bin
Minimum bin enumeration.
size_t max_cached_bytes
Maximum bin size.
size_t max_bin_bytes
Minimum bin size.
unsigned int min_bin
Geometric growth factor for bin-sizes.
unsigned int bin_growth
Mutex for thread-safety.
bool debug
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may hav...
BusyBlocks live_blocks
Set of cached pinned host allocations available for reuse.
const bool skip_cleanup
Maximum aggregate cached bytes.

◆ CachingHostAllocator() [2/2]

notcub::CachingHostAllocator::CachingHostAllocator ( bool  skip_cleanup = false,
bool  debug = false 
)
inline

Default constructor.

Configured with:

  • bin_growth = 8
  • min_bin = 3
  • max_bin = 7
  • max_cached_bytes = (bin_growth ^ max_bin) * 3) - 1 = 6,291,455 bytes

which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes

Definition at line 274 of file CachingHostAllocator.h.

275  : bin_growth(8),
276  min_bin(3),
277  max_bin(7),
280  max_cached_bytes((max_bin_bytes * 3) - 1),
282  debug(debug),
static bool SizeCompare(const BlockDescriptor &a, const BlockDescriptor &b)
static unsigned int IntPow(unsigned int base, unsigned int exp)
static bool PtrCompare(const BlockDescriptor &a, const BlockDescriptor &b)
CachedBlocks cached_blocks
Aggregate cached bytes.
size_t min_bin_bytes
Maximum bin enumeration.
unsigned int max_bin
Minimum bin enumeration.
size_t max_cached_bytes
Maximum bin size.
size_t max_bin_bytes
Minimum bin size.
unsigned int min_bin
Geometric growth factor for bin-sizes.
unsigned int bin_growth
Mutex for thread-safety.
bool debug
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may hav...
BusyBlocks live_blocks
Set of cached pinned host allocations available for reuse.
const bool skip_cleanup
Maximum aggregate cached bytes.

◆ ~CachingHostAllocator()

notcub::CachingHostAllocator::~CachingHostAllocator ( )
inline

Destructor.

Definition at line 638 of file CachingHostAllocator.h.

References FreeAllCached(), and skip_cleanup.

638  {
639  if (!skip_cleanup)
640  FreeAllCached();
641  }
cudaError_t FreeAllCached()
Frees all cached pinned host allocations.
const bool skip_cleanup
Maximum aggregate cached bytes.

Member Function Documentation

◆ FreeAllCached()

cudaError_t notcub::CachingHostAllocator::FreeAllCached ( )
inline

Frees all cached pinned host allocations.

Definition at line 579 of file CachingHostAllocator.h.

References cached_blocks, cached_bytes, cudaCheck, debug, relativeConstraints::error, notcub::CachingHostAllocator::TotalBytes::free, INVALID_DEVICE_ORDINAL, notcub::CachingHostAllocator::TotalBytes::live, live_blocks, and mutex.

Referenced by cms::cuda::allocator::cachingAllocatorsFreeCached(), and ~CachingHostAllocator().

579  {
580  cudaError_t error = cudaSuccess;
581  int entrypoint_device = INVALID_DEVICE_ORDINAL;
582  int current_device = INVALID_DEVICE_ORDINAL;
583 
584  std::unique_lock<std::mutex> mutex_locker(mutex);
585 
586  while (!cached_blocks.empty()) {
587  // Get first block
588  CachedBlocks::iterator begin = cached_blocks.begin();
589 
590  // Get entry-point device ordinal if necessary
591  if (entrypoint_device == INVALID_DEVICE_ORDINAL) {
592  if ((error = cudaGetDevice(&entrypoint_device)))
593  break;
594  }
595 
596  // Set current device ordinal if necessary
597  if (begin->device != current_device) {
598  if ((error = cudaSetDevice(begin->device)))
599  break;
600  current_device = begin->device;
601  }
602 
603  // Free host memory
604  if ((error = cudaFreeHost(begin->d_ptr)))
605  break;
606  if ((error = cudaEventDestroy(begin->ready_event)))
607  break;
608 
609  // Reduce balance and erase entry
610  cached_bytes.free -= begin->bytes;
611 
612  if (debug)
613  printf(
614  "\tHost freed %lld bytes.\n\t\t %lld available blocks cached (%lld bytes), %lld live blocks (%lld "
615  "bytes) outstanding.\n",
616  (long long)begin->bytes,
617  (long long)cached_blocks.size(),
618  (long long)cached_bytes.free,
619  (long long)live_blocks.size(),
620  (long long)cached_bytes.live);
621 
622  cached_blocks.erase(begin);
623  }
624 
625  mutex_locker.unlock();
626 
627  // Attempt to revert back to entry-point device if necessary
628  if (entrypoint_device != INVALID_DEVICE_ORDINAL) {
629  cudaCheck(error = cudaSetDevice(entrypoint_device));
630  }
631 
632  return error;
633  }
CachedBlocks cached_blocks
Aggregate cached bytes.
TotalBytes cached_bytes
Whether or not to print (de)allocation events to stdout.
static const int INVALID_DEVICE_ORDINAL
Invalid device ordinal.
bool debug
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may hav...
BusyBlocks live_blocks
Set of cached pinned host allocations available for reuse.
#define cudaCheck(ARG,...)
Definition: cudaCheck.h:69

◆ HostAllocate()

cudaError_t notcub::CachingHostAllocator::HostAllocate ( void **  d_ptr,
size_t  bytes,
cudaStream_t  active_stream = nullptr 
)
inline

Provides a suitable allocation of pinned host memory for the given size.

Once freed, the allocation becomes available immediately for reuse.

Parameters
[out]d_ptrReference to pointer to the allocation
[in]bytesMinimum number of bytes for the allocation
[in]active_streamThe stream to be associated with this allocation

Definition at line 312 of file CachingHostAllocator.h.

References notcub::CachingHostAllocator::BlockDescriptor::associated_stream, notcub::CachingHostAllocator::BlockDescriptor::bin, bin_growth, notcub::CachingHostAllocator::BlockDescriptor::bytes, cached_blocks, cached_bytes, cudaCheck, notcub::CachingHostAllocator::BlockDescriptor::d_ptr, debug, notcub::CachingHostAllocator::BlockDescriptor::device, relativeConstraints::error, newFWLiteAna::found, notcub::CachingHostAllocator::TotalBytes::free, INVALID_BIN, INVALID_DEVICE_ORDINAL, notcub::CachingHostAllocator::TotalBytes::live, live_blocks, max_bin, min_bin, min_bin_bytes, mutex, NearestPowerOf(), and notcub::CachingHostAllocator::BlockDescriptor::ready_event.

316  {
317  std::unique_lock<std::mutex> mutex_locker(mutex, std::defer_lock);
318  *d_ptr = nullptr;
319  int device = INVALID_DEVICE_ORDINAL;
320  cudaError_t error = cudaSuccess;
321 
322  cudaCheck(error = cudaGetDevice(&device));
323 
324  // Create a block descriptor for the requested allocation
325  bool found = false;
326  BlockDescriptor search_key;
327  search_key.device = device;
328  search_key.associated_stream = active_stream;
329  NearestPowerOf(search_key.bin, search_key.bytes, bin_growth, bytes);
330 
331  if (search_key.bin > max_bin) {
332  // Bin is greater than our maximum bin: allocate the request
333  // exactly and give out-of-bounds bin. It will not be cached
334  // for reuse when returned.
335  search_key.bin = INVALID_BIN;
336  search_key.bytes = bytes;
337  } else {
338  // Search for a suitable cached allocation: lock
339  mutex_locker.lock();
340 
341  if (search_key.bin < min_bin) {
342  // Bin is less than minimum bin: round up
343  search_key.bin = min_bin;
344  search_key.bytes = min_bin_bytes;
345  }
346 
347  // Iterate through the range of cached blocks in the same bin
348  CachedBlocks::iterator block_itr = cached_blocks.lower_bound(search_key);
349  while ((block_itr != cached_blocks.end()) && (block_itr->bin == search_key.bin)) {
350  // To prevent races with reusing blocks returned by the host but still
351  // in use for transfers, only consider cached blocks that are from an idle stream
352  if (cudaEventQuery(block_itr->ready_event) != cudaErrorNotReady) {
353  // Reuse existing cache block. Insert into live blocks.
354  found = true;
355  search_key = *block_itr;
356  search_key.associated_stream = active_stream;
357  if (search_key.device != device) {
358  // If "associated" device changes, need to re-create the event on the right device
359  cudaCheck(error = cudaSetDevice(search_key.device));
360  cudaCheck(error = cudaEventDestroy(search_key.ready_event));
361  cudaCheck(error = cudaSetDevice(device));
362  cudaCheck(error = cudaEventCreateWithFlags(&search_key.ready_event, cudaEventDisableTiming));
363  search_key.device = device;
364  }
365 
366  live_blocks.insert(search_key);
367 
368  // Remove from free blocks
369  cached_bytes.free -= search_key.bytes;
370  cached_bytes.live += search_key.bytes;
371 
372  if (debug)
373  printf(
374  "\tHost reused cached block at %p (%lld bytes) for stream %lld, event %lld on device %lld "
375  "(previously associated with stream %lld, event %lld).\n",
376  search_key.d_ptr,
377  (long long)search_key.bytes,
378  (long long)search_key.associated_stream,
379  (long long)search_key.ready_event,
380  (long long)search_key.device,
381  (long long)block_itr->associated_stream,
382  (long long)block_itr->ready_event);
383 
384  cached_blocks.erase(block_itr);
385 
386  break;
387  }
388  block_itr++;
389  }
390 
391  // Done searching: unlock
392  mutex_locker.unlock();
393  }
394 
395  // Allocate the block if necessary
396  if (!found) {
397  // Attempt to allocate
398  // TODO: eventually support allocation flags
399  if ((error = cudaHostAlloc(&search_key.d_ptr, search_key.bytes, cudaHostAllocDefault)) ==
400  cudaErrorMemoryAllocation) {
401  // The allocation attempt failed: free all cached blocks on device and retry
402  if (debug)
403  printf(
404  "\tHost failed to allocate %lld bytes for stream %lld on device %lld, retrying after freeing cached "
405  "allocations",
406  (long long)search_key.bytes,
407  (long long)search_key.associated_stream,
408  (long long)search_key.device);
409 
410  error = cudaSuccess; // Reset the error we will return
411  cudaGetLastError(); // Reset CUDART's error
412 
413  // Lock
414  mutex_locker.lock();
415 
416  // Iterate the range of free blocks
417  CachedBlocks::iterator block_itr = cached_blocks.begin();
418 
419  while ((block_itr != cached_blocks.end())) {
420  // No need to worry about synchronization with the device: cudaFree is
421  // blocking and will synchronize across all kernels executing
422  // on the current device
423 
424  // Free pinned host memory.
425  if ((error = cudaFreeHost(block_itr->d_ptr)))
426  break;
427  if ((error = cudaEventDestroy(block_itr->ready_event)))
428  break;
429 
430  // Reduce balance and erase entry
431  cached_bytes.free -= block_itr->bytes;
432 
433  if (debug)
434  printf(
435  "\tHost freed %lld bytes.\n\t\t %lld available blocks cached (%lld bytes), %lld live blocks (%lld "
436  "bytes) outstanding.\n",
437  (long long)block_itr->bytes,
438  (long long)cached_blocks.size(),
439  (long long)cached_bytes.free,
440  (long long)live_blocks.size(),
441  (long long)cached_bytes.live);
442 
443  cached_blocks.erase(block_itr);
444 
445  block_itr++;
446  }
447 
448  // Unlock
449  mutex_locker.unlock();
450 
451  // Return under error
452  if (error)
453  return error;
454 
455  // Try to allocate again
456  cudaCheck(error = cudaHostAlloc(&search_key.d_ptr, search_key.bytes, cudaHostAllocDefault));
457  }
458 
459  // Create ready event
460  cudaCheck(error = cudaEventCreateWithFlags(&search_key.ready_event, cudaEventDisableTiming));
461 
462  // Insert into live blocks
463  mutex_locker.lock();
464  live_blocks.insert(search_key);
465  cached_bytes.live += search_key.bytes;
466  mutex_locker.unlock();
467 
468  if (debug)
469  printf(
470  "\tHost allocated new host block at %p (%lld bytes associated with stream %lld, event %lld on device "
471  "%lld).\n",
472  search_key.d_ptr,
473  (long long)search_key.bytes,
474  (long long)search_key.associated_stream,
475  (long long)search_key.ready_event,
476  (long long)search_key.device);
477  }
478 
479  // Copy host pointer to output parameter
480  *d_ptr = search_key.d_ptr;
481 
482  if (debug)
483  printf("\t\t%lld available blocks cached (%lld bytes), %lld live blocks outstanding(%lld bytes).\n",
484  (long long)cached_blocks.size(),
485  (long long)cached_bytes.free,
486  (long long)live_blocks.size(),
487  (long long)cached_bytes.live);
488 
489  return error;
490  }
void NearestPowerOf(unsigned int &power, size_t &rounded_bytes, unsigned int base, size_t value)
CachedBlocks cached_blocks
Aggregate cached bytes.
size_t min_bin_bytes
Maximum bin enumeration.
unsigned int max_bin
Minimum bin enumeration.
TotalBytes cached_bytes
Whether or not to print (de)allocation events to stdout.
static const unsigned int INVALID_BIN
Out-of-bounds bin.
unsigned int min_bin
Geometric growth factor for bin-sizes.
unsigned int bin_growth
Mutex for thread-safety.
static const int INVALID_DEVICE_ORDINAL
Invalid device ordinal.
bool debug
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may hav...
BusyBlocks live_blocks
Set of cached pinned host allocations available for reuse.
#define cudaCheck(ARG,...)
Definition: cudaCheck.h:69

◆ HostFree()

cudaError_t notcub::CachingHostAllocator::HostFree ( void *  d_ptr)
inline

Frees a live allocation of pinned host memory, returning it to the allocator.

Once freed, the allocation becomes available immediately for reuse.

Definition at line 497 of file CachingHostAllocator.h.

References notcub::CachingHostAllocator::BlockDescriptor::associated_stream, notcub::CachingHostAllocator::BlockDescriptor::bin, notcub::CachingHostAllocator::BlockDescriptor::bytes, cached_blocks, cached_bytes, cudaCheck, debug, notcub::CachingHostAllocator::BlockDescriptor::device, relativeConstraints::error, notcub::CachingHostAllocator::TotalBytes::free, INVALID_BIN, INVALID_DEVICE_ORDINAL, notcub::CachingHostAllocator::TotalBytes::live, live_blocks, max_cached_bytes, mutex, and notcub::CachingHostAllocator::BlockDescriptor::ready_event.

497  {
498  int entrypoint_device = INVALID_DEVICE_ORDINAL;
499  cudaError_t error = cudaSuccess;
500 
501  // Lock
502  std::unique_lock<std::mutex> mutex_locker(mutex);
503 
504  // Find corresponding block descriptor
505  bool recached = false;
506  BlockDescriptor search_key(d_ptr);
507  BusyBlocks::iterator block_itr = live_blocks.find(search_key);
508  if (block_itr != live_blocks.end()) {
509  // Remove from live blocks
510  search_key = *block_itr;
511  live_blocks.erase(block_itr);
512  cached_bytes.live -= search_key.bytes;
513 
514  // Keep the returned allocation if bin is valid and we won't exceed the max cached threshold
515  if ((search_key.bin != INVALID_BIN) && (cached_bytes.free + search_key.bytes <= max_cached_bytes)) {
516  // Insert returned allocation into free blocks
517  recached = true;
518  cached_blocks.insert(search_key);
519  cached_bytes.free += search_key.bytes;
520 
521  if (debug)
522  printf(
523  "\tHost returned %lld bytes from associated stream %lld, event %lld on device %lld.\n\t\t %lld "
524  "available blocks cached (%lld bytes), %lld live blocks outstanding. (%lld bytes)\n",
525  (long long)search_key.bytes,
526  (long long)search_key.associated_stream,
527  (long long)search_key.ready_event,
528  (long long)search_key.device,
529  (long long)cached_blocks.size(),
530  (long long)cached_bytes.free,
531  (long long)live_blocks.size(),
532  (long long)cached_bytes.live);
533  }
534  }
535 
536  cudaCheck(error = cudaGetDevice(&entrypoint_device));
537  if (entrypoint_device != search_key.device) {
538  cudaCheck(error = cudaSetDevice(search_key.device));
539  }
540 
541  if (recached) {
542  // Insert the ready event in the associated stream (must have current device set properly)
543  cudaCheck(error = cudaEventRecord(search_key.ready_event, search_key.associated_stream));
544  }
545 
546  // Unlock
547  mutex_locker.unlock();
548 
549  if (!recached) {
550  // Free the allocation from the runtime and cleanup the event.
551  cudaCheck(error = cudaFreeHost(d_ptr));
552  cudaCheck(error = cudaEventDestroy(search_key.ready_event));
553 
554  if (debug)
555  printf(
556  "\tHost freed %lld bytes from associated stream %lld, event %lld on device %lld.\n\t\t %lld available "
557  "blocks cached (%lld bytes), %lld live blocks (%lld bytes) outstanding.\n",
558  (long long)search_key.bytes,
559  (long long)search_key.associated_stream,
560  (long long)search_key.ready_event,
561  (long long)search_key.device,
562  (long long)cached_blocks.size(),
563  (long long)cached_bytes.free,
564  (long long)live_blocks.size(),
565  (long long)cached_bytes.live);
566  }
567 
568  // Reset device
569  if ((entrypoint_device != INVALID_DEVICE_ORDINAL) && (entrypoint_device != search_key.device)) {
570  cudaCheck(error = cudaSetDevice(entrypoint_device));
571  }
572 
573  return error;
574  }
CachedBlocks cached_blocks
Aggregate cached bytes.
size_t max_cached_bytes
Maximum bin size.
TotalBytes cached_bytes
Whether or not to print (de)allocation events to stdout.
static const unsigned int INVALID_BIN
Out-of-bounds bin.
static const int INVALID_DEVICE_ORDINAL
Invalid device ordinal.
bool debug
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may hav...
BusyBlocks live_blocks
Set of cached pinned host allocations available for reuse.
#define cudaCheck(ARG,...)
Definition: cudaCheck.h:69

◆ IntPow()

static unsigned int notcub::CachingHostAllocator::IntPow ( unsigned int  base,
unsigned int  exp 
)
inlinestatic

Integer pow function for unsigned base and exponent

Definition at line 179 of file CachingHostAllocator.h.

References newFWLiteAna::base, and JetChargeProducer_cfi::exp.

179  {
180  unsigned int retval = 1;
181  while (exp > 0) {
182  if (exp & 1) {
183  retval = retval * base; // multiply the result by the current base
184  }
185  base = base * base; // square the base
186  exp = exp >> 1; // divide the exponent in half
187  }
188  return retval;
189  }
base
Main Program
Definition: newFWLiteAna.py:92

◆ NearestPowerOf()

void notcub::CachingHostAllocator::NearestPowerOf ( unsigned int &  power,
size_t &  rounded_bytes,
unsigned int  base,
size_t  value 
)
inline

Round up to the nearest power-of

Definition at line 194 of file CachingHostAllocator.h.

References newFWLiteAna::base, and cms::alpakatools::detail::power().

Referenced by HostAllocate().

194  {
195  power = 0;
196  rounded_bytes = 1;
197 
198  if (value * base < value) {
199  // Overflow
200  power = sizeof(size_t) * 8;
201  rounded_bytes = size_t(0) - 1;
202  return;
203  }
204 
205  while (rounded_bytes < value) {
206  rounded_bytes *= base;
207  power++;
208  }
209  }
base
Main Program
Definition: newFWLiteAna.py:92
Definition: value.py:1
constexpr unsigned int power(unsigned int base, unsigned int exponent)

◆ SetMaxCachedBytes()

void notcub::CachingHostAllocator::SetMaxCachedBytes ( size_t  max_cached_bytes)
inline

Sets the limit on the number bytes this allocator is allowed to cache.

Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See FreeAllCached().

Definition at line 292 of file CachingHostAllocator.h.

References debug, max_cached_bytes, and mutex.

292  {
293  // Lock
294  std::unique_lock mutex_locker(mutex);
295 
296  if (debug)
297  printf("Changing max_cached_bytes (%lld -> %lld)\n",
298  (long long)this->max_cached_bytes,
299  (long long)max_cached_bytes);
300 
301  this->max_cached_bytes = max_cached_bytes;
302 
303  // Unlock (redundant, kept for style uniformity)
304  mutex_locker.unlock();
305  }
size_t max_cached_bytes
Maximum bin size.
bool debug
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may hav...

Member Data Documentation

◆ bin_growth

unsigned int notcub::CachingHostAllocator::bin_growth

Mutex for thread-safety.

Definition at line 217 of file CachingHostAllocator.h.

Referenced by HostAllocate().

◆ cached_blocks

CachedBlocks notcub::CachingHostAllocator::cached_blocks

Aggregate cached bytes.

Definition at line 230 of file CachingHostAllocator.h.

Referenced by FreeAllCached(), HostAllocate(), and HostFree().

◆ cached_bytes

TotalBytes notcub::CachingHostAllocator::cached_bytes

Whether or not to print (de)allocation events to stdout.

Definition at line 229 of file CachingHostAllocator.h.

Referenced by FreeAllCached(), HostAllocate(), and HostFree().

◆ debug

bool notcub::CachingHostAllocator::debug

Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators)

Definition at line 227 of file CachingHostAllocator.h.

Referenced by rrapi.RRApi::dprint(), FreeAllCached(), rrapi.RRApi::get(), HostAllocate(), HostFree(), runTauIdMVA.TauIDEmbedder::loadMVA_WPs_run2_2017(), runTauIdMVA.TauIDEmbedder::runTauID(), and SetMaxCachedBytes().

◆ INVALID_BIN

const unsigned int notcub::CachingHostAllocator::INVALID_BIN = (unsigned int)-1
static

Out-of-bounds bin.

Definition at line 106 of file CachingHostAllocator.h.

Referenced by HostAllocate(), and HostFree().

◆ INVALID_DEVICE_ORDINAL

const int notcub::CachingHostAllocator::INVALID_DEVICE_ORDINAL = -1
static

Invalid device ordinal.

Definition at line 114 of file CachingHostAllocator.h.

Referenced by FreeAllCached(), HostAllocate(), and HostFree().

◆ INVALID_SIZE

const size_t notcub::CachingHostAllocator::INVALID_SIZE = (size_t)-1
static

Invalid size.

Definition at line 109 of file CachingHostAllocator.h.

◆ live_blocks

BusyBlocks notcub::CachingHostAllocator::live_blocks

Set of cached pinned host allocations available for reuse.

Definition at line 231 of file CachingHostAllocator.h.

Referenced by FreeAllCached(), HostAllocate(), and HostFree().

◆ max_bin

unsigned int notcub::CachingHostAllocator::max_bin

Minimum bin enumeration.

Definition at line 219 of file CachingHostAllocator.h.

Referenced by HostAllocate().

◆ max_bin_bytes

size_t notcub::CachingHostAllocator::max_bin_bytes

Minimum bin size.

Definition at line 222 of file CachingHostAllocator.h.

◆ max_cached_bytes

size_t notcub::CachingHostAllocator::max_cached_bytes

Maximum bin size.

Definition at line 223 of file CachingHostAllocator.h.

Referenced by HostFree(), and SetMaxCachedBytes().

◆ min_bin

unsigned int notcub::CachingHostAllocator::min_bin

Geometric growth factor for bin-sizes.

Definition at line 218 of file CachingHostAllocator.h.

Referenced by HostAllocate().

◆ min_bin_bytes

size_t notcub::CachingHostAllocator::min_bin_bytes

Maximum bin enumeration.

Definition at line 221 of file CachingHostAllocator.h.

Referenced by HostAllocate().

◆ mutex

std::mutex notcub::CachingHostAllocator::mutex

Definition at line 215 of file CachingHostAllocator.h.

Referenced by FreeAllCached(), HostAllocate(), HostFree(), and SetMaxCachedBytes().

◆ skip_cleanup

const bool notcub::CachingHostAllocator::skip_cleanup

Maximum aggregate cached bytes.

Definition at line 226 of file CachingHostAllocator.h.

Referenced by ~CachingHostAllocator().