CMS 3D CMS Logo

List of all members | Classes | Public Types | Public Member Functions | Static Public Member Functions | Public Attributes | Static Public Attributes
notcub::CachingDeviceAllocator Struct Reference

A simple caching allocator for device memory allocations. More...

#include <CachingDeviceAllocator.h>

Classes

struct  BlockDescriptor
 
class  TotalBytes
 

Public Types

typedef std::multiset< BlockDescriptor, CompareBusyBlocks
 Set type for live blocks (ordered by ptr) More...
 
typedef std::multiset< BlockDescriptor, CompareCachedBlocks
 Set type for cached blocks (ordered by size) More...
 
typedef bool(* Compare) (const BlockDescriptor &, const BlockDescriptor &)
 BlockDescriptor comparator function interface. More...
 
typedef std::map< int, TotalBytesGpuCachedBytes
 Map type of device ordinals to the number of cached bytes cached by each device. More...
 

Public Member Functions

 CachingDeviceAllocator (bool skip_cleanup=false, bool debug=false)
 Default constructor. More...
 
 CachingDeviceAllocator (unsigned int bin_growth, unsigned int min_bin=1, unsigned int max_bin=INVALID_BIN, size_t max_cached_bytes=INVALID_SIZE, bool skip_cleanup=false, bool debug=false)
 Set of live device allocations currently in use. More...
 
cudaError_t DeviceAllocate (int device, void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr)
 Provides a suitable allocation of device memory for the given size on the specified device. More...
 
cudaError_t DeviceAllocate (void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr)
 Provides a suitable allocation of device memory for the given size on the current device. More...
 
cudaError_t DeviceFree (int device, void *d_ptr)
 Frees a live allocation of device memory on the specified device, returning it to the allocator. More...
 
cudaError_t DeviceFree (void *d_ptr)
 Frees a live allocation of device memory on the current device, returning it to the allocator. More...
 
cudaError_t FreeAllCached ()
 Frees all cached device allocations on all devices. More...
 
void NearestPowerOf (unsigned int &power, size_t &rounded_bytes, unsigned int base, size_t value)
 
cudaError_t SetMaxCachedBytes (size_t max_cached_bytes)
 Sets the limit on the number bytes this allocator is allowed to cache per device. More...
 
 ~CachingDeviceAllocator ()
 Destructor. More...
 

Static Public Member Functions

static unsigned int IntPow (unsigned int base, unsigned int exp)
 

Public Attributes

unsigned int bin_growth
 Mutex for thread-safety. More...
 
CachedBlocks cached_blocks
 Map of device ordinal to aggregate cached bytes on that device. More...
 
GpuCachedBytes cached_bytes
 Whether or not to print (de)allocation events to stdout. More...
 
bool debug
 Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators) More...
 
BusyBlocks live_blocks
 Set of cached device allocations available for reuse. More...
 
unsigned int max_bin
 Minimum bin enumeration. More...
 
size_t max_bin_bytes
 Minimum bin size. More...
 
size_t max_cached_bytes
 Maximum bin size. More...
 
unsigned int min_bin
 Geometric growth factor for bin-sizes. More...
 
size_t min_bin_bytes
 Maximum bin enumeration. More...
 
std::mutex mutex
 
const bool skip_cleanup
 Maximum aggregate cached bytes per device. More...
 

Static Public Attributes

static const unsigned int INVALID_BIN = (unsigned int)-1
 Out-of-bounds bin. More...
 
static const int INVALID_DEVICE_ORDINAL = -1
 Invalid device ordinal. More...
 
static const size_t INVALID_SIZE = (size_t)-1
 Invalid size. More...
 

Detailed Description

A simple caching allocator for device memory allocations.

Overview
The allocator is thread-safe and stream-safe and is capable of managing cached device allocations on multiple devices. It behaves as follows:
  • Allocations from the allocator are associated with an active_stream. Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.
  • Allocations are categorized and cached by bin size. A new allocation request of a given size will only consider cached allocations within the corresponding bin.
  • Bin limits progress geometrically in accordance with the growth factor bin_growth provided during construction. Unused device allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.
  • Allocation requests below (bin_growth ^ min_bin) are rounded up to (bin_growth ^ min_bin).
  • Allocations above (bin_growth ^ max_bin) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.
  • If the total storage of cached allocations on a given device will exceed max_cached_bytes, allocations for that device are simply freed when they are deallocated instead of being returned to their bin-cache.
For example, the default-constructed CachingDeviceAllocator is configured with:
  • bin_growth = 8
  • min_bin = 3
  • max_bin = 7
  • max_cached_bytes = 6MB - 1B
which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device

Definition at line 123 of file CachingDeviceAllocator.h.

Member Typedef Documentation

◆ BusyBlocks

Set type for live blocks (ordered by ptr)

Definition at line 199 of file CachingDeviceAllocator.h.

◆ CachedBlocks

Set type for cached blocks (ordered by size)

Definition at line 196 of file CachingDeviceAllocator.h.

◆ Compare

typedef bool(* notcub::CachingDeviceAllocator::Compare) (const BlockDescriptor &, const BlockDescriptor &)

BlockDescriptor comparator function interface.

Definition at line 186 of file CachingDeviceAllocator.h.

◆ GpuCachedBytes

Map type of device ordinals to the number of cached bytes cached by each device.

Definition at line 202 of file CachingDeviceAllocator.h.

Constructor & Destructor Documentation

◆ CachingDeviceAllocator() [1/2]

notcub::CachingDeviceAllocator::CachingDeviceAllocator ( unsigned int  bin_growth,
unsigned int  min_bin = 1,
unsigned int  max_bin = INVALID_BIN,
size_t  max_cached_bytes = INVALID_SIZE,
bool  skip_cleanup = false,
bool  debug = false 
)
inline

Set of live device allocations currently in use.

Constructor.

Parameters
bin_growthGeometric growth factor for bin-sizes
min_binMinimum bin (default is bin_growth ^ 1)
max_binMaximum bin (default is no max bin)
max_cached_bytesMaximum aggregate cached bytes per device (default is no limit)
skip_cleanupWhether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate)
debugWhether or not to print (de)allocation events to stdout (default is no stderr output)

Definition at line 275 of file CachingDeviceAllocator.h.

279  : 512B, 4KB, 32KB, 256KB, and 2MB and
280  * sets a maximum of 6,291,455 cached bytes per device
281  */
282  CachingDeviceAllocator(bool skip_cleanup = false, bool debug = false)
283  : bin_growth(8),
284  min_bin(3),
285  max_bin(7),
288  max_cached_bytes((max_bin_bytes * 3) - 1),
290  debug(debug),
291  cached_blocks(BlockDescriptor::SizeCompare),
292  live_blocks(BlockDescriptor::PtrCompare) {}

◆ CachingDeviceAllocator() [2/2]

notcub::CachingDeviceAllocator::CachingDeviceAllocator ( bool  skip_cleanup = false,
bool  debug = false 
)
inline

Default constructor.

Configured with:

  • bin_growth = 8
  • min_bin = 3
  • max_bin = 7
  • max_cached_bytes = (bin_growth ^ max_bin) * 3) - 1 = 6,291,455 bytes

which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device

Definition at line 307 of file CachingDeviceAllocator.h.

330  {

◆ ~CachingDeviceAllocator()

notcub::CachingDeviceAllocator::~CachingDeviceAllocator ( )
inline

Destructor.

Definition at line 742 of file CachingDeviceAllocator.h.

Member Function Documentation

◆ DeviceAllocate() [1/2]

cudaError_t notcub::CachingDeviceAllocator::DeviceAllocate ( int  device,
void **  d_ptr,
size_t  bytes,
cudaStream_t  active_stream = nullptr 
)
inline

Provides a suitable allocation of device memory for the given size on the specified device.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

Parameters
[in]deviceDevice on which to place the allocation
[out]d_ptrReference to pointer to the allocation
[in]bytesMinimum number of bytes for the allocation
[in]active_streamThe stream to be associated with this allocation

Definition at line 350 of file CachingDeviceAllocator.h.

353  {
354  // Search for a suitable cached allocation: lock
355  mutex.lock();
356 
357  if (search_key.bin < min_bin) {
358  // Bin is less than minimum bin: round up
359  search_key.bin = min_bin;
360  search_key.bytes = min_bin_bytes;
361  }
362 
363  // Iterate through the range of cached blocks on the same device in the same bin
364  CachedBlocks::iterator block_itr = cached_blocks.lower_bound(search_key);
365  while ((block_itr != cached_blocks.end()) && (block_itr->device == device) &&
366  (block_itr->bin == search_key.bin)) {
367  // To prevent races with reusing blocks returned by the host but still
368  // in use by the device, only consider cached blocks that are
369  // either (from the active stream) or (from an idle stream)
370  if ((active_stream == block_itr->associated_stream) ||
371  (cudaEventQuery(block_itr->ready_event) != cudaErrorNotReady)) {
372  // Reuse existing cache block. Insert into live blocks.
373  found = true;
374  search_key = *block_itr;
375  search_key.associated_stream = active_stream;
376  live_blocks.insert(search_key);
377 
378  // Remove from free blocks
379  cached_bytes[device].free -= search_key.bytes;
380  cached_bytes[device].live += search_key.bytes;
381 
382  if (debug)
383  // CMS: improved debug message
384  // CMS: use raw printf
385  printf(
386  "\tDevice %d reused cached block at %p (%lld bytes) for stream %lld, event %lld (previously "
387  "associated with stream %lld, event %lld).\n",
388  device,
389  search_key.d_ptr,
390  (long long)search_key.bytes,
391  (long long)search_key.associated_stream,
392  (long long)search_key.ready_event,
393  (long long)block_itr->associated_stream,
394  (long long)block_itr->ready_event);
395 
396  cached_blocks.erase(block_itr);
397 
398  break;
399  }
400  block_itr++;
401  }
402 
403  // Done searching: unlock
404  mutex.unlock();
405  }
406 
407  // Allocate the block if necessary
408  if (!found) {
409  // Set runtime's current device to specified device (entrypoint may not be set)
410  if (device != entrypoint_device) {
411  // CMS: throw exception on error
412  cudaCheck(error = cudaGetDevice(&entrypoint_device));
413  cudaCheck(error = cudaSetDevice(device));
414  }
415 
416  // Attempt to allocate
417  // CMS: silently ignore errors and retry or pass them to the caller
418  if ((error = cudaMalloc(&search_key.d_ptr, search_key.bytes)) == cudaErrorMemoryAllocation) {
419  // The allocation attempt failed: free all cached blocks on device and retry
420  if (debug)
421  // CMS: use raw printf
422  printf(
423  "\tDevice %d failed to allocate %lld bytes for stream %lld, retrying after freeing cached allocations",
424  device,
425  (long long)search_key.bytes,
426  (long long)search_key.associated_stream);
427 
428  error = cudaSuccess; // Reset the error we will return
429  cudaGetLastError(); // Reset CUDART's error
430 
431  // Lock
432  mutex.lock();
433 
434  // Iterate the range of free blocks on the same device
435  BlockDescriptor free_key(device);
436  CachedBlocks::iterator block_itr = cached_blocks.lower_bound(free_key);
437 
438  while ((block_itr != cached_blocks.end()) && (block_itr->device == device)) {
439  // No need to worry about synchronization with the device: cudaFree is
440  // blocking and will synchronize across all kernels executing
441  // on the current device
442 
443  // Free device memory and destroy stream event.
444  // CMS: silently ignore errors and pass them to the caller
445  if ((error = cudaFree(block_itr->d_ptr)))
446  break;
447  if ((error = cudaEventDestroy(block_itr->ready_event)))
448  break;
449 
450  // Reduce balance and erase entry
451  cached_bytes[device].free -= block_itr->bytes;
452 
453  if (debug)
454  // CMS: use raw printf
455  printf(
456  "\tDevice %d freed %lld bytes.\n\t\t %lld available blocks cached (%lld bytes), %lld live blocks "
457  "(%lld bytes) outstanding.\n",
458  device,
459  (long long)block_itr->bytes,
460  (long long)cached_blocks.size(),
461  (long long)cached_bytes[device].free,
462  (long long)live_blocks.size(),
463  (long long)cached_bytes[device].live);
464 
465  cached_blocks.erase(block_itr);
466 
467  block_itr++;
468  }
469 
470  // Unlock
471  mutex.unlock();
472 
473  // Return under error
474  if (error)
475  return error;
476 
477  // Try to allocate again
478  // CMS: throw exception on error
479  cudaCheck(error = cudaMalloc(&search_key.d_ptr, search_key.bytes));
480  }
481 
482  // Create ready event
483  // CMS: throw exception on error
484  cudaCheck(error = cudaEventCreateWithFlags(&search_key.ready_event, cudaEventDisableTiming));
485 
486  // Insert into live blocks
487  mutex.lock();
488  live_blocks.insert(search_key);
489  cached_bytes[device].live += search_key.bytes;
490  mutex.unlock();
491 
492  if (debug)
493  // CMS: improved debug message
494  // CMS: use raw printf
495  printf("\tDevice %d allocated new device block at %p (%lld bytes associated with stream %lld, event %lld).\n",
496  device,
497  search_key.d_ptr,
498  (long long)search_key.bytes,
499  (long long)search_key.associated_stream,
500  (long long)search_key.ready_event);
501 
502  // Attempt to revert back to previous device if necessary
503  if ((entrypoint_device != INVALID_DEVICE_ORDINAL) && (entrypoint_device != device)) {
504  // CMS: throw exception on error
505  cudaCheck(error = cudaSetDevice(entrypoint_device));
506  }
507  }
508 
509  // Copy device pointer to output parameter
510  *d_ptr = search_key.d_ptr;
511 
512  if (debug)
513  // CMS: use raw printf
514  printf("\t\t%lld available blocks cached (%lld bytes), %lld live blocks outstanding(%lld bytes).\n",
515  (long long)cached_blocks.size(),
516  (long long)cached_bytes[device].free,
517  (long long)live_blocks.size(),
518  (long long)cached_bytes[device].live);
519 
520  return error;
521  }
522 
530  cudaError_t DeviceAllocate(
531  void **d_ptr,
532  size_t bytes,
533  cudaStream_t active_stream = nullptr)
534  {
535  return DeviceAllocate(INVALID_DEVICE_ORDINAL, d_ptr, bytes, active_stream);
536  }
537 
545  cudaError_t DeviceFree(int device, void *d_ptr) {
546  int entrypoint_device = INVALID_DEVICE_ORDINAL;

◆ DeviceAllocate() [2/2]

cudaError_t notcub::CachingDeviceAllocator::DeviceAllocate ( void **  d_ptr,
size_t  bytes,
cudaStream_t  active_stream = nullptr 
)
inline

Provides a suitable allocation of device memory for the given size on the current device.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

Parameters
[out]d_ptrReference to pointer to the allocation
[in]bytesMinimum number of bytes for the allocation
[in]active_streamThe stream to be associated with this allocation

Definition at line 555 of file CachingDeviceAllocator.h.

562  {

◆ DeviceFree() [1/2]

cudaError_t notcub::CachingDeviceAllocator::DeviceFree ( int  device,
void *  d_ptr 
)
inline

Frees a live allocation of device memory on the specified device, returning it to the allocator.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

Definition at line 570 of file CachingDeviceAllocator.h.

594  {
595  // CMS: throw exception on error
596  cudaCheck(error = cudaGetDevice(&entrypoint_device));
597  cudaCheck(error = cudaSetDevice(device));
598  }
599 
600  if (recached) {
601  // Insert the ready event in the associated stream (must have current device set properly)
602  // CMS: throw exception on error
603  cudaCheck(error = cudaEventRecord(search_key.ready_event, search_key.associated_stream));
604  }
605 
606  // Unlock
607  mutex.unlock();
608 
609  if (!recached) {
610  // Free the allocation from the runtime and cleanup the event.
611  // CMS: throw exception on error
612  cudaCheck(error = cudaFree(d_ptr));
613  cudaCheck(error = cudaEventDestroy(search_key.ready_event));
614 
615  if (debug)
616  // CMS: improved debug message
617  printf(
618  "\tDevice %d freed %lld bytes at %p from associated stream %lld, event %lld.\n\t\t %lld available "
619  "blocks cached (%lld bytes), %lld live blocks (%lld bytes) outstanding.\n",
620  device,
621  (long long)search_key.bytes,
622  d_ptr,
623  (long long)search_key.associated_stream,
624  (long long)search_key.ready_event,
625  (long long)cached_blocks.size(),
626  (long long)cached_bytes[device].free,
627  (long long)live_blocks.size(),
628  (long long)cached_bytes[device].live);
629  }
630 
631  // Reset device
632  if ((entrypoint_device != INVALID_DEVICE_ORDINAL) && (entrypoint_device != device)) {
633  // CMS: throw exception on error
634  cudaCheck(error = cudaSetDevice(entrypoint_device));
635  }
636 
637  return error;
638  }
639 
647  cudaError_t DeviceFree(void *d_ptr) { return DeviceFree(INVALID_DEVICE_ORDINAL, d_ptr); }
648 
652  cudaError_t FreeAllCached() {
653  cudaError_t error = cudaSuccess;
654  int entrypoint_device = INVALID_DEVICE_ORDINAL;
655  int current_device = INVALID_DEVICE_ORDINAL;
656 
657  mutex.lock();
658 
659  while (!cached_blocks.empty()) {
660  // Get first block
661  CachedBlocks::iterator begin = cached_blocks.begin();
662 
663  // Get entry-point device ordinal if necessary

◆ DeviceFree() [2/2]

cudaError_t notcub::CachingDeviceAllocator::DeviceFree ( void *  d_ptr)
inline

Frees a live allocation of device memory on the current device, returning it to the allocator.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

Definition at line 672 of file CachingDeviceAllocator.h.

705 {

◆ FreeAllCached()

cudaError_t notcub::CachingDeviceAllocator::FreeAllCached ( )
inline

Frees all cached device allocations on all devices.

Definition at line 677 of file CachingDeviceAllocator.h.

705  {
706  // CMS: throw exception on error
707  cudaCheck(error = cudaSetDevice(entrypoint_device));
708  }
709 
710  return error;
711  }
712 
716  // CMS: make the destructor not virtual
718  if (!skip_cleanup)
719  FreeAllCached();
720  }
721  };
722  // end group UtilMgmt
724 
725 } // namespace notcub
726 
727 #endif

Referenced by CUDAService::~CUDAService().

◆ IntPow()

static unsigned int notcub::CachingDeviceAllocator::IntPow ( unsigned int  base,
unsigned int  exp 
)
inlinestatic

Integer pow function for unsigned base and exponent

Definition at line 211 of file CachingDeviceAllocator.h.

212  {
213  rounded_bytes *= base;
214  power++;
215  }
216  }
217 
218  //---------------------------------------------------------------------
219  // Fields
220  //---------------------------------------------------------------------
221 

References newFWLiteAna::base.

Referenced by cms::cuda::allocator::getCachingDeviceAllocator(), and cms::cuda::allocator::getCachingHostAllocator().

◆ NearestPowerOf()

void notcub::CachingDeviceAllocator::NearestPowerOf ( unsigned int &  power,
size_t &  rounded_bytes,
unsigned int  base,
size_t  value 
)
inline

Round up to the nearest power-of

Definition at line 226 of file CachingDeviceAllocator.h.

Referenced by SetMaxCachedBytes().

◆ SetMaxCachedBytes()

cudaError_t notcub::CachingDeviceAllocator::SetMaxCachedBytes ( size_t  max_cached_bytes)
inline

Sets the limit on the number bytes this allocator is allowed to cache per device.

Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See FreeAllCached().

Definition at line 325 of file CachingDeviceAllocator.h.

330  {
331  *d_ptr = nullptr;
332  int entrypoint_device = INVALID_DEVICE_ORDINAL;
333  cudaError_t error = cudaSuccess;
334 
335  if (device == INVALID_DEVICE_ORDINAL) {
336  // CMS: throw exception on error
337  cudaCheck(error = cudaGetDevice(&entrypoint_device));
338  device = entrypoint_device;
339  }
340 
341  // Create a block descriptor for the requested allocation

References bin_growth, cudaCheck, relativeConstraints::error, newFWLiteAna::found, INVALID_DEVICE_ORDINAL, max_bin, and NearestPowerOf().

Member Data Documentation

◆ bin_growth

unsigned int notcub::CachingDeviceAllocator::bin_growth

Mutex for thread-safety.

Definition at line 250 of file CachingDeviceAllocator.h.

Referenced by SetMaxCachedBytes().

◆ cached_blocks

CachedBlocks notcub::CachingDeviceAllocator::cached_blocks

Map of device ordinal to aggregate cached bytes on that device.

Definition at line 263 of file CachingDeviceAllocator.h.

◆ cached_bytes

GpuCachedBytes notcub::CachingDeviceAllocator::cached_bytes

Whether or not to print (de)allocation events to stdout.

Definition at line 262 of file CachingDeviceAllocator.h.

◆ debug

bool notcub::CachingDeviceAllocator::debug

Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators)

Definition at line 260 of file CachingDeviceAllocator.h.

Referenced by rrapi.RRApi::dprint(), rrapi.RRApi::get(), runTauIdMVA.TauIDEmbedder::loadMVA_WPs_run2_2017(), and runTauIdMVA.TauIDEmbedder::runTauID().

◆ INVALID_BIN

const unsigned int notcub::CachingDeviceAllocator::INVALID_BIN = (unsigned int)-1
static

Out-of-bounds bin.

Definition at line 130 of file CachingDeviceAllocator.h.

◆ INVALID_DEVICE_ORDINAL

const int notcub::CachingDeviceAllocator::INVALID_DEVICE_ORDINAL = -1
static

Invalid device ordinal.

Definition at line 138 of file CachingDeviceAllocator.h.

Referenced by SetMaxCachedBytes().

◆ INVALID_SIZE

const size_t notcub::CachingDeviceAllocator::INVALID_SIZE = (size_t)-1
static

Invalid size.

Definition at line 133 of file CachingDeviceAllocator.h.

◆ live_blocks

BusyBlocks notcub::CachingDeviceAllocator::live_blocks

Set of cached device allocations available for reuse.

Definition at line 264 of file CachingDeviceAllocator.h.

◆ max_bin

unsigned int notcub::CachingDeviceAllocator::max_bin

Minimum bin enumeration.

Definition at line 252 of file CachingDeviceAllocator.h.

Referenced by SetMaxCachedBytes().

◆ max_bin_bytes

size_t notcub::CachingDeviceAllocator::max_bin_bytes

Minimum bin size.

Definition at line 255 of file CachingDeviceAllocator.h.

◆ max_cached_bytes

size_t notcub::CachingDeviceAllocator::max_cached_bytes

Maximum bin size.

Definition at line 256 of file CachingDeviceAllocator.h.

◆ min_bin

unsigned int notcub::CachingDeviceAllocator::min_bin

Geometric growth factor for bin-sizes.

Definition at line 251 of file CachingDeviceAllocator.h.

◆ min_bin_bytes

size_t notcub::CachingDeviceAllocator::min_bin_bytes

Maximum bin enumeration.

Definition at line 254 of file CachingDeviceAllocator.h.

◆ mutex

std::mutex notcub::CachingDeviceAllocator::mutex

Definition at line 248 of file CachingDeviceAllocator.h.

◆ skip_cleanup

const bool notcub::CachingDeviceAllocator::skip_cleanup

Maximum aggregate cached bytes per device.

Definition at line 259 of file CachingDeviceAllocator.h.

notcub::CachingDeviceAllocator::min_bin_bytes
size_t min_bin_bytes
Maximum bin enumeration.
Definition: CachingDeviceAllocator.h:254
notcub::CachingDeviceAllocator::live_blocks
BusyBlocks live_blocks
Set of cached device allocations available for reuse.
Definition: CachingDeviceAllocator.h:264
notcub::CachingDeviceAllocator::DeviceFree
cudaError_t DeviceFree(int device, void *d_ptr)
Frees a live allocation of device memory on the specified device, returning it to the allocator.
Definition: CachingDeviceAllocator.h:570
notcub::CachingDeviceAllocator::max_bin_bytes
size_t max_bin_bytes
Minimum bin size.
Definition: CachingDeviceAllocator.h:255
notcub::CachingDeviceAllocator::max_bin
unsigned int max_bin
Minimum bin enumeration.
Definition: CachingDeviceAllocator.h:252
notcub::CachingDeviceAllocator::min_bin
unsigned int min_bin
Geometric growth factor for bin-sizes.
Definition: CachingDeviceAllocator.h:251
notcub::CachingDeviceAllocator::mutex
std::mutex mutex
Definition: CachingDeviceAllocator.h:248
newFWLiteAna.found
found
Definition: newFWLiteAna.py:118
relativeConstraints.error
error
Definition: relativeConstraints.py:53
notcub::CachingDeviceAllocator::FreeAllCached
cudaError_t FreeAllCached()
Frees all cached device allocations on all devices.
Definition: CachingDeviceAllocator.h:677
notcub::CachingDeviceAllocator::DeviceAllocate
cudaError_t DeviceAllocate(int device, void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr)
Provides a suitable allocation of device memory for the given size on the specified device.
Definition: CachingDeviceAllocator.h:350
notcub::CachingDeviceAllocator::cached_bytes
GpuCachedBytes cached_bytes
Whether or not to print (de)allocation events to stdout.
Definition: CachingDeviceAllocator.h:262
notcub::CachingDeviceAllocator::IntPow
static unsigned int IntPow(unsigned int base, unsigned int exp)
Definition: CachingDeviceAllocator.h:211
notcub::CachingDeviceAllocator::cached_blocks
CachedBlocks cached_blocks
Map of device ordinal to aggregate cached bytes on that device.
Definition: CachingDeviceAllocator.h:263
notcub::CachingDeviceAllocator::skip_cleanup
const bool skip_cleanup
Maximum aggregate cached bytes per device.
Definition: CachingDeviceAllocator.h:259
a
double a
Definition: hdecay.h:119
beam_dqm_sourceclient-live_cfg.live
live
Definition: beam_dqm_sourceclient-live_cfg.py:24
notcub::CachingDeviceAllocator::~CachingDeviceAllocator
~CachingDeviceAllocator()
Destructor.
Definition: CachingDeviceAllocator.h:742
TtFullHadDaughter::B
static const std::string B
Definition: TtFullHadronicEvent.h:9
notcub::CachingDeviceAllocator::bin_growth
unsigned int bin_growth
Mutex for thread-safety.
Definition: CachingDeviceAllocator.h:250
cudaCheck
#define cudaCheck(ARG,...)
Definition: cudaCheck.h:62
notcub::CachingDeviceAllocator::CachingDeviceAllocator
CachingDeviceAllocator(unsigned int bin_growth, unsigned int min_bin=1, unsigned int max_bin=INVALID_BIN, size_t max_cached_bytes=INVALID_SIZE, bool skip_cleanup=false, bool debug=false)
Set of live device allocations currently in use.
Definition: CachingDeviceAllocator.h:275
notcub::CachingDeviceAllocator::max_cached_bytes
size_t max_cached_bytes
Maximum bin size.
Definition: CachingDeviceAllocator.h:256
notcub::CachingDeviceAllocator::debug
bool debug
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may hav...
Definition: CachingDeviceAllocator.h:260
notcub::CachingDeviceAllocator::INVALID_DEVICE_ORDINAL
static const int INVALID_DEVICE_ORDINAL
Invalid device ordinal.
Definition: CachingDeviceAllocator.h:138
newFWLiteAna.base
base
Definition: newFWLiteAna.py:92
begin
#define begin
Definition: vmac.h:32