A simple caching allocator for device memory allocations. More...
#include <CachingDeviceAllocator.h>
Classes | |
struct | BlockDescriptor |
Public Types | |
typedef std::multiset < BlockDescriptor, Compare > | BusyBlocks |
Set type for live blocks (ordered by ptr) More... | |
typedef std::multiset < BlockDescriptor, Compare > | CachedBlocks |
Set type for cached blocks (ordered by size) More... | |
typedef bool(* | Compare )(const BlockDescriptor &, const BlockDescriptor &) |
BlockDescriptor comparator function interface. More... | |
using | GpuCachedBytes = cms::cuda::allocator::GpuCachedBytes |
Map type of device ordinals to the number of cached bytes cached by each device. More... | |
Public Member Functions | |
GpuCachedBytes | CacheStatus () const |
CachingDeviceAllocator (unsigned int bin_growth, unsigned int min_bin=1, unsigned int max_bin=INVALID_BIN, size_t max_cached_bytes=INVALID_SIZE, bool skip_cleanup=false, bool debug=false) | |
Set of live device allocations currently in use. More... | |
CachingDeviceAllocator (bool skip_cleanup=false, bool debug=false) | |
Default constructor. More... | |
cudaError_t | DeviceAllocate (int device, void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr) |
Provides a suitable allocation of device memory for the given size on the specified device. More... | |
cudaError_t | DeviceAllocate (void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr) |
Provides a suitable allocation of device memory for the given size on the current device. More... | |
cudaError_t | DeviceFree (int device, void *d_ptr) |
Frees a live allocation of device memory on the specified device, returning it to the allocator. More... | |
cudaError_t | DeviceFree (void *d_ptr) |
Frees a live allocation of device memory on the current device, returning it to the allocator. More... | |
cudaError_t | FreeAllCached () |
Frees all cached device allocations on all devices. More... | |
void | NearestPowerOf (unsigned int &power, size_t &rounded_bytes, unsigned int base, size_t value) |
cudaError_t | SetMaxCachedBytes (size_t max_cached_bytes) |
Sets the limit on the number bytes this allocator is allowed to cache per device. More... | |
~CachingDeviceAllocator () | |
Destructor. More... | |
Static Public Member Functions | |
static unsigned int | IntPow (unsigned int base, unsigned int exp) |
Public Attributes | |
unsigned int | bin_growth |
Mutex for thread-safety. More... | |
CachedBlocks | cached_blocks |
Map of device ordinal to aggregate cached bytes on that device. More... | |
GpuCachedBytes | cached_bytes |
Whether or not to print (de)allocation events to stdout. More... | |
bool | debug |
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators) More... | |
BusyBlocks | live_blocks |
Set of cached device allocations available for reuse. More... | |
unsigned int | max_bin |
Minimum bin enumeration. More... | |
size_t | max_bin_bytes |
Minimum bin size. More... | |
size_t | max_cached_bytes |
Maximum bin size. More... | |
unsigned int | min_bin |
Geometric growth factor for bin-sizes. More... | |
size_t | min_bin_bytes |
Maximum bin enumeration. More... | |
std::mutex | mutex |
const bool | skip_cleanup |
Maximum aggregate cached bytes per device. More... | |
Static Public Attributes | |
static const unsigned int | INVALID_BIN = (unsigned int)-1 |
Out-of-bounds bin. More... | |
static const int | INVALID_DEVICE_ORDINAL = -1 |
Invalid device ordinal. More... | |
static const size_t | INVALID_SIZE = (size_t)-1 |
Invalid size. More... | |
A simple caching allocator for device memory allocations.
active_stream
. Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.bin_growth
provided during construction. Unused device allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.bin_growth
^ min_bin
) are rounded up to (bin_growth
^ min_bin
).bin_growth
^ max_bin
) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.max_cached_bytes
, allocations for that device are simply freed when they are deallocated instead of being returned to their bin-cache.bin_growth
= 8min_bin
= 3max_bin
= 7max_cached_bytes
= 6MB - 1BDefinition at line 100 of file CachingDeviceAllocator.h.
typedef std::multiset<BlockDescriptor, Compare> notcub::CachingDeviceAllocator::BusyBlocks |
Set type for live blocks (ordered by ptr)
Definition at line 178 of file CachingDeviceAllocator.h.
typedef std::multiset<BlockDescriptor, Compare> notcub::CachingDeviceAllocator::CachedBlocks |
Set type for cached blocks (ordered by size)
Definition at line 175 of file CachingDeviceAllocator.h.
typedef bool(* notcub::CachingDeviceAllocator::Compare)(const BlockDescriptor &, const BlockDescriptor &) |
BlockDescriptor comparator function interface.
Definition at line 170 of file CachingDeviceAllocator.h.
Map type of device ordinals to the number of cached bytes cached by each device.
Definition at line 182 of file CachingDeviceAllocator.h.
|
inline |
Set of live device allocations currently in use.
Constructor.
bin_growth | Geometric growth factor for bin-sizes |
min_bin | Minimum bin (default is bin_growth ^ 1) |
max_bin | Maximum bin (default is no max bin) |
max_cached_bytes | Maximum aggregate cached bytes per device (default is no limit) |
skip_cleanup | Whether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate) |
debug | Whether or not to print (de)allocation events to stdout (default is no stderr output) |
Definition at line 255 of file CachingDeviceAllocator.h.
|
inline |
Default constructor.
Configured with:
bin_growth
= 8min_bin
= 3max_bin
= 7max_cached_bytes
= (bin_growth
^ max_bin
) * 3) - 1 = 6,291,455 byteswhich delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device
Definition at line 287 of file CachingDeviceAllocator.h.
|
inline |
Destructor.
Definition at line 737 of file CachingDeviceAllocator.h.
References FreeAllCached(), and skip_cleanup.
|
inline |
Definition at line 728 of file CachingDeviceAllocator.h.
References cached_bytes, and mutex.
Referenced by cms::cuda::deviceAllocatorStatus().
|
inline |
Provides a suitable allocation of device memory for the given size on the specified device.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
[in] | device | Device on which to place the allocation |
[out] | d_ptr | Reference to pointer to the allocation |
[in] | bytes | Minimum number of bytes for the allocation |
[in] | active_stream | The stream to be associated with this allocation |
Definition at line 331 of file CachingDeviceAllocator.h.
References notcub::CachingDeviceAllocator::BlockDescriptor::associated_stream, notcub::CachingDeviceAllocator::BlockDescriptor::bin, bin_growth, notcub::CachingDeviceAllocator::BlockDescriptor::bytes, notcub::CachingDeviceAllocator::BlockDescriptor::bytesRequested, cached_blocks, cached_bytes, cudaCheck, notcub::CachingDeviceAllocator::BlockDescriptor::d_ptr, debug, relativeConstraints::error, newFWLiteAna::found, INVALID_BIN, INVALID_DEVICE_ORDINAL, beam_dqm_sourceclient-live_cfg::live, live_blocks, max_bin, min_bin, min_bin_bytes, mutex, NearestPowerOf(), gpuVertexFinder::printf(), and notcub::CachingDeviceAllocator::BlockDescriptor::ready_event.
Referenced by DeviceAllocate().
|
inline |
Provides a suitable allocation of device memory for the given size on the current device.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
[out] | d_ptr | Reference to pointer to the allocation |
[in] | bytes | Minimum number of bytes for the allocation |
[in] | active_stream | The stream to be associated with this allocation |
Definition at line 541 of file CachingDeviceAllocator.h.
References DeviceAllocate(), and INVALID_DEVICE_ORDINAL.
|
inline |
Frees a live allocation of device memory on the specified device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
Definition at line 556 of file CachingDeviceAllocator.h.
References notcub::CachingDeviceAllocator::BlockDescriptor::associated_stream, notcub::CachingDeviceAllocator::BlockDescriptor::bin, notcub::CachingDeviceAllocator::BlockDescriptor::bytes, notcub::CachingDeviceAllocator::BlockDescriptor::bytesRequested, cached_blocks, cached_bytes, cudaCheck, debug, relativeConstraints::error, INVALID_BIN, INVALID_DEVICE_ORDINAL, beam_dqm_sourceclient-live_cfg::live, live_blocks, max_cached_bytes, mutex, gpuVertexFinder::printf(), and notcub::CachingDeviceAllocator::BlockDescriptor::ready_event.
|
inline |
Frees a live allocation of device memory on the current device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
Definition at line 661 of file CachingDeviceAllocator.h.
References DeviceFree(), and INVALID_DEVICE_ORDINAL.
Referenced by DeviceFree().
|
inline |
Frees all cached device allocations on all devices.
Definition at line 666 of file CachingDeviceAllocator.h.
References SplitLinear::begin, cached_blocks, cached_bytes, cudaCheck, debug, relativeConstraints::error, INVALID_DEVICE_ORDINAL, beam_dqm_sourceclient-live_cfg::live, live_blocks, mutex, and gpuVertexFinder::printf().
Referenced by cms::cuda::allocator::cachingAllocatorsFreeCached(), and ~CachingDeviceAllocator().
|
inlinestatic |
Integer pow function for unsigned base and exponent
Definition at line 191 of file CachingDeviceAllocator.h.
References newFWLiteAna::base.
Referenced by cms::cuda::allocator::getCachingDeviceAllocator(), and cms::cuda::allocator::getCachingHostAllocator().
|
inline |
Round up to the nearest power-of
Definition at line 206 of file CachingDeviceAllocator.h.
References newFWLiteAna::base.
Referenced by DeviceAllocate().
|
inline |
Sets the limit on the number bytes this allocator is allowed to cache per device.
Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See FreeAllCached()
.
Definition at line 305 of file CachingDeviceAllocator.h.
References debug, max_cached_bytes, mutex, and gpuVertexFinder::printf().
unsigned int notcub::CachingDeviceAllocator::bin_growth |
Mutex for thread-safety.
Definition at line 230 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate().
CachedBlocks notcub::CachingDeviceAllocator::cached_blocks |
Map of device ordinal to aggregate cached bytes on that device.
Definition at line 243 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate(), DeviceFree(), and FreeAllCached().
GpuCachedBytes notcub::CachingDeviceAllocator::cached_bytes |
Whether or not to print (de)allocation events to stdout.
Definition at line 242 of file CachingDeviceAllocator.h.
Referenced by CacheStatus(), DeviceAllocate(), DeviceFree(), and FreeAllCached().
bool notcub::CachingDeviceAllocator::debug |
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators)
Definition at line 240 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate(), DeviceFree(), rrapi.RRApi::dprint(), FreeAllCached(), rrapi.RRApi::get(), runTauIdMVA.TauIDEmbedder::loadMVA_WPs_run2_2017(), runTauIdMVA.TauIDEmbedder::runTauID(), and SetMaxCachedBytes().
|
static |
Out-of-bounds bin.
Definition at line 106 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate(), and DeviceFree().
|
static |
Invalid device ordinal.
Definition at line 114 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate(), DeviceFree(), and FreeAllCached().
|
static |
Invalid size.
Definition at line 109 of file CachingDeviceAllocator.h.
BusyBlocks notcub::CachingDeviceAllocator::live_blocks |
Set of cached device allocations available for reuse.
Definition at line 244 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate(), DeviceFree(), and FreeAllCached().
unsigned int notcub::CachingDeviceAllocator::max_bin |
Minimum bin enumeration.
Definition at line 232 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate().
size_t notcub::CachingDeviceAllocator::max_bin_bytes |
Minimum bin size.
Definition at line 235 of file CachingDeviceAllocator.h.
size_t notcub::CachingDeviceAllocator::max_cached_bytes |
Maximum bin size.
Definition at line 236 of file CachingDeviceAllocator.h.
Referenced by DeviceFree(), and SetMaxCachedBytes().
unsigned int notcub::CachingDeviceAllocator::min_bin |
Geometric growth factor for bin-sizes.
Definition at line 231 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate().
size_t notcub::CachingDeviceAllocator::min_bin_bytes |
Maximum bin enumeration.
Definition at line 234 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate().
|
mutable |
Definition at line 228 of file CachingDeviceAllocator.h.
Referenced by CacheStatus(), DeviceAllocate(), DeviceFree(), FreeAllCached(), and SetMaxCachedBytes().
const bool notcub::CachingDeviceAllocator::skip_cleanup |
Maximum aggregate cached bytes per device.
Definition at line 239 of file CachingDeviceAllocator.h.
Referenced by ~CachingDeviceAllocator().