A simple caching allocator for device memory allocations. More...
#include <CachingDeviceAllocator.h>
Classes | |
struct | BlockDescriptor |
Public Types | |
typedef std::multiset< BlockDescriptor, Compare > | BusyBlocks |
Set type for live blocks (ordered by ptr) More... | |
typedef std::multiset< BlockDescriptor, Compare > | CachedBlocks |
Set type for cached blocks (ordered by size) More... | |
typedef bool(* | Compare) (const BlockDescriptor &, const BlockDescriptor &) |
BlockDescriptor comparator function interface. More... | |
using | GpuCachedBytes = cms::cuda::allocator::GpuCachedBytes |
Map type of device ordinals to the number of cached bytes cached by each device. More... | |
Public Member Functions | |
GpuCachedBytes | CacheStatus () const |
CachingDeviceAllocator (bool skip_cleanup=false, bool debug=false) | |
Default constructor. More... | |
CachingDeviceAllocator (unsigned int bin_growth, unsigned int min_bin=1, unsigned int max_bin=INVALID_BIN, size_t max_cached_bytes=INVALID_SIZE, bool skip_cleanup=false, bool debug=false) | |
Set of live device allocations currently in use. More... | |
cudaError_t | DeviceAllocate (int device, void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr) |
Provides a suitable allocation of device memory for the given size on the specified device. More... | |
cudaError_t | DeviceAllocate (void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr) |
Provides a suitable allocation of device memory for the given size on the current device. More... | |
cudaError_t | DeviceFree (int device, void *d_ptr) |
Frees a live allocation of device memory on the specified device, returning it to the allocator. More... | |
cudaError_t | DeviceFree (void *d_ptr) |
Frees a live allocation of device memory on the current device, returning it to the allocator. More... | |
cudaError_t | FreeAllCached () |
Frees all cached device allocations on all devices. More... | |
void | NearestPowerOf (unsigned int &power, size_t &rounded_bytes, unsigned int base, size_t value) |
cudaError_t | SetMaxCachedBytes (size_t max_cached_bytes) |
Sets the limit on the number bytes this allocator is allowed to cache per device. More... | |
~CachingDeviceAllocator () | |
Destructor. More... | |
Static Public Member Functions | |
static unsigned int | IntPow (unsigned int base, unsigned int exp) |
Public Attributes | |
unsigned int | bin_growth |
Mutex for thread-safety. More... | |
CachedBlocks | cached_blocks |
Map of device ordinal to aggregate cached bytes on that device. More... | |
GpuCachedBytes | cached_bytes |
Whether or not to print (de)allocation events to stdout. More... | |
bool | debug |
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators) More... | |
BusyBlocks | live_blocks |
Set of cached device allocations available for reuse. More... | |
unsigned int | max_bin |
Minimum bin enumeration. More... | |
size_t | max_bin_bytes |
Minimum bin size. More... | |
size_t | max_cached_bytes |
Maximum bin size. More... | |
unsigned int | min_bin |
Geometric growth factor for bin-sizes. More... | |
size_t | min_bin_bytes |
Maximum bin enumeration. More... | |
std::mutex | mutex |
const bool | skip_cleanup |
Maximum aggregate cached bytes per device. More... | |
Static Public Attributes | |
static const unsigned int | INVALID_BIN = (unsigned int)-1 |
Out-of-bounds bin. More... | |
static const int | INVALID_DEVICE_ORDINAL = -1 |
Invalid device ordinal. More... | |
static const size_t | INVALID_SIZE = (size_t)-1 |
Invalid size. More... | |
A simple caching allocator for device memory allocations.
active_stream
. Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.bin_growth
provided during construction. Unused device allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.bin_growth
^ min_bin
) are rounded up to (bin_growth
^ min_bin
).bin_growth
^ max_bin
) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.max_cached_bytes
, allocations for that device are simply freed when they are deallocated instead of being returned to their bin-cache.bin_growth
= 8min_bin
= 3max_bin
= 7max_cached_bytes
= 6MB - 1BDefinition at line 124 of file CachingDeviceAllocator.h.
typedef std::multiset<BlockDescriptor, Compare> notcub::CachingDeviceAllocator::BusyBlocks |
Set type for live blocks (ordered by ptr)
Definition at line 203 of file CachingDeviceAllocator.h.
typedef std::multiset<BlockDescriptor, Compare> notcub::CachingDeviceAllocator::CachedBlocks |
Set type for cached blocks (ordered by size)
Definition at line 200 of file CachingDeviceAllocator.h.
typedef bool(* notcub::CachingDeviceAllocator::Compare) (const BlockDescriptor &, const BlockDescriptor &) |
BlockDescriptor comparator function interface.
Definition at line 195 of file CachingDeviceAllocator.h.
Map type of device ordinals to the number of cached bytes cached by each device.
Definition at line 207 of file CachingDeviceAllocator.h.
|
inline |
Set of live device allocations currently in use.
Constructor.
bin_growth | Geometric growth factor for bin-sizes |
min_bin | Minimum bin (default is bin_growth ^ 1) |
max_bin | Maximum bin (default is no max bin) |
max_cached_bytes | Maximum aggregate cached bytes per device (default is no limit) |
skip_cleanup | Whether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate) |
debug | Whether or not to print (de)allocation events to stdout (default is no stderr output) |
Definition at line 280 of file CachingDeviceAllocator.h.
|
inline |
Default constructor.
Configured with:
bin_growth
= 8min_bin
= 3max_bin
= 7max_cached_bytes
= (bin_growth
^ max_bin
) * 3) - 1 = 6,291,455 byteswhich delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device
Definition at line 312 of file CachingDeviceAllocator.h.
|
inline |
Destructor.
Definition at line 762 of file CachingDeviceAllocator.h.
|
inline |
Definition at line 753 of file CachingDeviceAllocator.h.
Referenced by cms::cuda::deviceAllocatorStatus().
|
inline |
Provides a suitable allocation of device memory for the given size on the specified device.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
[in] | device | Device on which to place the allocation |
[out] | d_ptr | Reference to pointer to the allocation |
[in] | bytes | Minimum number of bytes for the allocation |
[in] | active_stream | The stream to be associated with this allocation |
Definition at line 356 of file CachingDeviceAllocator.h.
References INVALID_BIN.
|
inline |
Provides a suitable allocation of device memory for the given size on the current device.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
[out] | d_ptr | Reference to pointer to the allocation |
[in] | bytes | Minimum number of bytes for the allocation |
[in] | active_stream | The stream to be associated with this allocation |
Definition at line 566 of file CachingDeviceAllocator.h.
|
inline |
Frees a live allocation of device memory on the specified device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
Definition at line 581 of file CachingDeviceAllocator.h.
References cached_blocks, cached_bytes, debug, beam_dqm_sourceclient-live_cfg::live, and live_blocks.
|
inline |
Frees a live allocation of device memory on the current device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
Definition at line 686 of file CachingDeviceAllocator.h.
|
inline |
Frees all cached device allocations on all devices.
Definition at line 691 of file CachingDeviceAllocator.h.
Referenced by CUDAService::~CUDAService().
|
inlinestatic |
Integer pow function for unsigned base and exponent
Definition at line 216 of file CachingDeviceAllocator.h.
References newFWLiteAna::base.
Referenced by cms::cuda::allocator::getCachingDeviceAllocator(), and cms::cuda::allocator::getCachingHostAllocator().
|
inline |
|
inline |
Sets the limit on the number bytes this allocator is allowed to cache per device.
Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See FreeAllCached()
.
Definition at line 330 of file CachingDeviceAllocator.h.
unsigned int notcub::CachingDeviceAllocator::bin_growth |
Mutex for thread-safety.
Definition at line 255 of file CachingDeviceAllocator.h.
CachedBlocks notcub::CachingDeviceAllocator::cached_blocks |
Map of device ordinal to aggregate cached bytes on that device.
Definition at line 268 of file CachingDeviceAllocator.h.
Referenced by DeviceFree().
GpuCachedBytes notcub::CachingDeviceAllocator::cached_bytes |
Whether or not to print (de)allocation events to stdout.
Definition at line 267 of file CachingDeviceAllocator.h.
Referenced by DeviceFree().
bool notcub::CachingDeviceAllocator::debug |
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators)
Definition at line 265 of file CachingDeviceAllocator.h.
Referenced by DeviceFree(), rrapi.RRApi::dprint(), rrapi.RRApi::get(), runTauIdMVA.TauIDEmbedder::loadMVA_WPs_run2_2017(), and runTauIdMVA.TauIDEmbedder::runTauID().
|
static |
Out-of-bounds bin.
Definition at line 131 of file CachingDeviceAllocator.h.
Referenced by DeviceAllocate().
|
static |
Invalid device ordinal.
Definition at line 139 of file CachingDeviceAllocator.h.
|
static |
Invalid size.
Definition at line 134 of file CachingDeviceAllocator.h.
BusyBlocks notcub::CachingDeviceAllocator::live_blocks |
Set of cached device allocations available for reuse.
Definition at line 269 of file CachingDeviceAllocator.h.
Referenced by DeviceFree().
unsigned int notcub::CachingDeviceAllocator::max_bin |
Minimum bin enumeration.
Definition at line 257 of file CachingDeviceAllocator.h.
size_t notcub::CachingDeviceAllocator::max_bin_bytes |
Minimum bin size.
Definition at line 260 of file CachingDeviceAllocator.h.
size_t notcub::CachingDeviceAllocator::max_cached_bytes |
Maximum bin size.
Definition at line 261 of file CachingDeviceAllocator.h.
unsigned int notcub::CachingDeviceAllocator::min_bin |
Geometric growth factor for bin-sizes.
Definition at line 256 of file CachingDeviceAllocator.h.
size_t notcub::CachingDeviceAllocator::min_bin_bytes |
Maximum bin enumeration.
Definition at line 259 of file CachingDeviceAllocator.h.
|
mutable |
Definition at line 253 of file CachingDeviceAllocator.h.
const bool notcub::CachingDeviceAllocator::skip_cleanup |
Maximum aggregate cached bytes per device.
Definition at line 264 of file CachingDeviceAllocator.h.