A simple caching allocator for device memory allocations. More...
#include <CachingDeviceAllocator.h>
Classes | |
struct | BlockDescriptor |
class | TotalBytes |
Public Types | |
typedef std::multiset< BlockDescriptor, Compare > | BusyBlocks |
Set type for live blocks (ordered by ptr) More... | |
typedef std::multiset< BlockDescriptor, Compare > | CachedBlocks |
Set type for cached blocks (ordered by size) More... | |
typedef bool(* | Compare) (const BlockDescriptor &, const BlockDescriptor &) |
BlockDescriptor comparator function interface. More... | |
typedef std::map< int, TotalBytes > | GpuCachedBytes |
Map type of device ordinals to the number of cached bytes cached by each device. More... | |
Public Member Functions | |
CachingDeviceAllocator (bool skip_cleanup=false, bool debug=false) | |
Default constructor. More... | |
CachingDeviceAllocator (unsigned int bin_growth, unsigned int min_bin=1, unsigned int max_bin=INVALID_BIN, size_t max_cached_bytes=INVALID_SIZE, bool skip_cleanup=false, bool debug=false) | |
Set of live device allocations currently in use. More... | |
cudaError_t | DeviceAllocate (int device, void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr) |
Provides a suitable allocation of device memory for the given size on the specified device. More... | |
cudaError_t | DeviceAllocate (void **d_ptr, size_t bytes, cudaStream_t active_stream=nullptr) |
Provides a suitable allocation of device memory for the given size on the current device. More... | |
cudaError_t | DeviceFree (int device, void *d_ptr) |
Frees a live allocation of device memory on the specified device, returning it to the allocator. More... | |
cudaError_t | DeviceFree (void *d_ptr) |
Frees a live allocation of device memory on the current device, returning it to the allocator. More... | |
cudaError_t | FreeAllCached () |
Frees all cached device allocations on all devices. More... | |
void | NearestPowerOf (unsigned int &power, size_t &rounded_bytes, unsigned int base, size_t value) |
cudaError_t | SetMaxCachedBytes (size_t max_cached_bytes) |
Sets the limit on the number bytes this allocator is allowed to cache per device. More... | |
~CachingDeviceAllocator () | |
Destructor. More... | |
Static Public Member Functions | |
static unsigned int | IntPow (unsigned int base, unsigned int exp) |
Public Attributes | |
unsigned int | bin_growth |
Mutex for thread-safety. More... | |
CachedBlocks | cached_blocks |
Map of device ordinal to aggregate cached bytes on that device. More... | |
GpuCachedBytes | cached_bytes |
Whether or not to print (de)allocation events to stdout. More... | |
bool | debug |
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators) More... | |
BusyBlocks | live_blocks |
Set of cached device allocations available for reuse. More... | |
unsigned int | max_bin |
Minimum bin enumeration. More... | |
size_t | max_bin_bytes |
Minimum bin size. More... | |
size_t | max_cached_bytes |
Maximum bin size. More... | |
unsigned int | min_bin |
Geometric growth factor for bin-sizes. More... | |
size_t | min_bin_bytes |
Maximum bin enumeration. More... | |
std::mutex | mutex |
const bool | skip_cleanup |
Maximum aggregate cached bytes per device. More... | |
Static Public Attributes | |
static const unsigned int | INVALID_BIN = (unsigned int)-1 |
Out-of-bounds bin. More... | |
static const int | INVALID_DEVICE_ORDINAL = -1 |
Invalid device ordinal. More... | |
static const size_t | INVALID_SIZE = (size_t)-1 |
Invalid size. More... | |
A simple caching allocator for device memory allocations.
active_stream
. Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.bin_growth
provided during construction. Unused device allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.bin_growth
^ min_bin
) are rounded up to (bin_growth
^ min_bin
).bin_growth
^ max_bin
) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.max_cached_bytes
, allocations for that device are simply freed when they are deallocated instead of being returned to their bin-cache.bin_growth
= 8min_bin
= 3max_bin
= 7max_cached_bytes
= 6MB - 1BDefinition at line 123 of file CachingDeviceAllocator.h.
typedef std::multiset<BlockDescriptor, Compare> notcub::CachingDeviceAllocator::BusyBlocks |
Set type for live blocks (ordered by ptr)
Definition at line 199 of file CachingDeviceAllocator.h.
typedef std::multiset<BlockDescriptor, Compare> notcub::CachingDeviceAllocator::CachedBlocks |
Set type for cached blocks (ordered by size)
Definition at line 196 of file CachingDeviceAllocator.h.
typedef bool(* notcub::CachingDeviceAllocator::Compare) (const BlockDescriptor &, const BlockDescriptor &) |
BlockDescriptor comparator function interface.
Definition at line 186 of file CachingDeviceAllocator.h.
typedef std::map<int, TotalBytes> notcub::CachingDeviceAllocator::GpuCachedBytes |
Map type of device ordinals to the number of cached bytes cached by each device.
Definition at line 202 of file CachingDeviceAllocator.h.
|
inline |
Set of live device allocations currently in use.
Constructor.
bin_growth | Geometric growth factor for bin-sizes |
min_bin | Minimum bin (default is bin_growth ^ 1) |
max_bin | Maximum bin (default is no max bin) |
max_cached_bytes | Maximum aggregate cached bytes per device (default is no limit) |
skip_cleanup | Whether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate) |
debug | Whether or not to print (de)allocation events to stdout (default is no stderr output) |
Definition at line 275 of file CachingDeviceAllocator.h.
|
inline |
Default constructor.
Configured with:
bin_growth
= 8min_bin
= 3max_bin
= 7max_cached_bytes
= (bin_growth
^ max_bin
) * 3) - 1 = 6,291,455 byteswhich delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device
Definition at line 307 of file CachingDeviceAllocator.h.
|
inline |
Destructor.
Definition at line 742 of file CachingDeviceAllocator.h.
|
inline |
Provides a suitable allocation of device memory for the given size on the specified device.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
[in] | device | Device on which to place the allocation |
[out] | d_ptr | Reference to pointer to the allocation |
[in] | bytes | Minimum number of bytes for the allocation |
[in] | active_stream | The stream to be associated with this allocation |
Definition at line 350 of file CachingDeviceAllocator.h.
|
inline |
Provides a suitable allocation of device memory for the given size on the current device.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
[out] | d_ptr | Reference to pointer to the allocation |
[in] | bytes | Minimum number of bytes for the allocation |
[in] | active_stream | The stream to be associated with this allocation |
Definition at line 555 of file CachingDeviceAllocator.h.
|
inline |
Frees a live allocation of device memory on the specified device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
Definition at line 570 of file CachingDeviceAllocator.h.
|
inline |
Frees a live allocation of device memory on the current device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
Definition at line 672 of file CachingDeviceAllocator.h.
|
inline |
Frees all cached device allocations on all devices.
Definition at line 677 of file CachingDeviceAllocator.h.
Referenced by CUDAService::~CUDAService().
|
inlinestatic |
Integer pow function for unsigned base and exponent
Definition at line 211 of file CachingDeviceAllocator.h.
References newFWLiteAna::base.
Referenced by cms::cuda::allocator::getCachingDeviceAllocator(), and cms::cuda::allocator::getCachingHostAllocator().
|
inline |
Round up to the nearest power-of
Definition at line 226 of file CachingDeviceAllocator.h.
Referenced by SetMaxCachedBytes().
|
inline |
Sets the limit on the number bytes this allocator is allowed to cache per device.
Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See FreeAllCached()
.
Definition at line 325 of file CachingDeviceAllocator.h.
References bin_growth, cudaCheck, relativeConstraints::error, newFWLiteAna::found, INVALID_DEVICE_ORDINAL, max_bin, and NearestPowerOf().
unsigned int notcub::CachingDeviceAllocator::bin_growth |
Mutex for thread-safety.
Definition at line 250 of file CachingDeviceAllocator.h.
Referenced by SetMaxCachedBytes().
CachedBlocks notcub::CachingDeviceAllocator::cached_blocks |
Map of device ordinal to aggregate cached bytes on that device.
Definition at line 263 of file CachingDeviceAllocator.h.
GpuCachedBytes notcub::CachingDeviceAllocator::cached_bytes |
Whether or not to print (de)allocation events to stdout.
Definition at line 262 of file CachingDeviceAllocator.h.
bool notcub::CachingDeviceAllocator::debug |
Whether or not to skip a call to FreeAllCached() when destructor is called. (The CUDA runtime may have already shut down for statically declared allocators)
Definition at line 260 of file CachingDeviceAllocator.h.
Referenced by rrapi.RRApi::dprint(), rrapi.RRApi::get(), runTauIdMVA.TauIDEmbedder::loadMVA_WPs_run2_2017(), and runTauIdMVA.TauIDEmbedder::runTauID().
|
static |
Out-of-bounds bin.
Definition at line 130 of file CachingDeviceAllocator.h.
|
static |
Invalid device ordinal.
Definition at line 138 of file CachingDeviceAllocator.h.
Referenced by SetMaxCachedBytes().
|
static |
Invalid size.
Definition at line 133 of file CachingDeviceAllocator.h.
BusyBlocks notcub::CachingDeviceAllocator::live_blocks |
Set of cached device allocations available for reuse.
Definition at line 264 of file CachingDeviceAllocator.h.
unsigned int notcub::CachingDeviceAllocator::max_bin |
Minimum bin enumeration.
Definition at line 252 of file CachingDeviceAllocator.h.
Referenced by SetMaxCachedBytes().
size_t notcub::CachingDeviceAllocator::max_bin_bytes |
Minimum bin size.
Definition at line 255 of file CachingDeviceAllocator.h.
size_t notcub::CachingDeviceAllocator::max_cached_bytes |
Maximum bin size.
Definition at line 256 of file CachingDeviceAllocator.h.
unsigned int notcub::CachingDeviceAllocator::min_bin |
Geometric growth factor for bin-sizes.
Definition at line 251 of file CachingDeviceAllocator.h.
size_t notcub::CachingDeviceAllocator::min_bin_bytes |
Maximum bin enumeration.
Definition at line 254 of file CachingDeviceAllocator.h.
std::mutex notcub::CachingDeviceAllocator::mutex |
Definition at line 248 of file CachingDeviceAllocator.h.
const bool notcub::CachingDeviceAllocator::skip_cleanup |
Maximum aggregate cached bytes per device.
Definition at line 259 of file CachingDeviceAllocator.h.