CUDA memory allocator ignores alignment
In memory.cc:
uint64_t CudaDeviceAllocator::alloc(uint64_t len, uint16_t align)
{
auto p = ctx_->memalloc(len);
return reinterpret_cast<uint64_t>(p);
}
This clearly ignores alignment. We should see if an aligned allocation with CUDA is possible.