c++ - Any CUDA operation after cudaStreamSynchronize blocks until all streams are finished -
While profiling my CUDA application with NVIDIA Visual Profiler, I saw that the If I enter Is there any concept wrong here? EDIT: I found another strange behavior, sometimes it is non-blocking between the processing of currents and then processes the rest of the currents. General Practices: < p> Strange behavior (often happens): < p> EDIT2: I'm testing on the Tesla K40c card with the calculation capability of 3.5 on CUDA 6.0. As suggested in the comments, though the memory transfer in my application is quite fast and I can be viable to reduce the number of streams and I mainly work dynamically scheduled Want to use currents GPU? The problem is that after the stream is over, I need to download the data from the pinned memory and clean the allocated memory for further streams which appears to be the blocking operation. I am using a stream for each data set because every data set I do not know why the operations are blocking but I have concluded that I can not do anything about it, so I decided to implement the memory and implement PUU memory To reuse pinned CPU memory, pooling (forms of tips in currents) Was switch to suggest) sections to avoid deletion of any kind. If anyone is interested then my solution is to start Kernel behaves as asynchronous operation which is called schedule kernel and callback after the kernel expires. And wait for the bus and all to end: And here is a graph form profiler, works as a charm! cudaStream synchronization After any operation, all the streams have expired, this is very strange behavior because if
cudaStream synchronizes gives it means that the stream is finished, right? Here's my fax code:
std :: list & lt; Std :: thread & gt; waitingThreads; Zero StartKernelSync (for {{int i = 0; i <200; ++ i} {KoodhostAllok (CPUPid Memory, Size, KoodhostAllow Default); Memcpy (cpuPinnedMemory, Data, Size); CudaMalloc (gpuMemory); CudaStreamCreate (& amp; stream); CudaMemcpyAynync (gpuMemory, cpuPinnedMemory, size, cudaMemcpyHostToDevice, stream); RunCernel & lt; & Lt; & Lt; 32, 32, 0, stream & gt; & Gt; & Gt; (GPU Memorial); CudaMemcpyAsync (cpuPinnedMemory, gpuMemory, Size, cudaMemcpyDeviceToHost, Stream); WaitingThreads.push_back (std :: move (std :: thread (waitForFinish, cpuPinnedMemory, stream))); } While (Waiting Streads.) () & Gt; 0) {waitingThreads.front () Join (); waitingThreads.pop_front (); }} Zero WaitForFinish (Zero * cpuPinnedMemory, cudaStream_t Stream, ...) {cudaStream Synchronize (Stream); cudaStreamDestroy (stream); // & lt; == Block memcpy (data, cpuPinnedMemory, size) until all streams are finished; cudaFreeHost (cpuPinnedMemory); cudaFree (gpuMemory); }
cudaStreamDestroy before
cudaFreeHost then it becomes a blocked operation.
std :: vector & lt; instance * & gt; m_idleInstances; Std :: vector & lt; Example * & gt; m_workingInstances; Void startKernelAsync (...) {// while searching for the finished stream (m_idleInstances.size () == 0) {findFinishedInstance (); If (m_idleInstances.size () == 0) {std :: chrono :: milliseconds do (10); std :: this_thread :: sleep_for (dur); }} Example * Example = m_idleInstances.back (); M_idleInstances.pop_back (); // fill cpu pinted memory cudaMemcpyAsync (..., stream); RunCernel & lt; & Lt; & Lt; 32, 32, 0, stream & gt; & Gt; & Gt; (GPU Memorial); CudaMemcpyAynync (..., stream); m_workingInstances.push_back (clusteringInstance); } For NiftyFlyingFull Instance () (For AUTO = this M_KIRING instance.BZIN (); it! = M_workingInstances.end ();) {EXAMPLE * inst = * it; CudaError_t Status = cudaStreamQuery (inst-> stream); If (position == cudaSuccess) {this = m_workingInstances.erase (this); M_callback (instance-> clusterGroup); M_idleInstances.push_back (inst); } And {++; }}}
Virtual Zero WaitingForFish () {while (m_workingInstances.size ()> gt; ; 0) {example; example = m_workingInstances.back (); m_workingInstances.pop_back (); m_idleInstances.push_back (example); CudaStreamSynchronize (instance- & gt; Stream); FinalizeInstance (example); }}
Comments
Post a Comment