fix: cleanup resources properly on BaseSession::_receive_loop cleanup
#1817
+89
−11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes Session cleanup is being failed, caused by task cancellation via
__aexit__not handled properly.Motivation and Context
Session cleanups the receive streams via
finallyclause ofBaseSession::_receive_loop.__aexit__), it cancels the TaskGroup.finallyclause, such asMemoryObjectStream::sendis its base coroutine_receive_loop, IS the TaskGroup which is being closed.I fixed this problem by adding
CancelScopewithshield=Trueonfinallyclause, ensures the cleanup logic to be executed properly even TaskGroup cancel is requested.Also, while cleaning up receive streams, we loop streams, therefore stream dictionary must not change.
BaseSessionAS-IS does not protect dictionary change on cleanup.How Has This Been Tested?
I found this issue while debugging the problem, using ADK AgentTool with MCP causes an abtruse error:
After a week of debugging, I realised the ultimate fix to this problem is quite huge and need fixes both MCP SDK and Google ADK, and this fix is one of them.
I added a simple test which triggers this issue.
Breaking Changes
finallyblock until every stream is being closed, the__aexit__will be blocked untilfinallyblock finishes its execution.finallyblock is being executed too long, user might think it is hanging.Types of changes
Checklist
No docs needed to be updatedAdditional context
This PR MAY resolve issues below:
read_stream_writeropen after SSE disconnection hanging.receive()#1811I think this problem is related to task leak (or indefinite-waiting) at Google ADK, since in ADK the tool call await forever until the receive stream responds.