-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix(record): resolve record service deadlock and data race #3371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(record): resolve record service deadlock and data race #3371
Conversation
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
|
The CLA check failed. Please ensure you have:
After fixing these issues, comment 'recheck' to trigger the workflow again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you and congratulations 🎉 for opening your very first pull request in keploy
|
I have read the CLA Document and I hereby sign the CLA |
Signed-off-by: kartikgoyal137 <kartikcodes137@gmail.com>
7c26d22 to
f5705b1
Compare
pkg/service/record/record.go
Outdated
| utils.LogError(r.logger, err, "failed to stop recording") | ||
| } | ||
|
|
||
| defer close(appErrChan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove defer prefix as it is inside defer, also why have we moved it here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it lead to a data race condition where the channels closed first but the goroutines were still active
Signed-off-by: kartikgoyal137 <kartikcodes137@gmail.com>
Signed-off-by: kartikgoyal137 <kartikcodes137@gmail.com>
Signed-off-by: ahmed0-07 <ahmedmohamed00744@hotmail.com> Signed-off-by: kartikgoyal137 <kartikcodes137@gmail.com>
Resource leaks in proxy cleanup: - Remove early returns in StopProxyServer() to ensure mutex unlock, DNS server stop, listener close, and error channel close always execute - Fix handleConnection() defer block to always complete connection closure and parser goroutine wait - Collect and log errors instead of stopping cleanup prematurely - Prevents mutex deadlocks, DNS server leaks, connection leaks, and goroutine leaks Spelling fix: - Fix VersionIdenitfier to VersionIdentifier in multiple files Signed-off-by: Mayank Nishad <mayankn051@gmail.com> Co-authored-by: Akash Kumar <91385321+AkashKumar7902@users.noreply.github.com> Signed-off-by: kartikgoyal137 <kartikcodes137@gmail.com>
Signed-off-by: kartikgoyal137 <kartikcodes137@gmail.com>
112cc92 to
b4580b9
Compare
|
#3366 Implemented the suggested changes for this PR. |
Comprehensive Code Review: Record Service Deadlock and Data Race FixExcellent work on addressing critical concurrency issues! This PR tackles both deadlock and data race problems in the record service, along with several typo fixes. This is exactly the kind of defensive programming that prevents production incidents. 🌟 Major Strengths:
🔍 Detailed Analysis by File:1.
|
Signed-off-by: kartikgoyal137 <kartikcodes137@gmail.com>
Signed-off-by: kartikgoyal137 <kartikcodes137@gmail.com>
Signed-off-by: Kartik Goyal <kartikcodes137@gmail.com>
|
@harikapadia999 Thank you for the detailed review. I have implemented the said changes. |
Describe the changes that are made
This PR addresses critical concurrency stability issues in the Record service during shutdown:
Fixed Deadlock: Previously, the
insertTestErrChanandinsertMockErrChanused blocking sends. If database insertion failed rapidly and filled the buffer (size 10), the worker goroutines would block indefinitely because the main thread stops reading errors after the first one. I replaced these with a select block that respects ctx.Done(), ensuring workers exit immediately when the context is canceled.Fixed Data Race: The defer close(channel) statements were originally declared before the main defer block that waits for goroutines
errGrp.Wait(). This caused channels to be closed while workers were still trying to write to them, triggering a data race. I moved the close() calls to strictly aftererrGrp.Wait()ensures all producers have finished.Links & References
Closes: #3370
What type of PR is this? (check all applicable)
Added e2e test pipeline?
Added comments for hard-to-understand areas?
Added to documentation?
Are there any sample code or steps to test the changes?
Self Review done?
Any relevant screenshots, recordings or logs?
Additional checklist: