Skip to content

Conversation

@Ankurdeewan
Copy link

@Ankurdeewan Ankurdeewan commented Dec 24, 2025

This fixes a crash in replay mode when multiple test sets are executed in parallel.

Replay was mutating shared maps and counters without synchronization, which could
lead to fatal error: concurrent map writes and incorrect test statistics.
The issue was reproducible under concurrent execution.

The fix adds instance-level synchronization to the Replayer and guards all shared
state access. Status/reporting code now reads from the same synchronized state or
from local snapshots, without changing behavior or output.

Fixes #3412

Copilot AI review requested due to automatic review settings December 24, 2025 17:52
@github-actions
Copy link

github-actions bot commented Dec 24, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@github-actions
Copy link

The CLA check failed. Please ensure you have:

  • Signed the CLA by commenting 'I have read the CLA Document and I hereby sign the CLA.'
  • Used the correct email address in your commits (matches the one you used to sign the CLA).

After fixing these issues, comment 'recheck' to trigger the workflow again.

@Ankurdeewan
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you and congratulations 🎉 for opening your very first pull request in keploy

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix concurrent map write panics during parallel test replay by adding synchronization to shared state access. The main changes convert global functions to instance methods and add mutex protection for shared variables.

Key Changes:

  • Added instance-level stateMu mutex to the Replayer struct
  • Converted GetCompleteTestRunReport() and GetTestRunTotals() from package-level functions to instance methods
  • Added mutex protection around all accesses to shared global state (completeTestReport, failedTCsBySetID, totalTests counters)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
pkg/service/replay/replay.go Added stateMu field to Replayer struct; protected all shared state accesses with mutex locks; added snapshot creation for failedTCsBySetID in printSummary
pkg/service/replay/status.go Converted GetCompleteTestRunReport and GetTestRunTotals to instance methods using the new stateMu mutex

Critical Issue Identified: The implementation uses an instance-level mutex (r.stateMu) to protect package-level global variables. If multiple Replayer instances run concurrently, each would have its own mutex but access the same global state, making the synchronization ineffective. The mutex should either be global (package-level) or the protected state should be moved to instance-level fields.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +38 to +39
r.stateMu.Lock()
defer r.stateMu.Unlock()
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code uses Lock() instead of RLock() for a read-only operation. This method only reads totalTests, totalTestPassed, and totalTestFailed without modifying them. It should use RLock() to allow concurrent reads, which would improve performance and is more semantically correct.

Suggested change
r.stateMu.Lock()
defer r.stateMu.Unlock()
r.stateMu.RLock()
defer r.stateMu.RUnlock()

Copilot uses AI. Check for mistakes.
instrument bool
isLastTestSet bool
isLastTestCase bool
stateMu sync.Mutex
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instance-level mutex stateMu is being used to protect package-level global variables (completeTestReport, failedTCsBySetID, totalTests, etc. defined at lines 40-50). If multiple Replayer instances run concurrently (e.g., in different goroutines or if multiple instances are created), each instance would have its own mutex but they would all access the same global variables, making the synchronization ineffective. The mutex should either be global (package-level) or the protected state should be moved to instance-level fields within the Replayer struct.

Copilot uses AI. Check for mistakes.
Comment on lines +290 to +296
r.stateMu.Lock()
initTotal := totalTests
initPassed := totalTestPassed
initFailed := totalTestFailed
initIgnored := totalTestIgnored
initTimeTaken := totalTestTimeTaken
completeTestReportMu.RUnlock()
r.stateMu.Unlock()
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code uses Lock() instead of RLock() for read-only operations. Since this code only reads the shared state variables (totalTests, totalTestPassed, totalTestFailed) and creates snapshots, it should use RLock() to allow concurrent reads, which would improve performance. Using Lock() here unnecessarily blocks other readers.

Copilot uses AI. Check for mistakes.
Comment on lines +477 to +480
r.stateMu.Lock()
passed := totalTestPassed
failed := totalTestFailed
completeTestReportMu.RUnlock()
r.stateMu.Unlock()
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code uses Lock() instead of RLock() for read-only operations. Since this code only reads the shared state variables to create snapshots without modifying them, it should use RLock() to allow concurrent reads and improve performance.

Copilot uses AI. Check for mistakes.
completeTestReportMu.RUnlock()
failedTCsBySetIDSnapshot := make(map[string][]string, len(failedTCsBySetID))
for key, val := range failedTCsBySetID {
failedTCsBySetIDSnapshot[key] = val
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot creation for failedTCsBySetIDSnapshot uses a shallow copy of the slice values. If the slices in failedTCsBySetID are modified after this snapshot is created, it could lead to race conditions or unexpected behavior. Consider using deep copy: failedTCsBySetIDSnapshot[key] = append([]string(nil), val...)

Suggested change
failedTCsBySetIDSnapshot[key] = val
failedTCsBySetIDSnapshot[key] = append([]string(nil), val...)

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the solution is
Create a failedTCsBySetIDSnapshot inside the RLock() section
use deep copt for the slice
failedTCsBySetIDSnapshot := make(map[string][]string, len(failedTCsBySetID))
for key, val := range failedTCsBySetID {
failedTCsBySetIDSnapshot[key] = append([]string(nil), val...)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! updated the snapshot to deep-copy the slice values.

Comment on lines +18 to +19
r.stateMu.Lock()
defer r.stateMu.Unlock()
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code uses Lock() instead of RLock() for a read-only operation. This method only reads from completeTestReport to create a snapshot and doesn't modify the shared state. It should use RLock() to allow concurrent reads, which would improve performance and is more semantically correct for read-only operations.

Suggested change
r.stateMu.Lock()
defer r.stateMu.Unlock()
r.stateMu.RLock()
defer r.stateMu.RUnlock()

Copilot uses AI. Check for mistakes.
@codeMaestro78
Copy link

@Ankurdeewan i have resolve all the issue related to current pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: Add mutex protection to prevent concurrent map writes in test replay

2 participants