Skip to content

Conversation

@kelvinvelasquez-SDE
Copy link

Description

Fixes #12423

This PR addresses an issue where audio files of exactly 4 hours (14400 seconds) were rejected by the TranscribeService, despite AWS allowing files up to 4 hours. It also adds robustness to handle cases where ffprobe returns invalid duration metadata (e.g., N/A), preventing the service from crashing.

Changes

  • Modified provider.py: Changed duration check from >= to > to include 14400.0s as valid.
  • Added Error Handling: Wrapped the duration parsing logic in a try/except block to gracefully handle ValueError or TypeError, defaulting to assuming validity (or 0.0s) and logging a warning instead of raising a 500 Internal Server Error.
  • New Tests: Added tests/unit/services/transcribe/test_provider.py with regression tests covering:
    • Duration > 4h (Fails as expected)
    • Duration = 4h (Passes)
    • Duration = N/A (Handled without crash)

Verification

Ran the new unit tests locally and they pass. Verified that the service no longer throws unhandled exceptions for bad metadata.

@localstack-bot
Copy link
Contributor

localstack-bot commented Dec 22, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Contributor

@localstack-bot localstack-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to LocalStack! Thanks for raising your first Pull Request and landing in your contributions. Our team will reach out with any reviews or feedbacks that we have shortly. We recommend joining our Slack Community and share your PR on the #community channel to share your contributions with us. Please make sure you are following our contributing guidelines and our Code of Conduct.

@kelvinvelasquez-SDE
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Contributor

@purcell purcell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to analyse, write and submit this PR. 🙏

I'll defer to the maintainers of this service for a more thorough review, but my quick feedback is that while the small fix itself may be worthwhile, the tests added here are not consistent with those elsewhere in the codebase, nor would they be maintainable in the long term. The scenarios should be possible to recreate with real sample files (without them becoming enormous), so it would be better to extend the more concrete tests instead.

Comment on lines 8 to 9
class TestTranscribeProvider(unittest.TestCase):
def test_transcribe_job_duration_limit(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we use MagicMock in some tests, this codebase uses pytest rather than unittest for the tests themselves.

Comment on lines 41 to 55

# Case 2: Duration = 4h (14400.0) - Should Pass Check
# Create a NEW job/store for clean state
job2_name = "test-job-boundary"
job2 = {
"TranscriptionJobName": job2_name,
"LanguageCode": "en-US",
"Media": {"MediaFileUri": "s3://test-bucket/boundary_audio.mp3"},
"Transcript": {"TranscriptFileUri": "s3://test-bucket/output.json"},
"TranscriptionJobStatus": "QUEUED",
}
store.transcription_jobs[job2_name] = job2

ffprobe_output["format"]["duration"] = "14400.0"
mock_run.return_value = json.dumps(ffprobe_output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cases should be separate tests, so that they can have their own name and meaning, and fail independently of each other.

Comment on lines 57 to 71
# Mock _setup_vosk using target path instead of instance object for static method stability
with patch(
"localstack.services.transcribe.provider.TranscribeProvider._setup_vosk",
side_effect=RuntimeError("StopHere"),
):
try:
provider._run_transcription_job((store, job2_name))
except Exception:
pass

# If it failed due to size, status would be FAILED with "Invalid file size..."
# If it failed due to size, status would be FAILED with "Invalid file size..."
# If it hit StopHere, status is FAILED. Even if reason is empty (?), it confirms size check passed.
self.assertEqual(job2["TranscriptionJobStatus"], TranscriptionJobStatus.FAILED)
self.assertNotIn("Invalid file size", job2.get("FailureReason", ""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you end up mocking/overriding internals of the object, the unit tests become brittle. Mocks in unit testing are not for modifying the object under test — rather, they are for programming the other collaborating objects to behave in a certain way.

The degree of patching and mocking here indicates to me that this is not going to be a maintainable test, and that it would be better to extend the higher-level tests in tests/aws/services/transcribe/ instead.

Comment on lines +325 to +332
# If duration cannot be parsed, we assume it's invalid or fallback to letting it run
# But for strictness let's fail or default?
# Best practice: if we can't determine it, we can't validate it.
# Use a safe fallback or log/raise.
# Given user report of "N/A", let's fail to be safe as per AWS strictness?
# Or maybe default to fail if we assume it's a stream?
# Let's log and re-raise or handle.
# Simplest fix: treat N/A as failure?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment contains lots of questions and doesn't explain anything clearly, so it adds little to the reader's understanding of the code.

@purcell purcell added semver: patch Non-breaking changes which can be included in patch releases docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes labels Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes semver: patch Non-breaking changes which can be included in patch releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: TranscribeService should fail for audio sizes exceeding 4 hours.

3 participants