fix(transcribe): allow exact 4h duration and handle invalid metadata #13561

kelvinvelasquez-SDE · 2025-12-22T15:23:51Z

Description

This PR addresses an issue where audio files of exactly 4 hours (14400 seconds) were rejected by the TranscribeService, despite AWS allowing files up to 4 hours. It also adds robustness to handle cases where ffprobe returns invalid duration metadata (e.g., N/A), preventing the service from crashing.

Changes

Modified provider.py: Changed duration check from >= to > to include 14400.0s as valid.
Added Error Handling: Wrapped the duration parsing logic in a try/except block to gracefully handle ValueError or TypeError, defaulting to assuming validity (or 0.0s) and logging a warning instead of raising a 500 Internal Server Error.
New Tests: Added tests/unit/services/transcribe/test_provider.py with regression tests covering:
- Duration > 4h (Fails as expected)
- Duration = 4h (Passes)
- Duration = N/A (Handled without crash)

Verification

Ran the new unit tests locally and they pass. Verified that the service no longer throws unhandled exceptions for bad metadata.

localstack-bot · 2025-12-22T15:24:02Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

localstack-bot

Welcome to LocalStack! Thanks for raising your first Pull Request and landing in your contributions. Our team will reach out with any reviews or feedbacks that we have shortly. We recommend joining our Slack Community and share your PR on the #community channel to share your contributions with us. Please make sure you are following our contributing guidelines and our Code of Conduct.

kelvinvelasquez-SDE · 2025-12-22T15:27:46Z

I have read the CLA Document and I hereby sign the CLA

purcell

Thanks for taking the time to analyse, write and submit this PR. 🙏

I'll defer to the maintainers of this service for a more thorough review, but my quick feedback is that while the small fix itself may be worthwhile, the tests added here are not consistent with those elsewhere in the codebase, nor would they be maintainable in the long term. The scenarios should be possible to recreate with real sample files (without them becoming enormous), so it would be better to extend the more concrete tests instead.

purcell · 2025-12-22T15:53:00Z

tests/unit/services/transcribe/test_provider.py

+class TestTranscribeProvider(unittest.TestCase):
+    def test_transcribe_job_duration_limit(self):


While we use MagicMock in some tests, this codebase uses pytest rather than unittest for the tests themselves.

purcell · 2025-12-22T15:53:41Z

tests/unit/services/transcribe/test_provider.py

+
+            # Case 2: Duration = 4h (14400.0) - Should Pass Check
+            # Create a NEW job/store for clean state
+            job2_name = "test-job-boundary"
+            job2 = {
+                "TranscriptionJobName": job2_name,
+                "LanguageCode": "en-US",
+                "Media": {"MediaFileUri": "s3://test-bucket/boundary_audio.mp3"},
+                "Transcript": {"TranscriptFileUri": "s3://test-bucket/output.json"},
+                "TranscriptionJobStatus": "QUEUED",
+            }
+            store.transcription_jobs[job2_name] = job2
+
+            ffprobe_output["format"]["duration"] = "14400.0"
+            mock_run.return_value = json.dumps(ffprobe_output)


These cases should be separate tests, so that they can have their own name and meaning, and fail independently of each other.

purcell · 2025-12-22T15:57:13Z

tests/unit/services/transcribe/test_provider.py

+            # Mock _setup_vosk using target path instead of instance object for static method stability
+            with patch(
+                "localstack.services.transcribe.provider.TranscribeProvider._setup_vosk",
+                side_effect=RuntimeError("StopHere"),
+            ):
+                try:
+                    provider._run_transcription_job((store, job2_name))
+                except Exception:
+                    pass
+
+            # If it failed due to size, status would be FAILED with "Invalid file size..."
+            # If it failed due to size, status would be FAILED with "Invalid file size..."
+            # If it hit StopHere, status is FAILED. Even if reason is empty (?), it confirms size check passed.
+            self.assertEqual(job2["TranscriptionJobStatus"], TranscriptionJobStatus.FAILED)
+            self.assertNotIn("Invalid file size", job2.get("FailureReason", ""))


If you end up mocking/overriding internals of the object, the unit tests become brittle. Mocks in unit testing are not for modifying the object under test — rather, they are for programming the other collaborating objects to behave in a certain way.

The degree of patching and mocking here indicates to me that this is not going to be a maintainable test, and that it would be better to extend the higher-level tests in tests/aws/services/transcribe/ instead.

purcell · 2025-12-22T15:59:10Z

localstack-core/localstack/services/transcribe/provider.py

+                # If duration cannot be parsed, we assume it's invalid or fallback to letting it run
+                # But for strictness let's fail or default?
+                # Best practice: if we can't determine it, we can't validate it.
+                # Use a safe fallback or log/raise.
+                # Given user report of "N/A", let's fail to be safe as per AWS strictness?
+                # Or maybe default to fail if we assume it's a stream?
+                # Let's log and re-raise or handle.
+                # Simplest fix: treat N/A as failure?


This comment contains lots of questions and doesn't explain anything clearly, so it adds little to the reader's understanding of the code.

fix(transcribe): allow exact 4h duration and handle invalid metadata

cddfee5

kelvinvelasquez-SDE requested review from k-a-il and silv-io as code owners December 22, 2025 15:23

localstack-bot reviewed Dec 22, 2025

View reviewed changes

localstack-bot added a commit that referenced this pull request Dec 22, 2025

@kelvinvelasquez-SDE has signed the CLA in #13561

d4f90ae

purcell reviewed Dec 22, 2025

View reviewed changes

purcell added semver: patch Non-breaking changes which can be included in patch releases docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes labels Dec 22, 2025

test(transcribe): refactor duration test to integration suite

c71d841

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(transcribe): allow exact 4h duration and handle invalid metadata #13561

fix(transcribe): allow exact 4h duration and handle invalid metadata #13561

Uh oh!

kelvinvelasquez-SDE commented Dec 22, 2025

Uh oh!

localstack-bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

localstack-bot left a comment

Uh oh!

kelvinvelasquez-SDE commented Dec 22, 2025

Uh oh!

purcell left a comment

Uh oh!

purcell Dec 22, 2025

Uh oh!

purcell Dec 22, 2025

Uh oh!

purcell Dec 22, 2025

Uh oh!

purcell Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		class TestTranscribeProvider(unittest.TestCase):
		def test_transcribe_job_duration_limit(self):

Uh oh!

fix(transcribe): allow exact 4h duration and handle invalid metadata #13561

Are you sure you want to change the base?

fix(transcribe): allow exact 4h duration and handle invalid metadata #13561

Uh oh!

Conversation

kelvinvelasquez-SDE commented Dec 22, 2025

Description

Changes

Verification

Uh oh!

localstack-bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

localstack-bot left a comment

Choose a reason for hiding this comment

Uh oh!

kelvinvelasquez-SDE commented Dec 22, 2025

Uh oh!

purcell left a comment

Choose a reason for hiding this comment

Uh oh!

purcell Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

purcell Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

purcell Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

purcell Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

localstack-bot commented Dec 22, 2025 •

edited

Loading