[DRAFT] experiment: add test sharding by daniel-sanche · Pull Request #17438 · googleapis/google-cloud-python

daniel-sanche · 2026-06-12T00:25:53Z

Experimenting with test sharding

Unit Tests

if less than 10 packages are changed, execute one test per python version (existing behaviour)
if more than 10 packages would be in a test, add an additional shard for each python runtime, and spread out packages between them
use an upper limit of 10 shards (per runtime), with packages evenly distributed
adds an end unit test complete step, which is only green if all shards pass. This can be our new required check for unit tests

System Tests

Previously system tests run sequentially as a single script
With this change, they run in parallel, reported as separate Targets in TestGrid
test logs are still all combined into the primary build.log for viewing in one place

Fail Fast

if any single unit test shard or system test job fails, the remaining end early to provide quick feedback

gemini-code-assist

Code Review

This pull request introduces parallel execution for system tests and implements a sharding mechanism for CI jobs, including a new Python script to group packages and updates to the test runner script. Feedback on these changes focuses on improving reliability and safety: first, by using an EXIT trap in .kokoro/system.sh to guarantee cleanup of isolated gcloud configuration directories in case of test failures; and second, by avoiding global toggles of set -e in ci/run_conditional_tests.sh and instead capturing test exit codes using the || operator.

gemini-code-assist · 2026-06-12T00:28:04Z

+  # Isolate gcloud config for parallel execution
+  export CLOUDSDK_CONFIG=$(mktemp -d)


Since run_package_test runs with set -e enabled, any test failure or error will cause the subshell to exit immediately. This prevents the manual cleanup at the end of the function from running, leading to leaked temporary directories in /tmp.

Using an EXIT trap ensures that the temporary directory is reliably cleaned up when the subshell exits, regardless of whether the tests succeeded or failed.

Suggested change

# Isolate gcloud config for parallel execution

export CLOUDSDK_CONFIG=$(mktemp -d)

# Isolate gcloud config for parallel execution

export CLOUDSDK_CONFIG=$(mktemp -d)

trap 'rm -rf "${CLOUDSDK_CONFIG}"' EXIT

gemini-code-assist · 2026-06-12T00:28:04Z

+  # Clean up isolated gcloud config
+  rm -rf "${CLOUDSDK_CONFIG}"


With the EXIT trap handling the cleanup of CLOUDSDK_CONFIG automatically, this manual cleanup is redundant and can be safely removed.

gemini-code-assist · 2026-06-12T00:28:04Z

+        set +e
+        ${test_script}
+        ret=$?
+        set -e


Toggling set -e globally can lead to unexpected side effects or mask errors in other parts of the script if not carefully managed. A more idiomatic and safer way to capture the exit status of a command under set -e is to use the || operator, which prevents the shell from exiting on failure without needing to disable set -e globally.

Suggested change

set +e

${test_script}

ret=$?

set -e

ret=0

${test_script} || ret=$?

daniel-sanche · 2026-06-19T00:03:49Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a package sharding mechanism for CI testing, adding a new Python script ci/get_package_shards.py and updating ci/run_conditional_tests.sh to support package lists, unique coverage files, and fallback sharding. It also configures test coverage reporting for sqlalchemy-spanner and updates coverage requirements for google-cloud-firestore. The review feedback highlights two key improvements in ci/get_package_shards.py: resolving potential path separator mismatches on Windows by avoiding mixed usage of os.path.join and forward slashes, and refining the sharding distribution logic to prevent highly unbalanced shards when the shard count is capped.

gemini-code-assist · 2026-06-19T00:06:03Z

+        if not os.path.exists(subdir):
+            continue
+        # Use the same sorting as the shell script
+        pkg_dirs = [os.path.join(subdir, d) + '/' for d in os.listdir(subdir) if os.path.isdir(os.path.join(subdir, d))]


Using os.path.join and then appending a hardcoded forward slash (/) can result in mixed path separators on Windows (e.g., packages\\google-cloud-bigtable/). Since git diff outputs always use forward slashes, this mismatch will cause the startswith check in get_packages_to_test to fail on Windows environments.

Using a consistent forward slash format for package paths avoids this issue.

Suggested change

pkg_dirs = [os.path.join(subdir, d) + '/' for d in os.listdir(subdir) if os.path.isdir(os.path.join(subdir, d))]

pkg_dirs = [f"{subdir}/{d}/" for d in os.listdir(subdir) if os.path.isdir(os.path.join(subdir, d))]

gemini-code-assist · 2026-06-19T00:06:03Z

+    shard_size = math.ceil(num_packages / num_shards)
+
+    shards = []
+    for i in range(num_shards):
+        start = i * shard_size
+        end = min((i + 1) * shard_size, num_packages)
+        if start >= num_packages:
+            break
+        shard_packages = packages[start:end]


Using math.ceil(num_packages / num_shards) as a fixed chunk size can lead to highly unbalanced shards when the number of shards is capped at 10. For example, if there are 101 packages, num_shards is capped at 10, and shard_size becomes ceil(101/10) = 11. This results in 9 shards of size 11 and 1 shard of size 2, which is highly unbalanced and defeats the purpose of efficient parallel sharding.

Using a contiguous even distribution formula ensures that the difference in package count between any two shards is at most 1.

Suggested change

shard_size = math.ceil(num_packages / num_shards)

shards = []

for i in range(num_shards):

start = i * shard_size

end = min((i + 1) * shard_size, num_packages)

if start >= num_packages:

break

shard_packages = packages[start:end]

shards = []

for i in range(num_shards):

start = (i * num_packages) // num_shards

end = ((i + 1) * num_packages) // num_shards

shard_packages = packages[start:end]

daniel-sanche added 4 commits June 11, 2026 17:21

added unit test sharding

01d4753

run system tests concurrently

a7502d4

added changes to many packages for testing

9ad0f0c

added sharding test file

02c7653

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

daniel-sanche added 22 commits June 11, 2026 17:29

added unit-complete to gather all shards

3c8d057

changed shard params

73ab091

fixed tests

74f1ef7

fix coverage

a450254

fix system tests

af4d10c

split out system test logs

15916df

attempt fix for lint

49cac0f

update system tests to show logs for each target

d8dd522

update sharding logic

de74a68

fixed lint/mypy runs

46d1583

removed sqlalchemy-spanner from touched packages

6e3531d

system tests print all logs in main build log

e6167bd

removed many SHARD_TEST files

fd5792d

updated lint and mypy logic

8b825d5

add shard descriptions

32968a9

remove global run on ci/ change

ff1274f

attempt cover fix

c49b06d

attempt fix for cover

1cd8c1f

allow hidden files for cover

a0f5c9a

fail fast

71f444a

use individual coverage checks

cf1a1f1

loosen firestore coverage requirement

b00967f

daniel-sanche changed the title ~~[DRAFT] chore: add test sharding~~ [DRAFT] experiment: add test sharding Jun 12, 2026

daniel-sanche added 2 commits June 12, 2026 16:43

activated more packages

7bb323c

10 packages total

3633089

change default for coverage percent

980d8eb

daniel-sanche mentioned this pull request Jun 13, 2026

chore(ci): optimize kokoro system tests with concurrency #17444

Open

daniel-sanche added 10 commits June 12, 2026 17:29

activated 11th package (enable sharding)

a5cc497

fixed typo

b36b559

change shard logic

c3cf49e

iterating on shard logic

acdfb18

rename tests

9c80a51

add label to number in shard

cf554ca

unit tests fail if initialize fails

886d6d9

added known-bad package

83ca04a

added summary to cover step

ae05642

added no-fail .coveragerc to sqlalchemy-spanner

7b565a4

daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 15, 2026

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 15, 2026

daniel-sanche and others added 2 commits June 18, 2026 16:59

Merge branch 'main' into ci_sharding

72503de

reverted systen test changes

e34b654

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] experiment: add test sharding#17438

[DRAFT] experiment: add test sharding#17438
daniel-sanche wants to merge 41 commits into
mainfrom
ci_sharding

daniel-sanche commented Jun 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

daniel-sanche commented Jun 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Isolate gcloud config for parallel execution
		export CLOUDSDK_CONFIG=$(mktemp -d)

		# Clean up isolated gcloud config
		rm -rf "${CLOUDSDK_CONFIG}"

	pkg_dirs = [os.path.join(subdir, d) + '/' for d in os.listdir(subdir) if os.path.isdir(os.path.join(subdir, d))]
	pkg_dirs = [f"{subdir}/{d}/" for d in os.listdir(subdir) if os.path.isdir(os.path.join(subdir, d))]

Conversation

daniel-sanche commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Tests

System Tests

Fail Fast

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

daniel-sanche commented Jun 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daniel-sanche commented Jun 12, 2026 •

edited

Loading