This repository was archived by the owner on Jul 13, 2025. It is now read-only.
Fork Sync: Update from parent repository#36
Open
github-actions[bot] wants to merge 1678 commits into
Open
Conversation
…onnectivity (#19699) Add new clientmetric counters for establishing contact with peers while using cached network map data. To do this, instrument the magicsock.Conn with a bit to indicate whether its peer data came from a cached netmap. If so, there are two conditions we will count as establishing connectivity to a peer: - Receipt of a CallMeMaybe from a peer via disco. - Establishing a valid endpoint address for a peer. In vmtest, add Env.ClientMetrics to scrape metrics from the specified node. Use this to check that counters were updated in caching tests. Updates tailscale/projects#13 Updates #12639 Change-Id: Ie8cf3244ac8af4f5bcfe4d0d944078da2ba08990 Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Fixes #12778 Change-Id: If9f8b299cef0cb68f93b344845b5c6a5b7554d2c Signed-off-by: DeedleFake <deedlefake@users.noreply.github.com>
…services Adds two new cap resolution methods alongside the existing PeerCaps: PeerCapsForService(src netip.Addr, svcName tailcfg.ServiceName) resolves the service name to its VIP addresses via the node's service IP mappings and returns caps scoped to that service. Exposed on /v0/whois via the svc_name query parameter and on client/local.Client as WhoIsForService. PeerCapsForIP(src, dst netip.Addr) resolves caps against an arbitrary destination IP. Exposed on /v0/whois via the svc_addr query parameter and on client/local.Client as WhoIsForIP. svc_name takes priority over svc_addr when both are present. Invalid values for either return 400. The existing PeerCaps/WhoIs path is unchanged: without a service parameter, WhoIs returns only host-level caps. Updates tailscale/corp#41632 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
Replace the process-global Server.mu lookup in the packet send hot path
with a global hashtriemap mirror of local clientSet entries. The
authoritative clients map remains guarded by Server.mu; clientsAtomic is
only a lock-free fast path for active local clients.
Misses, stale inactive client sets, duplicate accounting, and mesh
forwarding still fall back to lookupDestUncached. This avoids taking
Server.mu for the common local active-client send path, at the cost of
adding one global concurrent map that mirrors Server.clients for local
peers.
The benchmark uses four destination peers. The before run sets
TS_DEBUG_DERP_DISABLE_PEER_HASHTRIE=true to force the old mutex lookup
path; the after run uses the hashtrie fast path.
goos: linux
goarch: amd64
pkg: tailscale.com/derp/derpserver
cpu: Intel(R) Xeon(R) 6975P-C
│ before │ after │
│ sec/op │ sec/op vs base │
LookupDestHashTrie-16 176.050n ± 1% 1.904n ± 6% -98.92% (p=0.000 n=10)
│ before │ after │
│ B/op │ B/op vs base │
LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
│ before │ after │
│ allocs/op │ allocs/op vs base │
LookupDestHashTrie-16 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
¹ all samples are equal
Updates #3560 (very indirectly, historically)
Updates #19713 (as an alternative to that PR)
Change-Id: Ifb72e5c9854ad00e938cd24c6ab9c27312f297e8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This fixes a log message where ipn/ipnlocal.shouldUseOneCGNATRoute would claim that an android machines was actually macOS. Updates #cleanup Updates #19652 Signed-off-by: Simon Law <sfllaw@tailscale.com>
…19721) This patch fixes a data race in wgengine/netstack that surfaced while running both TestTCPForwardLimits and TestTCPForwardLimits_PerClient. Because these two tests both setup the TS_DEBUG_NETSTACK envknob, a race happens because netstack.Impl.Close leaked its inject goroutine. The inject goroutine also reads the TS_DEBUG_NETSTACK envknob, so if it is still running when the next test starts, then it will break. This patch also cleans up the tests a bit, ensuring that neither of them run in T.Parallel. It also adds a T.Cleanup call to clear the envknob. Fixes #19720 Signed-off-by: Simon Law <sfllaw@tailscale.com>
Fixes tailscale/corp#40250 Signed-off-by: Fran Bull <fran@tailscale.com>
) Instead of having two entry points for running natlab tests, start converting the connectivity tests to use the vmtest framework. Grid and pair tests have yet to be moved over. Updates #13038 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
A missing hosts file is not a fatal error. We should log it, but still proceed and create a new one instead of failing the DNS reconfiguration completely. Fixes #19733 Signed-off-by: Nick Khyl <nickk@tailscale.com>
Adds a new NoiseRoundTripper field to tsd.Sys to expose an http.RoundTripper to make requests over the control plane Noise connection. This will be used in PAM use cases soon. Updates tailscale/corp#41800 Signed-off-by: Adriano Sela Aviles <adriano@tailscale.com>
…ns unchanged Warnables with a non-zero TimeToVisible are only published on the eventbus when they remain unhealthy long enough to become visible. However, we still publish a health.Change when a warning that was never visible (and was never published to the eventbus) becomes healthy. This PR fixes that and reduces churn when there is no actual state change. In particular, it avoids unnecessary IPN bus notifications sent to GUI/CLI clients, captive portal detection, etc. Updates tailscale/corp#39759 (noticed while working on it) Signed-off-by: Nick Khyl <nickk@tailscale.com>
Server.clientsAtomic was introduced in 6b72979 as a lock-free mirror of Server.clients to skip Server.mu on the packet send hot path. This drops the non-concurrent map and makes all the existing callers of the old plain map just use the concurrent map, but still holding Server.mu. BenchmarkLookupDestHashTrie is unchanged at ~2ns/op. Fixes #19726 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I0894e4d86914d152b9b5fef969a3184bcb96f678
…etry
Brings Subscriber[T] in line with the same non-generic-core pattern already
applied to SubscriberFunc[T] and Publisher[T]:
- Renames subscriberFuncCore to subscriberCore and shares it between
Subscriber[T] and SubscriberFunc[T]. Both typed facades hold a
*subscriberCore plus their respective per-T delivery state
(Subscriber: chan T; SubscriberFunc: nothing, the user callback is
captured in the dispatch closure).
- The bus's outputs map and subscriber-interface itab key on
*subscriberCore for both subscriber kinds, so adding a new Subscribe[T]
call site no longer pays a per-T itab, dictionary, or equality function
for the subscriber-interface side.
- Subscribe[T] now hoists the non-generic constructor portion into
newSubscriberCore (timer setup, core allocation, cached type/typeName,
unregister method-value), matching SubscribeFunc.
The dispatch loop is intentionally NOT extracted to a non-generic helper for
Subscriber[T], unlike SubscriberFunc[T]. The reason is the typed channel send
'case s.read <- t:' must appear lexically inside the select; the only way to
lift it into a non-generic loop is to bridge typed and untyped via a per-event
goroutine, which costs ~2.7x throughput on BenchmarkBasicThroughput. We keep
dispatchTyped on the generic facade and accept the per-shape stencil cost as
the cheaper alternative.
Symbol-level effect on tailscaled (linux/amd64, measured via
`go tool nm -size`):
Before:
(*Subscriber[T]).dispatch
2 shape stencils: 1,682 + 1,549 = 3,231 B
3 thin per-T wrappers: 124 B each = 372 B
2 deferwrap1 helpers: 62 B each = 124 B
total: 3,727 B
After:
(*Subscriber[T]).dispatchTyped
2 shape stencils: 1,678 + 1,582 = 3,260 B
0 per-T wrappers (replaced by closure stored on core)
2 deferwrap1 helpers: 62 B each = 124 B
total: 3,384 B
dispatch path .text delta: -343 B (-9.2%)
Per-shape stencils are ~1,600 B (.text body) + ~1,100 B (pclntab) =
~2,700 B each on production tailscaled. The shape count matches before/after
(two distinct GC shapes for the Subscriber[T] event types in this binary).
What changes is that the per-T thin wrappers are eliminated because
Subscriber[T] no longer implements the subscriber interface directly.
Whole-binary section deltas:
.text: -2,304 B (includes the dispatch savings plus other
small downstream effects)
.rodata: +512 B (additional closure-type metadata)
.gopclntab: -2,981 B (fewer per-T compiled functions => less metadata)
Stripped tailscaled (linux/amd64): no change at the file level (the savings
fall below the linker's section-alignment boundary). Unstripped builds shrink
by ~2,900 B.
Behavior is unchanged:
BenchmarkBasicThroughput: 2,161 ns/op, 0 B/op, 0 allocs/op
BenchmarkBasicFuncThroughput: 2,493 ns/op, 144 B/op, 2 allocs/op
BenchmarkSubsThroughput: 3,727 ns/op, 0 B/op, 0 allocs/op
Updates #12614
Change-Id: I97918ec68bd2cdb15958bbfd7687592b39663efe
Signed-off-by: James Tucker <james@tailscale.com>
…eck (#19725) Fix the following issues: 1. Endianness Bug: The nftables runner used hardcoded big-endian byte arrays for firewall mark values (0xff0000, etc.), breaking bitwise operations on little-endian systems (all x86/x64, ARM). This caused connmark save/restore rules to silently fail. Fixed by using binary.NativeEndian to generate correct byte order for the host system. 2. Connmark Restore Conditional Check: The connmark restore mechanism unconditionally overwrote packet marks, even when Tailscale hadn't set any mark bits in conntrack. This destroyed mark bits set by other systems (VPNs, policy routing, vendor flags), breaking coexistence. Fixed by adding a conditional check to only restore when (ct mark & 0xff0000) != 0, preventing the worst case of wiping all marks to zero. Changes: - util/linuxfw/linuxfw.go: Added nativeEndianUint32() helper and updated all mask functions to use native byte order instead of hardcoded bytes - util/linuxfw/nftables_runner.go: Added conditional check in makeConnmarkRestoreExprs() to only restore when ct mark has Tailscale bits set; added detailed comment about bit preservation limitations - util/linuxfw/iptables_runner.go: Added conditional check using -m connmark ! --mark to match nftables behavior - Tests updated: Fixed byte-level regression tests to expect little-endian byte sequences and verify the new conditional check Note: Perfect bit preservation in nftables remains challenging due to nftables expression VM limitations. The current implementation prevents the critical case of wiping marks with zero. Updates #3310 Fixes #11803 Related to #8555 Signed-off-by: Mike O'Driscoll <mikeo@tailscale.com>
The codegen path for map-of-slice-of-pointer fields, skipped nil-valued entries. That dropped the key from the map. This broke how dns.Config.Routes uses nil values sentinels. Fixes #19730 Fixes #19732 Fixes #19746 Fixes #19744 Change-Id: Ic6400227f4ab21b3ca0e8c0eeecf9b83d145a9ab Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
The label "natlab" is a bit confusing and also used for other things. Instead, change the trigger label to "run-natlab-tests". Updates #13038 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
In a lot of places, we construct an error to End a step, then immediately log it to the governing test as test fatal. Save ourselves a bit of boilerplate by putting methods on Step for that. There are a couple cases this doesn't cover, e.g., where we construct the Step outside a subtest that wants to fail individually, but it helps enough to pay for its lines. Updates #13038 Change-Id: I71f9900942962de16609b6b198d3ba13d6958a5f Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Add a VM-based natlab test that exercises the peer-relay feature (feature/relayserver) end-to-end across three Tailscale nodes whose network topology makes a direct A<->B UDP path impossible: both peers are behind HardNAT (FreeBSD/pfSense-style endpoint-dependent NAT) with no port-mapping services, while the relay node is behind One2OneNAT so its STUN-discovered WAN endpoint is reachable from both peers. The test enables the relay server via EditPrefs, then waits for an a->b PingDisco whose PingResult.PeerRelay is set (proving magicsock chose the peer-relay path, not DERP), and finally asserts that the relay's DebugPeerRelaySessions LocalAPI reports the session. The existing TestPeerRelayPing in tstest/integration runs three tailscaled processes on the loopback interface with no NATs; this new vmtest covers peer relay through real per-VM kernels and NATs. To wire control-server capabilities into vmtest, also add a PeerRelayGrants() EnvOption (sibling of AllOnline, SameTailnetUser) that flips testcontrol.Server.PeerRelayGrants so the wildcard packet filter grants tailcfg.PeerCapabilityRelay and PeerCapabilityRelayTarget; without those caps magicsock won't consider any peer a candidate relay. Updates #13038 Change-Id: Ib3440b83ec442da0d3b89ffa48ceea9398ea9062 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Since f343b49 ("wgengine, all: remove LazyWG, use wireguard-go callback API for on-demand peers"), Reconfig is fully synchronous: magicConn.UpdatePeers, wgdev.RemovePeer, router.Set, and dns.Set all return when the work is done, and the peer list is updated under wgLock before Reconfig returns. So after Reconfig with empty configs, len(st.Peers) is already 0. The old loop also waited for st.DERPs to drain to 0, but UpdatePeers only edits maps; active DERP connections idle out on their own timeout. The sole caller (LocalBackend.stopEngineAndWait) doesn't inspect st.DERPs anyway; it just hands the Status to setWgengineStatusLocked. So the drain-wait was for nothing observable and could theoretically (or at least appear to readers to) loop forever holding b.mu. Remove that reader confusion by removing the backoff loop entirely. Updates #19759 Change-Id: Ibfac3f0baabcad7604b713c934a8fc37932e0a50 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…scale CI cibuild.On() returns true for any CI environment that sets CI=true, including Alpine Linux's package build CI. TestTsgoRevInCacheKey was guarded by cibuild.On() (or use of tsgo), so it ran under Alpine's CI with stock Go, where go.toolchain.rev isn't blended into build cache keys, and unsurprisingly failed. Add cibuild.OnTailscaleCI, which keys off GITHUB_REPOSITORY_OWNER to distinguish tailscale/tailscale's own GitHub Actions CI from arbitrary downstream CI, and use it in TestTsgoRevInCacheKey. Fixes #19754 Change-Id: Id31cfe71903a235f1460dca1e2fdf334e3ba1ee5 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
…ls (#19757) linuxRouter has two blocks (connmark rules and the CGNAT drop rule) that gate on cfg.NetfilterMode, the requested config state. This may cause an error when setNetfilterModeLocked fails, since it may keep assuming this config is valid. We now gate both blocks on r.netfilterMode, matching the pattern used by SNAT, stateful, and loopback paths. Fixes #19737 Change-Id: Ia6003a082db99c376e662132d725661afbac0ee9 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Updates tailscale/corp#37904 Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d Signed-off-by: Alex Chan <alexc@tailscale.com>
Move the inline CSS and JS into separate files to be more friendly to Content Security Policies. ServeHTTP is updated to serve these assets from the '/static/' path. Updates tailscale/corp#32398 Signed-off-by: Noel O'Brien <noel@tailscale.com>
RouteCheck, which checks that overlapping routers are reachable, is enabled by default for both tailscaled and tsnet. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>
The Engine watchdog wrapped every wgengine.Engine method call in a goroutine with a 45s timeout and crashed the process on timeout. It was added years ago to surface deadlocks during development, but the underlying deadlocks have long since been fixed, and even when it did fire it produced obscure stack traces (from inside the watchdog goroutine, not the original caller) without buying much. Audit of userspaceEngine's methods shows none have cyclic locking or unbounded blocking now that ResetAndStop no longer loops waiting for DERPs to drain (fa49009). The watchdog is dead weight; remove it along with the TS_DEBUG_DISABLE_WATCHDOG escape hatch. Updates #19759 Change-Id: Iba9d718fe1f8718a6631296e336b138c31b99ff1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Issue #19737 ran into a nil pointer dereference, the cause of which was fixed by #19761. If we end up on this code path with a nil table again, we should bubble that up as an error (which is logged by the health warning system) rather than failing catastrophically. Signed-off-by: Naman Sood <mail@nsood.in>
If the context given to DialContext has a shorter lifetime than the OS TCP SYN timeout, and TCP SYNs are dropped from the path to the remote, DialContext would never fall back to try IPv6 after IPv4. Instead, use the normal happy eyeballs race if there is more than one address. This does remove the implicit prioritization of IPv4 over IPv6 in cases where there is only a single IPv4 remote address. Updates #13346 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
A data race in a package matters more than any individual test
result. Two related problems:
1. Where go test's race detector text ("WARNING: DATA RACE" plus
the goroutine stack traces) lands in JSON output is timing-
dependent: it can be attributed to a test that ends up reporting
PASS (e.g. when the racing goroutines outlive the test that
spawned them and TSan prints during a different test's window).
testwrapper's main loop only flushes the logs of failed tests,
so the race report ends up stuck in a passing test's buffer and
is silently dropped. The race builders just see a bare
"FAIL\nFAIL\tpkg\ttime".
2. If the failing test in such a package happens to be marked flaky,
testwrapper retries it. That is the worst possible response to a
race: the flaky test might not even be the racy code, and a
second run without the racy goroutines could "succeed" while
hiding the real bug.
Address both: scan every output line for the race detector's first-
line marker. Track whether the package observed a race at all, on
the pkgFinished testAttempt. When a race was seen, fold every per-
test log buffer into the package-level logs (so the full report
surfaces from the existing pkg-fail flush path), and drop any
flaky-test retry plans for that package so we fail immediately
instead of running another attempt.
Two new tests:
- TestRaceSuppressesFlakyRetry verifies that a flaky test alongside
a racy test does NOT get retried.
- TestRaceAttributedToPassingTest verifies that a race attributed by
test2json to a passing test still surfaces in the output.
Also add a corpus of captured raw test binary outputs under
cmd/testwrapper/testdata/, with one subdirectory per scenario,
documenting the six representative shapes that go test -race can
emit (race in test body, race in goroutines that outlive a test,
race forced into a later test, race in TestMain post-m.Run, and a
parallel-tests split-attribution case via a "=== NAME" redirect
line). See its README.md for details.
Fixes #19603
Change-Id: Ifbfcd67fb3b1882c4907bd9cb2d68a8b5a91dd54
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Pin govulncheck to resolve panics in the most recent version. Updates #cleanup Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
The watchdog (ipn/ipnlocal/watchdog.go) was abusing PeerForIP with an invalid netip.Addr as a way to acquire and release the engine's internal locks for deadlock detection. This does the TODO to break it out into its own method like all the other similarly named methods. Splitting this out as a prerequisite for a follow-up rewrite of PeerForIP itself; not having to preserve the lock-probe overload in the new implementation keeps that follow-up smaller. Updates #12542 Updates #cleanup Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I25cbffd11aeb65600d9128845404c4918ef88ead
…ression Otherwise we may never handshake a new peer relay server endpoint around remote client restarts and/or disco key rotation. Updates #20215 Signed-off-by: Jordan Whited <jordan@tailscale.com>
Another baby step toward removing slices of peers from the engine. getStatus iterated peerSequence (a key snapshot built in Reconfig from cfg.Peers) and then asked wgdev for each peer's stats; peers that weren't active in wgdev silently fell out. Iterate active wgdev peers directly via RemoveMatchingPeers(returnFalse) instead. Updates #12542 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I3abd348abc30db706db29b3a785179259e48abda
userspaceEngine.PeerForIP read from e.netMap.Peers and e.lastCfgFull.Peers, both of which go stale when peers arrive via netmap deltas (which skip Engine.SetNetworkMap and Engine.Reconfig). Every PeerForIP caller (Engine.Ping, the TSMP disco-key handler, pendopen diagnostics, tsdial.Dialer.UseNetstackForIP, and LocalBackend.GetPeerEndpointChanges) would report "no matching peer" for freshly-added peers. Fix it the same way SetPeerByIPPacketFunc fixed the outbound packet hot path: have LocalBackend install a callback that reads the live nodeBackend. nb.NodeByAddr is built from both SelfNode and Peers (updateNodeByAddrLocked), so a single lookup covers the common case with IsSelf set when the matched node ID is SelfNode's. The subnet- route / exit-node-default-route slow path goes through a new Engine.PeerKeyForIP that exposes the engine's AllowedIPs BART table (the same table the outbound packet hot path already consults, with exit-node selection honored), and resolves the matched key back to a NodeView via the live nodeBackend. Updates #12542 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I0d4b0d8997c8e796b7367c46b49b61d4fdc717b0
The logging added in 12188c0 was generating excessive spam in backend logs. This may have been exacerbated by tailscale GUI<->backend architecture on certain platforms like Windows, where the GUI polls for exit node suggestions rather than listening on the IPN bus. Change this to log on error or if the current suggestion differs from the previous suggestion. Updates tailscale/corp#43691 Updates #20194 Signed-off-by: Amal Bansode <amal@tailscale.com>
Most of our flag descriptions start with a lowercase word (except proper nouns); fix the handful which do not. Fixes #20230 Change-Id: I00aaac171254c050ad0b75c2cf8746590c8c4d8f Signed-off-by: Alex Chan <alexc@tailscale.com>
Add a retry loop with BatchMode=yes to absorb the race window between Env.Start() returning (when tta reports the tailscale backend as Running) and cloud-init finishing the user/SSH-key setup. In CI, the second VM's tta agent has been observed connecting only a few hundred milliseconds before the test SSHes in, which is inside the window where /root/.ssh/authorized_keys hasn't fully landed yet. SSH key auth then fails and ssh(1) falls back to interactive password prompts (3x), wasting time and producing a confusing "Permission denied (publickey,password)" error. BatchMode=yes makes the client fail fast on auth failure instead of prompting, and the retry loop handles SSH transport-level errors (exit code 255) for up to 30 seconds with 500ms backoff. Remote command non-zero exits still pass through unchanged. Fixes #20228 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I17f7422e9e27bf7b995f505c0184cbb2b230ed81
Env.Start boots all VM nodes in parallel; each calls createCloudInitISO -> ensureDebugSSHKey concurrently. When /tmp/vmtest_key doesn't yet exist, the first goroutine creates it with os.WriteFile, which opens with O_CREATE|O_TRUNC and briefly leaves the file existing-but-empty between the open and the subsequent write. A concurrent goroutine that hits that window sees ReadFile succeed with zero bytes, then fails ssh.ParsePrivateKey with "ssh: no key found", causing boot to fail with: boot: creating cloud-init ISO: parse /tmp/vmtest_key: ssh: no key found Observed in CI on TestSiteToSite (3 nodes). Wrap the function in a package-level Mutex so the first caller fully writes the key before any other caller reads it. Updates #20228 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: Ie6399dcba0c397bb8041931d3de1c6063a11c568
tsdial.Dialer.SetNetMap rebuilt an O(n peers) map of MagicDNS names on every netmap change. As we move toward per-peer incremental deltas, this becomes quadratic. This removes it and replaces it with SetResolveMagicDNS, a callback into LocalBackend that looks up hostnames from nodeBackend's new nodeByName index (populated alongside nodeByAddr/nodeByKey on both full and delta paths). The index stores both FQDNs and short names as keys. This is the same treatment applied to netlog (8f21045), wglog (988b090), and drive (1d69894): stop pushing *netmap.NetworkMap into subsystems and instead have them pull from LocalBackend's live data via callbacks. Updates #12542 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I24557ab0c8a27636e08e4779bcfd3ec633db0a78
Add zizmor GitHub Actions linting on changes to .github/workflows. Updates tailscale/corp#28760 Signed-off-by: Patrick O'Doherty <patrick@tailscale.com>
…20199) Router.Set reconciled tailscale0's addresses only against the in-memory r.addrs map, which starts empty each run. After a restart the kernel can still hold the addresses a previous profile put on tailscale0. With no record of them, Set never removed them, leaving two tailnets' CGNAT addresses on the interface. That broke connectivity, because the kernel could source traffic from the wrong IP. Fix this by scanning the addresses actually on the interface and, after reconciling the desired set, removing any in Tailscale's CGNAT/ULA ranges that aren't in the config. Non-Tailscale addresses are never touched, and IPv6 addresses are skipped when IPv6 is unavailable, since delAddress no-ops there. To avoid a netlink dump on every Set, the scan runs only on the first Set and when the desired address set changes. This also needs the iptables DelLoopbackRule to tolerate a missing rule: an orphan left by a previous instance never went through AddLoopbackRule here, and iptables (unlike nftables) errors when deleting an absent rule, which would otherwise block the address delete. Fixes #19974 Signed-off-by: Brendan Creane <bcreane@gmail.com>
The primary purpose is that return packets from the target app get properly SNATed on connectors with --tun=userspace-networking, matching the NAT behavior in the kernel tun path. This is also necessary but not sufficient for clients of connectors in userspace networking mode. The hook will DNAT MagicIPs, but won't actually be sent MagicIPs until conn25 app connector DNS works with userspace networking. Fixes tailscale/corp#43201 Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
The engine only used the netmap to look up self addresses and the self node's primary routes, so pass it the self node directly rather than the whole netmap. Updates #12542 Change-Id: I13c0028eed65d2177baf4cf6c449f5e441845a18 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
setWebClientAtomicBoolLocked and setDebugLogsByCapabilityLocked each only need the node capabilities to decide what to do, so take a set.Set[tailcfg.NodeCapability] directly as part of getting rid of netmap.NetworkMap. Updates #12542 Change-Id: If7c30b6354fd42dfe82ed6d2e2fe3439de401315 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
No code changes needed; this is to rule out cmpver as the source of any version-comparison issues. Updates #20238 Change-Id: Ib8765dd042e994549d9e2c03859a5f769a856704 Signed-off-by: Alex Chan <alexc@tailscale.com>
364b952 switched containerboot to partial netmap fetching, but stopped refreshing `DNS.ExtraRecords`, so Tailscale Services created after pod boot were invisible to resolveTailnetFQDN. To fix we watch for SelfChange ipn bus notifies, and refetch dns-config via LocalAPI to get a fresh set of `DNS.ExtraRecords`. Fixes #20233 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
… receive extensions" (#20257) * Revert "control/controlclient: continue map poll during key expiry to receive extensions" This reverts commit 6a822dc. This commit has caused test failures in the corp repo by unexpected changing the login behaviour when nodes have a valid node key. Updates tailscale/corp#43705 Updates #19326 Signed-off-by: Alex Chan <alexc@tailscale.com> * Revert "tsnet: test key extension after server restart" This reverts commit 3172013. This test relies on changes in 3172013, which is also being reverted because it causes test failures in corp. Updates tailscale/corp#43705 Updates #19326 Signed-off-by: Alex Chan <alexc@tailscale.com> --------- Signed-off-by: Alex Chan <alexc@tailscale.com>
…20169) This patch adds a new `client-side-reachability-routecheck` node attribute to allow admins to selectively enable background routecheck probing on trial nodes. The current implementation is still experimental. It adds the routecheck.IsEnabled helper to check for the new `client-side-reachability-routecheck` node attribute alongside the existing `client-side-reachability` node attribute in this node’s self capabilities. This allows administrators to turn on and off this feature by editing the policy file. It adds the `TS_DEBUG_FORCE_CLIENT_SIDE_REACHABILITY_ROUTECHECK` environment variable which can be set to override the policy file. When set to `true`, it forcibly enables this feature. And when set to `false`, it forcibly disables it. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>
Occasionally CI jobs will flake because downloading from GitHub fails. Allow retrying up to 3 times to reduce CI flakiness. Updates #cleanup Change-Id: Ib019e89ac74b81d78f71a40099b20ff60014a81f Signed-off-by: Alex Chan <alexc@tailscale.com>
…err (#19968) On optimistic lock error, requeue the event after a short duration. Resolves a case where a failure to acquire an optimistic lock on the dnsrecords configmap will cause the operator to drop a reconcile event and leave the configmap in an undesirable state. Updates #19946 Signed-off-by: Alex Freestone <freestone.alex@gmail.com>
updates tailscale/corp#44019 WebClient is very useful for remote management on tvOS (which cannot do ssh). Let's include it there. Minimal corresponding tailscale/corp changes to follow to add UI to set the required prefs. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
We stopped reading this field nearly two years ago, with a TODO comment to remove it sometime in 2025. It is now 2026. Updates #12058 Change-Id: I8ddf1c2e4c3c428e8d45a6491d3899368ec52c30 Signed-off-by: Alex Chan <alexc@tailscale.com>
…nsion
The ACME serialization mutex (acmeMu) was a package-level global, and
several ACME-related fields lived on LocalBackend even though the
cert code is conditional and not linked into every binary. With
multiple tsnet.Servers in one process (each its own LocalBackend),
a process-wide acmeMu also serialized unrelated backends.
Introduce a new feature/acme extension that owns the per-LocalBackend
ACME/cert state in an ipnlocal.CertState value:
- acmeMu, renewMu, renewCertAt (previously package globals)
- pendingACMETLSALPNCerts, pendingCertDomains{,Mu},
getCertForTest, certRefreshCancel (previously LocalBackend
fields, only meaningful when ACME was compiled in)
ipnlocal/cert.go now reaches the state through b.certState(), which
is routed by a feature.Hook installed at init by feature/acme. The
CertState type lives in ipnlocal so cert.go can access its fields
directly without a method explosion; the extension in feature/acme
constructs and owns it.
This is a baby step. The end goal is for the entire cert/ACME code
to live in feature/acme, with ipnlocal only retaining whatever thin
hooks the rest of LocalBackend needs to call into it. The current
split (CertState and most of cert.go in ipnlocal, extension wrapper
in feature/acme) is a deliberately temporary middle ground that
keeps this PR small while making the next moves mechanical.
The package is named feature/acme to match the existing HasACME /
ts_omit_acme naming. condregister/maybe_acme.go wires it in for
non-js builds.
Updates #12614
Updates #20248
Updates #20249
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I520909f24ad11a9622ef33c2290fe36ad44d6f71
GitHub's built-in CODEOWNERS only supports a hard "block until a team member reviews" rule, with no way to leave an audit trail when the requirement is intentionally bypassed. Move review enforcement to palantir/policy-bot (https://github.com/palantir/policy-bot) running at https://policybot.corp.ts.net, which lets us express the same tailcfg/ -> control-protocol-owners rule plus an explicit override: any other @tailscale/dev member can post policybot-override: <reason> as a PR comment and that comment counts as their approval, with the reason recorded in the PR conversation as a permanent audit trail. CODEOWNERS is kept as a one-screen comment so anyone landing on it expecting the old behavior is directed to .policy.yml. Updates tailscale/corp#13972 Change-Id: I2dc3619c498d4c4a6decae29aa123f6d67905eed Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The override comment didn't work as expected. (I'll be updating the policytest package to handle this) Updates tailscale/corp#13972 Change-Id: Ic5c16eed09c8cb5fa8dab37d43cf05f8dfa75d49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
prometheus/common v0.66/v0.67 introduced a mandatory model.ValidationScheme on expfmt.TextParser as part of prepping for UTF-8 metric/label names in Prometheus 3.0. The zero value is intentionally UnsetValidation, which panics on the first call to IsValidMetricName / IsValidLabelName with Invalid name validation scheme requested: unset so the long-standing "var parser expfmt.TextParser" pattern crashes at runtime. Several big downstreams have hit the same sharp edge: thanos-io/thanos#8823 grafana/loki#21401 Switch our two callers (parseMetrics in tsnet's TestUserMetricsByteCounters and the client-metrics scraper in tstest/natlab/vmtest) to the new expfmt.NewTextParser constructor with model.LegacyValidation. LegacyValidation matches the classic ASCII metric/label naming rules that tailscaled's exporter uses today; if and when we ever emit a metric with a UTF-8 name, we can revisit. Goes to v0.69.0 (the latest at the time of writing) rather than v0.67.5 so we pick up the unrelated security fixes for cross-host redirects. Done in advance so a follow-up change can pull in github.com/tailscale/policybottest (which depends on palantir/policy-bot, which transitively requires prometheus/common at v0.67+) without dragging this debugging into that PR. Updates tailscale/corp#13972 Change-Id: I4b37db9ad3bebef1a32d9020bf6f8790bab25336 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a .policy-tests.yml file with tests exercising the policy that was just landed: the tailcfg/ control-protocol-owners gate, the "policybot-override:" comment escape hatch (including defaults-regression guards so the override rule does not silently accept a normal review or a 👍 comment), and the always-on "any tailscale/dev review" baseline. Updates tailscale/corp#13972 Change-Id: I42afb06b0771658c803512cb5de4701450c8a704 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.