Skip to content

fix: browse() and list_datablocks() for V3 multi-frame EXPLORE (S7-1200 FW V4.5)#753

Open
tommasofaedo wants to merge 1 commit into
gijzelaerr:masterfrom
tommasofaedo:fix/browse-v3-multiframe-explore
Open

fix: browse() and list_datablocks() for V3 multi-frame EXPLORE (S7-1200 FW V4.5)#753
tommasofaedo wants to merge 1 commit into
gijzelaerr:masterfrom
tommasofaedo:fix/browse-v3-multiframe-explore

Conversation

@tommasofaedo

Copy link
Copy Markdown
## Problem

On S7-1200 firmware V4.5 (V3 protocol), `list_datablocks()` and `browse()`
return empty results because the EXPLORE response for `0x8A11FFFF` spans
**multiple TPKT frames** and uses a different encoding than V1/V2 PLCs.

Three separate issues:

1. **Multi-frame response not collected** — the EXPLORE response is split into
   3 consecutive TPKT frames. The second and third frames are raw BLOB
   continuation data with no response header (only a V3 HMAC prefix). The
   existing code only reads the first frame.

2. **Wrong parser for V3 EXPLORE** — V3 PLCs return a zlib-compressed
   `PlcContentInfo` XML blob (magic `78 DA`), not the PObject tree that
   `_parse_explore_datablocks()` expects.

3. **`_parse_explore_fields` crashes on V3 attributes** — three bugs when
   parsing EXPLORE responses from V3 PLCs:
   - WSTRING dtype `0x15` not recognised (only `0x13` was checked)
   - Strings decoded as UTF-16-BE instead of UTF-8
   - BLOB skip logic misses an extra `0x00` byte that V3 PLCs insert before
     the VLQ length; WSTRING skip was also missing the data bytes

## Changes

### `s7/connection.py` — add `_collect_explore_frames()`

New method on `S7CommPlusConnection` that collects all continuation frames
after the first EXPLORE response frame.  Detection: a frame whose body
(after HMAC strip) is smaller than the reference size by more than 5 bytes
is the last fragment.

### `s7/_s7commplus_client.py`

- **`_parse_explore_datablocks_xml()`** — new parser that finds the `78 DA`
  zlib magic in the concatenated response, decompresses it, and extracts
  `Entity[@Id="Block" Header[@Type="DB"]]` nodes from the `PlcContentInfo`
  XML.  Falls back to `_parse_explore_datablocks()` if no zlib magic is
  found (backward compatible with V1/V2 PLCs).

- **`list_datablocks()`** — when `_session_key is not None` (V3 PLCs),
  builds the `0x8A11FFFF` EXPLORE payload, calls `_collect_explore_frames()`
  to gather all frames, then calls `_parse_explore_datablocks_xml()`.

- **`browse()`** — calls `_collect_explore_frames()` for each per-DB EXPLORE
  on V3 connections.

- **`_parse_explore_fields()`** — fixes for V3 PLCs:
  - Accept dtype `0x15` (WSTRING) in addition to `0x13` for name attributes
  - Decode name strings as UTF-8 (not UTF-16-BE)
  - BLOB skip: add 1 byte for the extra `0x00` before VLQ length
  - WSTRING skip: include `str_len` bytes after the VLQ

## Tested on

- **PLC:** Siemens S7-1200 CPU 1212C DC/DC/DC
- **Firmware:** V4.5
- **Protocol:** V3 (no TLS, no password)

`list_datablocks()` now correctly returns `[{"name": "Data_block_1",
"number": 100, "rid": 2316173412}]` where it previously returned `[]`.

## Known limitation (documented, not fixed)

On FW V4.5, DB field definitions and I/Q/M tag names are stored in zlib
BLOBs with a Siemens preset dictionary (magic `78 7D`, FDICT flag set, dict
checksum `58 14 B0 3B`).  Python's `zlib.decompress()` returns
`Z_NEED_DICT` — the preset dictionary is embedded in TIA Portal and has not
been published by Siemens.

As a result, `browse()` returns DB names and numbers but cannot enumerate
individual field names on V3 PLCs.  This is a protocol-level constraint,
not a code bug.

…00 FW V4.5)

On V3 PLCs (FW >= V4.5) the EXPLORE response for RID 0x8A11FFFF spans
multiple TPKT frames and uses a zlib-compressed PlcContentInfo XML format
instead of the PObject tree expected by _parse_explore_datablocks().
The existing reassemble=True path does not strip V3 HMAC prefixes from
continuation frames, so list_datablocks() returned [] on these PLCs.

Changes:

connection.py:
- Add collect_explore_frames(): collects V3 multi-fragment EXPLORE
  responses by receiving continuation frames and stripping their HMAC
  prefix, stopping when a shorter-than-reference frame is detected.

_s7commplus_client.py:
- Add _build_explore_payload_v3(): VLQ-encoded EXPLORE payload for
  V3 PLCs (required format for 0x8A11FFFF and per-DB RID explores).
- Add _parse_explore_datablocks_xml(): decompresses the zlib PlcContentInfo
  XML blob and extracts Entity[@id="Block"][@type="DB"] entries; falls back
  to _parse_explore_datablocks() when no zlib magic is found.
- list_datablocks(): when protocol_version >= V3, use _build_explore_payload_v3
  + collect_explore_frames + _parse_explore_datablocks_xml.
- browse(): when protocol_version >= V3, use V3 payload builder and frame
  collector for each per-DB EXPLORE.
- _parse_explore_fields(): three fixes for V3 PLCs:
  * Accept WSTRING dtype 0x15 in addition to 0x13 for name attributes.
  * Auto-detect encoding: UTF-8 (V3, no null bytes) vs UTF-16-BE (V1/V2).
  * BLOB skip: account for the extra 0x00 byte V3 PLCs insert before VLQ len.
  * WSTRING skip: advance past string data bytes (was only skipping VLQ).

Tested on S7-1200 CPU 1212C DC/DC/DC, firmware V4.5 (V3 protocol, no TLS):
- list_datablocks() now returns [{"name": "Data_block_1", "number": 100,
  "rid": 2316173412}] where it previously returned [].
- The PlcContentInfo XML (6131 bytes after decompression) is correctly
  parsed from a 3-frame response (first 946-byte frame + two continuations).

Known limitation: on FW V4.5, DB field definitions and I/Q/M tag names are
stored in zlib BLOBs with a Siemens preset dictionary (magic 78 7D, FDICT
flag set). Python zlib.decompress() returns Z_NEED_DICT. browse() returns
DB names/numbers but cannot enumerate individual field names on V3 PLCs.

@gijzelaerr gijzelaerr left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds V3 (S7-1200 FW V4.5) support for list_datablocks() and browse() — three distinct fixes for multi-frame collection, zlib-compressed XML parsing, and _parse_explore_fields V3 attribute encoding. Real-hardware tested. No malicious code.

Issues to address:

1. No unit tests. This is the biggest gap. The XML parser, the multi-frame collector, and the _parse_explore_fields fixes all have zero test coverage. At minimum: a test for _parse_explore_datablocks_xml with a synthetic zlib-compressed XML blob, and a test for the WSTRING/BLOB skip fixes.

2. XML entity expansion (XXE). ET.fromstring() uses the default parser which resolves external entities. Since the XML comes from a PLC (not user input), the risk is low, but for defense-in-depth consider defusedxml or at least ET.XMLParser(resolve_entities=False) — a malicious response could trigger entity expansion.

3. collect_explore_frames fragment detection is fragile. The "body shorter than reference by >5 bytes = last fragment" heuristic assumes all full-size frames are within 5 bytes of each other. If the PLC sends a legitimately short intermediate frame, it would be misdetected as the last. A more robust approach: use the V3 protocol's own termination signal (if available) or at least add a max-frame-count guard to prevent infinite loops.

4. collect_explore_frames has no size/count limits. A malformed V3 response could drive unbounded memory allocation. Add caps similar to _recv_reassembled_payload (16 MiB / 4096 fragments).

5. _build_explore_payload_v3 uses VLQ for ExploreId. The existing _build_explore_request was just fixed (in #749) to use fixed UInt32 for ExploreId because that's what real PLCs expect. Using VLQ here may work for V3 but diverges from the corrected convention — is this intentional?

6. BLOB skip offset += 1 for the extra 0x00 byte is V3-specific but runs unconditionally in _parse_explore_fields. If a V1/V2 response has a BLOB attribute, this would skip one byte too many. Guard it behind a V3 check.

7. Async parity. S7CommPlusAsyncClient.list_datablocks() and browse() are not updated — async callers on V3 PLCs still get empty results.

Positive notes:

  • Uses xml.etree.ElementTree (safe, stdlib)
  • V1/V2 fallback path preserved correctly
  • The zlib magic detection (78 DA) is sound
  • No legacy snap7/ files touched

Not ready to merge — needs tests, size caps, and the BLOB skip V3 guard.

Comment thread s7/_s7commplus_client.py
xml_bytes = zlib.decompress(response[zlib_pos:])
except zlib.error as exc:
logger.debug(f"_parse_explore_datablocks_xml: zlib error {exc}")
return []

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ET.fromstring() uses the default XML parser which resolves external entities. While the XML comes from a PLC, for defense-in-depth consider at minimum disabling entity resolution. Python 3.8+ ET.fromstring is safe against XXE by default (entities are not expanded), so this is low-risk — but worth a comment noting the assumption.

Comment thread s7/connection.py
# V3 non-TLS: strip the HMAC prefix ([hash_len][hash_bytes])
if self._protocol_version >= ProtocolVersion.V3 and len(body) > 33:
hash_len = body[0]
body = body[1 + hash_len :]

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No size or fragment-count cap. A malformed V3 response could loop indefinitely and allocate unbounded memory. Add limits similar to _recv_reassembled_payload (_MAX_REASSEMBLED_FRAGMENTS / _MAX_REASSEMBLED_BYTES).

Also, the "body shorter than reference by >5 bytes" heuristic is fragile — if the PLC sends a legitimately shorter intermediate frame, collection stops early and silently truncates the response.

Comment thread s7/_s7commplus_client.py
break
count, consumed = _vlq32(response, offset)
offset += consumed
offset += count

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offset += 1 for the extra 0x00 byte before BLOB VLQ length is V3-specific, but this code runs for all protocol versions. If a V1/V2 EXPLORE response contains a BLOB attribute, this will skip one byte too many and corrupt all subsequent parsing. Guard with a V3 check, or pass the protocol version into this function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants