fix: browse() and list_datablocks() for V3 multi-frame EXPLORE (S7-1200 FW V4.5)#753
Conversation
tommasofaedo
commented
Jun 19, 2026
…00 FW V4.5) On V3 PLCs (FW >= V4.5) the EXPLORE response for RID 0x8A11FFFF spans multiple TPKT frames and uses a zlib-compressed PlcContentInfo XML format instead of the PObject tree expected by _parse_explore_datablocks(). The existing reassemble=True path does not strip V3 HMAC prefixes from continuation frames, so list_datablocks() returned [] on these PLCs. Changes: connection.py: - Add collect_explore_frames(): collects V3 multi-fragment EXPLORE responses by receiving continuation frames and stripping their HMAC prefix, stopping when a shorter-than-reference frame is detected. _s7commplus_client.py: - Add _build_explore_payload_v3(): VLQ-encoded EXPLORE payload for V3 PLCs (required format for 0x8A11FFFF and per-DB RID explores). - Add _parse_explore_datablocks_xml(): decompresses the zlib PlcContentInfo XML blob and extracts Entity[@id="Block"][@type="DB"] entries; falls back to _parse_explore_datablocks() when no zlib magic is found. - list_datablocks(): when protocol_version >= V3, use _build_explore_payload_v3 + collect_explore_frames + _parse_explore_datablocks_xml. - browse(): when protocol_version >= V3, use V3 payload builder and frame collector for each per-DB EXPLORE. - _parse_explore_fields(): three fixes for V3 PLCs: * Accept WSTRING dtype 0x15 in addition to 0x13 for name attributes. * Auto-detect encoding: UTF-8 (V3, no null bytes) vs UTF-16-BE (V1/V2). * BLOB skip: account for the extra 0x00 byte V3 PLCs insert before VLQ len. * WSTRING skip: advance past string data bytes (was only skipping VLQ). Tested on S7-1200 CPU 1212C DC/DC/DC, firmware V4.5 (V3 protocol, no TLS): - list_datablocks() now returns [{"name": "Data_block_1", "number": 100, "rid": 2316173412}] where it previously returned []. - The PlcContentInfo XML (6131 bytes after decompression) is correctly parsed from a 3-frame response (first 946-byte frame + two continuations). Known limitation: on FW V4.5, DB field definitions and I/Q/M tag names are stored in zlib BLOBs with a Siemens preset dictionary (magic 78 7D, FDICT flag set). Python zlib.decompress() returns Z_NEED_DICT. browse() returns DB names/numbers but cannot enumerate individual field names on V3 PLCs.
gijzelaerr
left a comment
There was a problem hiding this comment.
Review Summary
This PR adds V3 (S7-1200 FW V4.5) support for list_datablocks() and browse() — three distinct fixes for multi-frame collection, zlib-compressed XML parsing, and _parse_explore_fields V3 attribute encoding. Real-hardware tested. No malicious code.
Issues to address:
1. No unit tests. This is the biggest gap. The XML parser, the multi-frame collector, and the _parse_explore_fields fixes all have zero test coverage. At minimum: a test for _parse_explore_datablocks_xml with a synthetic zlib-compressed XML blob, and a test for the WSTRING/BLOB skip fixes.
2. XML entity expansion (XXE). ET.fromstring() uses the default parser which resolves external entities. Since the XML comes from a PLC (not user input), the risk is low, but for defense-in-depth consider defusedxml or at least ET.XMLParser(resolve_entities=False) — a malicious response could trigger entity expansion.
3. collect_explore_frames fragment detection is fragile. The "body shorter than reference by >5 bytes = last fragment" heuristic assumes all full-size frames are within 5 bytes of each other. If the PLC sends a legitimately short intermediate frame, it would be misdetected as the last. A more robust approach: use the V3 protocol's own termination signal (if available) or at least add a max-frame-count guard to prevent infinite loops.
4. collect_explore_frames has no size/count limits. A malformed V3 response could drive unbounded memory allocation. Add caps similar to _recv_reassembled_payload (16 MiB / 4096 fragments).
5. _build_explore_payload_v3 uses VLQ for ExploreId. The existing _build_explore_request was just fixed (in #749) to use fixed UInt32 for ExploreId because that's what real PLCs expect. Using VLQ here may work for V3 but diverges from the corrected convention — is this intentional?
6. BLOB skip offset += 1 for the extra 0x00 byte is V3-specific but runs unconditionally in _parse_explore_fields. If a V1/V2 response has a BLOB attribute, this would skip one byte too many. Guard it behind a V3 check.
7. Async parity. S7CommPlusAsyncClient.list_datablocks() and browse() are not updated — async callers on V3 PLCs still get empty results.
Positive notes:
- Uses
xml.etree.ElementTree(safe, stdlib) - V1/V2 fallback path preserved correctly
- The zlib magic detection (
78 DA) is sound - No legacy
snap7/files touched
Not ready to merge — needs tests, size caps, and the BLOB skip V3 guard.
| xml_bytes = zlib.decompress(response[zlib_pos:]) | ||
| except zlib.error as exc: | ||
| logger.debug(f"_parse_explore_datablocks_xml: zlib error {exc}") | ||
| return [] |
There was a problem hiding this comment.
ET.fromstring() uses the default XML parser which resolves external entities. While the XML comes from a PLC, for defense-in-depth consider at minimum disabling entity resolution. Python 3.8+ ET.fromstring is safe against XXE by default (entities are not expanded), so this is low-risk — but worth a comment noting the assumption.
| # V3 non-TLS: strip the HMAC prefix ([hash_len][hash_bytes]) | ||
| if self._protocol_version >= ProtocolVersion.V3 and len(body) > 33: | ||
| hash_len = body[0] | ||
| body = body[1 + hash_len :] |
There was a problem hiding this comment.
No size or fragment-count cap. A malformed V3 response could loop indefinitely and allocate unbounded memory. Add limits similar to _recv_reassembled_payload (_MAX_REASSEMBLED_FRAGMENTS / _MAX_REASSEMBLED_BYTES).
Also, the "body shorter than reference by >5 bytes" heuristic is fragile — if the PLC sends a legitimately shorter intermediate frame, collection stops early and silently truncates the response.
| break | ||
| count, consumed = _vlq32(response, offset) | ||
| offset += consumed | ||
| offset += count |
There was a problem hiding this comment.
The offset += 1 for the extra 0x00 byte before BLOB VLQ length is V3-specific, but this code runs for all protocol versions. If a V1/V2 EXPLORE response contains a BLOB attribute, this will skip one byte too many and corrupt all subsequent parsing. Guard with a V3 check, or pass the protocol version into this function.