Introduction
This article requires moderate technical skill and comfort in dealing with command line tools and basic cryptographic concepts. |
Events registered in DataTrails with the Merkle Log Proof Mechanism come with a very high standard of trustworthiness thanks to the ease and certainty with which they can be verified using open source, independent tooling away from the DataTrails platform itself, making that Event evidence demonstrably tamper-proof and transparent.
This article walks you through the main steps of verifying an Event against its public transparent Merkle Log records using DataTrails tools. If you want to recreate your own tooling or use a 3rd party verifier instead, please checkout out our GitHub repositories or get in touch with us to discuss professional services options.
Shortcut: Navigating Merkle Log data in the DataTrails web UI
Most of this article tells you how to interrogate and verify proofs of Events from first principles using your own independent code. However if you're not ready to do that yet, the DataTrails web UI makes it easy to navigate between Event records and Merkle Log information, and shortcuts many of the steps below including finding the log data and calculating the Event and leaf hashes for you. To access this, when looking at any Event in the UI, click on the link in the 'Merkle Log Entry' field:
Once there, you have all the paths and data you need to verify the Event. To get back to the Event record, simply click on the 'Event Identity' field at the top of the Event Details table.
That's all there is to it, and the UI continues to evolve to give more and more details and exploration options to give comfort in the long-term integrity of all your Event information.
But if you're still keen to know the details and/or do it yourself, the following is a step-by-step guide on how it all works.
Step 1: Gather required information
Step 1A: Get the JSON of the Event you're interested in
Each individual Event registered with DataTrails has its own proof derived from the contents of the Event record so you'll need the Event contents. There are many ways to find the Event you're interested in but assuming you know or have found the Event Identity then you can simply download it with curl or similar.
curl -v -X GET \
-H "@$HOME/.datatrails/bearer-token.txt" \
"https://app.datatrails.ai/archivst/v2/$EVENT_ID"
Step 1B: Find and download its log data
NOTE: These instructions tell you how to locate the blobs for your own permissioned Events. If you are wanting to verify a public Event (eg from an Instaproof result) then you need to use the Tenant ID: tenant/6ea5cd00-c711-3649-6914-7b125928bbb4
Proof data for Merkle Log Events is stored on publicly readable Azure blob storage. For ease of handling the storage of blobs is split up by tenancy and when certain size limits are reached, so you'll need to work out which log file contains proof of your Event.
The first thing to do is to find the Massif store URL for your Tenant, which is simply 'app.datatrails.ai/verifiabledata/merklelogs/v1/mmrs/${tenantId}/0/massifs/'.
So given a Tenant ID of 'tenant/72dc8b10-dde5-43fe-99bc-16a37fd98c6a' the URL would be: 'https://app.datatrails.ai/verifiabledata/merklelogs/v1/mmrs/tenant/72dc8b10-dde5-43fe-99bc-16a37fd98c6a/0/massifs/'
Next you'll need to find out which log file your Event is 'in'. File numbering is zero-based and each file can hold 16,383 nodes (that's 8,192 leaves, plus internal nodes and local peaks), so the easiest way to find the log file is to take the whole part of dividing the MMR index of the Event record by 16,383, which gives you the number of the log file you need. This is then left-padded to 16 digits.
For instance, given an MMRIndex of 29,342 the log number would therefore be Math.floor(29342/16383) = 1 and the file name would be '0000000000000001.log'.
To fetch the log data simply add the log file name to the URL and fetch with:
curl -H "x-ms-blob-type: BlockBlob" -H "x-ms-version: 2019-12-12" https://app.datatrails.ai/verifiabledata/merklelogs/v1/mmrs/tenant/72dc8b10-dde5-43fe-99bc-16a37fd98c6a/0/massifs/0000000000000001.log
Step 1C: Find and download confirmation signatures
DataTrails maintains a signature over the root state of the MMR data (called a 'seal') in public storage to prevent spoofing while enabling wide distribution and replication of the public log data. This is stored in a companion file to the logs with the same naming scheme, so if your log file name is '0000000000000001.log' then your seal file name will be '0000000000000001.sth'.
This corresponding file name applies even when the massif is signed multiple times as it grows: the seal for the most recent blob is continually refreshed until the blob is full, meaning its signature and sealed mmrSize advances with the massif in the log. Once full a massif is never changed again and so its log file and seal remain fixed and reliable forever after that.
Fetch it from:
curl -H "x-ms-blob-type: BlockBlob" -H "x-ms-version: 2019-12-12" https://app.datatrails.ai/verifiabledata/merklelogs/v1/mmrs/tenant/72dc8b10-dde5-43fe-99bc-16a37fd98c6a/0/massifseals/0000000000000001.sth
Step 2: Compute the canonical hash of the Event
Due to DataTrails' highly versatile attribute handling and JSON's fundamentally unordered nature we have to arrange for a reliable canonical encoding that ensures Events can be compared and behave predictably with cryptographic processing.
Step 2A: Strip the Event back to its core content
The Event record returned by the DataTrails API contains metadata entries that may be updated from time to time and therefore cannot form part of the long term immutability promise of the platform. The fields that are rendered immutable are:
- 'identity'
- 'event_attributes'
- 'asset_attributes'
- 'operation'
- 'behaviour'
- 'timestamp_declared'
- 'timestamp_accepted'
- 'timestamp_committed'
- 'principal_accepted'
- 'principal_declared'
- 'tenant_identity'
All other fields should be stripped from the JSON, as in this sample Typescript code:
const v3RequiredFields = [
'identity',
'event_attributes',
'asset_attributes',
'operation',
'behaviour',
'timestamp_declared',
'timestamp_accepted',
'timestamp_committed',
'principal_accepted',
'principal_declared',
'tenant_identity',
]
const ensureV3RequiredFields = (eventJSON: any): any => {
const redactedEvent = {}
// Strip Event back to _exactly_ the required fieldset
v3RequiredFields.forEach((field) => {
if (field in eventJSON) {
if (field == 'identity') {
//normalise Event Identity for public attestations
redactedEvent[field] = eventJSON[field].replace('public', '')
} else {
redactedEvent[field] = eventJSON[field]
}
} else {
// If any are missing we bail out
return null
}
})
return redactedEvent
}
Step 2B: Bencode the stripped Event
Bencode is a simple encoding for reliably transmitting loosely-structured data (like JSON dictionaries) between clients. DataTrails uses bencoding to prepare the Event for cryptographic processing.
const bencodedBuffer = bencodec.encode(redactedEvent, { stringify: false })
Step 2C: Hash the bencoded bytes
The cryptographic hash of the Event (referred to elsewhere in DataTrails tools and documentation as the 'Simple Hash V3' hash) is then simply SHA256(bencodedBuffer)
.
Step 3: Compute the Merkle tree leaf hash of the Event
Simply knowing the hash of the Event isn't quite enough to prove that it is authentic. To do that we need to have it verified in the context of all the other Events in the log, which gives greater confidence of important details such as timestamps and ordering, and to protect against equivocation.
To achieve this, the bencoded Event data is cryptographically bound to its 'id Timestamp'. To do this, first prepare the idTimestamp by:
- Reading the idtimestamp from eventJSON.merklelog_entry.commit.idtimestamp (this is a hex string)
- Strip the leading "01"
- Convert to bytes, big endian
This can be easily achieved in Typescript with:
const idts = event.merklelog_entry.commit.idtimestamp
const idTimestamp = Buffer.from(idts.slice(2), 'hex')
Then the leaf entry that goes into the Merkle log is then calculated from:
SHA256(BYTE(0x00) || BYTES(idTimestamp) || BENCODE(redactedEvent))
Step 4: Confirm integrity of the Merkle Log data
Verify the signature in the seal blob downloaded in step 1c using the DataTrails seal signing key:
X.509/PEM format:
-----BEGIN PUBLIC KEY-----
MHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEA861WiJFuwOruvgCHmoGCEoNy4rxQU+T
MV0TIIFE84sA5106vKerlKVHiYEE04whnDwgJoczIAMusJAym7l0/4WMetVqldGs
Z+WDlwOgTBrz4CFAjQABe5P6dzawS2By
-----END PUBLIC KEY-----
NOTE: This key is refreshed from time to time, check the DataTrails web UI for the up-to-date current key.
Step 5: Confirm presence of the Event in the log
NOTE: More advanced topics on Inclusion Proofs, Range Proofs, and Consistency Proofs will follow in further articles with much deeper information on the MMR blob file format that will enable you to reconstruct the trees, perform full or sampled audits, and generate complete proofs. This step simply verifies that your chosen Event is in the log.
By the time you get here all is simple! Take the leaf hash that you calculated in step 3 and ensure that it appears in the downloaded log. The log is regularly structures and aligned to 32-byte boundaries making it very easy to parse with standard tools. For instance, given the leaf calculated in step 3:
export LEAF=71bde053fb63458e1333b3badfda19e615951e68c4aa09d1cac0579ab3e8c85c
xxd -c 32 testlog | sed -e 's/^.\{10\}//' -e 's/.\{32\}$//' -e 's/ //g' | grep -i $LEAF
If you see your hash echoed back at you then all is good!
Step 6: Done
Congratulations! You have just verified that your Event data is underpinned by a verifiable immutable audit trail. If you'd like to know more and go deeper, please don't hesitate to get in touch with the DataTrails team.