Home

Awesome

Regex-Map JSON Smartmodule

SmartModule to read a JSON record, look-up values, run regex, and write the result back into the record. This SmartModule is map type, where each record-in generates a new records-out.

Input Record

A JSON object:

{
  "customer": {
    "first": "Abby",
    "last": "Hardy",
    "ssn": "123-45-6789"
  },
  "description": "Highlights: 43 Entity: Draft Documents [Encased string - (data)] (<a href='https://example.com/doc1/182031340621?pdf_header=&de_seq_num=44&caseid=456177'>9</a>)"
}

Transformation spec

The transformation spec takes two types of regex operations: capture and replace.

Regex captures retrieves a substring from a json value, and it requires the following parameters:

Regex replace replaces substrings in a json value, and it requires the following parameters:

In this example, we'll use the following transformation spec:

transforms:
  - uses: infinyon-labs/regex-map-json@0.1.2
    with:
      spec:
        - capture:
            regex: "(?i)Highlights:\\s+(\\w+)\\b"
            target: "/description"
            output: "/parsed/highlights"        
        - capture: 
            regex: "(?i)Entity:\\s+([\\w,\\s\\.\\']*\\S)\\s*\\["
            target: "/description"
            output: "/parsed/entity"
        - capture:
            regex: "href='([^']+)'"
            target: "/description"
            output: "/parsed/doc-link"
        - replace:
            regex: "\\d{3}-\\d{2}-\\d{4}"
            target: "/customer/ssn"
            with: "***-**-****"

Outpot Record

A JSON object with a new parsed tree, and masked ssn value:

{
  "customer": {
    "first": "Abby",
    "last": "Hardy",
    "ssn": "***-**-****"
  },
  "description": "Highlights: 43 Entity: Draft Documents [Encased string - (data)] (<a href='https://example.com/doc1/182031340621?pdf_header=&de_seq_num=44&caseid=456177'>9</a>)",
  "parsed": {
    "doc-link": "https://example.com/doc1/182031340621?pdf_header=&de_seq_num=44&caseid=456177",
    "entity": "Draft Documents",
    "highlights": "43"
  }
}

Note, no result is generated if the target key cannot be found, or the regex capture operation returns no matches.

Build binary

Use smdk command tools to build:

smdk build

Inline Test

Use smdk to test:

smdk test --file ./test-data/input.json --raw -e spec='[{"capture": {"regex": "(?i)Highlights:\\s+(\\w+)\\b", "target": "/description", "output": "/parsed/highlights"}}, {"replace": {"regex": "\\d{3}-\\d{2}-\\d{4}", "target": "/customer/ssn", "with": "***-**-****" }}]'

Cluster Test

Use smdk to load to cluster:

smdk load 

Test using transform.yaml file:

smdk test --file ./test-data/input.json --raw  --transforms-file ./test-data/transform.yaml

Cargo Compatible

Build & Test

cargo build
cargo test

References