Designing an Object Repository for a Low-Code Platform

June 18, 2019

Designing an Object Repository for a Low-Code Platform

When two vendors describe their product as a "low-code platform," they often mean two completely different things. One is a screen builder bolted to a database; the other is a data integration engine with a UI on top. They share the marketing page and almost nothing else. The difference becomes obvious the moment a customer asks the same question of both: "Can I rename a field from mobile to phone_e164 and have my downstream Salesforce sync update?"

The platform that can answer "yes, in one place, without a code change" has an object repository. The platform that can't, doesn't.

This post is about building that object repository. I'm writing as the engineer who shipped the first version with one teammate over a few months. Nothing about it was novel. The interesting part was learning, the hard way, where the abstractions had to live and where they couldn't.

The job to be done

Before any architecture, the brief was concrete. A non-engineer at a customer should be able to:

  1. Connect to their existing customer database and a webhook from a vendor.
  2. Define an object — say, Customer — with the fields they care about.
  3. Map each field to either a column in their database, a path in the webhook payload, or a formula computed from other fields.
  4. Get a working list view, detail page, and edit form for free.
  5. Have the platform write changes back to the right place when they edited a row.

Everything in our object repository was in service of those five steps. If a feature didn't make at least one of them possible — or didn't unblock another feature that did — it didn't ship.

That meant the repository had to answer four questions for every object on the platform:

  1. What fields does this object have, and what is the meaning of each one?
  2. Where does each field originate — a database column, a JSON path, a formula?
  3. What invariants hold — required, unique, foreign-key, derived?
  4. How does an edit round-trip back to the source of truth?

The shape that came out of those four questions is the rest of this post.

flowchart TB subgraph EXT["External sources"] direction LR DB1[("Customer<br/>Postgres")] SF["Salesforce<br/>object"] FILE["Uploaded<br/>CSV"] HOOK["Webhook<br/>JSON"] end subgraph META["Object repository (metadata)"] direction TB OBJ["Object<br/>(Customer)"] FIELD["Field<br/>(email · type=Email)"] SRC["Source binding<br/>(table.column · jsonPointer · formula)"] OBJ -- "1..N" --> FIELD FIELD -- "1..1" --> SRC end subgraph CORE["Platform features"] direction LR UI["List · detail<br/>form"] SYNC["Sync<br/>engine"] VAL["Validation"] AUDIT["Audit log"] end EXT -- "describe" --> META META --> CORE

The picture is intentionally boring. Every interesting decision is inside the boxes.

Fields are not strings: from primitives to semantic types

Our first object model was the obvious one. A Field had a name, a dataType enum (STRING, INTEGER, BOOLEAN, DATE, JSON), and a free-text format column for "anything else." It held for about two weeks.

Then a customer wanted four fields on Customer:

  • An email field that was lowercased on write, validated by a real parser, and case-insensitively unique.
  • A phone field stored in E.164 with a separate display format per region.
  • A money field that was actually two values — amount and currency code — that you couldn't update independently without breaking the audit trail.
  • A status field whose allowed values came from another object the customer also owned.

None of those is exotic. All four are different from STRING. We rebuilt the model in three layers:

Object : a named collection of Fields belonging to a tenant Field : a typed slot on an Object — references a SemanticType SemanticType : the *meaning* of the slot (Email · Phone · Money · Reference · Enum · ...) — defines validation, normalization, storage, and indexing

The lift was that a field's type is its meaning, not its storage shape. Two String-shaped fields are not interchangeable if one is an email and the other a free-text note. Once we modeled meaning explicitly, the rest of the platform stopped guessing.

The interface every semantic type implemented was small on purpose:

public interface SemanticType<T> { String name(); // "Email", "Phone", "Money" StorageShape storage(); // SCALAR, COMPOSITE, REFERENCE T parse(Object raw) throws ValidationError; // raw → normalized internal value Object serialize(T value); // internal → wire / DB void validate(T value, FieldDef def); // required, unique, range, format IndexHint indexing(FieldDef def); // how to index in Postgres FormatResult format(T value, Locale loc); // how the UI should render it }

A handful of concrete types covered most of what we shipped in year one:

SemanticTypeStorage shapeWhy it isn't just a string
TexttextFree-form notes; collation matters but nothing else.
Emailtext (lowercased)Parsed, normalized, indexed lower(...) for unique.
Phonetext (E.164)Region-aware parse on input; per-locale display.
Money(numeric, char(3)) compositeAmount and currency must move together.
Date / Tsdate / timestamptzTimezone is a property of the type, not the user.
Referencebigint (FK)Carries the target object — UI can render a picker.
Enumtext (constrained)Values come from metadata, not the database.
Geopointpoint (PostGIS)Distance queries need spatial indexing.

Adding a new type — URL, IPAddress, Color — was a closed PR: implement the interface, register it, ship. The form renderer, the sync engine, the search indexer, and the audit log read the type's behavior off the interface; they did not switch on a dataType enum anywhere. That self-discipline was what kept the platform from rotting as types accumulated.

A small but important detail: Reference was never a foreign key column the user could pick. It always carried metadata about the target object, not just the target table. The picker, the cascade rules, the back-reference in audit, the join hints for queries — all of them came off that one piece of metadata, which is the kind of leverage you only notice when it's missing.

Source bindings: where each field actually comes from

The shape of an object answered "what does the platform know?" The bindings answered "and where does that knowledge live?"

Customers' data did not live in our database. It lived in their Postgres, their Salesforce org, an upload they did last Tuesday, or a webhook from a vendor we'd never heard of. The same Customer.email could be backed by a database column for one tenant and a webhook payload for another, with the rest of the platform unable to tell the difference.

We modeled the binding as a small typed sum:

sealed interface SourceBinding { Object read(Row row); void write(Row row, Object value); record TableColumn(ConnectionId conn, String schema, String table, String column) implements SourceBinding {} record RestField(ConnectionId conn, String resourcePath, JsonPointer pointer) implements SourceBinding {} record FilePath(UploadId upload, int columnIndex) implements SourceBinding {} record Computed(Expression formula, Set<FieldRef> dependsOn) implements SourceBinding {} }

Each variant was a small, dumb class that knew how to pull a value out of a source row and put one back. The form, the sync engine, the audit log — none of them needed to know which variant they were holding. They asked the binding.

Two cases were genuinely hard.

Nested JSON without inventing objects

Webhook payloads aren't tables. Here is a real-shaped one we had to map:

{ "event": "customer.updated", "occurred_at": "2019-05-12T08:42:11Z", "customer": { "id": "cus_8234", "email": "ANITA@example.com", "phone": "9876543210", "billing": { "address": { "line1": "12, MG Road", "city": "Bengaluru", "state": "KA", "postal_code": "560001", "country": "IN" } } } }

The customer modeling this didn't want a separate BillingAddress object — they wanted Customer.billing_postal_code and Customer.billing_state to behave like ordinary fields. We solved it with jsonPointer paths on the binding:

Customer.email → /customer/email Customer.phone → /customer/phone Customer.billing_postal_code → /customer/billing/address/postal_code Customer.billing_state → /customer/billing/address/state

…and one explicit rule: the platform never invents an object the user hasn't declared. Auto-creating a nested BillingAddress would have been clever, and would have meant that the meaning of Customer differed depending on whether you reached it via the UI or via the webhook. We refused that. If a customer wanted nesting, they declared the second object and gave it a Reference field.

Ambiguous upstream types

Most external systems hand you a column name and a primitive — phone as a varchar(32), created_at as a timestamp without time zone, amount next to a separate currency column. They don't tell you the timezone. They don't tell you that amount and currency belong together. They certainly don't tell you that the column called mobile_no is a phone number.

Auto-detection is a wonderful demo and a terrible contract. We split the difference:

  • On import, we suggested a semantic type per column based on name and content sampling.
  • The user confirmed the mapping in the UI before any binding became live.
  • The confirmed mapping was thereafter the single source of truth; we never re-inferred it on subsequent syncs.

The cost of getting this wrong is hidden but enormous: a silently wrong semantic type produces a UI that looks right, validations that pass, and bugs that show up six months later when somebody runs an analytics query and the totals don't add up.

Why we walked away from the ORM

Internally there was real pressure to use the team's existing Java ORM. It had migrations, lazy loading, caching, dirty tracking, two-level cache, named queries — the works. We tried, and we walked away. The reason wasn't bugs. It was a category mismatch.

An ORM models objects whose shape is known at compile time. Our objects' shapes were known at runtime — and changed whenever a user added a field, renamed one, or onboarded a new tenant whose Customer had different columns.

Concretely: imagine two tenants on the platform.

Tenant A — Customer fields (23 total) id, email, phone, gst_number, pan_number, plan, mrr, ... Tenant B — Customer fields (47 total) id, email, alt_email, mobile, work_phone, region, channel, lifetime_value, last_seen_at, churn_risk, segment, ...

Both call their object Customer. Both want forms, list views, search, audit. Both want their queries to use real indexes. To make the ORM work we'd have had to choose one of two unhappy paths:

  • Generate a class per tenant per object at provisioning time. That makes the ORM happy and turns the platform into a code-generation problem. Every metadata change becomes a build, a deploy, a class-loader dance. It also means we now own the build pipeline for every tenant's data model, which was not a job we wanted.
  • Use a single generic Entity with a Map<String, Object> attributes and let the ORM treat every field as opaque JSON. That works, and bypasses every interesting feature of the ORM — no real columns, no real indexes, no FK constraints. We'd have an ORM that did nothing for us, and a WHERE attributes->>'email' = ? query plan that scanned the entire table.

We took the third path. Plain SQL against Postgres, with our own thin layer that read the metadata and emitted real DDL.

A real table per object

When a user defined Customer, we materialized a Postgres table. Field types translated to real column types. Required fields became NOT NULL. Unique semantic types added unique indexes. References became foreign keys.

CREATE TABLE t_tenant42_customer ( id bigserial PRIMARY KEY, email text NOT NULL, phone text, amount numeric(18,2), currency char(3), status text NOT NULL, created_at timestamptz NOT NULL DEFAULT now(), updated_at timestamptz NOT NULL DEFAULT now(), ext jsonb NOT NULL DEFAULT '{}'::jsonb ); CREATE UNIQUE INDEX ux_t42_customer_email ON t_tenant42_customer (lower(email)); CREATE INDEX ix_t42_customer_status ON t_tenant42_customer (status); ALTER TABLE t_tenant42_customer ADD CONSTRAINT chk_t42_customer_status CHECK (status IN ('active','paused','churned')); ALTER TABLE t_tenant42_customer ADD CONSTRAINT chk_t42_customer_money CHECK ((amount IS NULL) = (currency IS NULL));

The check constraint at the bottom is the kind of detail that pays off later: it enforces, in the database, that a Money field's two columns move together. Even if a future bug bypassed our service layer, the constraint would refuse the write.

A jsonb spillover for "user-extended" fields

Sometimes a customer added a field at 2 a.m. on a release day, and waiting for a migration window was not the answer. We allowed those to land in an ext jsonb column on the same row, with a few rules:

  • Spillover fields were second-class in the UI — they showed a small "extension" badge.
  • They could be promoted to real columns later via a one-shot migration job.
  • They could not be foreign keys, could not be unique, and were not indexed by default.

Making the trade-off visible in the UI mattered. Customers stopped asking "why is this field slower than the others?" because we'd told them up front.

Migrations driven by metadata diffs

The metadata was the source of truth; the schema was a projection of it. Adding a field to an object generated a single ALTER TABLE ADD COLUMN. Removing one was soft-deleted in metadata first, then archived in a background job that dropped the column after a hold period.

flowchart TB EDIT["User edits object<br/>in the UI"] --> META["Object repository<br/>(metadata)"] META -- "diff" --> PLAN["Migration planner"] PLAN -- "DDL plan + dry-run" --> APPROVE["Approval (auto for safe ops,<br/>human for risky ones)"] APPROVE -- "ALTER · CREATE · INDEX" --> PG[("Postgres<br/>per-object tables · jsonb spill")] META --> RUN["Runtime resolver<br/>(metadata → SQL)"] RUN -- "SELECT / INSERT / UPDATE" --> PG

The dividing line between "auto-apply" and "ask a human" was simple and ended up holding well: any DDL that could lose data, lock for long, or drop an index needed approval. Adding nullable columns, adding indexes CONCURRENTLY, adding non-blocking checks — auto.

Why this paid off

The single moment that proved the design was a customer's ad-hoc filter:

"Show me all customers in Karnataka with at least one open invoice older than 30 days."

Because state was a real column with a real index, Reference to Invoice was a real foreign key, and due_date was a real date column, the planner generated this:

SELECT c.id, c.email, c.state FROM t_tenant42_customer c JOIN t_tenant42_invoice i ON i.customer_id = c.id WHERE c.state = 'KA' AND i.status = 'open' AND i.due_date < now() - interval '30 days' ORDER BY c.created_at DESC LIMIT 200;

Postgres used ix_t42_customer_state and the FK index on i.customer_id and finished in a few milliseconds on a moderately-sized tenant. The generic-bag alternative would have streamed thousands of rows into Java and filtered them in memory, with the predictable behavior of "fast in demo, mysteriously slow in production."

The lesson, in one line: for low-code, the right abstraction is "describe the data well enough that boring SQL is fast" — not "hide the database so users never have to think about it." Hiding the database is exactly how low-code platforms develop their reputation for becoming slow at exactly the moment their customers are succeeding.

Three things that bit us

A few honest ones, since the post would otherwise read like everything went smoothly.

Soft deletes as a global default

We added deleted_at timestamptz to every object's table and made every query WHERE deleted_at IS NULL. It seemed safe. It was not.

What happened was that analytics queries kept double-counting, because the analytics team wrote queries against the raw tables and forgot the filter, and the filter had to be re-added inconsistently across half the platform. Worse, "soft delete" meant different things to different objects: for Invoice it meant "voided"; for Customer it meant "removed by admin"; for Note the customer expected hard delete for compliance.

What I'd do differently: soft delete is a property of the SemanticType of the parent object, not a global default. Some objects are append-only (AuditEvent), some are tombstoned (Invoice), some are hard-deleted (Note, by request). Encoding that in metadata is one more place where the four-questions principle pays off.

Versioning the metadata too late

We didn't version object definitions until the day a customer renamed mobile to phone_e164 in production. Their UI updated immediately. Their existing webhook bindings did not. Their downstream Salesforce sync did not. The new column was empty for everything that had been created before the rename, because no one had told the platform to backfill.

The fix was a multi-day cleanup. The lesson: every change to an object definition should emit a versioned event from the first commit. Renames, in particular, are not a UI-only operation; they're a migration with a forwards-and-backwards mapping that the rest of the platform needs to see.

Computed fields without a dependency graph

We let users write formulas before we tracked what each formula depended on. The first time someone wrote a formula A that referenced B, where B already referenced A, we hit a stack overflow on row save and spent a long evening writing a cycle detector that should have been there to begin with.

After that, every Computed binding declared its dependsOn set explicitly (you saw it in the sealed interface earlier), and the metadata layer refused to accept a formula whose dependency graph contained a cycle. The detector was thirty lines of code. The right time to write it was day one, not the day after the incident.

What I'd take into the next platform

Four principles survived intact:

  • The metadata is the product. Treat it as a first-class API, version it, log every change to it, and refuse to let any feature shortcut around it.
  • Types should encode meaning, not just shape. Email and Text are not the same thing, and pretending they are leaks into every higher-level feature.
  • Bindings to external sources belong in the metadata, not in code. Anything that can vary per customer should be data the runtime reads, not a code path you maintain.
  • Pick boring storage on purpose. Postgres with real tables, real constraints, and real indexes will outlast almost any clever runtime trick. The cleverness belongs in the metadata, not in the database.

If I keep telling the same story across these posts, it's because it keeps being true: the systems that age well are the ones that make the right thing easy to query. Front-end, back-end, integration layer — same lesson, different floor of the building.

GitHub
LinkedIn
X