WuttaSync

This provides a “batteries included” way to handle data sync between arbitrary source and target.

This builds / depends on WuttJamaican, for sake of a common config object and handler interface. It was originally designed for import to / export from the app database but both the source and target can be “anything” - e.g. CSV or Excel file, cloud API, another DB.

The basic idea is as follows:

  • read a data set from “source”

  • read corresonding data from “target”

  • compare the two data sets

  • where they differ, create/update/delete records on the target

Although in some cases (e.g. export to CSV) the target has no meaningful data so all source records are “created” on / written to the target.

Note

You may already have guessed, that this approach may not work for “big data” - and indeed, it is designed for “small” data sets, ideally 500K records or smaller. It reads both (source/target) data sets into memory so that is the limiting factor.

You can work around this to some extent, by limiting the data sets to a particular date range (or other “partitionable” aspect of the data), and only syncing that portion.

However this is not meant to be an ETL engine involving a data lake/warehouse. It is for more “practical” concerns where some disparate “systems” must be kept in sync, or basic import from / export to file.

The general “source → target” concept can be used for both import and export, since “everything is an import” from the target’s perspective.

In addition to the import/export framework proper, a CLI framework is also provided.

A “real-time sync” framework is also (eventually) planned, similar to the one developed in the Rattail Project; cf. Real-Time Data Sync.