WuttaSync¶
This provides a “batteries included” way to handle data sync between arbitrary source and target.
This builds / depends on WuttJamaican, for sake of a common config object and handler interface. It was originally designed for import to / export from the app database but both the source and target can be “anything” - e.g. CSV or Excel file, cloud API, another DB.
The basic idea is as follows:
read a data set from “source”
read corresonding data from “target”
compare the two data sets
where they differ, create/update/delete records on the target
Although in some cases (e.g. export to CSV) the target has no meaningful data so all source records are “created” on / written to the target.
Note
You may already have guessed, that this approach may not work for “big data” - and indeed, it is designed for “small” data sets, ideally 500K records or smaller. It reads both (source/target) data sets into memory so that is the limiting factor.
You can work around this to some extent, by limiting the data sets to a particular date range (or other “partitionable” aspect of the data), and only syncing that portion.
However this is not meant to be an ETL engine involving a data lake/warehouse. It is for more “practical” concerns where some disparate “systems” must be kept in sync, or basic import from / export to file.
The general “source → target” concept can be used for both import and export, since “everything is an import” from the target’s perspective.
In addition to the import/export framework proper, a CLI framework is also provided.
A “real-time sync” framework is also (eventually) planned, similar to the one developed in the Rattail Project; cf. Real-Time Data Sync.
Documentation