Select ML Model

(Model Type) page in app. This page allows you to select and configure your model and parameters used for training and synthesis.

Depending on your choice of "Single Table" or "Multiple Tables" when creating the project, you will have different models (Synthesizers) available to choose from.

Single-table synthesizers include GaussianCopula, CTGAN, TVAE, and CopulaGAN (all based on models from Synthetic Data Vault). In the future, this may include other models like those specific to certain industries.

Single-Table Synthesizer Selection: These options vary in speed and data quality. We recommend starting with the "GaussianCopula" model (fastest but least accurate) for single table before moving onto one of the others to improve accuracy of results.

Multi-table synthesizers currently only include the DataMynd Premium Synthesizer (DmSynthMT1). The DataMynd synthesizer is built to optimize performance and accuracy running on Snowflake.

Note: DmSynthMT1 is only available after selecting multiple tables, but it can be used for a single table as well (just add a single table).

Advanced Parameters:

Here you will also be able to select optional parameters for the selected model. The defaults are recommended for most users (especially when starting out), and should be adjusted with caution.
If using a neural-net based model, you will see an 'Epochs' parameter. We recommend starting with a low number (e.g. 5) and increasing once you've validated the model is working as expected.

Warning: Most of these parameters' default settings will work for most starting scenarios. Changing these can cause errors. We recommend only changing these after you have a good familiarity for using the app.

DmSynthMt1 Parameters:

epochs: Number of training epochs. Raise this for more accurate results. We recommend starting with a low number (~5) while determining initial fit.
batch_size: Number of records (root table) per training batch. Raise this number to improve performance. Warning: too high a number may cause training errors.
optimize_batch_size: Runs an optimization step at the start of the training process. 10-15 minute overhead. Useful for large or complex datasets, or when epochs is high.
compile: Enables model to run much more efficiently. 5-10 minute overhead. Useful for large or complex datasets, or where epochs is high.

PreviousProject Setup NextData Config

Last updated 2 months ago