# Data Config

**Select Your Source Table(s):**

1. Here you will add tables to your project, one at a time.
2. Use the dropdowns to select a database, schema and table.  Remember: you must have run the correct grants to add tables or other objects to a project.  If you do not see your source data here, please see [Object Priviliges](/other/object-privileges.md).
3. Click the '+' icon to the left of the dropdowns to add the table to your project.

   <figure><img src="/files/IuG99HAhWmtVE8EsKAgh" alt=""><figcaption></figcaption></figure>
4. Once you have added your tables, it's time to configure each table. Click 'Config' to do this.
5. You can also click the 'x' button to the left of a table you've added to remove it.

**Field Details:**

1. First you will configure each field in your table on the 'Field Details' tab.
2. We recommend starting with the "Type" column.  Set all of your ID fields to 'ID'.
3. Check that all your text fields are set to 'Categorical'.
4. Remove any fields that you do not want to include in the model by clicking the 'X' icon to the left of the field.
5. One primary key should be selected (using the P-Key checkbox) if available for each table. "ID" must be selected as the type for the primary key.  This is important, especially when working with multiple tables.&#x20;
6. Click the 'Anonymize' checkbox for any fields that contain PII. These fields will not be included in the training and the results will be generated from scratch.
7. Adjust the 'Format' field where needed.  Only available for ID and datetime fields.  Accepts regex and strftime formats.

   <figure><img src="/files/WKLX4uvzUc61yxbG0Ouu" alt=""><figcaption></figcaption></figure>

**Anonymize:**

**Note**: The anonymize checkbox is only available for categorical fields since continuous fields (dates, numeric fields) are not generated with discrete values from the source data.

When the user selects anonymize for a field, the user must also select the type of output the user expects to replace the field values for that column. E.g. selecting 'name' for the type (person category) will randomly generate full names to populate the field values when generating.

1. Set the anonymization options for each field that contains sensitive data.
2. First, select the Category and Type of anonymization.
3. This will result in random values being generated of that type (using the [Faker ](https://faker.readthedocs.io/en/master/)library). For example, if you have a field of full names, you should select 'person' then 'name' for these two fields.
4. Some types support localization and extra parameters. When this is the case, you will see a dropdown for Locales and a text box for the extra parameters. See tooltips for more details.

   <figure><img src="/files/Mejo4IpvDrh7c2a7RMXF" alt=""><figcaption><p>Anonymization types are listed by category, with a short list of common types shown under the popular category</p></figcaption></figure>

**Tip**: Categorical fields can cause the model training to take longer, along with the number of distinct values for each categorical field. Therefore, the user should be mindful of the number of categorical fields for each table in the model. We also recommend starting small with a smaller number of categorical fields before advancing to different models and more categorical fields / records. This restriction will be lifted in future versions of the app.

**Constraints:**

1. Constraints are rules that the synthetic data must follow. For example, you can set a constraint that the 'age' field must be greater than 18. You would do this by selecting 'Scalar Inequality', then selecting the age field from the dropdown, and finally entering 18 into the input box.
2. Don't forget to click the '+' button to add the constraint to the table.

   <figure><img src="/files/KfE5lSbcaFLdEiym6o9T" alt=""><figcaption><p>This inequality constraint would cause the output publish date field to always be greater than the public version date field.</p></figcaption></figure>

**Relationships:**

1. After you've configured and saved your tables and Field Details (IDs), you must configure the relationships between the tables when working with multiple tables.
2. You must select your parent and child tables, and their respective PK and FK fields.
3. The parent table is the one whose primary key in the relationship is referenced by the child table's foreign key. In other terms, the parent table would be synonymous with the dimension table, while the child table would be synonymous with the fact table.
4. Don't forget to click the '+' button to add the relationship to the project.

   <figure><img src="/files/PgwgFuEwihVYERdwRRp0" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datamynd.ai/application-workflow/data-config.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
