Usually, if you train a model, you will use your own training data. This data is often changed more often than the model itself - it wouldn't be comfortable if you'd have to transfer the entire dataset each time you want to retrain a model, and that's why Iskra provides a special toolkit for datasets. It works for typical scenarios, such as:
- changing the dataset in the development process in order to increase its quality
- programmatic updating of a dataset in order to retrain the model with updated values
- providing different datasets to the same model
- training different models with the same datasets
Workflow with Datasets
Let's say you run a training job on Iskra
iskra train your_script.py
By default, if you run a training job this way, we will simply upload all the working directory to the training machine and execute the script. This works smoothly for smaller amounts of data, but if you want to train something bigger, or more than once, most likely you want to avoid reuploading of the data.
For this, you can easily create a dataset, which is nothing else than a folder stored in our system that can be later attached to any training job.
Enter the folder you want to upload to our infrastructure
iskra dataset create <name>
This will automatically assign the current working directory to a remote dataset in Iskra and upload all of its contents. It will create a
.iskra.json file inside of the local directory which will contain the Dataset id for future sychronisations.
Once your dataset is uploaded, you can create a training job that will be able to read from a dataset of your choice.
iskra dataset ls images (084b430c-2e7f-4d61-b5db-35b2a6d22824) frames (f861c7c6-90cd-4f20-ba76-2b5a93e74580)
In this example scenario, the command above listed two datasets that were uploaded to Iskra -
framges. To run create a training job that makes the
frames (f861c7c6-90cd-4f20-ba76-2b5a93e74580) dataset available in
frames directory of the training job, run:
iskra train -v=f861c7c6-90cd-4f20-ba76-2b5a93e74580:frames your_script.py
Updating a Dataset
If your dataset changes locally, you can synchronize its content contents by running
iskra dataset sync
This depends on presence of
.iskra.json file inside of current working directory.