Starting the training
Iskra provides an easy command to set up a powerful machine in the cloud, install all the dependencies and run your script without any configuration or maintenance. It also creates a safely, cloud stored dataset, which you can later programmatically expand and modify.
After you have installed the CLI in the environment you are developing your training script, run
iskra train your_training_script.py
your_training_script.py can be any valid Python 3 script. Our system automatically resolves all the dependencies based on the local environment you are using. This means in order for your script to work on our infrastructure, you need to have all the dependencies installed locally before you start training with Iskra.
If you are not authenticated, the CLI will ask you for your e-mail address and we will send you an activation code. If you have not set up your billing account, you will be asked to provide your billing details. Don't worry - we won't charge you anything yet. Until end of March, our platform is available for free, and later we will charge according to our pricing section.
$ iskra training.py Enter your email: firstname.lastname@example.org :: You should receive an email with the verification code Paste the code here: <your activation code> :: Transferring metadata... :: Acquiring machine... :: Created training: <training id> :: Locking instance for training. :: Connecting to the machine. :: Training submitted... :: Building... :: Training... <OUTPUT OF YOUR TRAINING SCRIPT WILL APPEAR AFTER>
If you see the above output, this means that the training has successfully started.
Building... usually takes a while at fist - up to 2-3 minutes - that's the part when we recreate your environment in our infrastructure.
If you press Ctrl-C trying to interrupt the process, it will not stop your training, it will just stop streaming the output of your script. You can follow up the output in the provided link to your project, or log in directly at https://app.iskra.ml
If you try to run the train command again while there is any training running already, we will ask you if you want to interrupt the current training or run a new one in parallel.
$ iskra train training.py Project: your-project-name (<project_id>) Process Time CPU GPU qg8MswBFfYO5pTW49lIX 12 min 65% 99% You are already running the above training jobs. Do you want to launch another one? (y/[n])
Behind the scenes
This command will run your training on the default machine we offer:
64 GB RAM
16Gb NVIDIA Tesla V100 GPU
If at any point we can't offer this type of machine, you will be informed about what are the specifications of machine we offer instead, or expected time of waiting for the default machine.
Monitoring the progress
After you start the training job, the CLI will show you a link to the web interface and the output of your training script below that. Terminating the process will not stop the training. If you wish to stop the training job, use the CLI
iskra stop <your_training_job_id>
You can also see your ongoing trainings, machines telemetry, stop or inspect the logs. You can access the web interface at https://app.iskra.ml. You can also retrieve the list of running training jobs for a given project (inside of its directory):
iskra ps :: Project: your-project-name (<project_id>) Running Time Status e41d1301-c1fb-4f2f-8154-06b540098ac4 4 minutes Env Build Started Stopped Time Status Artifacts e799a129-5eca-4ddb-9097-27ab88d1952c 6 minutes Training Failed 22 bytes
and then stop the project. Training job IDs are, obviously, not real.
Retrieving the artifacts
In an usual scenario, you will probably want to retrieve all the information your training script saved to the disk during the training. Whether it's the model with its weighs, or files saved in epochs during the training, you can retrieve this information simply by
iskra download <project_id>
to download the artifacts of the last finished training. These commands will open your browser and initiate download of all artifacts in an archive file.
You can also access all the artifacts (and separate files) in the web interface,in the details details screen of the specific training job.