Once you have framed your problem, you can start to prepare the data. Our software is quite flexible already and takes several datatypes, such as images, PDF, text or sound. Depending on the data type, the input needs to be prepared in a slightly different way. Here is how:

Labeled or unlabeled?

It is possible to upload both labeled and unlabeled data. At some point, training data needs to get a label but this can be done from within the software and Slack.

Quantity of training data

For all: More is better!

Images: At least 20 examples per class. You can get some reasonable results in many cases with 100+ examples.

Text: At least 100 examples per class

PDF: At least 100 examples per class

Sound: At least 100 examples per class. Note: At the moment, this is experimental and performance might be bad. We are working on it.

Format & upload

Images: Images (drag & drop) or a CSV-file which contains one column with all URLs.

PDFs: Same as images

Text: CSV-file with two columns "text" and "label".

Sound: WAV, MP3, AIFF can be uploaded into the platform (drag & drop).


Please feel free to reach out to us and we are happy to support you! We are constantly working to make the process more self-explanatory and your feedback is important for us.

Did this answer your question?