This is the table that explains the dataset that was used to conduct this case study.
Explaining the DataNeuron Pipeline
This is the DataNeuron Pipeline. Ingest, Structure, Validate, Train, Predict, Deploy and Iterate.
Results of our Experiment
Results of our Experiment
Reduction in SME Labeling Effort
During an in-house project, the SMEs have to go through all the paragraphs present in the dataset in order to figure out which paragraphs actually belong to the 8 classes mentioned above. This would usually take a tremendous amount of time and effort.
When using DataNeuron ALP, the algorithm was able to perform strategic annotation on 15000 raw paragraphs and filter out the paragraphs that belonged to the 8 classes and provide 659 paragraphs to the user for validation.
Taking as little as 45 seconds to annotate each paragraph, an in-house project would take an estimate of 188 hours just to annotate all the paragraphs.
Difference in paragraphs annotated between an in-house solution and DataNeuron.
Advantage of Suggestion-Based Annotation
Instead of making users go through the entire dataset to label paragraphs that belong to a certain class, DataNeuron uses a validation-based approach to make the model training process considerably easier.
The platform provides the users with a list of annotated/labeled paragraphs that are most likely to belong to the same class by using context-based filtering and analyzing the masterlist. The users simply have to validate whether the system labeled paragraph belongs to the class mentioned.
This validation-based approach also reduces the time it takes to annotate each paragraph. Based on our estimate, it takes approximate 30 seconds for a user to identify whether a paragraph belongs to a particular class.
Based on this, it would take an estimate of 6 hours for the users to validate 659 paragraphs provided by the DataNeuron ALP. When compared to the 188 hours it would take for an in-house team to complete the annotation process, DataNeuron offers a staggering 96.8% reduction in time spent.
Difference in time spent annotating between an in-house solution and DataNeuron.
The Accuracy Tradeoff
When conducting this case study, the accuracy we achieved for the model trained by the DataNeuron ALP was 93.9% while the accuracy of model trained by the in-house project was 98.2%.
The difference in time spent annotating could offset this small difference in accuracy and the accuracy of the model trained by the DataNeuron ALP can be increased by validating more paragraphs.
Difference in accuracy between an in-house solution and DataNeuron.
Calculating the Cost ROI
The number of paragraphs that needs to be annotated in an in-house project is 15067 and it would cost approximately $3288.
The number of paragraphs that needs to be annotated when using the DataNeuron ALP is 659 since most of the paragraphs which did not belong to any of 8 classes were discarded using context-based filtering. The cost for annotating 659 paragraphs using the DataNeuron ALP is $575.
The reduction in cost is a significant 82.5% and the cost ROI is an estimated 471.82%.
Difference in cost between an in-house solution and DataNeuron.
No Requirement for a Data Science/Machine Learning Expert
The DataNeuron ALP is designed in such a way that no prerequisite knowledge of data science or machine learning is required to utilize the platform to its maximum potential.
For some very specific use cases, a Subject Matter Expert might be required but for the majority of use cases, an SME is not required in the DataNeuron Pipeline.

Leave a Reply