Splitting Hairs: Receipt Image Segmentation
Seeing multiple receipts on a single image is a very common occurrence in VAT analysis. Often employees will attach several receipts to a single page and scan them all together to put in a trip report. Another common case we observe is when credit card slips are attached to the tax invoice from a hotel or car rental company. In all these cases, the image contains more than one receipt that must be handled individually. Even if the information overlaps, such as in the case of the slip and invoice, the slip and invoice contain different pieces of information like VAT ID (which is absent from the slip) and payment method (which is absent from the invoice). The point is that we almost always must split the image to the individual receipts before continuing with the analysis.
Fortunately, we can apply computer vision methods of bounding box object detection and segmentation to help us perform this splitting. This means that we must train the model on annotated receipt data, which in turn means that we must obtain annotations. However, overall, the problem is standard and has a ready solution, which we detail below. The solution allows us to pre-process all incoming images automatically and split them before processing, which greatly reduces manual processing time and effort.
Receipt Image Segmentation
There are two approaches to finding multiple receipts in images: Segmentation and Object Detection. In Segmentation we assume every pixel in the image belongs to one class, e.g. Receipt or Background, and we try to classify each pixel. After making a segmentation map, we can try to split the map to individual connected components – contiguous regions of “Receipt” label. However, in most cases the receipts are overlapping, and the segmentation gives them the same class combining them to a single big connected-component, so we cannot separate them based on the pixel classifications. To combat this, we could use Instance Segmentation, however that requires a different annotation.
The other, more economically feasible and simpler solution is to use an object detector, like well-used technologies that can detect animals, people or cars. To use these models, we need to annotate images with bounding boxes over all the separate documents in the image, overlapping or not. To get the segmentation of the image to individual receipts we effectively perform segmentation-by-detection, where we treat each detected box as a part of the image to separate.
The major challenges that we encounter in our data, which make it hard for our algorithms to work, are:
- Image background is hard
- Flatbed scanning: invoices have no boundaries, white-on-white
- Invoices overlapping
- Unknown number of invoices in the image
- Arrangements: Multiple directions, sizes, layouts, languages, etc.
These object detector algorithms are commonly called “single shot detectors”, since they only need a single image to detect the objects. They’re also called “box detectors” since they split the image to rectangles, and not a pixel-to-pixel map. On the other hand, despite their limitations, they are incredibly powerful, very fast and easy to train with existing tools. Some brand names for these algorithms are Faster-RCNN, YOLO and SSD.
In our data we identified around 5% of images “suffer” from the multiple receipts problem, which is a significant portion of the data pipeline. We trained an object detection algorithm on the multiple-receipt data we have and were able to achieve more than 95% accuracy in segmentation-by-detection.
Using the detection model, we were able to fully automate this part of the pipeline, which resulted in a major drop in our costs related to image cropping. The remaining 5% “errors” are usually attributed to partial extractions, which are in many cases tolerable.
Once again, computer vision models have made a big impact on our business. A straightforward implementation of off-the-shelf algorithms has completely replaced a painful part of our data ingest pipeline, driving costs lower and introducing automation. We were able to increase our data intake without affecting any other part of the business chain. In this case, AI and Machine Learning were clear winners for us and our clients.