We process 40-45K cards every day.
Before they hit the grading room, they need to be identified, this is one of the time-consuming parts of the grading process. Using text-based search engines shortens the amount of time that our research team spends on identifying the cards, though it is not enough to help them finish our backlogs. This brought up a few questions that needed to be solved; What can help to make the process faster? Is there a better way to identify cards more accurately while also being more time effective??
First, let me introduce myself before diving into how I answered these questions. My name is Imir Kalkanci and I have been working at Collectors Universe for the last 7 years. I joined the company with the collectors.com project, where I helped to develop a search engine that aggregates millions of collectible items from various sources. Most recently, I’ve been focusing on computer vision and scalable image management solutions.
A couple of years ago, I started working on a new project to answer the questions mentioned above. The main idea here is using card images and creating a reverse image search engine, similar to Google and TinEye reverse image search. Users can search for images with an image as the starting point, rather than a written search query. We can even automate the process and assign a suggested category before the research team receives the card.
Early implementation
Since we’re looking for exact matches, my first thought was to use one of the image hashing algorithms. For a little more context, image hashing is a technique that converts an image to a fixed size alphanumeric data. I calculated hash values of some sample images using a perceptual hash library, pHash, and I was able to see how they matched with query images. I found that this method is relatively fast and performs well for finding resized images, however other transformations like brightness or rotation lead to a bad performance. With this knowledge, I iterated and reattacked.
My second approach was using more traditional computer vision based algorithms like Scale-Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Oriented FAST and Rotated BRIEF (ORB) to find visual features of the images. Then I would compare detected numbers that are common between the two images. I spent some time with the ORB keypoint detector. This method was feeling more and more promising. It is intended to be rotation invariant and deals well with noise, while being much faster than other feature descriptors. I generated a database from extracted keypoints and compared query images by calculating their distance. Unfortunately, with the growing number of data, the quality of the search result decreased. Selecting meaningful and strongest keypoints started becoming a challenge I did not originally anticipate. Cards that featured heavy text or busy imaging had increasingly poor results.

Finding the right solution
We are getting closer to the solution, so stick with me here. I continued my research and realized that a convolutional neural network (CNN) could be the answer I was looking for. I started with image classification models, selected a couple of categories and trained a network. It really helped me to understand the main concept of machine learning. Search results were also very good within this small dataset. Without a doubt, labeling thousands of cards and finding a lot of images for the training wasn’t scalable. One of the operations in this method converts input images into feature vectors. They are basically collections of a few thousand floating-point values and contain the most important and noticeable components of the images. Now this is something we definitely can use! I began extracting image features from using pre-trained models of CNN and created a database that allows me to run approximate nearest neighbor searches in high dimensional space.
Today, our image search engine helps the research team identify items easily and efficiently. We see more opportunities for counterfeit or error detection with growing numbers of data. Personally, this project helped me to work in different areas that I’m not familiar with. I had the opportunity to learn new techniques and tools that scale up the business to the next level.

Imir Kalkanci
Imir Kalkanci is a Staff Software Engineer at Collectors Universe. His love for programming started at a very young age. Since then he has been a part of various projects that he enjoyed working on. Astronomy and space is another passion of his and he loves stargazing and astrophotography.
Related Posts
March 23, 2022
Welcome to the Collectors Universe Tech Blog!
Welcome to our new Tech Blog. Many of you have seen the multiple exciting…
February 1, 2022
The One Trait Of Legendary Engineers
My first job out of college was on a team working on application infrastructure…
January 9, 2022
Building A Data Platform From Scratch At Collectors
This blog post is an account of my first six months at Collectors Universe…