Translation

 

Optical Cash Registers

By Anthony Matarazzo

 

            Recently, I had the personal opportunity to study consumers from the sales clerk side of the task. I placed on my trusty uniform issued in only sparkling white with various emblems of billion dollar businesses on it and went to work. I easily mastered the cash register. But I was more interested in the people. I found that the general public has a lot of different people. Smelly, stinky, ugly, beautiful, intelligent, stupid, deaf, blind, repugnant, brilliant, impatient, impatient, impatient, impatient. Ahh, that’s about the only problem I can solve for these rude rats: impatience, or rather reduce the time they are in the store. I propose that by using optical recognition during the sales process, multiple objects can be rang up at once saving precious time for the consumer.

Impatience, the customers would sign, tap their foot, cross elbows, and bounce in place. Even when the time out of the door was less than a minute there were many visible signs that improvements were needed. I found that some customers would leave the store when five or more people were in line. What was the hold up? Me? I was running the cash draw at top speed, “click a ding ding”. But I did not let any of their sly remarks hurt my feelings (sob). Instead, I focused on what the customers actually did while they were in the store; that is, their physical actions.

The customers would line in a single file, at most times one by one, placing their items on the counter. I noticed almost immediately that each customer typically had less than five items of various shapes, colors, and sizes. Drinks standing some spills, bags lying down, and other items lying with the largest area down. From a visual perspective the objects were clustered nearer the customer and not the clerk. At most times a distinct space existed between the items; a natural artifact of the scene. At times objects were one on top another like two boxes of candy or two bags of peanuts.

In the current system, I would pick each item up, locate the UPC code, jiggle it in front of the bar code reader and place it back on the counter after a Churrrrrp. The chirp means that the cash register read the UPC. That’s changing hands too many times. Three hand changes (customer to counter to me to counter) not to mention the hand elbow and arm extension I had to perform to pick up the item.

In addition, on approximately one out of every tenth item, a rescan had to be done. Ahh, it is the plastic around the UPC that had to be flattened out, or the color of the packaging was too reflective or the UPC was printed directly on the plastic seam of the bag.  Sometimes I had to enter the UPC manually. In this case a good thirty seconds was spent. The end result was that that time was lost to the consumer. While as an individual we may not think that an extra thirty seconds can be a big deal, as a convenient store consumer in a line it obviously matters. With my optical recognition system specifically designed for the retail store transaction model, customers will wait less in line and leave the store sooner; that will attract repeat business.

Optical Cash Registers are just around the corner. It is no lie that when bar codes came out, many people would only go to the stores that had them so they could get out faster. Now that we have all grown accustom to this speed of checkout, as a consumer we are still impatient. It is as if the goods are ours as soon as we place our hands on them and paying for them is simply a byproduct of owning them. An optical cash register can reduce the sales cycle time because it can scan the items while they are on the counter and scan multiple objects at once.

Scanning the items while they are on the counter will save the clerk a lot physical action. They will not have to pick the item up, find the UPC, wave it in front of the scanner and finally place it back on the counter or in a bag. This is a savings in time from three to thirty seconds. Scanning multiple objects at once will multiple the time savings. So if a customer has four items, they will be rang up in ONE second rather than thirty seconds or even a minute.

But how would one solve this problem. I propose that by matching colors, general shape and selected patterns of a multi-camera feed a highly efficient object recognition system can be developed for the retail market. A technical flowchart of the process is shown below.

The Main Loop of the process will first determine if any items are in the view. This can be accomplished by using a single camera view. The detection will be achieved simply by comparing it to a static background that was set earlier or by requiring the cashier to press a button when all items are in view. The system should  also take into account hands being placed in the view and perhaps other objects like a sheet of paper just for durability. When erroneous items are present, they should not be considered as part of a transaction. This could be accomplished easily by the cashier or by automatically seeing the item as part of the static background. One could include it when an amount of time has past while and the object was not used in the sale. Or by including various items related to the counter or desk as part of the recognition database. It will not be a lot of items as the sales clerk’s life is extremely limited.

When a start of sale has been confirmed, all the cameras should be queried for their ImageProcessor Object. The image processor object contains the necessary functions to operate on the image as well as the image itself. Color correction should be performed at this stage if needed. For example, it may be easier for the recognition routines to only handle a specific color depth. As well I am sure  because of the constraints on the environment preparing the image in some way could as well speed up the recognition process and perhaps the color matching routines. Erasing artifacts of the image, specular light reflections

Next, the system has to determine where each of the objects are in the different camera views. It has to match the corresponding images with their counterpart. Remember that the background has been erased therefore each object will be on a black background. Then you must identify each of the items in the areas for each of the cameras. This is accomplished by the ItemMatcher object. As well, it should be understood that occlusion errors can be found at this stage. For example, let us say that camera one has the view of two boxes and one bottle. However in Camera two it only shows one box and one bottle. At this point, the occlusion flag should be marked on the items. It is also possible that given the different camera views, the system could resolve the occlusion. In this case the system should continue.

Once each item has been resolved for each of its views, the system should start a recognition thread for each item. I recommend a pooled source with a limit on max threads.

Once the recognition threads have completed, the result should be checked to ensure all items have been found. If not it should be reported to the cashier to move the items around. The cashier might have to move them so that more space is between them or un stack the items so that one item does not occlude another. Just as a punishment to the consumer we could embarrass them by a naughty buzz. Because it will eventually become common custom that other customers would yell at one another rather than the poor sales clerk. “Hey fix your occlusion!”

The recognition thread, figure C, will do most of the complex work for the system. There will be one thread per item. The 3D image generation occurs at this level. Its functional correctness is critical to the success of the system. It generates a series of X,Y,Z points from the camera views.

3D Image generation from a digital source is relatively new. There are many current research studies being conducted for the best method. However, most of these studies deal with environments that are unconstrained, that is, the environment’s background vary. For this project there are many known facts about the environment that can be controlled that will have a direct effect on performance. For example, just having a known background will automatically identify the specific objects we want to recognize. So the system will only work with items that need to be scanned. Secondly, since the camera angles of our setup will be known, the finding and matching up of significant points will be much faster. I am investigating the article “3D Object Modeling and Recognition Using Local Affine-Invariant Image” by Fred Rothganger, Svetlana Lazebnik, Cordelia Schmid and Jean Ponce for possible modifications to this fixed environment. Perhaps using a weighted triangle reduction in preprocess for the recognition engine.

Next, the GeneralShape object will determine what the object’s form is. This is achieved by analyzing the points gathered from the 3D image. This analysis will determine sub shape characteristics of the product. A typical characteristic might be that the item is narrow and closed at the top while it has a large based on the bottom. That would be that it might be a bottle. Or perhaps it is a box shape, which might lead to the conclusion that it is a box of candy. Most of the analysis routines will utilize the distance between two or more points to determine the existence of a these characteristics which is why 3D image is so important.

Knowing the general shape of an object will greatly reduce the amount of data that has to be searched. As well, some objects have properties that will allow the Emblem object to perform more efficiently. For example, on most bottles, the label is near the upper half of the bottle.

The Emblem object will scan the 3D Image object to find unique characteristics incorporated into the package design. This could be the company logo, lettering type, a UPC box, a specific gradient pattern on the item or other types of visual distinctions. Since multiple types of emblems will be stored in the database, the chance of matching will be greater because this takes into account that sometimes the object’s back is facing the camera.

The last criteria that will be gathered from the 3D image will be a color sample. The color sample is a standard deviation of n * n pixels across the entire area of the object. Using an overall color of the object, or the top two, will as well reduce the amount of data that has to be matched. For example, a bottle object that is comprised mostly of black and red would probably be a Coca Cola product.

The last and final stage of the recognition thread will be to search the database. The product information will be stored in a relational database such as MSDE, but efficient key searching must be maintained. The database will contain the size, general shape, overall colors, color sampling, and emblem samples. If the developer so chooses, a specialized structure may be used. I recommend using compiled SQC or direct ODBC calls for efficiency. The primary key should be the general shape followed by overall color, and size. I believe this will return a small dataset that will be matched for likeness against using the emblem and color samples. The likeness compare is processor intensive as all records returned in the dataset have to be compared and graded for statistical likeness. The top one will be returned.

            After the recognition thread completes, the database should be searched for specials on the item (see figure D). This information could be stored along with the item’s description in the database. If the flag is set for a special two for deal, the price should be updated to reflect this.

            Next the total should appear on the cashier’s screen and customer’s screen. The appropriate functions for entering the amount received from the customer should be readily available as described in the interface requirements section of this document.

            One important feature of this system is that the inventory for specific items will be more accurate. This is because at most times when the customer chooses products based on a two for one deal or a special price the cashier currently only scans one of the items while placing the register in a special sale. The end result will be that the two specific items inventory will be updated rather than just the one.

 

So now with this model the customer’s experience would be:

 

In conclusion, general shape, emblem and general color are the criterion that would be used to cross reference a database similar to a UPC code. The key matching will not be done on an exact basis but more “likeness” using a statistical grading system. The system of course will have to exist in software first. Then I would get a system profiler to decide which components can be built inside hardware. I believe NVIDIA or Intel may be able to place some of the algorithms for image recognition; after all they are really just image processing routines. Remember it must be real time and with the process I described above, it can be a reality.

 

 

 

Programmers view of the Process of Recognition 

 

1.      Trace edges of the image and find each distinct object in the view. This would make a great object in C++. It will be a set of routines that given a static background will locate the differences from that background. Separate each pixel of the sub image into a corresponding color bitmap buffer. Place each item into a stl vector or programmers choice. Inherit COM model that permits ATL collections for ease of interoperability. As well, the object must be able to run as a thread. For ease of use I will call it ImageProcessor. As well, it must decide by the light, shadow, and edges if two objects are on top another. It should flag this with a property inside the stl vector.

2.      Write an object that gathers information from a digital camera. We will call it Camera. Provide object connections between ImageProcessor and Camera. That is the data must be in its own object. This image data must be compatible throughout the system. This is described in 3.

3.      Write an object that can hold an image and perform color balancing on it. It must as well be able to perform color balancing on certain sections of the image, while not affecting the other parts including odd shapes. Saturation, contrast, sharpness, edge tracing, blurring most of the convolution filters. There may be a third party tool for this, but I would prefer owning the code.  For simplicity I will call the object Image. The object must also be able to find the edges of the object and rotate the image so that the image is facing a normalized, that is predetermined, direction. A set of routines must be written that decide which part of the object is the top and which is the bottom. As well, the object must synchronize the image with its counter image from the other camera. Could solve the problem with the bucket solution by placing north, east, south, and west facing images in the database.

4.      Write an object that will start threads that read a camera. The read from the camera will report if anything is in the view. This will be the main process. It will loop until an object is located. After an object has been located. All of the other camera objects will be queried for their ImageProcessor Object. So there should be one ImageProcessor object for each camera.

5.      Now that each ImageProcessor object has been retrieved it can be shipped to an object that locates each view of the object between each ImageProcessor. That’s multiple pictures of each customer item. It could be called ObjectMatcher. This could easily be achieved by known location information from a known reference point. The ObjectMatcher must also take into account that some objects may be on top of one another, the information gathered from the Image object (see above).

6.      Combine these images together to create a three dimensional map of the object, or a one eighty view. This will be the specialized image object used by the recognition routines. It will use the Image objects methods to normalize the color set of the given object. Rotate the images so that all three items have been sized using high quality filters. It will contain information that describe the size of the item, over all color of the item (what are the top two colors?) and standard deviation of a given area. The sizing can be accomplished by placing a measurement picture during setup or better still as part of the counter top so that it is self calibrating. A grid. This object will be called 3DImage.  One thing however, I know algorithms exist that perform this function already. The point is I do not believe we need an enormous about of points to decide the GeneralShape accordingly described below.

7.      The 3DImage object will be shipped first to the GeneralShape object. This object will decide if the object is a Bag, Bottle, Soda Bottle, Candy Box, Candy Bag, Cookies or Other. Using these specific terms instead of the basic 3D primitives or thereof a combination, should inform the developer that the GeneralShape object will be used to describe a higher level object, a real world one. Thinking of it in this way will greatly improve the searching efficiency of the system. The GeneralShape object will have a property that reports the real world object. Someone will have to run the inventory to determine general shapes of items in convenient stores. .

8.      Next the 3DImage object will be shipped to the EmblemObject which will identify product lettering as well as company logos. I believe that a set of routines can be developed that arrive at the same emblem location each time a scan is done on an image. That means internally an emblem may not be the actual company logo, but rather an identifying mark or series of marks that make the object most unique in color. Perhaps data search set could be reduced by reducing the emblem to shape type.

9.      Perform color sampling of the 3D Image over an N sized grid. This information will be used as key matches.

10.  A database will exist that contains the size that contains the size, general shape and several associated emblem marks. Scan the database and find the most likely match given these criteria. Special image compares should be done not for exactness but for likeness. Each search will reduce the dataset but statistics on the matching color sample in step j must be kept that the top matching item is picked. So the Key set will be GeneralShape, EmblemObject, and ColorSamples.

11.  Perform this function for each item in the field of view.

12.  Use the standard cash register sales technique for reporting funds to collect, change, etc.

 

 

Futher Research

           

a.      Why Three D?

b.      What advantages does a system like this have over the current system.

c.      Cash Register Functions

d.      The existing Cash Register and hardware requirements for the new system.

e.       Displaying map that informs cashier of items occluded.

f.        Running an inventory to show the computer system the objects it will be working with.

i.         Inventory will be better tracked because the actual item is range up. Sometimes when the cashier has two or more items, they will press the two for one and scan only one of the items. At most times the items are different inventory items. For example coke products.

g.       Building the database.

h.       Cost of system development

i.        Competitors and existing products

j.        Integration with current systems.

k.      How would this product make money and pay for itself?

l.       Rollout and installation requirements

m.     Training using onsite and multimedia course.

n.      Manager functions for inventory. Historical and statistical based ordering methods that adopt over the lifetime of the system’s use. Since the inventory is tracked better, the system can also report which items the manager will probably have to order.

o.      Safe management

p.      Oil sales

q.      Credit Card and Debit Card processing

r.        Feasibility.

s.       What resources are needed to complete the project.

 

 

 


Close Window