This paper concerns a study of information content in postal address fields for automatic address interpretation. Information provided by a combination of address components and information interaction among components is characterized in terms of Shannon's entropy. The efficiency of assignment strategies for determining a delivery point code can be compared by the propagation of uncertainty in address components. The quantity of redundancy between components can be computed from the information provided by these components. This information is useful in developing a strategy for selecting a useful component for recovering the value of an uncertain component. The uncertainty of a component based on another known component can be measured by conditional entropy. By ranking the uncertainty quantity, the effective processing flow for determining the value of a candidate component can be constructed.
Sargur N. Srihari, Wen-jann Yang, Venu Govindaraju