|
2010 MetroGIS Address Points Database Specifications – draft
The MetroGIS Address Workgroup has created address point database specifications based on the draft National Address Data Standard. The MetroGIS specifications are in draft format and will continue to change modestly. Unresolved issues are highlighted in the draft. We share this draft because so many people have asked for it.
1997 Address Guidelines
MetroGIS created a set of address data guidelines back in 1997. While somewhat dated, these guidelines are still useful to many organizations, so we have chosen to keep them on our web site.
Preface
Introduction
Types of
Addresses
Elements of a Street Address
Developing Address Data
Bases
Address Parsing and Field Concatenation
Relationship
Between Parcels and Addresses
Geocoding and Address Matching
Bibliography
Appendix
This paper is intended to provide information and guidance to anyone working with address data. It explains
the important issues involved in incorporating address data into a GIS. It also describes potential pitfalls and
provides specific examples to help the user understand the issues surrounding the use of address data.
MetroGIS Standards Advisory Team July 1997
What is a "Standard"? What is a "Guideline"? In April of 1993, the GIS Standards Committee of the
Minnesota Governor's Council on Geographic Information developed a project plan which included the following standards
related terminology. This information provides a useful frame of reference for the discussion of addressing issues and
guidelines that follows.
Policy A high-level overall plan, defining a course or method of action and embracing the general goals
and acceptable procedures of a governmental body, to guide and determine present and future decisions. The following
concepts are methods to implement policy.
Standard
- a definite rule, principle or measure,
- established or formally sanctioned by an authorized body
- requiring adherence from all organization statutorily obligated to do so
Convention
- a general agreement about basic principles or procedures
- some written description of the agreement is usually prepared
- a convention can become a standard if formally sanctioned by an authorizing body
Guideline
- a recommendation that is developed to detail a proposed practice or procedure developed for local
('in-house') use
- no formal or semi-formal agreement to comply with a guideline is necessary
- a guideline can evolve to a convention if adopted by a larger user community
Why are addresses important? A Geographic Information System (GIS) relies on the
framework that it is built upon. A critical element of this framework is its resolution or the level to which we can
reference unique entities and map them in some way. In the real world, addresses are the most commonly used and
smallest unique identifier. They are often used as the primary link between individuals and locations. Addresses have a
much more user-friendly quality than other identifiers such as property identification numbers (PIN's). For example, a
citizen who may wish to extract data from a GIS would find it much more intuitive to be able to query a data base using
an address than by most other means.
What is the benefit to using a standard address format? Because many datasets are
geographically referenced by an address, defining and using a standard address format will increase the ease with which
these datasets can be incorporated into the GIS for mapping and analysis. And because addresses are so often used as a
means of communication between and within organizations, standardizing addresses will increase an organizations ability
to share these datasets with other organizations. Standard addresses can also increase the efficiency of automated
applications. For example, they may make locating addresses on an E-911 system more efficient and accurate or usable
over a wider area covering several communities.
Standardizing addresses may also save you money on bulk mailings. The U.S. Postal Service offers reduced bulk
mailing costs for those organizations which utilize and adhere to the USPS address standard (discussed below).
Furthermore, software products are available which can read addresses and convert them to the appropriate USPS
standard, when possible. Even if a bulk mailing is not the intent, these software products may assist in standardizing
an address data base which is known to have errors or inconsistencies.
This document is intended to be a guideline, not a standard.
The recommendations outlined in this document are intended to provide information for those who wish to work
with addresses and are in no way meant to be adopted as a set of mandatory rules. This guide should help those who wish
to create new data bases which contain or access address data. The information should also assist those who work with
existing data bases and intend on transferring or sharing data.
Many different agencies have adopted a variety of address rules for their data bases for good reasons. There
is no advantage in undertaking the expense of recreating existing data bases when guidelines can be followed which
allow for the data to be transferred into formats which can be used by many other agencies in a wide array of
applications.
The U. S. Postal Service Standards Several different aspects of address information can
be considered for standardization. These range from the use of capitalization and punctuation to data base design and
address matching procedures. While this document provides some guidelines in a variety of areas, the U. S. Postal
Service has developed detailed standards that deal with address "format" and "content" to help the users and developers
of address information. We strongly recommend that developers of address information consider the USPS standards when
developing or modifying address datasets. Copies of these addressing standards may be obtained from the U.S. Postal
Service National Customer Support Center at 1-800-238-3150 or via the Internet at www.usps.gov. Request Postal
Addressing Standards (Pub. 28).
There are two main types of addresses used:
- Postal or Mailing Addresses
- Situs Addresses
In both cases, many different systems have been adopted by different organizations such as the U.S. Postal
Service, U.S. West, NSP, Counties, Cities, etc.
POSTAL/MAILING ADDRESSES
- These are used to contact individuals or organizations.
- These are used for legal contacts, billings and notifications, for example.
- As recorded by local government agencies, these addresses are often out of state or even out of the
country, and so may not be the same as its related site-specific or situs address. (see below)
- These addresses differentiate among parties of interest in land, for example:
- property owners for tax notifications
- business addresses of corporate entities or tenants which differ from the property owner
- lease holders
- Post Office Box addresses do not actually record the street details as part of the address for the
purposes of mailing. (See the "Elements of an Address" section for more details.)
SITUS ADDRESSES
- Are not used for the contact of individuals, but instead for the relating of features to a specific
location
- These are site-specific location or service addresses used for such things as:
- emergency response
- location/identification of local government infrastructure such as city owned land
- location of suites, shops or offices within a shopping center or business park
- These addresses can be used in a variety of ways, for example:
- Intersection addresses: These may be identified, processed and coded differently than with more
typical street addresses. (See the "Elements of an Address" section for more details.)
- Landmark addresses: These can be used for geocoding purposes through the use of an address alias.
(See the "Elements of an Address" section for more details.)
- Land use identification: Land use information can be attached to these addresses to indicate whether
a property is vacant.
- Situs addresses are a common data element used in local government. Assigning an address to every building
and undeveloped parcel (e.g. park, undeveloped lot) can be useful for such things as directing someone to a park or
getting an emergency vehicle to a vacant parcel. However, giving an address to vacant parcels also has a drawback in
that the address may change once the parcel is developed.
The following is a breakdown of the key elements that make up a typical street address. Please note that the
U.S. Postal Service has developed addressing standards that include capitalization, punctuation and abbreviation of
addresses. While the examples below adhere to those standards, they are only a subset of what is available in the
Postal Service standards.
- Street number. (3186 PILOT KNOB RD)
The street number is typically an
integer value, but it may also include alpha characters, (e.g., 142A or 216 1/2). How these addresses will be located
depends upon your geocoding software.
- Prefix direction. (156 E 18TH ST)
The location of a direction
designation may vary within an address. Some software products require the directional field to be placed in the prefix
position and others in the suffix position in order to ensure the best results in address matching. (e.g., N 1ST AVE or
1ST AVE N). USPS street direction standard abbreviations: N, S, E, W, NE, SE, NW, SW
- Street name. (3334 CEDAR AVE)
Care should be taken to insure that
street names are not abbreviated or misspelled. In some cases streets may be known by more than one name. In these
cases an alias or cross reference may be needed. Streets with numeric names may need to be entered as 1ST ST rather
than FIRST ST.
- Street type. (3334 CEDAR AVE)
Street types need to be entered using
USPS recommended abbreviations.
- Suffix direction. (1200 34TH ST W)
(see prefix direction)
- Unit Number (14955 GALAXIE AVE STE 300)
Some common unit designators
are APT (Apartment), STE (Suite), DEPT (Department), and the # sign. (See Postal Addressing Standards)
- City (MINNEAPOLIS MN 55406)
Spell city names in their entirety when
possible. When it is not possible, use the 13 character abbreviations from the USPS City State File.
- State (MINNEAPOLIS MN 55406)
Use 2 letter USPS State
Abbreviations.
- Zip Code (MINNEAPOLIS MN 55406)
Zip Code or Zip+4 Number
Other Types of Addresses
- Intersections are a common type of address. You may want to identify, process and code these differently
than street addresses. For example, one common method is to separate the intersecting streets with a space - forward
slash - space (e.g., PILOT KNOB RD / 150TH ST W).
- Landmarks are another useful address type. It may also be useful to alias street addresses in a
non-geocodable format in order to work with things like landmarks (e.g., DAKOTA COUNTY COURTHOUSE).
- A Post Office Box, with no street address. (e.g., PO BOX 146, HAMPTON MN 55031)
- These are just a few examples. For questions concerning the proper coding of these and other addresses,
consult the USPS Postal Address Standards booklet.
Other Things to Consider
- To help resolve duplicate matching addresses ( e.g., 150 MAIN ST ) which may exist in more then one
location in a City or County, you may need to use additional information (e.g. city or ZIP code).
- Be aware that for bulk mailers, the Postal Service has started using the concept of Zip plus four plus
two. The plus two is the two right-most characters of the house number. The zip + 4 + 2 is then sorted by ASCII value,
but with the odds and evens of the + 2 into separate lists for each of the unique Zip + 4 groups. This, in theory,
yields a mail sort in the actual delivery order. This enables the bundle of mail to be given directly to the mail
carrier without preprocessing. The real message here is that if your organization does bulk mailings, make the effort
to understand the Zip + 4 + 2 sorting.
- Any particular geocoding software may allow some flexibility in matching addresses, but careful data entry
following these recommendations will help ensure accurate matching!
In addition to having standardized addresses, it is also important to have a well designed address data base.
In this section you will find helpful tips and a good example of a single file for storing addresses. For storing large
amounts of address information, or for using address data with a variety of other datasets, a relational data model is
highly recommended.
Address Standardization Software Several software products are available which can read addresses and
convert them to the appropriate USPS standard, when possible. While this can save you money on bulk mailings, it can
also assist in standardizing an address data base which is known to have errors or inconsistencies. Two examples of
these products are Acumail and Postal Soft.
Address Tips from URISA The Urban and Regional Information Systems Association (URISA) develops and
presents workshops on a variety of topics related to GIS. Below are some tips from a workshop on addresses presented by
Peirce Eichelberger at GIS/LIS96 in Denver, CO.
- Many problems can be avoided by developing the proper data base model up front
- Need to have a street names synonyms cross reference (e.g. both Main St. and Hwy. 5). This will require a
link between the tabular data base and the street layer in the GIS.
- Normalize the data base. e.g. a given street name only entered once (spelled correctly). Thus, if there
are 100 parcels on the street, the street name is only in the database once. When entering the address, the system
should look up the street name and put the foreign key with the parcel (see Appendix C).
- Have a domain table (see Appendix C.) for street types (e.g. RD ST AV, etc.). and have a data entry
application to allow only these to be entered (e.g. could automatic change AVE or Ave to AV)
- For the previous two bullets, these help people do their job more accurately and faster. They will
actually thank you for implementing these things.
- Suggest data standards for everyone working with addresses.
- The addressing application, or at least the data base, should be outside of the GIS software (e.g. not in
INFO tables). Otherwise too much of it is hidden and not accessible to all of the numerous non-GIS applications for
addresses.
- Below is an example of a single file for address data. A relational data model is highly recommended for
storing large amounts of data.
Dakota Co. Address File Example
FIELD NAME (FIELD DESCRIPTION)/FIELD LENGTH
ST_NUMB (HOUSE NUMBER)/10 CHARACTER
ST_PDIR (PREFIX STREET DIRECTION)/2 CHARACTER
ST_NAME (STREET NAME)/20 CHARACTER
ST_TYPE (STREET TYPE)/4 CHARACTER
ST_SDIR (SUFFIX STREET DIRECTION)/2 CHARACTER
CITY (CITY)/27 CHARACTER
UNIT (UNIT NUMBER)/9 CHARACTER
STATE (STATE)/2 CHARACTER
ZIP_CODE (ZIP CODE INCLUDES PLUS 4)/ 10 CHARACTER
PLUS2(ZIP CODE EXTENSION)/2 CHARACTER
(THE NEXT 4 COULD BE COMBINED INTO A SINGLE FIELD CALLED "OTHER_ADD")
ST_INT (INTERSECTION)/45 CHARACTER
LANDMARK (BUILDING NAME)/45 CHARACTER
PO_BOX (POST OFFICE BOX)/12 CHARACTER
OTHERADD (NON STANDARD)/45 CHARACTER
Field Concatenation The parts of an address being stored in separate fields of a database may need to be
appended or joined together. The process of putting these address fields together is called concatenation. It is easier
to put fields of data together or concatenate them, than it is to break fields of data apart or parse them.
Fields are Stored as Character Fields Because these fields need to be concatenated together, the fields
should be stored as character fields. If the address fields are stored as numbers, the fields will be added like
numbers instead of appended together like characters.
Another reason for storing the address fields as character fields is because they contain non-numeric data.
For example, ZIP_CODE will contain a dash between the first five digits and the ZIP+4 portion of the zip code, i.e.
55124-8579.
Software may Determine Need The software being used may determine how the address data will need to be
formatted. For example, to perform address matching in Arc/Info or ArcView, the address fields need to be one field of
data.
Address Parsing Address parsing is the process of taking an address entered as a single field of data and
breaking into the component fields of the address.
County governments (and some cities) are responsible for identifying parcels for the purposes of property
taxation or the recording of an interest in a piece of land. In order to uniquely identify each piece of land, a PIN or
Property Identification Number is assigned.
In many cases, there is a direct correlation between the PIN and the property situs address. Such simple cases
occur with single family residences where one house is owner-occupied and the PIN can be linked to the situs address
which will also be used as the postal address.
There are many exceptions to this situation though, such as non-homestead residential properties where the PIN
relates to a situs address, but a different postal address may be used for taxation purposes.
Many PINs relate to parcels where multiple tenancy exists, either in business or for residences. In these
cases, there is a one-to-many relationship, with the PIN linked to the corporate taxable property for example. The same
PIN though, should be attached to each of the numerous tenant addresses, which themselves may be situs address and/or
postal addresses.
PINs may also relate to undeveloped parcels which, nevertheless, could have a situs address for local
government to be able to accurately determine future addressing for uses in emergency response to a particular location
in the community.
PINs can relate to non-contiguous land which is under the same ownership, such as a farm with separate land
holdings. In this case, the PIN must be linked to the multiple situs addresses of each separate piece of land and also
to the postal address for the entire taxable entity.
Geocoding is the process of creating geographic coordinates for geographically referenced tabular data. In
other words, a geocoding process will allow one to derive precise coordinates on the surface of the earth for things
like
- addresses
- mile posts
- parcel identification numbers (PINs), and
- public land survey information (PLS).
One form of geocoding is address matching. Because addresses are the geographic identifier for many databases
(for example a database of city residents), one can map a variety of information in such a data base by using an
address matching process. Through this process, an address can be matched to data in a GIS and a geographic location or
coordinate (longitude and latitude or X and Y value) can be assigned.
Street Centerlines One type of GIS layer is a "street centerline" layer. While this layer sounds as though
it is used to map the yellow line in the middle of roads, it is in fact used for defining the location of roads in
general and also to define the location of addresses. This is done by assigning the range of addresses that exist on
any particular segment of a road or street. A centerline layer will store geographic coordinates for the end points of
the street segments, designating them as a from point and a to point. Then, if the "from" and "to" address on each side
of the street is defined (address range), addresses can be geocoded by interpolating between these points and creating
coordinates at an offset left or right of the centerline.
In this example 1151 State St. is an interpolated point based on the address ranges.
Thus, in order to perform address matching functions with a street centerline layer, for each street segment
that layer must contain information about street number, street name, prefix direction, suffix direction, street type,
and address ranges based on four fields:
- from address on the left side of the street
- from address on the right side of the street
- to address on the left side of the street
- to address on the right side of the street.
Unfortunately addresses are often entered into databases (coded) inconsistently. Spelling errors and
non-standard abbreviations affect the number of hits (matches) on a particular database. Addresses that are not matched
can be processed one at a time, and spelling errors and non-standard abbreviations can be fixed during this process of
reject processing.
Address Ranges, Actual vs. Theoretical When putting address ranges in a database of a street centerline
layer, a decision must be made on whether to use actual address ranges or theoretical address ranges. For example, a
given street segment might have three houses with the house numbers of 123, 135 and 143 (actual), while the city has
designated that houses along this street should be numbered between 100 and 200 (theoretical address range). The actual
address is much more accurate and reflects reality (123 - 143), but if a new house is built or otherwise a new address
is added, the address range may need to be modified. If theoretical ranges are used, then no maintenance of the address
range is needed.
Your intended use of the data should help determine which method is preferable. Your existing data sources may
also be a factor in your decision. For example, you might not have the actual address ranges. Conversely, there may be
no theoretical address ranges in some areas.
Address Points Address ranges, by their very nature, are imprecise. They can only offer an approximation of
the actual location of a specific address. Address points, however, can be considerably more accurate with regard to
geography. For example, in the diagram below, 1145 State St. would appear to the left of the midpoint of the street
segment with an interpolated address range. The address point shows its actual location which is to the right of the
street segment midpoint.
Centroids of parcels may provide address points that are suitable for some purposes. Remember though, that one
parcel can have many addresses, and one address can consume many parcels. Again, the use of the dataset should help
determine what type of dataset is needed.
Geocoding Other Types of Data Mile Posts work very similar to addresses. A mile post number can be assigned
to each line segment in an address centerline layer. Through an interpolation process, like geocoding by address, a
geographic coordinate can be created for a mile post. The difference between mile posts and addresses is that mile
posts do not have left and right fields coded. This is because mile posts normally apply to the centerline or physical
road surface while addresses generally apply to residences or businesses located back from the road surface.
A Parcel Identification Number (PIN) is a unique identifier associated with a defined piece of land for
purposes of tracking taxation information. Sometimes non-taxation related databases will also include PINs (e.g. city
owned land, crime locations, etc.). Since a PIN is generally associated with a point or polygon in a parcel GIS layer,
any database with the PIN in it can be geocoded by matching the PINs. Geocoding by PINs assures that the related
databases are assigned to the correct parcel. The down side to geocoding with PINs is that PINs in the GIS parcel layer
change over time, and some are even retired (no longer used). Because of this, all of the databases that use PINs must
be maintained (updated) in conjunction with the GIS parcel layer in order to keep the geocoding process accurate.
In the Public Land Survey System (PLS), the aliquot parts of a section, like quarter or quarter-quarter, can
be used for geocoding as well. The PLS GIS layer has the sections (square miles) of townships broken down into these
aliquot parts. The fields used for this are: section number, township number, range number, quarter section, and
quarter-quarter section (forty). The number of quarters will determine how accurate an interpolated geographic
coordinate will be. For instance, a quarter section code will result in about 1320 feet of possible error while a
quarter-quarter section code will result in about 660 feet of possible error.
Postal Addressing Standards, U.S. Postal Service, Publication 28, August 1995
Copies of these addressing standards may be obtained from the U.S. Postal Service National Customer Support
Center at 1-800-238-3150 or via the Internet at www.usps.gov. Request Postal Addressing Standards (Pub. 28).
Eichelberger, Peirce. "The Importance of Addresses: The Locus of GIS" in 1993 URISA Proceedings pp. 212-222
Example of creating mailing labels by buffering a parcel using
ArcView and Microsoft Word.
|