GIS toolglossary

introduction
data capture methods
design considerations physical extent
resolution (grid size)
themes to be included
classifications
spatial data collection links
 
 

introduction

Data input is the operation of encoding data for inclusion into a database. The creation of accurate databases is a very important part of GIS.

Data collection, and the maintenance of databases, remains the most expensive and time consuming aspect of setting up a major GIS facility. This typically costs 60-80% of the overall costs of a GIS project.

There are a number of issues which arise when developing a data base for a planning or management projects. The first issue is should the data be stored in vector or raster format. Considerations here include:
the nature of the source data e.g. it is already in raster form
the predominant use to which it will be put
the potential losses that may occur in transition
storage space (increasingly less important)
requirements for data sharing with other systems/software

As a general rule it is best to retain the maximum amount of information in the data base. If the data is available as points, lines or polygons then it should be kept that way. If a raster approximation of this data is also needed for analytical purposes then a raster version of the data may be kept in addition to the vector coverage. Many systems provide from quick conversion from vector to raster.

The issue of scale is often raised in relation to GIS data base development. It is important to remember that data stored in a GIS does not have a scale. Sometime people refer to a 1:25000 scale data base. What they mean is that the data has been taken from 1:25000 maps or that it has a level of accuracy which is roughly equivalent to that found on 1:25000 scale maps.

In line with the principle of keeping the most information possible the ideal is to fill the data base with data with accuracies equivalent to very large scale maps. This however may not always be practical as:
the data may not be available at very large scale,
i
t may be too expensive or time consuming to digitise from that scale ,
there may be no envisioned application that requires that accuracy;

so compromises are made.

Problems can arise when some of the data in a GIS is very accurate (drawn from large scale mapping – e.g. urban utilities) and other data is drawn from much smaller scale mapping (e.g. soils). In this case great care has to be taken that conclusions are not drawn on the basis of the less reliable data.

There are several methods used for entering spatial data into a GIS, including:
manual digitising and scanning of analogue maps
image data input and conversion to a GIS
direct data entry including global positioning systems (GPS)
transfer of data from existing digital sources
maps: scale, resolution, accuracy

At each stage of data input there should be data verification should occur to ensure that the resulting database is as error free as possible.

When developing a raster data set for specific purposes there are a number of design considerations. These include:
the physical extent of the data base
the resolution (grid size)
the themes to be included
the classifications to be used within the themes
the appropriateness of scale of input data to the preferred grid size

Physical Extent
Should the data base cover only the area being planned or managed? OR Are there external influences (upstream catchment, major transport corridors, nearby population centres, views to adjacent scenery) which need to be incorporated as part of the planning data base.

Resolution
The higher the resolution the better the approximation of reality - provided the data is good enough to support this resolution. If you assume a certain error in map preparation and digitisation then this translates to a certain on ground error. There is little point in making the grid size smaller than the probable error. A smaller than necessary grid size leads to larger files and longer processing times. In the best case halving the cell size will quadruple the processing time. Experience with raster processing suggests that more than 2 million cells is excessive in most contexts. "The size of the pixel must be half of the smallest distance to be represented" Star and Estes (1990)

Themes
A lot of time and money can be wasted by seeking to build a data base which incorporates all known information about an area. It is first appropriate to determine what questions will the GIS be required to answer and what data is needed to answer those questions. For example, while geological maps of an area may be available they may be of no relevance to the specific decision process.

Classifications
What numbers are to be attached to the grid cells and what will these numbers mean. They may refer to data which is:
Nominal/ Categorical Ordinal Interval Ratio
It is important to know that the range of analytical or modelling operations which are available may be limited by the type of data measure (scale) being used.

Follow the following links to find out more about each spatial data collection method:
manual digitising and scanning of analogue maps
image data input and conversion to a GIS
direct data entry including global positioning systems (GPS)
transfer of data from existing digital sources
maps: scale, resolution, accuracy

 
Note: If you follow any of the above links and want to return to this page either click on your browser's back button, or click on one of the "Specific Theory" links at the top or bottom of the page.
Click here to download all theory presented in this module
references