introduction
Data input is the operation
of encoding data for inclusion into a database. The creation of accurate
databases is a very important part of GIS.
Data collection, and the maintenance of databases, remains the most expensive
and time consuming aspect of setting up a major GIS facility. This typically
costs 60-80% of the overall costs of a GIS project.
There are a number of issues which arise when developing
a data base for a planning or management projects. The first issue is
should the data be stored in vector or raster format. Considerations here
include:
the nature of the source data e.g. it is already in raster form
the predominant use to which it will be put
the potential losses that may occur in transition
storage space (increasingly less important)
requirements for data sharing with other systems/software
As a general rule it is best to retain the maximum
amount of information in the data base. If the data is available as points,
lines or polygons then it should be kept that way. If a raster approximation
of this data is also needed for analytical purposes then a raster version
of the data may be kept in addition to the vector coverage. Many systems
provide from quick conversion from vector to raster.
The issue of scale is often raised in relation
to GIS data base development. It is important to remember that data stored
in a GIS does not have a scale. Sometime people refer to a 1:25000 scale
data base. What they mean is that the data has been taken from 1:25000
maps or that it has a level of accuracy which is roughly equivalent to
that found on 1:25000 scale maps.
In line with the principle of keeping the most
information possible the ideal is to fill the data base with data with
accuracies equivalent to very large scale maps. This however may not always
be practical as:
the data may not be available at very large
scale,
it may be too expensive or time consuming
to digitise from that scale ,
there may be no envisioned application that requires that accuracy;
so compromises are made.
Problems can arise when some of the data in a GIS
is very accurate (drawn from large scale mapping – e.g. urban utilities)
and other data is drawn from much smaller scale mapping (e.g. soils).
In this case great care has to be taken that conclusions are not drawn
on the basis of the less reliable data.
There
are several methods used for entering spatial
data into a GIS, including:
manual
digitising and scanning of analogue maps
image
data input and conversion to a GIS
direct
data entry including global positioning systems (GPS)
transfer
of data from existing digital sources
maps:
scale, resolution, accuracy
At each stage of data input there should be data verification
should occur to ensure that the resulting database is as error free as
possible.
When developing a raster data
set for specific purposes there are a number of design
considerations. These include:
the physical extent of the data base
the resolution (grid size)
the themes to be included
the classifications to be used within the themes
the appropriateness of scale of input data
to the preferred grid size
Physical
Extent
Should the data base cover only the area being planned or managed? OR
Are there external influences (upstream catchment, major transport corridors,
nearby population centres, views to adjacent scenery) which need to be
incorporated as part of the planning data base.
Resolution
The higher the resolution the better the approximation of reality - provided
the data is good enough to support this resolution. If you assume a certain
error in map preparation and digitisation then this translates to a certain
on ground error. There is little point in making the grid size smaller
than the probable error. A smaller than necessary grid size leads to larger
files and longer processing times. In the best case halving the cell size
will quadruple the processing time. Experience with raster processing
suggests that more than 2 million cells is excessive in most contexts.
"The size of the pixel must be half of the smallest distance to be represented"
Star and Estes (1990)
Themes
A lot of time and money can be wasted by seeking to build a data base
which incorporates all known information about an area. It is first appropriate
to determine what questions will the GIS be required to answer and what
data is needed to answer those questions. For example, while geological
maps of an area may be available they may be of no relevance to the specific
decision process.
Classifications
What numbers are to be attached to the grid
cells and what will these numbers mean. They may refer to data which is:
Nominal/
Categorical
Ordinal
Interval
Ratio
It is important to know that the range of analytical or modelling operations
which are available may be limited by the type of data measure (scale)
being used.
Follow
the following links to find out more about each spatial data collection
method:
manual
digitising and scanning of analogue maps
image
data input and conversion to a GIS
direct
data entry including global positioning systems (GPS)
transfer
of data from existing digital sources
maps:
scale, resolution, accuracy
|