Answers to Frequently Asked Questions about Sampling Methods And Area Frame Sampling Technology
General Questions about surveys
Question1: How can statisticians estimate specific characteristics of a population?
Statisticians obtain estimates of population characteristics by collecting data. There are two main methods used to collect data:
- Taking data from every unit in the population of interest (called a census)
- Taking data from a sample. A census takes a great deal of time and costs a great deal of money. Taking a census is not practical and modern survey design has been developed in order to collect data less expensively and in a more timely manner. In addition, sample surveys are frequently more accurate at the national level because quality control of field data collection is better. Whether one is doing a sample or a census, and no matter what kind of sampling units are selected, interviewers must collect data from entities in the sample called reporting units. Reporting units can be people, households, plots of land, or other items.
Question2: How does one select a sample?
There are several procedures used to select samples from a population. One way is to obtain units in an informal way such as interviewing people coming out of a bank or factory. When one samples in an informal way, objectivity and measures of accuracy are compromised and therefore the study loses credibility. Probability sampling is the preferred method.
Before one can undertake probability sampling, the population of interest is divided into sample units (SUs) and probabilities are assigned to the SUs. This list of SUs with probabilities and addresses for each SU is called the sampling frame. The list can be made up of districts, villages, blocks, holdings, households, persons, fields, plots, or other sub-units of a population. Often sample designs have two or more stages of selection where first stage units are listed and sampled. In the selected first stage units, households are listed and a sample of households is selected for interview. The area frame (AF) is a list of land parcels.
The sample should represent all the diversity in the population of interest. Normally, we stratify the population so that we can ensure that the sample is representative. For example, we might list all the land units and select a sample with too many units of one kind and not enough of another kind. Stratification of the land means that we subdivide the population into homogeneous groups. We select samples from each group so that both groups are represented in proportion to the way we want the samples to be selected. Normally, the population is stratified at many levels. For example, we can first divide the land into administrative areas or areas with extensive damage and areas of less damage or farm areas and city areas. These strata are anything that makes sense either to report or to reduce the variability when sampling. In AF sampling, each stratum is divided into area parcels. These are used as sampling units.
In order for the data to be objective, we need to select the sample using random numbers. Random selection allows us to compute sampling errors and to prove the objectivity of our sample. Sampling errors are defined later in this document.
Question3: What types of errors are associated with statistical surveys?
There are two types of errors associated with surveys: sampling errors, which are a result of differences among samples, and nonsampling errors, which are a result of sampling frame errors, response errors, or mistakes in processing the data. Of the two, nonsampling errors are usually more important and more difficult to estimate and control. They are a major concern in any survey. The area frame (AF) is designed to control for both types of errors; however, special emphasis is placed on controlling nonsampling errors.
Question4: What is sampling error?
Sampling error is survey error associated with the sample. The following example illustrates sampling error.
An area of interest is divided into 30,000 segments of land and a random sample of 300 segments is selected. Data are obtained from the 300 segments and an estimate of wheat acreage is produced by multiplying the total wheat acreage found in the 300 segments by 30,000/300 = 100. If another 300 segments were selected, the estimate would be different. The differences in estimates among samples are called sampling errors.
If the estimates vary considerably, then we would conclude that the estimator has a large variance or sampling error. Estimators with small sampling errors are most desirable. However, there is another criterion that also is important, the element of nonsampling error or bias.
Often, one cannot determine sampling errors unless other samples of 300 segments are selected and enumerated. However, with proper sampling techniques the variation can be measured with only one sample. The segment to segment variation is used to calculate the sample to sample variation. In essence sample to sample variation is estimated with only one sample. (This is one reason why use of random numbers to select the sample is so important.) One common measure of sampling error is the coefficient of variation (CV). It is a relative sampling error and is expressed as a percent, such as plus or minus 3 percent. For example, with such a margin of error, one might estimate maize to be 427,000 hectares (plus or minus 3 percent). Margins of error are related to the size of the sample. Smaller samples produce larger margins of error than will larger samples. With a sample size of 100, for example, the CV might be plus or minus 6 percent. In order to reduce the CV to 3 %, a sample of 400 might be required.
Question5: What is nonsampling error (or bias)?
Nonsampling error is the difference between the center of the distribution that defines sampling error and the true value being estimated. This difference is defined as bias. Whether or not the true value being estimated is at the center of the sampling error distribution is controlled by:
- the completeness of the sampling frame,
- every element in the population having a known positive chance of selection,
- accuracy of data collection, recording, and data entry,
- the technical properties of the estimators
Question6: What is an area frame?
An area frame (AF) is a special case of cluster sampling. The SUs are areas of land commonly called segments or area parcels, which have identifiable boundaries. The goal is to divide the entire land area of interest into SUs and to select a sample of such segments. The process of area sampling is usually accomplished by selecting the sample in stages, an approach that avoids the necessity of dividing the entire population into segments. An AF is suitable for general purpose sampling and is designed for obtaining information about variables associated with land such as crops, livestock, forests, soil, and ground water. Households inside the segments can also be associated with land.
At the end of AF construction, a small sample of representative land segments or parcels will be selected for data collection. Typically, land within the selected segments account for less than one percent of the country's total land area. In the US, the sample accounts for one half of one percent of the land.
Question7: How is an AF constructed?
In order to construct an AF, both maps and satellite images of the land area are used. An AF is constructed as follows:
- The total land of the country or area of interest is divided into administrative areas.
- Satellite imagery is used to subdivide the land in administrative strata into land use strata such as cropland, range, woods, estates, cities and wastelands.
- Small sampling units (SUs) are constructed using maps. These SUs are numbered and a small sample of land parcels is selected in each stratum to represent the land in that stratum. The selected SUs are called segments. There are segments selected to represent all strata with land of interest to the data users.
Question8: How do you determine sample size?
The sample size is determined by the uses to be made of the data, survey resources, and survey management resources. The sample can be 150 to 3000 segments. These segments will differ in size from one stratum to the next. For example, in range land, segments may contain 400 hectares or more. In cropland, segments may contain 20 to 30 hectares. In cities, the segments may be less than a city block.
Question9: How are AF methods implemented?
Implementation of AF methods is straight forward:
- Divide the land area of interest into homogeneous farming systems and land use strata.
- Subdivide each stratum into (Ni) units of land without overlap or omission.
- Select a representative sample of (ni) land units called segments from each stratum.
- Collect the desired information from the segments without error; and
- Estimate population totals by multiplying sample totals by the proper expansion factors (Ni/ni).
Question10: What are advantages and disadvantages of AF?
Among sampling frames, the AF has the following advantages:
- The AF is permanent for 10 years or more.
- When employing AF methods, the same segments of land are used for many surveys and thus measure change accurately.
- The AF is useful for environmental data as well as agricultural data.
- Data collection in the field can be checked for quality.
- Data can be collected in an integrated fashion.
- There are several types of advanced methods of data collection that canbe integrated with the AF method that will provide specialized data such as commercial crops and deforestation.
- AF data will withstand professional scrutiny.
- An AF is complete, has no duplication, and facilitates data collection in the field. Disadvantages of AF are few, and are primarily related to the expense of setting up the system. However, over a period of 10 years an AF is more cost-effective than other sampling frames.
Question11: How can AF technology be used to enhance Geographic Information System (GIS)?
GIS technology allows one to establish logical relationships among layers of information digitized and entered into a computer. For example, roads of a country might be overlaid onto soil maps in order to identify areas where produce can be transported easily.
An essential part of a GIS system is the quality of the data entered. For example, some environmental parameters involve observing the invertebrates in the bottom of a stream. Such minute detail must be collected in a scientific manner before it is useful in a GIS. AAIC personnel are concerned with the quality of data going into the GIS. By using an AF, our data help GIS technology reflect more accurate relationships.
Question12: What data can the AF provide?
The AF is designed to collect data from the fields and farmland and from the households that are located inside the segments. For agricultural data, the first survey of each growing season is conducted after planting. Interviewers go to the segments and collect data on the number of hectares planted to each crop. At that time AAIC will recommend collecting data on soil erosion and land use.
Close to harvest, enumerators can go back to these same fields and do crop cutting surveys, which estimate yields and, subsequently, total production. Yield estimates also may be obtained by asking farmers what they harvested (i.e., farmer recall). In addition, we can employ agrometeorological (or agromet) yield models. These models simulate crop growth in the computer. Weather data, rainfall, solar radiation, crop variety, soil fertility, and good historic yield data are required. With several years of data, models are calibrated to conditions in a country and crop yields can be forecasted accurately.
Question13: Since animals are mobile, how do you estimate number of heads of livestock?
One can improve data from an AF for specific variables such as livestock by obtaining additional data through list sampling. When data are collected from both the AF and list frames, and combined removing any duplication, we call this method multiple frame sampling (MFS). Most countries improve data with MFS methods.
Question14: How are large estate farms surveyed?
MFS, discussed in the previous section, is employed when dealing with estate farms or what we in the US refer to as extreme operators or extreme operations. A list of large farms is prepared and a sample selected in order to represent the list. Data are collected for the smaller farms from the AF and these data are combined in a way that avoids duplication. When MFS is employed, the advantages of list frames (efficient for farms on the list) and AF (complete for all items) are reached.
Question15: Can the AF system provide baseline, midterm and end of project data?
An AF can be constructed for a project area that provides baseline, midterm and end of project data. Because permanent plots are established, measures of change are both accurate and cost effective.
Question16: Will the AF help reduce the sample size?
The sample size of a survey is independent of the frame used. The sample size has more to do with the level of summarized data produced. When data are required for the smallest administrative levels then sample sizes are driven high because each administrative unit has a sufficient sample. There are better ways to generate small area estimates that do not require increasing the sample past the point where the sample can be managed properly. The AF could also have a large sample and it would therefore encounter some of the same problems that the current system encounters with respect to data quality and management. However, nonsampling errors are easier to control in AF surveys than in list frames.
Question17: How does satellite imagery improve estimates of crop area?
Satellite imagery has two uses:
- to stratify land into land use strata,
- to improve estimates of land cover. The first use of stratification has already been discussed. This question will deal with the use of digital satellite imagery to improve estimates of land cover and to reduce survey error. The AF is a perfect tool to provide representative ground data for statistical calibration for atmospheric and growth state.
In general, an AF makes estimates of populations based on small samples of segments. Satellite imagery covers an entire population (all sampling units) but the information is not perfect. That is, one has reflected energy at 704 km above sea level for all sampling units in the population. One must use the reflected energy in the satellite imagery in order to reduce the sampling error of the estimates. With AF methods, one has actual ground observations. One uses reflected energy from known fields in order to calibrate the reflected energy. We then classify the entire satellite image. The last step is to evaluate the misclassification of the satellite data using AF ground data and adjust the full frame classification based on misclassification identified in the AF segments where ground truth is available.
Question18: What are the prerequisites of digital satellite imagery?
A few basic prerequisites must be met before a decision is made to use digital data to improve estimates of crop and land cover. If one prerequisite is lacking, the entire effort may be seriously hampered. The prerequisites are the following:
Current digital satellite images must be available for the area of interest. In most areas of the world, obtaining satellite data is not a major problem. Moreover, the new satellite systems provide a wide variety of resolution size and spectral bands that are available to solve problems.
The acquisition of imagery must be timed to correspond with the critical periods in the growing cycles when the spectral values of the crops allow them to be differentiated.
Ground data from an AF must be available.
Sufficient Field Size
Fields of sufficient size of all crops to be classified must be available in the segments.
Sound Statistical Methodology
Using digital data requires sound statistical methodology. The AF method used is one AAIC staff helped develop and one that is used by experienced personnel. Currently it is used in the United States by the National Agricultural Statistics Service and in Europe by the Joint Research Centre of CEC.
General Resources Resources and personnel are required to implement the technology by:
- collecting data in the field and run survey operations
- obtaining digital satellite and AF data.
- performing digital analysis. If these requirements can be met, using digital satellite data to improve crop and land-cover estimates can be rewarding.
Question19: When Does Digital Image Analysis Become Cost Effective?
If the area of interest is rather small (contained within three to five TM scenes) or the land cover of interest is very important, then use of satellite imagery can be cost effective.
Question20: How can a group learn more about AF methods?
One possible approach would be to arrange for AAIC staff members to conduct a seminar at your local office or in DC regarding the methods to estimate:
- crop production,
- forecast yields,
- natural resources and environmental parameters,
- social and economic status,
- land tenure. AAIC's professional staff can present a two hour or two day seminar or discuss individual cases with project managers and program officers. A combination of both is also possible.
Using AF technology, it is possible to develop a standard survey methodology that allows accurate agricultural, natural resource, environmental, and social data to be collected that will enable data to be compared across years and to establish relationships among variables in different sectors.
AF methodology offers objective accurate comprehensive and timely data at a reduced long-term cost. Surveys at the beginning of the project are based on a methodology that can be used again for mid-term and end of project data collection. This is crucial since different survey methods produce different survey results even if the population sampled is identical. In order to have comparable data, survey design and procedures must remain constant. AAIC would select segments at the beginning of the project to represent the project area and use those same segments throughout the project life so that we have a statistically controlled monitoring system.
In many areas of the world there is a great need for better crop and livestock statistics at the national, provincial and project levels. Most of us in AAIC have devoted our lives to these advanced data collection methods that generate timely, comprehensive, objective, credible, replicated (time series) data. You will find our work useful, cost effective, and able to generate data that can be defended scientifically.
Although surveys with specialized purposes are useful, a comprehensive general purpose system will allow project managers to compare data collected across years by different institutions. AAIC would welcome the opportunity to facilitate development of this kind of system in collaboration with donor projects.
For additional questions email us at Questions@aaic.net.