60 Atlantic Avenue, Toronto

Survey Sampling, Frames & Weighting

The Complete Guide to Representative Sampling

Sampling Fundamentals

Survey sampling is the process of selecting a subset of individuals from a target population to estimate characteristics of the whole population. Proper sampling enables researchers to make statistical inferences about populations of millions from samples of hundreds or thousands, with quantifiable margins of error.

Why Sampling Matters

Census surveys (measuring every member of a population) are expensive, time-consuming, and often impractical. Sampling reduces costs by 95-99% while maintaining statistical precision. A well-designed sample of 1,000 can estimate national opinion within ±3% margin of error.

Probability vs Non-Probability Sampling

Characteristic Probability Sampling Non-Probability Sampling
Selection Random, known probabilities Non-random (convenience, quota, volunteer)
Representativeness Statistically guaranteed Depends on assumptions
Margin of Error Calculable Not calculable (unknown bias)
Cost High ($50-$200 per complete) Low ($2-$50 per complete)
Use Cases Election polls, government surveys Market research, customer feedback

Sampling Frames

A sampling frame is a list or mechanism that defines all members of the target population from which the sample will be drawn. Frame quality directly determines sample quality.

Types of Sampling Frames

Best for: Face-to-face (CAPI) surveys, door-to-door canvassing

Frame Sources:
  • Census blocks/tracts: Geographic units from national census
  • Enumeration areas: Administrative boundaries
  • GPS coordinate grids: Latitude/longitude points
  • Satellite imagery: Building footprints from aerial photos
PollZapper AtlasSamplerâ„¢:

Generate GPS waypoints using:

  1. Import census boundaries (GeoJSON/Shapefile)
  2. Add population density layers
  3. Select sampling method (PPS, systematic, random)
  4. Generate N waypoints with selection probabilities
  5. Export routes for CAPI interviewers

Best for: Mail surveys, mixed-mode (mail + web)

Frame Sources:
  • USPS Delivery Sequence File: 95%+ coverage of U.S. addresses
  • Postal code databases: National postal address lists
  • Property tax records: Government assessment files
Advantages:
  • High coverage (includes cell-only households)
  • Known selection probabilities
  • No phone number required
Challenges:
  • Expensive ($500-$2,000 per 1,000 addresses)
  • Requires multiple contact modes (mail invitation → web survey)
  • Low response rates (5-15% typical)

Best for: CATI (telephone surveys)

RDD (Random Digit Dialing):

Generate phone numbers by:

  1. Identify active area codes and exchanges
  2. Randomly generate last 4 digits
  3. Filter business numbers and non-working numbers
Coverage Issues:
  • Landline-only RDD: <50% household coverage (declining)
  • Cell phone RDD: Legal restrictions, higher costs
  • Dual frame (landline + cell): 85-95% coverage, complex weighting

Best for: Specialized populations with existing lists

Examples:
  • Voter registration files: Election polling (90%+ coverage of likely voters)
  • Professional associations: Doctors, lawyers, teachers
  • Customer databases: Purchase history, loyalty programs
  • Student directories: University/school enrollment
Quality Considerations:
  • Coverage: What % of population is on the list?
  • Currency: How recent is the list? (>6 months = outdated)
  • Accuracy: Incorrect addresses/phones reduce response rates

Sampling Methods

Simple Random Sampling (SRS)

How it works: Every member of the population has an equal probability of selection. Assign numbers 1-N to all population members, then use random number generator to select sample.

Advantages:
  • Easiest to understand and explain
  • Unbiased estimates
  • Simple statistical formulas
Disadvantages:
  • Requires complete population list
  • May miss rare subgroups
  • Inefficient for heterogeneous populations

Stratified Random Sampling

How it works: Divide population into strata (age groups, regions, etc.), then random sample within each stratum. Ensures representation of all groups.

Example:
Region Population % Sample (n=1,000)
Northeast17%170
South38%380
Midwest21%210
West24%240
When to Use:
  • Need guaranteed representation of subgroups
  • Population has known heterogeneity
  • Subgroup comparisons are research objectives

Cluster Sampling

How it works: Divide population into clusters (schools, neighborhoods), randomly select clusters, then survey all or sample within selected clusters.

Multi-Stage Example (National Survey):
  1. Stage 1: Randomly select 50 counties (PSUs = Primary Sampling Units)
  2. Stage 2: Within each county, select 10 census blocks
  3. Stage 3: Within each block, select 5 households
  4. Result: 50 × 10 × 5 = 2,500 households sampled
Design Effect:

Cluster sampling increases sampling error vs SRS. Design effect (Deff) = 1.5-3.0 typical. Effective n = actual n ÷ Deff. Example: n=1,000 with Deff=2.0 has effective n=500, so MOE = ±4.4% instead of ±3.1%.

Quota Sampling (Non-Probability)

How it works: Set quotas for demographic groups (age, gender, region), then fill quotas through convenience sampling until targets met.

Warning: Quota sampling is non-probability. Cannot calculate true margin of error. Requires post-stratification weighting to approximate representativeness.

When Used:
  • Online panel surveys (opt-in respondents)
  • Market research (speed over precision)
  • Low budget studies
Quality Controls:
  • Interlocking quotas (age × gender × region)
  • Source diversity (multiple panel providers)
  • Weighting to known population benchmarks

Sample Size Determination

Basic Formula

n = (Z² × p × (1-p)) / E²

Where:

  • n = required sample size
  • Z = Z-score for confidence level (1.96 for 95% confidence)
  • p = estimated proportion (use 0.5 for maximum variance)
  • E = desired margin of error (0.03 for ±3%)

Common Sample Sizes

Margin of Error 95% Confidence 99% Confidence Typical Use
±1% 9,604 16,590 Government census surveys
±2% 2,401 4,148 Large-scale tracking studies
±3% 1,068 1,843 National political polls
±4% 601 1,037 Regional surveys
±5% 385 664 Statewide polls
±10% 97 166 Pilot studies
Important Considerations:
  • Response rate: If 30% response rate, need to contact 3,000 to get 1,000 completes
  • Subgroup analysis: Need 400+ per subgroup for ±5% MOE within groups
  • Design effects: Multiply n by Deff for cluster/stratified samples
  • Population size: Finite population correction applies for small populations

Statistical Weighting

Weighting adjusts sample data to match known population characteristics, correcting for sampling and non-response bias.

Why Weight?

  • Unequal selection probabilities: Some groups oversampled (by design)
  • Differential non-response: Young people less likely to respond
  • Frame coverage errors: Cell-only households underrepresented in landline samples

Post-Stratification Weighting

How it works: Compare sample demographics to known population benchmarks (census data), then calculate weights to adjust.

Example:
Age Group Population % Sample % Weight
18-29 25% 15% 1.67
30-49 35% 30% 1.17
50-64 25% 35% 0.71
65+ 15% 20% 0.75

Weight = Population % ÷ Sample %. Young respondents count 1.67x, seniors count 0.75x.

Raking (Iterative Proportional Fitting)

Purpose: Weight to multiple variables simultaneously (age, gender, race, education, region).

Algorithm:
  1. Weight to age distribution
  2. Weight to gender distribution (age weights still applied)
  3. Weight to race distribution
  4. Weight to education distribution
  5. Repeat cycles until convergence (marginals match targets)

PollZapper: Built-in raking calculator with automatic convergence (typically 5-10 iterations).

Weight Trimming

Why Trim?

Extreme weights (>5) increase variance and give too much influence to individual respondents.

Trimming approaches:

  • Cap weights: Maximum weight = 3 or 5
  • Winsorize: Set extreme weights to 95th percentile
  • Prune sample: Remove respondents with weights >5 (not recommended)

AtlasSamplerâ„¢ in PollZapper

Geographic Frame Builder

AtlasSamplerâ„¢ enables researchers to generate statistically valid geographic sampling frames for CAPI surveys without GIS expertise or expensive consultants.

Frame Sources
  • Census enumeration areas
  • WorldPop population density grids
  • OpenStreetMap building footprints
  • Custom GeoJSON/Shapefile uploads
  • Admin boundaries (countries/provinces)
Sampling Methods
  • PPS: Probability proportional to size (population)
  • Systematic: Grid-based with random start
  • Stratified: By district/urban-rural
  • Cluster: Multi-stage area sampling
Territory Management
  • Assign waypoints to interviewers
  • Optimize routes to minimize travel
  • Balance workload across team
  • Track completion by territory
Built-in Weighting
  • Upload population benchmarks (CSV)
  • Raking algorithm with convergence
  • Automatic design effect calculation
  • Weight trimming options
  • Export weighted SPSS/Stata files

Ready to Build Representative Samples?

PollZapper's AtlasSamplerâ„¢ and weighting tools enable professional sampling without GIS consultants or statistical software. From 400-interview local surveys to 5,000-interview national studies.

Join the Waitlist Explore More Resources

©2024 - PollZapper.com. All Rights Reserved.