Survey Sampling, Frames & Weighting

Sampling Fundamentals

Survey sampling is the process of selecting a subset of individuals from a target population to estimate characteristics of the whole population. Proper sampling enables researchers to make statistical inferences about populations of millions from samples of hundreds or thousands, with quantifiable margins of error.

Why Sampling Matters

Census surveys (measuring every member of a population) are expensive, time-consuming, and often impractical. Sampling reduces costs by 95-99% while maintaining statistical precision. A well-designed sample of 1,000 can estimate national opinion within ±3% margin of error.

Probability vs Non-Probability Sampling

Characteristic	Probability Sampling	Non-Probability Sampling
Selection	Random, known probabilities	Non-random (convenience, quota, volunteer)
Representativeness	Statistically guaranteed	Depends on assumptions
Margin of Error	Calculable	Not calculable (unknown bias)
Cost	High ($50-$200 per complete)	Low ($2-$50 per complete)
Use Cases	Election polls, government surveys	Market research, customer feedback

Sampling Frames

A sampling frame is a list or mechanism that defines all members of the target population from which the sample will be drawn. Frame quality directly determines sample quality.

Types of Sampling Frames

Best for: Face-to-face (CAPI) surveys, door-to-door canvassing

Frame Sources:

Census blocks/tracts: Geographic units from national census
Enumeration areas: Administrative boundaries
GPS coordinate grids: Latitude/longitude points
Satellite imagery: Building footprints from aerial photos

PollZapper AtlasSampler™:

Generate GPS waypoints using:

Import census boundaries (GeoJSON/Shapefile)
Add population density layers
Select sampling method (PPS, systematic, random)
Generate N waypoints with selection probabilities
Export routes for CAPI interviewers

Best for: Mail surveys, mixed-mode (mail + web)

Frame Sources:

USPS Delivery Sequence File: 95%+ coverage of U.S. addresses
Postal code databases: National postal address lists
Property tax records: Government assessment files

Advantages:

High coverage (includes cell-only households)
Known selection probabilities
No phone number required

Challenges:

Expensive ($500-$2,000 per 1,000 addresses)
Requires multiple contact modes (mail invitation → web survey)
Low response rates (5-15% typical)

Best for: CATI (telephone surveys)

RDD (Random Digit Dialing):

Generate phone numbers by:

Identify active area codes and exchanges
Randomly generate last 4 digits
Filter business numbers and non-working numbers

Coverage Issues:

Landline-only RDD: <50% household coverage (declining)
Cell phone RDD: Legal restrictions, higher costs
Dual frame (landline + cell): 85-95% coverage, complex weighting

Best for: Specialized populations with existing lists

Examples:

Voter registration files: Election polling (90%+ coverage of likely voters)
Professional associations: Doctors, lawyers, teachers
Customer databases: Purchase history, loyalty programs
Student directories: University/school enrollment

Quality Considerations:

Coverage: What % of population is on the list?
Currency: How recent is the list? (>6 months = outdated)
Accuracy: Incorrect addresses/phones reduce response rates

Sampling Methods

Simple Random Sampling (SRS)

How it works: Every member of the population has an equal probability of selection. Assign numbers 1-N to all population members, then use random number generator to select sample.

Advantages:

Easiest to understand and explain
Unbiased estimates
Simple statistical formulas

Disadvantages:

Requires complete population list
May miss rare subgroups
Inefficient for heterogeneous populations

Stratified Random Sampling

How it works: Divide population into strata (age groups, regions, etc.), then random sample within each stratum. Ensures representation of all groups.

Example:

Region	Population %	Sample (n=1,000)
Northeast	17%	170
South	38%	380
Midwest	21%	210
West	24%	240

When to Use:

Need guaranteed representation of subgroups
Population has known heterogeneity
Subgroup comparisons are research objectives

Cluster Sampling

How it works: Divide population into clusters (schools, neighborhoods), randomly select clusters, then survey all or sample within selected clusters.

Multi-Stage Example (National Survey):

Stage 1: Randomly select 50 counties (PSUs = Primary Sampling Units)
Stage 2: Within each county, select 10 census blocks
Stage 3: Within each block, select 5 households
Result: 50 × 10 × 5 = 2,500 households sampled

Design Effect:

Cluster sampling increases sampling error vs SRS. Design effect (Deff) = 1.5-3.0 typical. Effective n = actual n ÷ Deff. Example: n=1,000 with Deff=2.0 has effective n=500, so MOE = ±4.4% instead of ±3.1%.

Quota Sampling (Non-Probability)

How it works: Set quotas for demographic groups (age, gender, region), then fill quotas through convenience sampling until targets met.

Warning: Quota sampling is non-probability. Cannot calculate true margin of error. Requires post-stratification weighting to approximate representativeness.

When Used:

Online panel surveys (opt-in respondents)
Market research (speed over precision)
Low budget studies

Quality Controls:

Interlocking quotas (age × gender × region)
Source diversity (multiple panel providers)
Weighting to known population benchmarks

Sample Size Determination

Basic Formula

n = (Z² × p × (1-p)) / E²

Where:

n = required sample size
Z = Z-score for confidence level (1.96 for 95% confidence)
p = estimated proportion (use 0.5 for maximum variance)
E = desired margin of error (0.03 for ±3%)

Common Sample Sizes

Margin of Error	95% Confidence	99% Confidence	Typical Use
±1%	9,604	16,590	Government census surveys
±2%	2,401	4,148	Large-scale tracking studies
±3%	1,068	1,843	National political polls
±4%	601	1,037	Regional surveys
±5%	385	664	Statewide polls
±10%	97	166	Pilot studies

Important Considerations:

Response rate: If 30% response rate, need to contact 3,000 to get 1,000 completes
Subgroup analysis: Need 400+ per subgroup for ±5% MOE within groups
Design effects: Multiply n by Deff for cluster/stratified samples
Population size: Finite population correction applies for small populations

Statistical Weighting

Weighting adjusts sample data to match known population characteristics, correcting for sampling and non-response bias.

Why Weight?

Unequal selection probabilities: Some groups oversampled (by design)
Differential non-response: Young people less likely to respond
Frame coverage errors: Cell-only households underrepresented in landline samples

Post-Stratification Weighting

How it works: Compare sample demographics to known population benchmarks (census data), then calculate weights to adjust.

Example:

Age Group	Population %	Sample %	Weight
18-29	25%	15%	1.67
30-49	35%	30%	1.17
50-64	25%	35%	0.71
65+	15%	20%	0.75

Weight = Population % ÷ Sample %. Young respondents count 1.67x, seniors count 0.75x.

Raking (Iterative Proportional Fitting)

Purpose: Weight to multiple variables simultaneously (age, gender, race, education, region).

Algorithm:

Weight to age distribution
Weight to gender distribution (age weights still applied)
Weight to race distribution
Weight to education distribution
Repeat cycles until convergence (marginals match targets)

PollZapper: Built-in raking calculator with automatic convergence (typically 5-10 iterations).

Weight Trimming

Why Trim?

Extreme weights (>5) increase variance and give too much influence to individual respondents.

Trimming approaches:

Cap weights: Maximum weight = 3 or 5
Winsorize: Set extreme weights to 95th percentile
Prune sample: Remove respondents with weights >5 (not recommended)

AtlasSampler™ in PollZapper

Geographic Frame Builder

AtlasSampler™ enables researchers to generate statistically valid geographic sampling frames for CAPI surveys without GIS expertise or expensive consultants.

Frame Sources

Census enumeration areas
WorldPop population density grids
OpenStreetMap building footprints
Custom GeoJSON/Shapefile uploads
Admin boundaries (countries/provinces)

Sampling Methods

PPS: Probability proportional to size (population)
Systematic: Grid-based with random start
Stratified: By district/urban-rural
Cluster: Multi-stage area sampling

Territory Management

Assign waypoints to interviewers
Optimize routes to minimize travel
Balance workload across team
Track completion by territory

Built-in Weighting

Upload population benchmarks (CSV)
Raking algorithm with convergence
Automatic design effect calculation
Weight trimming options
Export weighted SPSS/Stata files

Ready to Build Representative Samples?

PollZapper's AtlasSampler™ and weighting tools enable professional sampling without GIS consultants or statistical software. From 400-interview local surveys to 5,000-interview national studies.

Join the Waitlist Explore More Resources

Survey Sampling, Frames & Weighting

Contents

Sampling Fundamentals

Why Sampling Matters

Probability vs Non-Probability Sampling

Sampling Frames

Types of Sampling Frames

1. Geographic Frames (Area Sampling)

Frame Sources:

PollZapper AtlasSampler™:

2. Address-Based Sampling (ABS)

Frame Sources:

Advantages:

Challenges:

3. Telephone Frames (RDD)

RDD (Random Digit Dialing):

Coverage Issues:

4. List Frames

Examples:

Quality Considerations:

Sampling Methods

Simple Random Sampling (SRS)

Advantages:

Disadvantages:

Stratified Random Sampling

Example:

When to Use:

Cluster Sampling

Multi-Stage Example (National Survey):

Design Effect:

Quota Sampling (Non-Probability)

When Used:

Quality Controls:

Sample Size Determination

Basic Formula

Common Sample Sizes

Important Considerations:

Statistical Weighting

Why Weight?

Post-Stratification Weighting

Example:

Raking (Iterative Proportional Fitting)

Algorithm:

Weight Trimming

Why Trim?

AtlasSampler™ in PollZapper

Geographic Frame Builder

Frame Sources

Sampling Methods

Territory Management

Built-in Weighting

Ready to Build Representative Samples?

PollZapper.com

Get In Touch

Quick Links

Regions