Optimizing feature iteration with QgsVectorLayer.getFeatures
Direct Answer: Optimizing feature iteration with QgsVectorLayer.getFeatures requires constraining data retrieval at the API level before iteration begins.…
Direct Answer: Optimizing feature iteration with QgsVectorLayer.getFeatures requires constraining data retrieval at the API level before iteration begins. Never call layer.getFeatures() without a QgsFeatureRequest. Instead, instantiate QgsFeatureRequest to explicitly disable geometry loading when unnecessary (setNoGeometry()), restrict attribute fetching to only required fields (setSubsetOfAttributes()), and apply spatial or expression filters upfront (setFilterRect() or setFilterExpression()). Combine this with QGIS’s internal QgsSpatialIndex for proximity queries, and process features in a single pass using Python generators to avoid materializing full feature lists in memory. This approach reduces I/O overhead, bypasses Python object creation for unused data, and aligns with the underlying C++ data pipeline.
Production-Ready Iteration Pattern
from qgis.core import QgsVectorLayer, QgsFeatureRequest, QgsRectangle
def stream_field_values(layer: QgsVectorLayer, target_field: str, search_rect: QgsRectangle = None):
"""
Memory-efficient feature iteration for QGIS 3.10+ plugins and scripts.
Yields attribute values without loading geometries or unused fields into RAM.
"""
if not layer.isValid():
raise ValueError(f"Layer '{layer.name()}' is invalid or not loaded.")
# Resolve field index safely (handles aliases and case-insensitivity)
field_idx = layer.fields().lookupField(target_field)
if field_idx == -1:
raise KeyError(f"Field '{target_field}' not found in layer.")
# 1. Build a constrained request
request = QgsFeatureRequest()
request.setNoGeometry() # Skip WKB parsing and coordinate transformation
request.setSubsetOfAttributes([field_idx]) # Fetch only the target column
# 2. Apply spatial filter if a bounding box is provided
if search_rect:
request.setFilterRect(search_rect)
# 3. Stream features via generator
for feat in layer.getFeatures(request):
yield feat.attribute(field_idx)
# Usage:
# layer = QgsProject.instance().mapLayersByName("parcels")[0]
# rect = QgsRectangle(10.0, 45.0, 12.0, 47.0)
# for code in stream_field_values(layer, "land_use_code", rect):
# process(code)
Core Optimization Strategies
flowchart TD
A["layer.getFeatures(request)"] --> B["setNoGeometry() — skip WKB decode"]
B --> C["setSubsetOfAttributes() — fewer columns"]
C --> D["setFilterRect / setFilterExpression — push filter to provider"]
D --> E["Stream one feature at a time via generator"]
Skip Unnecessary Geometry
Geometry parsing is the most expensive operation in QgsVectorLayer iteration. Even if you only need tabular data, QGIS will decode WKB strings, apply CRS transformations, and instantiate QgsGeometry objects by default. Calling request.setNoGeometry() instructs the provider to bypass this entirely. For attribute-heavy workflows (e.g., CSV exports, statistical summaries, or database joins), this alone can reduce iteration time by 40–70%.
Subset Attributes at the Source
Fetching all columns when you only need one or two forces the provider to read unnecessary disk pages or network payloads. request.setSubsetOfAttributes([idx]) pushes a column projection down to the data provider. This is especially critical for wide tables (50+ fields) or remote databases where network latency dominates. The PyQGIS Core Architecture & Data Handling documentation details how attribute requests map to underlying provider capabilities.
Push Filters Down to the Provider
Never iterate and filter in Python when the provider can do it natively. Use setFilterRect() for bounding-box queries or setFilterExpression() for SQL-like conditions. When you pass an expression, QGIS compiles it to the provider’s native query language (e.g., SQLite for GeoPackage, SQL for PostGIS, OGC Filter for WFS). This avoids transferring irrelevant features across the C++/Python boundary. For complex logic, consult the official QgsFeatureRequest API reference to verify expression compatibility with your data source.
Stream with Generators
Materializing list(layer.getFeatures(request)) forces QGIS to instantiate every matching QgsFeature object simultaneously. This spikes memory usage and triggers garbage collection pauses. Wrapping the iteration in a Python generator (as shown above) yields one feature at a time, keeping the working set constant regardless of dataset size. Generators also integrate cleanly with itertools for batch processing, chunking, or early-exit conditions. See Python’s functional programming guide for best practices on lazy evaluation.
Integrating QgsSpatialIndex for Proximity Queries
When your workflow requires nearest-neighbor searches, radius queries, or spatial joins, iterating with setFilterRect() alone is inefficient for point clouds or dense polygons. Build a QgsSpatialIndex once, then query it before fetching features:
from qgis.core import QgsSpatialIndex, QgsPointXY
index = QgsSpatialIndex(layer.getFeatures(QgsFeatureRequest().setNoAttributes()))
nearest_ids = index.nearestNeighbor(QgsPointXY(11.5, 46.2), neighbors=5)
# Fetch only the matched features
request = QgsFeatureRequest().setFilterFids(nearest_ids)
for feat in layer.getFeatures(request):
# Process only the 5 nearest records
process(feat)
The spatial index lives in memory and uses an R-tree structure, making proximity lookups instead of . This pattern is foundational when designing Vector and Raster Data Access Patterns for high-throughput QGIS plugins.
Provider Constraints & Edge Cases
Not all data providers honor request constraints equally. Understanding provider capabilities prevents silent performance degradation:
- GeoPackage & Shapefile: Fully respect
setNoGeometry(),setSubsetOfAttributes(), andsetFilterRect(). Shapefiles will still read the.dbfheader, but column projection works reliably. - PostGIS / Spatialite: Push filters and attribute subsets directly to SQL queries. Performance gains are maximal here.
- WFS / Remote Services: May ignore
setSubsetOfAttributes()depending on server configuration. Some OGC endpoints fetch full feature collections regardless of client-side requests. Verify capabilities vialayer.dataProvider().capabilities() & QgsVectorDataProvider.FilterFeaturesandQgsVectorDataProvider.SelectSubsetOfAttributes. - Virtual Layers / Memory Layers: Constraints are applied in-memory. While fast, they lack disk-level I/O optimization and should be used sparingly for large datasets.
Always wrap provider-dependent logic in capability checks. If a provider lacks subset support, fall back to Python-side filtering but log a warning to alert users of potential memory overhead.
Performance Checklist
Before deploying a QGIS automation script or plugin, verify these iteration baselines:
-
QgsFeatureRequestis instantiated; rawgetFeatures() -
setNoGeometry() -
setSubsetOfAttributes() - Filters (
setFilterRect,setFilterExpression, orsetFilterFids - Iteration uses generators or explicit
forloops;list()or[]
Following this pattern ensures your code scales from hundreds to millions of features without linear memory growth, aligns with QGIS’s C++ data pipeline, and maintains responsiveness in both standalone scripts and interactive plugin environments.