Optimizing feature iteration with QgsVectorLayer.getFeatures

Direct Answer: Optimizing feature iteration with QgsVectorLayer.getFeatures requires constraining data retrieval at the API level before iteration begins.…

Direct Answer: Optimizing feature iteration with QgsVectorLayer.getFeatures requires constraining data retrieval at the API level before iteration begins. Never call layer.getFeatures() without a QgsFeatureRequest. Instead, instantiate QgsFeatureRequest to explicitly disable geometry loading when unnecessary (setNoGeometry()), restrict attribute fetching to only required fields (setSubsetOfAttributes()), and apply spatial or expression filters upfront (setFilterRect() or setFilterExpression()). Combine this with QGIS’s internal QgsSpatialIndex for proximity queries, and process features in a single pass using Python generators to avoid materializing full feature lists in memory. This approach reduces I/O overhead, bypasses Python object creation for unused data, and aligns with the underlying C++ data pipeline.

Production-Ready Iteration Pattern

python
from qgis.core import QgsVectorLayer, QgsFeatureRequest, QgsRectangle

def stream_field_values(layer: QgsVectorLayer, target_field: str, search_rect: QgsRectangle = None):
    """
    Memory-efficient feature iteration for QGIS 3.10+ plugins and scripts.
    Yields attribute values without loading geometries or unused fields into RAM.
    """
    if not layer.isValid():
        raise ValueError(f"Layer '{layer.name()}' is invalid or not loaded.")

    # Resolve field index safely (handles aliases and case-insensitivity)
    field_idx = layer.fields().lookupField(target_field)
    if field_idx == -1:
        raise KeyError(f"Field '{target_field}' not found in layer.")

    # 1. Build a constrained request
    request = QgsFeatureRequest()
    request.setNoGeometry()  # Skip WKB parsing and coordinate transformation
    request.setSubsetOfAttributes([field_idx])  # Fetch only the target column

    # 2. Apply spatial filter if a bounding box is provided
    if search_rect:
        request.setFilterRect(search_rect)

    # 3. Stream features via generator
    for feat in layer.getFeatures(request):
        yield feat.attribute(field_idx)

# Usage:
# layer = QgsProject.instance().mapLayersByName("parcels")[0]
# rect = QgsRectangle(10.0, 45.0, 12.0, 47.0)
# for code in stream_field_values(layer, "land_use_code", rect):
#     process(code)

Core Optimization Strategies

flowchart TD
    A["layer.getFeatures(request)"] --> B["setNoGeometry() — skip WKB decode"]
    B --> C["setSubsetOfAttributes() — fewer columns"]
    C --> D["setFilterRect / setFilterExpression — push filter to provider"]
    D --> E["Stream one feature at a time via generator"]

Skip Unnecessary Geometry

Geometry parsing is the most expensive operation in QgsVectorLayer iteration. Even if you only need tabular data, QGIS will decode WKB strings, apply CRS transformations, and instantiate QgsGeometry objects by default. Calling request.setNoGeometry() instructs the provider to bypass this entirely. For attribute-heavy workflows (e.g., CSV exports, statistical summaries, or database joins), this alone can reduce iteration time by 40–70%.

Subset Attributes at the Source

Fetching all columns when you only need one or two forces the provider to read unnecessary disk pages or network payloads. request.setSubsetOfAttributes([idx]) pushes a column projection down to the data provider. This is especially critical for wide tables (50+ fields) or remote databases where network latency dominates. The PyQGIS Core Architecture & Data Handling documentation details how attribute requests map to underlying provider capabilities.

Push Filters Down to the Provider

Never iterate and filter in Python when the provider can do it natively. Use setFilterRect() for bounding-box queries or setFilterExpression() for SQL-like conditions. When you pass an expression, QGIS compiles it to the provider’s native query language (e.g., SQLite for GeoPackage, SQL for PostGIS, OGC Filter for WFS). This avoids transferring irrelevant features across the C++/Python boundary. For complex logic, consult the official QgsFeatureRequest API reference to verify expression compatibility with your data source.

Stream with Generators

Materializing list(layer.getFeatures(request)) forces QGIS to instantiate every matching QgsFeature object simultaneously. This spikes memory usage and triggers garbage collection pauses. Wrapping the iteration in a Python generator (as shown above) yields one feature at a time, keeping the working set constant regardless of dataset size. Generators also integrate cleanly with itertools for batch processing, chunking, or early-exit conditions. See Python’s functional programming guide for best practices on lazy evaluation.

Integrating QgsSpatialIndex for Proximity Queries

When your workflow requires nearest-neighbor searches, radius queries, or spatial joins, iterating with setFilterRect() alone is inefficient for point clouds or dense polygons. Build a QgsSpatialIndex once, then query it before fetching features:

python
from qgis.core import QgsSpatialIndex, QgsPointXY

index = QgsSpatialIndex(layer.getFeatures(QgsFeatureRequest().setNoAttributes()))
nearest_ids = index.nearestNeighbor(QgsPointXY(11.5, 46.2), neighbors=5)

# Fetch only the matched features
request = QgsFeatureRequest().setFilterFids(nearest_ids)
for feat in layer.getFeatures(request):
    # Process only the 5 nearest records
    process(feat)

The spatial index lives in memory and uses an R-tree structure, making proximity lookups instead of . This pattern is foundational when designing Vector and Raster Data Access Patterns for high-throughput QGIS plugins.

Provider Constraints & Edge Cases

Not all data providers honor request constraints equally. Understanding provider capabilities prevents silent performance degradation:

  • GeoPackage & Shapefile: Fully respect setNoGeometry(), setSubsetOfAttributes(), and setFilterRect(). Shapefiles will still read the .dbf header, but column projection works reliably.
  • PostGIS / Spatialite: Push filters and attribute subsets directly to SQL queries. Performance gains are maximal here.
  • WFS / Remote Services: May ignore setSubsetOfAttributes() depending on server configuration. Some OGC endpoints fetch full feature collections regardless of client-side requests. Verify capabilities via layer.dataProvider().capabilities() & QgsVectorDataProvider.FilterFeatures and QgsVectorDataProvider.SelectSubsetOfAttributes.
  • Virtual Layers / Memory Layers: Constraints are applied in-memory. While fast, they lack disk-level I/O optimization and should be used sparingly for large datasets.

Always wrap provider-dependent logic in capability checks. If a provider lacks subset support, fall back to Python-side filtering but log a warning to alert users of potential memory overhead.

Performance Checklist

Before deploying a QGIS automation script or plugin, verify these iteration baselines:

  • QgsFeatureRequest is instantiated; raw getFeatures()
  • setNoGeometry()
  • setSubsetOfAttributes()
  • Filters (setFilterRect, setFilterExpression, or setFilterFids
  • Iteration uses generators or explicit for loops; list() or []

Following this pattern ensures your code scales from hundreds to millions of features without linear memory growth, aligns with QGIS’s C++ data pipeline, and maintains responsiveness in both standalone scripts and interactive plugin environments.