FilterContext
Summary​
FilterContext is passed as an argument to the data source function when supports_time_filtering is set to True. Using these parameters enables optimized query patterns for improved performance:Â
- The method
<data source>.get_dataframe()can be invoked with the argumentsstart_timeorend_time. - When defining a Feature View, a FilteredSource can be paired with a Data Source. The Feature View will then pass FilterContext into the Data Source Function
Â
Note that Data Source Functions are expected to implement their own filtering logic.
Example
from tecton import spark_batch_configfrom pyspark.sql.functions import col@spark_batch_config(supports_time_filtering=True)def hive_data_source_function(spark, filter_context):spark.sql(f"USE {hive_db_name}")df = spark.table(user_hive_table)ts_column = "timestamp"# Data Source Function handles its own filtering logic hereif filter_context:if filter_context.start_time:df = df.where(col(ts_column) >= filter_context.start_time)if filter_context.end_time:df = df.where(col(ts_column) < filter_context.end_time)return df
Methods​
__init__(...)​
Parameters
start_time(Optional[datetime.datetime]) - If specified, data source will only include items with timestamp column >= start_timeend_time(Optional[datetime.datetime]) - If specified, data source will only include items with timestamp column < end_time