The science behind automated valuation models
Commercial real estate development and operation contributed $1.0 trillion to U.S. GDP in 2018 alone. The economy is greatly influenced by decisions made in the commercial real estate sector. Detailed explanations for property valuations are fundamental, non-negotiable building blocks for CRE professionals.
At Geophy, a part of the data science work we do includes building Automated Valuation Models (AVMs) that predict multifamily transaction prices. A user enters an address into the interface and an application returns a predicted value along with an assortment of contextual information the model used to make the most accurate sales price projection possible in under 10 seconds.
Addressing our users’ “need for speed” (and precision) has been a huge motivator for the technical teams. But now, we’d like to explain the data science (the why and how) behind our model, because we know it’s not enough to just offer a predicted value.
GeoPhy’s Value Drivers help users understand which factors influence predicted value.
Value Drivers are variables we use to train our model. We calculate them from market trends, local housing statistics, property-level information, and neighborhood characteristics.
A user can see the “raw” values of Value Drivers for any property they’re interested in. Next, we feed that property’s Value Drivers into our model to produce a value estimate. Users can then see how each individual Value Driver impacts a property’s value estimate. How do we calculate these impacts? We borrow a concept from game theory called Shapley values.
Shapley values calculate the importance of a feature by comparing what a model predicts with and without the feature[sic]. However, since the order in which a model sees features can affect its predictions, this is done in every possible order, so that the features are fairly compared.
Think of Shap Values for Value Drivers like this
For each property, the Value Drivers act as members of a task force hired for one reason—to estimate a property’s value.
Once the task force produces the value estimate, each member receives a “payout” that’s equivalent to how “important” they were in estimating value.
The members wouldn’t get equal payouts because they did not contribute equally. With a little math, the members of the task force can get an idea about how important they were to the predicted value by taking turns sitting out, one at a time, and re-estimating value many times over.
After each estimate, the task force records the predicted values. Later, the predicted values that a member contributed to are compared to all predicted values to which they did not contribute. The average difference in predicted values represents their importance. If the predicted values produced when one member did participate don’t differ from the predicted values when they did not participate, they weren’t an important member and would receive a small payout.
So, when a user generates a valuation using GeoPhy’s Evra AVM, the front end shows charts of Value Drivers and their Shap values. Value Drivers with longer bars exerted more influence on the valuation—their Shap values indicate that they had a larger impact on a property’s value. In the example below, Property & Income Value Drivers have a negative effect that is greater than the positive effect of Market Value Drivers.
Conclusion
When combined with raw Value Drivers, the Shap values for a property help our users contextualize the valuation produced by our model.
To learn more about Value Drivers, set up an appointment with our team.