Just how much crime, exactly, is a good thing when considering that next property investment? Believe it or not, game theory techniques like SHAP are able to provide insight into how much crime is “good,” and how much is “too much.”
Machine learning tools now increasingly find a place in property investment-decision making. But all algorithms are not created equal. For example, algorithms like those underlying GeoPhy’s AI-powered valuations use a combination of dynamic data and supervised machine-learning techniques to provide real-time quality benchmarks and valuations across a wide range of property types. These AI-powered valuations provide more than simple property values. Captured within sophisticated modelling reside relationships between building features and location characteristics that inform the difference between “good” and “great” property investment decisions. The task, however, is in identifying those precise influential relationships. That is where Shapley Additive exPlanations — SHAP — makes its debut.
Linear relationships do not reflect the real world
No one wants to visit or work in that building located in an area known for having a “crime problem.” Yet how many of us would prefer trekking out to some obscure location with limited amenities where the biggest action, let alone crime, is the pairing of socks with sandals? We all intuitively realize the relationship of crime to location appeal may not always be linear. That realization holds true when considering other building features and location characteristics when making property valuations.
Yet the typical regression-based automated valuation models (AVMs) seen on the market today take precisely such a linear approach. For those AVMs, the focus primarily lies in how several attributes of a property will concurrently contribute to the value. That approach has led to some rather effective rebuttals on the relative failings of AVMs, and confines them to use principally for single-family residential properties where sufficient comparables are available and the differences between properties identified have straightforward attributes, such as number of bedrooms or floor space. However, the use of these linear models cannot effectively take advantage of the breadth of data now available in the real estate sector.
The GeoPhy AVM throws out those linear assumptions in favour of more robust AI-powered valuations that exploit dynamic data — including demographic, macroeconomic, and “hyperlocal” contextual information — and supervised machine learning techniques to move AVMs from predominant use in low-risk transactions towards more sophisticated valuations for commercial real estate. These AI-powered models have the ability to “learn” as they go through identifying patterns from the data. And what these algorithms learn can be quite useful to those making high-stake property investment and lending decisions. However, as consumers of AVM outputs, not necessarily familiar with the intricacies of non-linear decision tree modelling (a likely understatement), we can use a little help deciphering the lessons these models have to share.
What we can learn from game theory
Based on Shapley values, a technique used in game theory to determine how much each player in a collaborative game has contributed to its success, computer scientist Scott Lundberg developed SHAP (SHapley Additive exPlanations) to help provide a means of interpreting the information bound within these advanced machine learning models. For GeoPhy’s AVM, that means each SHAP value determines the positive or negative weight of how much a property feature — such as crime — contributes to the property valuation. This is a similar idea to feature importance in logistic regression, where we can determine the impact of each feature by looking at the magnitude of its coefficient. However, SHAP values offer two distinct benefits:
- It can be used in tree-based model calculations. So instead of being restricted to simple, linear — and therefore less accurate — regression models, users can now build more nuanced, nonlinear machine learning models capable of making highly accurate valuations.
- Feature impact values for each property, which allow for interpretation of features that influence individual property valuation.
Insight comes in understanding shapley values
So let’s apply SHAP values to GeoPhy’s AVM output on a well-known property set from Boston. An initial look at the correlation between variables shows a strong inverse correlation with a % of ‘lower status’ of the population (Status)and positive correlation with the average number of rooms per dwelling (Rooms). OK, that is a start.
However, by taking an alternative look at the global SHAP summary plot one begins to see just how the distribution of these features impact the AVM model output. This view reveals a high “lower status” population (Status) lowers the predicted home value while the average number of rooms per dwelling (Room) only marginally increases that value. It also confirms our intuition that features like nitrogen oxide concentration (Pollution), and yes Crime tends to have negative impacts on property values.
Dependency plots on features also uncover valuable insights. For example, as Status increases, Crime further pushes the value down. Yet the median value of properties increase when weighted distance to five Boston employment centres is low and crime is higher:
These traditional features also impact at an individual property level.
When crime signals something more
Let’s have a closer look at how crime influences specific property values in our sample of Boston property values. Consider two properties, property-A and property-B, on the opposite sides of town. On paper, physical property features — number of bedrooms, bathrooms and floor space — all appear relatively comparable. Yet their value, and more important, the way various features affect the property value appear quite different. Why?
The use of local explanations — another valuable SHAP feature — can begin to shed light on how different environmental features influence individual property values.
A feature such as crime that is an important driver in pushing the value higher, yes higher, for property-B is in fact one that drives value down for property-A. Or that the proportion of non-retail business space (Industrial) drives property-A’s value higher, yet proves to be an unimportant factor for property-B. Solely examining the global feature importance obfuscates variations at the property level, or worse, assumes a nuanced awareness achievable only by human agents. However, with individual-level SHAP values, GeoPhy’s AVMs help you systematically pinpoint these pivotal features for each individual property. It provides our users the tool to identify value drivers and customize property investment decisions accordingly.
SHAP helps decrypt lessons of value
Although this may seem like complicated statistics at first, SHAP values can help understand how various property features impact GeoPhy’s AVM models. It provides us insight into the inner working of supervised machine learning algorithms that drive AI-powered valuation models, and offers direction for improving performance and removing bias.
That understanding becomes crucial in the selection of best-suited properties for portfolio optimization in the future and improvement of current properties for optimal returns today. It also provides a means of explaining the complex machine learning algorithms that drive valuations should there be legal requirements to do so — whether or not you find yourself in the midst of a crime.