top of page
Search
Writer's pictureAnthony Chamberas

Predicting Energy Consumption

Updated: Jul 25, 2023

In my last article I wrote about how I was able to get the total energy consumption of my home from the Tesla Powerwall and EV APIs. I subtracted the EV charging data from the total home consumption data to get home consumption net of EV charging.


In this article I will show how I predict net energy consumption using weather and time related features.


The first thing I did was to look at the distribution of net energy consumption for January 2022 to April 2023. At 15 minute intervals, this gave me about 46,000 rows to analyze. The distribution of the values was a bit unbalanced, with a min of 0 kWh (of course) a mode of about 0.15 kWh (per 15 minute interval) and a max of around 3kWh.


Often, unbalanced distributions of target variables lead to heteroscedastic prediction errors (larger errors for larger prediction values). To prevent this, I applied a natural log to each record to balance the distribution.


Next, I thought about different features that might explain the variation seen in the kWh consumed. I settled on starting with weather and time-related features with the hypothesis that these might influence consumption patterns. I pulled hourly weather data for the same time period from the Open Weather API, and time data from the timestamp recorded for each energy measurement.


To combine the two datasets, created a new timestamp field on the energy dataset that represented the date and hour of the measurement, and joined that to the measurement date and hour of the weather data.


After analyzing these features for correlations to the net energy consumed, I found that week of year (WOY), day of week (DOW), hour of day (HOD), temperature and percent cloud cover to be the most hopeful. I also added auto-regressive features to the dataset, measuring net energy consumption 1 to 6 periods prior to the period being measured. I labeled these T-1, T-2,...,T-6.


Lastly, in December 2022 we bought a Toyota Rav4 Plugin Hybrid. This would cause a bump in energy consumption, so to account for this I added a feature that indicated if the measurement date was before (0) or after (1) this purchase.


I stayed away from using day of year as a feature to avoid over-fitting of the model.


To train a model for predicting net energy consumed, I opted for the RandomForestRegressor model from the Python SkLearn library. The default model parameters generated a pretty good fit on an 80/20 train/test split of the dataset, but to tune the parameters, a also performed a grid search with the RandomizedSearchCV method, also from SkLearn. This generated a similar fit, but the additional cross-validation gave me the peace of mind that there was no likely bias in my original training data.


This model gave fairly decent results with about 87% accuracy, Also, using the natural log to balance the target data distribution did not end up introducing much transformation bias.


To understand the impact each feature had on the prediction, I ran feature importance on the fit model. This showed that the auto-regressive features had the biggest impact, with the Rav4 indicator having the least.


But I really wanted to put the model through its paces, so I applied it to data after May 2023, which were not in the original training/test dataset. The predictions tracked against the actual measurements, which further validated that the model did a good job at predicting net energy consumed.


In my next article I will be explaining how I used solar production data to predict the amount of solar power that could be produced on any given day.


Thanks to Nikola Novakovic for helping me sort through some challenges I had fitting the model!

41 views0 comments

Recent Posts

See All

Opmerkingen


bottom of page