Problem
I have a dataset representing a time series. There is one index column containing a timestamp and one target column containing a numerical target variable signifying my odometer or the number of kilometers I have driven in total over each period. Using this historical data, how can I predict the kilometers (km) I will accumulate over the following weeks? Knowing how many kilometers I will drive in the future will help me better plan how to service my vehicle.
Solution
We can use a Python library for time series forecasting to predict a target variable over a period. While there are many to choose from, we will focus on the Prophet library today.
Environment
Begin by creating a project folder and opening it using VS code. Then, create a requirements.txt file containing four lines: pandas, ipykernel, prophet, and statsmodels. Next, hit Ctrl+Shift+P and choose Python: Create Environment.

Follow the prompts for creating a local virtual environment. Make sure to check requirements.txt so the environment agent will install the required Python packages directly:

Finally, create a .ipynb file for the experiment. Make sure to select the kernel of the existing project environment:

Dataset Overview
Let us start by checking out the data. I’ll use a CSV file containing fueling data for the past three years. The file has two columns:
- Date: date of fueling.
- Odometer: total kilometers at the time of the fueling.
The difference between each consecutive row equals the total distance driven for that period. For example, over the fourth and fifth of March, the distance covered was 68 688 – 68 174 = 514 km. Here is the data preview:
import pandas as pd
df = pd.read_csv('fuelings_data.csv')
df

Transformations
Next, we need to transform these data a bit. We must convert the Date column from the default object type to pandas datetime. Then, we must rename the columns and set the timestamp column as an index column:
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y')
# prophet requires columns ds and y
df = df.rename(columns={'Date': 'ds', 'Odometer': 'y'})
# but we need an index for resampling
df = df.set_index('ds')

Resampling
The next step is resampling. The intervals in our data set are not equal. We may have more than one fueling per day or fuelings with more than a week between them. To build a better model, it is necessary to resample the data into equal frequencies between the data points. Resampling per se and the intricacies behind it are out of the scope of this article. However, in this case, I already assessed that I get a better model with resampling rather than using the data directly. Resampling is straightforward with the powerful pandas resample function:
df_resampled = df.resample('W').mean().interpolate(method='linear')

This code:
- Instructs the dataframe to be resampled at equal weekly intervals, stipulated by the ‘W’.
- Calls the
mean()function to get the mean over every interval in case there are multiple values per week. - Interpolates using linear interpolation to fill in the missing values when there is not enough data. We will estimate the missing values based on two neighboring temporal points. This is because some of the periods in the original dataset are not equally spaced between each other by a week but by some longer period.
Data Profile
Next, let’s grab a quick data profile of the data so far.
df_resampled.describe()

We see:
- In red: the total amount of data points. Notice that, due to resampling, it has increased from 98 to 157.
- In blue: the starting odometer reading.
- In green: the end odometer reading. Notice it has changed from 68 688 to 68 431 km. This change is again due to the resampling and averaging over the existing interval.
One last point: let us reset the dataframe index. Resampling required a temporal index; however, the Prophet library expects only two columns named ds and y with no dataframe index.
df_resampled = df_resampled.reset_index()

Prediction
Now to the meat and potatoes of this article: making a prediction. Here is the code:
01: from prophet import Prophet 02: 03: m = Prophet() 04: m.fit(df_resampled) 05: 06: future = m.make_future_dataframe(periods=12, freq='W') # Forecasting for the next 12 weeks 07: forecast = m.predict(future) 08: 09: fig = m.plot(forecast, 10: include_legend=True)
Let’s break it down:
- 01: Import the Prophet object for the model.
- 03: Make an instance of the Prophet model.
- 04: Fit the model to the data.
- 06: Make a dataframe for the future periods, in this case, 12 weeks.
- 07: Forecast the future for 12 weeks ahead using the model.
- 09: Plot the forecast.
The result is the following line chart plotting the observed data points, the forecast, and the uncertainty interval:

The uncertainty interval seems narrow. Therefore, the model is good, or at least not too bad, considering we are plug-and-playing. However, we should not forget that the data is real-world and high-quality.
Cross Validation
To get a better understanding of the performance of our model, let us cross-validate it. The cross validation method will cut the data into chunks that we specified (the period parameter) and predict a future period (the horizon parameter) for every chunk:
from prophet.diagnostics import cross_validation
df_cv = cross_validation(m,
period= '28 days', # 4 weeks
horizon = '28 days')
The output of the cross validation is a dataframe containing the true values (y) and the sampled forecast values yhat, y_lower, and y_upper:

Having cross validated the model, we can proceed to computing statistics on the model performance:
from prophet.diagnostics import performance_metrics df_p = performance_metrics(df_cv)

The statistics computed are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), median absolute percent error (MDAPE), and coverage of the yhat_lower and yhat_upper estimates. We are interested in MAE (unit is kilometers) and MAPE (unit is a percentage). We see that for a short forecast period (1-2 weeks), the error percentage is below 1; up to a month (4 weeks or 28 days), it is about 1.6%. These values are valid in the context of this model only. They give an accurate idea of what odometer readings to expect in the coming periods.
Plotting the Evaluation Metrics
Finally, we are ready to plot the evaluation results of the model to make sense of everything so far.
from prophet.plot import plot_cross_validation_metric fig = plot_cross_validation_metric(df_cv, metric='mape', rolling_window=0.1)

As pointed out already, we see that for up to two weeks ahead, errors of up to 1% are common. For predictions further in the future, the error may approach 1.6–1.7%.
Conclusion
As seen in the prediction plot, two points of interest stand out. Sometime in the beginning of April, the odometer will reach the 70 000 km mark. Assuming the next oils service is due at 75,000 km, it is safe to assume that I should have the vehicle serviced in June unless a drastic change in driving habits occurs.

Next Steps

Hristo Hristov is a seasoned data professional with 10+ years of experience spanning the intersection of data engineering and smart manufacturing solutions. Since 2017, he has specialized in implementing advanced analytics solutions for bridging the IT/OT gap.
A technical writer with over 80 published articles on data and AI technologies, Python development, and cloud solutions. Passionate about transforming complex data into business value through innovative applications of Azure Data Platform, Python, IoT solutions, databases, and other cloud technologies.
Currently applying Industry 4.0 best practices, focusing on IoT connectivity, and implementing data and AI systems in manufacturing. Hristo holds a degree in Data Science and several Microsoft certifications covering SQL Server, Power BI, and related technologies.
- MSSQLTips Awards
- Achiever Award (75+ tips) – 2026
- Rookie of the Year – 2021
- Author Contender – 2022/2023/2024/2025



Hi Siraj,
Thanks for your comment. Here is the data. Just paste it to an empty excel file (I hope the tabs will be preserved). The date format is dd.mm.yyyy.
DateOdometer
05.03.202468688
04.03.202468174
02.03.202468078
01.03.202467549
12.02.202467088
27.01.202466635
27.01.202466208
21.01.202465837
20.01.202465309
06.01.202465017
06.01.202464455
17.12.202363941
16.12.202363418
25.11.202362938
25.11.202362471
18.11.202362205
29.10.202361686
18.10.202361248
17.10.202360704
14.10.202360213
13.10.202359673
12.10.202359064
06.10.202358792
29.09.202358219
15.09.202357759
02.09.202357147
02.09.202356574
01.09.202356030
29.08.202355516
28.08.202354927
24.08.202354473
19.08.202354166
02.08.202353783
01.08.202353239
30.07.202352691
29.07.202352152
27.07.202351552
22.07.202351480
25.06.202350898
18.06.202350424
01.06.202350056
24.05.202349436
23.05.202348928
22.05.202348344
21.05.202348161
18.05.202347554
18.05.202347124
17.05.202346616
17.05.202346037
10.05.202345773
11.04.202345338
21.03.202344942
24.02.202344392
21.01.202343849
10.12.202243481
30.11.202243169
31.10.202242398
29.10.202242332
22.10.202241713
28.09.202241105
14.08.202240373
08.08.202240095
24.07.202239530
09.07.202238893
29.06.202238530
04.06.202238218
27.05.202237731
30.04.202237182
15.04.202236773
26.03.202236345
12.03.202235828
19.02.202235459
04.02.202234993
29.11.202134269
21.11.202134006
13.11.202133502
31.10.202132604
28.09.202131844
10.09.202130979
09.09.202130468
09.09.202129948
08.09.202129587
07.09.202129203
03.09.202128753
20.08.202128195
17.08.202127705
13.08.202127277
08.08.202127055
07.08.202126541
07.08.202126056
31.07.202124935
31.07.202124869
27.06.202124327
23.06.202123830
05.06.202123555
13.05.202123058
05.04.202122554
14.03.202121993
Can you add the Source data link too?