Join us for our 4th global annual Innovation Series in SeptemberRegister now
Logo
Detecting trends in time series data using Python

Detecting trends in time series data using Python

How to detect trends in time series data with Python using linear regression and Kendall Tau to uncover valuable insights in complex oil and gas datasets.


Vortexa Analysts
Vortexa AnalystsVortexa Analysts
Detecting trends in time series data using Python_cover

Data Preparation

Part 1. Extract data from VortexaSDK

In this article, we will be using Vortexa data as our data source.

Russia has been active in engaging STS activity and has been a market spotlight since last year. In this article, we will analyze the recent trends in Russian STS activity as our case study.

We will be utilizing Vortexa Cargo Movements data to extract the trends in STS activity. The code below is used to download diesel cargoes loading from Russia between 1 December 2022 and 31 March 2023, extract movements containing STS events, and group them by the month in which the STS took place.

Detecting trends in time series data using Python_embedded_1

Fig 1. Transiting quantity for different sts zones (df_ts_filter)

Detecting trends in time series data using Python_embedded_2

Part 2. Data Cleaning to analyze top STS location

Assuming we are only interested in the top STS locations, we will remove STS zones that have only one STS event. The following code identifies the top STS zones of interest.

Detecting trends in time series data using Python_embedded_3

Detecting Time Series

Method 1. Linear Regression

To detect an increasing trend using linear regression, you can fit a linear regression model to the time series data and perform a statistical test on the estimated coefficient (slope). If the coefficient is significantly positive, it indicates that the time series has an increasing trend. On the other hand, if the coefficient is significantly negative, it indicates a decreasing trend.

Linear Regression generally works well even in smaller datasets (such as in monthly time series). Here is an example code that demonstrates this approach:

*Code for linear regression see Appendix

Detecting trends in time series data using Python_embedded_4

Fig 3. Fitted lines of Linear Regression on Top STS zones

To validate the result, we first look at R² score and MAPE of the result of our fitted regression.The R² score measures how well the linear regression model fits the data. It represents the proportion of variance in the dependent variable that can be explained by the independent variables in the model.

A higher R² score indicates a better fit between the model and the data.

The MAPE measures the accuracy of the model’s predictions. It represents the average percentage difference between the actual values and the predicted values.

A lower MAPE indicates that the model is more accurate in its predictions.

In our case, Kalamata STS, Augusta STS, and Taman STS have both high R² score and low MAPE, thus the regression result has higher confidence.

From the slope/gradient results above, it shows that we have Kalamata STS [GR] becoming the hottest STS locations recently, followed by Augusta STS zone, and lastly Taman STS [RU], as they all have high gradients.

Method 2. Kendall Tau Statistics

Kendall tau measures the strength of the association between two variables by comparing the number of concordant and discordant pairs of observations. In the context of time series data, this means comparing the order of the values for the variable being measured at different points in time.

A pair of observations is considered concordant if the value of the variable being measured increases as time increases, or if the value of the variable decreases as time decreases. A pair of observations is considered discordant if the value of the variable being measured increases as time decreases, or if the value of the variable decreases as time increases.

To use Kendall tau to detect trends in time series, we would first calculate the Kendall tau coefficient between time and the variable being measured. This involves counting the number of concordant and discordant pairs of observations. By doing so, Kendall tau coefficient can determine the direction and strength of the association between time and the variable being measured. See detailed step-by-step guide on computing Kendall tau coefficient

If there are more concordant pairs than discordant pairs, this indicates a positive association between time and the variable being measured, suggesting a positive trend. Conversely, if there are more discordant pairs than concordant pairs, this indicates a negative association between time and the variable being measured, suggesting a negative trend.

It’s important to note that Kendall tau is a non-parametric method, meaning it doesn’t assume any specific distribution of the data. This makes it useful for detecting trends in time series data that may not follow a normal distribution or may have outliers. However, it’s important to use Kendall tau in conjunction with other methods, such as visual inspection or linear regression, to get a more complete picture of the trend in the data.

*Code for Relative Order Testing see Appendix

Detecting trends in time series data using Python_embedded_5

We need to choose a threshold (alpha) for the p-values returned by Kendall tau. If the p-value is smaller than our chosen alpha, we reject the null hypothesis and accept the alternative hypothesis — that a significant trend exists in the series. By increasing the alpha level, we are essentially allowing for a greater chance of committing a Type I error — in other words, of detecting a trend that does not in fact exist.. However, we will use an alpha level of 0.1 as we do not wish to miss any interesting trends.

Kendall tau p-values (Fig. 4) suggest that there are significant trends in Augusta STS [IT], Taman STS [RU], and Kalamata STS [GR]. By looking at their tau-statistics (Fig. 4), all of them are approaching 1, meaning that all three STS zones are experiencing increasing trends.

Validation of results

Detecting trends in time series data using Python_embedded_6

The chart above shows that all these three STS locations derived by 2 methods above have in fact an obvious increasing trend. By checking other sts zones, none of them has a more obvious increasing trend than these three. This has helped to verify the capability of these two methods in detecting trends in time series. However, there are some parameters that you may need to consider such as the significance level (alpha) and MAPE threshold to reduce the risk of detecting spurious trends.

Further reading

Interested in learning more about Vortexa's Data API and Python SDK? Check out our guide.

JOIN OUR NEWSLETTER

Loading form...