Clustering of Currency Exchange Rates Using Hierarchical Methods Based on Dynamic Time Warping

Machine Learning Finance

Clustering the daily spot exchange rates of the 30 currencies.

Uraz Akgül
2022-09-18

In this study, we collect and analyze historical exchange rate data for the 30 currencies listed in the table below, considering the US dollar as reference. Reuters is the source of the data and you can access it by downloading post29.xlsx file here.

country currency
Argentine USDARS
Brazil USDBRL
Bulgaria USDBGN
Chile USDCLP
Colombia USDCOP
Czech Republic USDCZK
Egypt USDEGP
Hungary USDHUF
India USDINR
Indonesia USDIDR
Kazakhstan USDKZT
Kenya USDKES
Malaysia USDMYR
Mexico USDMXN
Nigeria USDNGN
Pakistan USDPKR
Peru USDPEN
Philippines USDPHP
Poland USDPLN
Romania USDRON
Russia USDRUB
Singapore USDSGD
South Africa USDZAR
South Korea USDKRW
Sri Lanka USDLKR
Taiwan USDTWD
Thailand USDTHB
Turkey USDTRY
Ukranie USDUAH
Vietnam USDVND

By calculating the average, we can convert the daily frequency data to monthly data.

df <- readxl::read_excel("data.xlsx") %>% 
  na.omit() %>% 
  pivot_longer(!DATE, names_to = "Vars", values_to = "Vals") %>% 
  mutate(
    Months = format(DATE,"%m"),
    Years = format(DATE,"%Y")
  ) %>% 
  group_by(Vars,Months,Years) %>% 
  summarise(
    MeanVals = mean(Vals)
  ) %>% 
  ungroup() %>% 
  arrange(Vars,Years) %>% 
  mutate(
    DATE = as.Date(paste0(Years,"-",Months,"-",1))
  ) %>% 
  select(DATE,Vars,MeanVals) %>% 
  pivot_wider(names_from = "Vars", values_from = "MeanVals")

It would be better if we standardize the data. The following formula can be used to standardize the values in a dataset.

\(z = \frac{(x - \mu)}{\sigma}\)

\(z\): Standardised value or Z-score

\(x\): Original value

\(\mu\): Sample mean

\(\sigma\): Sample standard deviation

standardize <- function(x){
  
  (x - mean(x)) / sd(x)
  
}

df_standardize <- df %>% 
  mutate_at(
    vars(-DATE), function(x) standardize(x)
  )
df_standardize %>% 
  pivot_longer(!DATE, names_to = "Vars", values_to = "Vals") %>% 
  ggplot(aes(x = DATE, y = Vals)) +
  geom_line() +
  facet_wrap(~Vars, ncol = 5) +
  theme_minimal() +
  theme(axis.title = element_blank(),
        axis.text.x = element_text(size = 15),
        axis.text.y = element_blank(),
        strip.text = element_text(size = 20))

What happens when the data is standardised? An example is provided below. I’d like to draw your attention to the y-axis.

Type: h or Hierarchical

An algorithm called hierarchical clustering divides objects into clusters based on how similar they are. The result is a collection of clusters, each of which differs from the others while having things that are generally similar to one another.

Distance: dtw or Dynamic Time Warping

We can explain DTW by comparing it to the Euclidean distance. The distance between two points in Euclidean space is known as the Euclidean distance.

Dynamic Time Warping uses temporal distortions to create the best possible alignment, whereas Euclidean only allows one-to-one point comparison.

As a result of various trials, I decided to create 3 different clusters.

k <- 3L

data_cluster <- tsclust(
  t(df_standardize[,-1]), # data
  type = "h", # What type of clustering method to use
  k = k, # Number of desired clusters
  distance = "dtw" # Dynamic time warping
)
cluster <- as.data.frame(cutree(data_cluster, k=k)) %>% 
  rownames_to_column(., var = "Vars") %>% 
  rename("Cluster"=2)
Vars Cluster
ARS 1
EGP 1
HUF 1
IDR 1
INR 1
KES 1
KZT 1
LKR 1
MXN 1
NGN 1
PKR 1
RON 1
RUB 1
TRY 1
UAH 1
VND 1
ZAR 1
BGN 2
BRL 2
CLP 2
COP 2
CZK 2
KRW 2
MYR 2
PEN 2
PHP 2
PLN 2
SGD 3
THB 3
TWD 3
df2 <- df_standardize %>% 
  pivot_longer(!DATE, names_to = "Vars", values_to = "Vals") %>% 
  arrange(Vars) %>% 
  left_join(cluster, by = "Vars")
for(i in 1:length(unique(cluster$Cluster))){
  
  g <- ggplot(df2 %>% filter(Cluster == i), aes(x = DATE, y = Vals)) +
    geom_line(data = df2 %>% filter(Cluster == i) %>% rename(Vars2 = Vars), aes(group = Vars2), color = "gray", size = 1) +
    geom_line(color = "dark blue", size = 2) +
    ggthemes::theme_fivethirtyeight() +
    theme(strip.text = element_text(size = 20),
          axis.text = element_blank()) +
    facet_wrap(~Vars)
  
  plot(g)
  
}