Your task is to write a Python function that implements the k-Means clustering algorithm. This function should take specific inputs and produce a list of final centroids. k-Means clustering is a method used to partition n points into k clusters. The goal is to group similar points together and represent each group by its center (called the centroid).

Function Inputs:

  • points: A list of points, where each point is a tuple of coordinates (e.g., (x, y) for 2D points)
  • k: An integer representing the number of clusters to form initial_centroids: A list of initial centroid points, each a tuple of coordinates
  • max_iterations: An integer representing the maximum number of iterations to perform

Function Output:

A list of the final centroids of the clusters, where each centroid is rounded to the nearest fourth decimal.

Example:

  • Input:points = [(1, 2), (1, 4), (1, 0), (10, 2), (10, 4), (10, 0)], k = 2, initial_centroids = [(1, 1), (10, 1)], max_iterations = 10
  • Output:[(1, 2), (10, 2)]

这道题是K均值聚类,也是比较容易的,无非是迭代。

首先定义一下度量距离的函数,这里是欧式距离:

import numpy as np  
  
def dist(p, q):  
 return sum((np.array(p) - np.array(q))**2)**0.5

然后就是给点,指定k类,给个中心,告诉你按照中心这些给定的点应该分为什么类别:

def clustering(points, k, centroids):  
 cluster = [0] * len(points)  
 for i in range(len(points)):  
  min_d = dist(points[i], centroids[0])  
  cls = 0  
  for j in range(1, k):  
   d = dist(points[i], centroids[j])  
   if min_d < d:  
    min_d = d  
    cls = j  
  cluster[i] = cls  
 return cluster

拿到类别之后,就要按照类别去切分数据点了,然后再给每一类计算均值,做为这一类别的新中心点:

def index(x, value):  
 return [i for i, e in enumerate(x) if e == value]  
      
def coord_mean(points):  
return np.mean(np.array(points), axis=0)  
  
def update_centroids(points, k, centroids):  
 cls = clustering(points, k, centroids)  
for i in range(k):  
  p = [points[j] for j in index(cls, i)]  
if (len(p) == 0):  
   next  
  centroids[i] = coord_mean(p)  
return centroids

有了这些辅助函数之后,K均值聚类写起来就简单了,这里是指定迭代次数的,所以直接循环就好了,每次聚类,重新计算中心点。迭代次数够了,返回中心点:

def k_means_clustering(points: list[tuple[float, float]], k: int, initial_centroids: list[tuple[float, float]], max_iterations: int) -> list[tuple[float, float]]:  
 centroids = [list(c) for c in initial_centroids]  
   
 for _ in range(max_iterations):  
  centroids = update_centroids(points, k, centroids)  
   
 final_centroids = [tuple(c) for c in np.round(centroids, 4)]  
 return final_centroids