Your task is to write a Python function that implements the k-Means clustering algorithm. This function should take specific inputs and produce a list of final centroids. k-Means clustering is a method used to partition n points into k clusters. The goal is to group similar points together and represent each group by its center (called the centroid).
Function Inputs:
- points: A list of points, where each point is a tuple of coordinates (e.g., (x, y) for 2D points)
- k: An integer representing the number of clusters to form initial_centroids: A list of initial centroid points, each a tuple of coordinates
- max_iterations: An integer representing the maximum number of iterations to perform
Function Output:
A list of the final centroids of the clusters, where each centroid is rounded to the nearest fourth decimal.
Example:
- Input:
points = [(1, 2), (1, 4), (1, 0), (10, 2), (10, 4), (10, 0)], k = 2, initial_centroids = [(1, 1), (10, 1)], max_iterations = 10 - Output:
[(1, 2), (10, 2)]
这道题是K均值聚类,也是比较容易的,无非是迭代。
首先定义一下度量距离的函数,这里是欧式距离:
import numpy as np
def dist(p, q):
return sum((np.array(p) - np.array(q))**2)**0.5
然后就是给点,指定k类,给个中心,告诉你按照中心这些给定的点应该分为什么类别:
def clustering(points, k, centroids):
cluster = [0] * len(points)
for i in range(len(points)):
min_d = dist(points[i], centroids[0])
cls = 0
for j in range(1, k):
d = dist(points[i], centroids[j])
if min_d < d:
min_d = d
cls = j
cluster[i] = cls
return cluster
拿到类别之后,就要按照类别去切分数据点了,然后再给每一类计算均值,做为这一类别的新中心点:
def index(x, value):
return [i for i, e in enumerate(x) if e == value]
def coord_mean(points):
return np.mean(np.array(points), axis=0)
def update_centroids(points, k, centroids):
cls = clustering(points, k, centroids)
for i in range(k):
p = [points[j] for j in index(cls, i)]
if (len(p) == 0):
next
centroids[i] = coord_mean(p)
return centroids
有了这些辅助函数之后,K均值聚类写起来就简单了,这里是指定迭代次数的,所以直接循环就好了,每次聚类,重新计算中心点。迭代次数够了,返回中心点:
def k_means_clustering(points: list[tuple[float, float]], k: int, initial_centroids: list[tuple[float, float]], max_iterations: int) -> list[tuple[float, float]]:
centroids = [list(c) for c in initial_centroids]
for _ in range(max_iterations):
centroids = update_centroids(points, k, centroids)
final_centroids = [tuple(c) for c in np.round(centroids, 4)]
return final_centroids
