本文共 3306 字,大约阅读时间需要 11 分钟。
在 kNN 中,我们直接使用像素的灰度值作为特征向量。这次我们要使用方向梯度直方图 Histogram of Oriented Gradients (HOG) 作为特征向量。
在计算 HOG 前,我们使用图片的二阶矩对其进行抗扭斜 (deskew) 处理,即把歪了的图片摆正。因此,我们首先定义一个函数 deskew(),它可以对一个图像进行抗扭斜处理。以下是 deskew() 函数的实现:
def deskew(img): m = cv2.moments(img) if abs(m['mu02']) < 1e-2: return img.copy() skew = m['mu11'] / m['mu02'] M = np.float32([[1, skew, -0.5 * SZ * skew], [0, 1, 0]]) img = cv2.warpAffine(img, M, (SZ, SZ), flags=cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR) return img
接下来,我们计算图像的 HOG 描述符,创建一个函数 hog()。为此,我们计算图像 X 和 Y 方向的 Sobel 导数。然后计算每个像素的梯度方向和大小,将其转换为 16 位的整数。将图像分为 4 个小块,对每个小块计算它们的朝向直方图 (16 个 bin),使用梯度的大小作为权重。这样,每个小块会得到一个包含 16 个成员的向量。4 个小块的 4 个向量即为图像的特征向量 (包含 64 个成员)。这就是我们要训练数据的特征向量。
以下是 hog() 函数的实现:
def hog(img): gx = cv.Sobel(img, cv.CV_32F, 1, 0) gy = cv.Sobel(img, cv.CV_32F, 0, 1) mag, ang = cv.cartToPolar(gx, gy) bins = np.int32(bin_n * ang / (2 * np.pi)) # quantizing binvalues in (0...16) bin_cells = bins[:10, :10], bins[10:, :10], bins[:10, 10:], bins[10:, 10:] mag_cells = mag[:10, :10], mag[10:, :10], mag[:10, 10:], mag[10:, 10:] hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)] hist = np.hstack(hists) # hist is a 64 bit vector return hist
最后,我们将大图分割成小图。使用每个数字的前 250 个作为训练数据,后 250 个作为测试数据。以下是完整的代码:
import cv2 as cvimport numpy as npSZ = 20bin_n = 16affine_flags = cv.WARP_INVERSE_MAP | cv.INTER_LINEARdef deskew(img): m = cv.moments(img) if abs(m['mu02']) < 1e-2: return img.copy() skew = m['mu11'] / m['mu02'] M = np.float32([[1, skew, -0.5 * SZ * skew], [0, 1, 0]]) img = cv.warpAffine(img, M, (SZ, SZ), flags=affine_flags) return imgdef hog(img): gx = cv.Sobel(img, cv.CV_32F, 1, 0) gy = cv.Sobel(img, cv.CV_32F, 0, 1) mag, ang = cv.cartToPolar(gx, gy) bins = np.int32(bin_n * ang / (2 * np.pi)) # quantizing binvalues in (0...16) bin_cells = bins[:10, :10], bins[10:, :10], bins[:10, 10:], bins[10:, 10:] mag_cells = mag[:10, :10], mag[10:, :10], mag[:10, 10:], mag[10:, 10:] hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)] hist = np.hstack(hists) # hist is a 64 bit vector return histimg = cv.imread('digits.png', 0)if img is None: raise Exception("we need the digits.png image from samples/data here!")cells = [np.hsplit(row, 100) for row in np.vsplit(img, 50)]# First half is training data, remaining is test datatrain_cells = [i[:50] for i in cells]test_cells = [i[50:] for i in cells]deskewed = [list(map(deskew, row)) for row in train_cells]hogdata = [list(map(hog, row)) for row in deskewed]trainData = np.float32(hogdata).reshape(-1, 64)responses = np.repeat(np.arange(10), 250)[:, np.newaxis]svm = cv.ml.SVM_create()svm.setKernel(cv.ml.SVM_LINEAR)svm.setType(cv.ml.SVM_C_SVC)svm.setC(2.67)svm.setGamma(5.383)svm.train(trainData, cv.ml.ROW_SAMPLE, responses)svm.save('svm_data.dat')deskewed = [list(map(deskew, row)) for row in test_cells]hogdata = [list(map(hog, row)) for row in deskewed]testData = np.float32(hogdata).reshape(-1, bin_n * 4)result = svm.predict(testData)[1]mask = result == responsescorrect = np.count_nonzero(mask)print(correct * 100.0 / result.size) 通过上述方法,我们可以准确率达到 93.8%。
转载地址:http://gkju.baihongyu.com/