首頁資訊機(jī)器學(xué)習(xí)：線性回歸練習(xí)

機(jī)器學(xué)習(xí)：線性回歸練習(xí)

來源：泰然健康網(wǎng) 時間：2024年12月13日 06:08

練習(xí)1：線性回歸

介紹

在本練習(xí)中，您將實(shí)現(xiàn)線性回歸并了解其在數(shù)據(jù)上的工作原理。

在開始練習(xí)前，需要下載如下的文件進(jìn)行數(shù)據(jù)上傳：

ex1data1.txt -單變量的線性回歸數(shù)據(jù)集 ex1data2.txt -多變量的線性回歸數(shù)據(jù)集

在整個練習(xí)中，涉及如下的必做作業(yè)，及標(biāo)號*的選做作業(yè)：

實(shí)現(xiàn)簡單示例函數(shù)----------（5分）實(shí)現(xiàn)數(shù)據(jù)集顯示的函數(shù)-------（5分）計(jì)算線性回歸成本的函數(shù)-----（40分）運(yùn)行梯度下降的功能函數(shù)-----（50分）數(shù)據(jù)標(biāo)準(zhǔn)化* 多變量線性回歸的梯度下降功能實(shí)現(xiàn)*

必做作業(yè)為實(shí)現(xiàn)單變量的線性回歸；選做作業(yè)為實(shí)現(xiàn)多變量線性回歸。

1 實(shí)現(xiàn)簡單示例函數(shù)

在該部分練習(xí)中,將通過代碼實(shí)現(xiàn)返回一個5*5的對角矩陣。輸出與如下相同：

1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1

1.1 提交解決方案

在以下代碼框中進(jìn)行如上的實(shí)現(xiàn)，完成部分練習(xí)后，得到如上的相同結(jié)果即為通過。

import numpy as np def create_identity_matrix(size): matrix = [] for i in range(size): row = [0] * size row[i] = 1 matrix.append(row) return matrix def print_matrix(matrix): for row in matrix: print(' '.join(map(str, row))) identity_matrix = create_identity_matrix(5) print_matrix(identity_matrix)

1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1

2 單變量線性回歸

在該部分練習(xí)中，將實(shí)現(xiàn)單變量線性回歸并用來預(yù)測餐車的利潤。

假設(shè)你是一家餐廳的領(lǐng)導(dǎo)，正在考慮在不同的城市開設(shè)新的分店。該連鎖店已經(jīng)在不同的城市有了餐車，并且你能夠獲得每個城市的人口和利潤數(shù)據(jù)。

現(xiàn)在需要使用這些數(shù)據(jù)來幫助你選擇下一個被擴(kuò)展的城市。

文件ex1data1.txt包含線性回歸問題的數(shù)據(jù)集。第一列數(shù)據(jù)對應(yīng)城市人口，第二列數(shù)據(jù)對應(yīng)那座城市的餐車的利潤。利潤為負(fù)時表示虧損。

2.1 繪制數(shù)據(jù)

在開始進(jìn)入練習(xí)之前，對數(shù)據(jù)進(jìn)行可視化通常很有用。對于該數(shù)據(jù)集，可以使用散點(diǎn)圖進(jìn)行可視化，因?yàn)樗挥袃蓚€屬性（人口、利潤）。

import numpy as np import pandas as pd import matplotlib.pyplot as plt import os %matplotlib inline

path = 'ex1data1.txt' data = pd.read_csv(path, header=None,names=['Population','Profit']) data.head(5) Population Profit 0 6.1101 17.5920 1 5.5277 9.1302 2 8.5186 13.6620 3 7.0032 11.8540 4 5.8598 6.8233

接下來需要實(shí)現(xiàn)數(shù)據(jù)可視化的代碼，該部分?jǐn)?shù)據(jù)繪制出的圖像應(yīng)與如下相同。

要點(diǎn)：

實(shí)現(xiàn)散點(diǎn)圖可視化數(shù)據(jù)分布為紅色點(diǎn) 標(biāo)清橫縱坐標(biāo)名稱

plt.figure(figsize=(8, 6)) plt.scatter(data['Population'], data['Profit'], color='red') plt.title('Population vs Profit', fontsize=14) plt.xlabel('Population', fontsize=12) plt.ylabel('Profit', fontsize=12) plt.show()

2.2 梯度下降

在該部分中，將使用梯度下降來選擇合適的線性回歸參數(shù)θ用以擬合給定數(shù)據(jù)集。

2.2.1 更新公式

線性回歸的目的是最小化成本函數(shù)：

假設(shè)由以下線性模型給出：

回顧一下，模型的參數(shù)是的值，這些將用來調(diào)整以最小化成本。

其中一種方法是使用批量梯度下降算法，在批量梯度下降中，每次迭代地執(zhí)行更新，隨著梯度下降的每一步計(jì)算，參數(shù)越來越接近能夠使得成本達(dá)到最低的最佳值。

(同時更新所有的）

2.2.2 實(shí)現(xiàn)

在上一部分的練習(xí)中，我們已經(jīng)將所需要用到的數(shù)據(jù)加載至變量data中，并為其列分別進(jìn)行命名。

接下來，我們在數(shù)據(jù)中添加了一個維度來擬合截距項(xiàng)。并將初始參數(shù)值設(shè)為0，學(xué)習(xí)率設(shè)為0.01。

data.insert(0, 'Ones', 1) cols = data.shape[1] X = data.iloc[:,0:cols-1] y = data.iloc[:,cols-1:cols] X = np.matrix(X.values) y = np.matrix(y.values) alpha = 0.01 iterations = 1500

print(X.shape, type(X)) print(X) print(X.shape, type(X)) print(X)

(97, 2) <class 'numpy.matrix'> [[ 1. 6.1101] [ 1. 5.5277] [ 1. 8.5186] [ 1. 7.0032] [ 1. 5.8598] [ 1. 8.3829] [ 1. 7.4764] [ 1. 8.5781] [ 1. 6.4862] [ 1. 5.0546] [ 1. 5.7107] ... [ 1. 5.0594] [ 1. 5.7077] [ 1. 7.6366] [ 1. 5.8707] [ 1. 5.3054] [ 1. 8.2934] [ 1. 13.394 ] [ 1. 5.4369]] (97, 2) <class 'numpy.matrix'> [[ 1. 6.1101] [ 1. 5.5277] [ 1. 8.5186] [ 1. 7.0032] ... [ 1. 7.6366] [ 1. 5.8707] [ 1. 5.3054] [ 1. 8.2934] [ 1. 13.394 ] [ 1. 5.4369]] 2.2.3 計(jì)算成本J(θ)

在執(zhí)行梯度下降最小化成本函數(shù)時，通過計(jì)算成本來監(jiān)視收斂狀態(tài)是有幫助的。

在該部分練習(xí)任務(wù)中，你需要實(shí)現(xiàn)一個計(jì)算成本的函數(shù)computeCost，用于檢查梯度下降實(shí)現(xiàn)的收斂性。

其中，X和y不是標(biāo)量值，而是矩陣，其行代表訓(xùn)練集中的示例。

要點(diǎn)：
完成該函數(shù)后，將值初始化為0并進(jìn)行成本的計(jì)算，將得到的成本值打印出來。

如果結(jié)果為32.07，則計(jì)算通過。

theta = np.zeros((X.shape[1],1)) def computeCost(theta, X, y): m = len(y) inner = X @ theta - y square_sum = np.sum(np.square(inner)) cost = square_sum / (2 * m) return cost; cost = computeCost(theta, X, y) print(cost)

32.072733877455676 2.2.4 梯度下降

接下來，我們將實(shí)現(xiàn)梯度下降，給出的代碼已經(jīng)實(shí)現(xiàn)了循環(huán)結(jié)構(gòu)，你只需要在每次的迭代中提供的更新。

在進(jìn)行代碼實(shí)現(xiàn)時，請確保你了解要優(yōu)化的內(nèi)容，和正在更新的內(nèi)容。

請記住，成本為參數(shù)-被向量終止，而不是和。也就是說，我們將的值最小化通過改變矢量的值，而不是通過改變或。

驗(yàn)證梯度下降是否正常工作的一種好方法是查看的值，并檢查該值每步是否減小。每次迭代時，代碼都會調(diào)用computeCost函數(shù)并打印成本。假設(shè)你實(shí)現(xiàn)了梯度下降，正確地計(jì)算成本，值永遠(yuǎn)不會增加，并且應(yīng)該在算法結(jié)束時收斂到穩(wěn)定值。

要點(diǎn)：

實(shí)現(xiàn)梯度下降后，需要使用最終的參數(shù)值將線性回歸的擬合結(jié)果進(jìn)行可視化，繪圖結(jié)果需要類似如下圖所示。

def gradient(theta, X, y): m = X.shape[0] inner = X.T @ (X @ theta - y) return inner / m def batch_gradient_decent(theta, X, y, iterations, alpha): cost_data = [computeCost(theta, X, y)] _theta = theta.copy() for _ in range(iterations): _theta = _theta - alpha * gradient(_theta, X, y) cost_data.append(computeCost(_theta, X, y)) return _theta, cost_data

_theta, cost_data = batch_gradient_decent(theta, X, y, iterations, alpha) print(_theta) computeCost(_theta, X, y)

[[-3.63029144] [ 1.16636235]] 4.483388256587726

cost_data

[32.072733877455676, 6.737190464870006, 5.931593568604956, 5.901154707081388, 5.895228586444221, 5.89009494311733, 5.885004158443647, 5.879932480491418, 5.874879094762575, ... 4.516522271846125, 4.516379811801644, 4.516237864890023, 4.516096429262984, ...]

plt.scatter(data['Population'], data['Profit'], color='red', label='Training data') plt.plot(data['Population'], X @ _theta, label='Linear regression fit', color='blue') plt.xlabel('Population') plt.ylabel('Profit') plt.title('Linear Regression Fit') plt.legend() plt.show()

2.3 可視化成本函數(shù)

為了更好地理解成本函數(shù)的迭代計(jì)算，將每一步計(jì)算的cost值進(jìn)行記錄并繪制。

fig, ax = plt.subplots(figsize=(12,8)) ax.plot(np.arange(iterations+1), cost_data, 'r') ax.set_xlabel('Iterations') ax.set_ylabel('Cost') ax.set_title('Cost Function Convergence')

Text(0.5, 1.0, 'Cost Function Convergence')

選做練習(xí)

3 多變量線性回歸

在該部分中，將使用多個變量來實(shí)現(xiàn)用線性回歸預(yù)測房屋價格。假設(shè)你目前正在出售房屋，想知道什么是好的市場價格。

一種方法是首先收集最近出售房屋的信息，其次是建立房屋價格模型。

文件ex1data2.txt包含俄勒岡州波特蘭市的房屋價格及相關(guān)信息。第一列是房屋的大?。ㄒ云椒接⒊邽閱挝唬诙惺桥P室的個數(shù)，第三列是房屋的價格。

3.1 特征標(biāo)準(zhǔn)化

以下代碼將從文件ex1data2.txt文件中加載并顯示該數(shù)據(jù)集。

通過觀察這些數(shù)據(jù)，可以發(fā)現(xiàn)房屋的大小大約是臥室數(shù)量的1000倍。而當(dāng)不同的特征值之間相差幾個數(shù)量級時，將特征進(jìn)行縮放可以使梯度下降收斂得更快。

path = 'ex1data2.txt' data2 = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price']) data2.head() Size Bedrooms Price 0 2104 3 399900 1 1600 3 329900 2 2400 3 369000 3 1416 2 232000 4 3000 4 539900

在該部分練習(xí)中，你的任務(wù)是編寫代碼并實(shí)現(xiàn)數(shù)據(jù)集中的數(shù)據(jù)標(biāo)準(zhǔn)化。

要點(diǎn)：

從數(shù)據(jù)集中減去每個特征的平均值。減去平均值后，再將新的特征值除以各自的“標(biāo)準(zhǔn)差”

標(biāo)準(zhǔn)差是一種衡量特定特征的值的范圍內(nèi)有多大變化的方法（大多數(shù)數(shù)據(jù)點(diǎn)將位于平均值的兩個標(biāo)準(zhǔn)差內(nèi)）；這是取值范圍的替代方法。

當(dāng)標(biāo)準(zhǔn)化特征時，需要存儲用于標(biāo)準(zhǔn)化的值——平均值和標(biāo)準(zhǔn)差。從模型中學(xué)習(xí)參數(shù)后，經(jīng)常需要預(yù)測新的房屋的價格。此時給定一個新的值（房屋面積和臥室數(shù)量），必須首先使用先前從訓(xùn)練集中計(jì)算的平均值和標(biāo)準(zhǔn)差來對新的數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化。

def get_X(df): ones = pd.DataFrame({'ones': np.ones(len(df))}) data = pd.concat([ones, df], axis=1) return data.iloc[:, :-1].values def get_y(df): return np.array(df.iloc[:, -1]) def normalize_feature(df): return df.apply(lambda column: (column - column.mean()) / column.std()) data = normalize_feature(data2) data.head() Size Bedrooms Price 0 0.130010 -0.223675 0.475747 1 -0.504190 -0.223675 -0.084074 2 0.502476 -0.223675 0.228626 3 -0.735723 -1.537767 -0.867025 4 1.257476 1.090417 1.595389 3.2 梯度下降

在之前的練習(xí)中，我們使用單變量線性回歸實(shí)現(xiàn)了梯度下降的問題。在該部分聯(lián)系中，唯一的區(qū)別是，此時我們的數(shù)據(jù)變?yōu)榫仃嚒?/p>

假設(shè)函數(shù)和批次梯度下降的更新規(guī)則保持不變，你的任務(wù)是代碼實(shí)現(xiàn)多變量線性回歸的成本函數(shù)和梯度下降。

要點(diǎn)：

確保你的代碼中可以支持任何大小的數(shù)據(jù)，并且數(shù)據(jù)均已被向量化。代碼實(shí)現(xiàn)成本函數(shù)和梯度下降后，最終的成本值應(yīng)大約為0.13。請依照單變量線性回歸練習(xí)中要求，繪制成本的變化曲線。

import seaborn as sns X = get_X(data) print(X.shape, type(X)) y = get_y(data) print(y.shape, type(y)) theta = np.zeros(X.shape[1]) epoch = 1500 alpha = 0.01 final_theta, cost_data = batch_gradient_decent(theta, X, y, epoch, alpha = alpha) sns.tsplot(time = np.arange(len(cost_data)), data = cost_data) plt.xlabel('epoch', fontsize = 18) plt.ylabel('cost', fontsize = 18) plt.show() final_theta

(47, 3) <class 'numpy.ndarray'> (47,) <class 'numpy.ndarray'> /opt/conda/lib/python3.6/site-packages/seaborn/timeseries.py:183: UserWarning: The `tsplot` function is deprecated and will be removed in a future release. Please update your code to use the new `lineplot` function. warnings.warn(msg, UserWarning) /opt/conda/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

array([-1.10833328e-16, 8.84042349e-01, -5.24551809e-02]) 練習(xí)1：線性回歸介紹 1 實(shí)現(xiàn)簡單示例函數(shù) 1.1 提交解決方案 2 單變量線性回歸 2.1 繪制數(shù)據(jù) 2.2 梯度下降 2.2.1 更新公式 2.2.2 實(shí)現(xiàn) 2.2.3 計(jì)算成本J(θ) 2.2.4 梯度下降 2.3 可視化成本函數(shù) 選做練習(xí) 3 多變量線性回歸 3.1 特征標(biāo)準(zhǔn)化 3.2 梯度下降

__EOF__

本文作者： HJDSSJ 本文鏈接： https://www.cnblogs.com/hjdssj/p/18585819 關(guān)于博主：評論和私信會在第一時間回復(fù)?；蛘咧苯铀叫盼?。版權(quán)聲明：本博客所有文章除特別聲明外，均采用 BY-NC-SA 許可協(xié)議。轉(zhuǎn)載請注明出處！聲援博主：如果您覺得文章對您有幫助，可以點(diǎn)擊文章右下角【推薦】一下。

網(wǎng)址: 機(jī)器學(xué)習(xí)：線性回歸練習(xí) http://m.u1s5d6.cn/newsview484985.html

91高清中文字幕|亚洲无码网站网址|欧美一区二区乱伦|a乱码精品一区二区三|成人一区二区毛片|国产日韩精品视频短片|不卡无码无需播放器|鲁噜精品免费视频|wwwh日韩中出|精品五月婷婷无码

機(jī)器學(xué)習(xí)：線性回歸練習(xí)

練習(xí)1：線性回歸

介紹