吴恩达《机器学习》学习笔记 2023/3/16更新

2023-03-14 机器学习 rainforcetime

1st

machine learning

1. welcome to machine Learning

2. what can machine Learning do

2st

1. Define: what is machine learning?

2. Machine Learning algorithms

==2.1 Supervised learning (监督学习)== used most in real-world application

input to output label

x to y

learns from being given "right answers"

Regression: House price prediction ——回归问题

Predict a number infinitely many possible outputs.

Classification: Breast cancer detection —— 分类问题 class == category

Predict categories.

small number of possible outputs.

==2.2 Unsupervised learning==(无监督学习)

Find something interesting in unlabeled data.

Data only comes with inputs x, but not output labels y.

Algorithm has to find structure in the data.

Clustering(集群/群集/簇): Google news / Grouping customers

Group similar data points together.

Anomaly detection(异常检测):

Find unusual data points.

Dimensionality reduction(降维):

Compress data using fewer numbers.

3st

==Linear Regression Model==: 线性回归模型

Terminology

Training set

Notation：

x = "input" variable

feature

y = "output" variable

"target" variable

m = number of training examples.

(x,y) = single training example.

$(x^{i},y^{i}) = i^{th}$ training example.

$\hat{y}$ is "y-hat" estimated y

Linear regression with one variable. （单变量线性回归）

Univariate linear regression.

Cost Function 代价函数 or 成本函数

Squared error cost function 平方误差成本函数/代价函数

Model: $f_{w,b}(x) = wx + b$

w,b : parameters (系数/权重)

y-intercept：截距

slope：斜率

($\hat{y}$ - y) is be called error(误差)

$J_{w,b}=\frac{1}{2m}\sum_{i=1}^m(\hat{y}^{(i)} - y^{(i)})^2$ (m = number of training examples)

$J_{w,b}=\frac{1}{2m}\sum_{i=1}^m(f_{w,b}(x^{i}) - y^{(i)})^2$

Find w,b:

$\hat{y}$ is close to $y^{(i)}$ for all $(x^{(i)},y^{(i)})$

Conclusion

model：

$f_{w,b}(x) = wx + b$

parameters：

$w,b$

cost function：

$J_{w,b}=\frac{1}{2m}\sum_{i=1}^m(f_{w,b}(x^{i}) - y^{(i)})^2$

goal：

$\mathop{minimize} \limits_{w,b} J(w,b)$

Simplified

$f_{w}(x) = wx$ (if b = 0)

$J_{w}=\frac{1}{2m}\sum_{i=1}^m(f_{w}(x^{(i)}) - y^{(i)})^2$

$\mathop{minimize} \limits_{w} J(w)$

$f_{w}(x)$ (for fixed w, function of x) (here, x is input)

J(w) (function of w) (here , w is parameter)

choose w to minimize J(w)

3D Visualizing

4st

==Gradient Descent(梯度下降)==

Have some function J(w,b)

want to $\mathop{min} \limits_{w,b} J(w,b)$

$\mathop{min} \limits_{w_{1},...,w_{n},b}J(w_{1},w_{2},...,w_{n},b)$

Outline:

Start with some w,b(is not important)(so set w=0,b=0)

Keep changing w,b to reduce J(w,b)

Until we settle at or near a minimum(may have >1 minimum)

Gradient descent algorithm

$w = w - \alpha \frac{\partial}{\partial w} J(w,b)$

$\alpha$ : Learning rate

$\frac{\partial}{\partial w} J(w,b)$ : Derivative(导数)

$b = b - \alpha \frac{\partial}{\partial b} J(w,b)$

Simultaneously update w and b

Correct : Simultaneous update

$tmp\_w = w - \alpha \frac{\partial}{\partial w} J(w,b)$

$tmp\_b = b - \alpha \frac{\partial}{\partial b} J(w,b)$

$w = tmp\_w$

$b = tmp\_b$

Order of attention

Gradient Descent Intuition (梯度下降的直观理解)

$w = w - \alpha \frac{\partial}{\partial w} J(w)$

学习率

作者：rainforcetime

链接：https://blog.mumudlin.com/index.php/吴恩达《机器学习》学习笔记/

文章版权归作者所有，未经允许请勿转载。