吴恩达《机器学习》学习笔记 2023/3/16更新

1st

machine learning

1. welcome to machine Learning

2. what can machine Learning do


2st

1. Define: what is machine learning?

2. Machine Learning algorithms

==2.1 Supervised learning (监督学习)== used most in real-world application

input to output label

x to y

learns from being given "right answers"

image-20230314132626238

Regression: House price prediction ——回归问题

Predict a number infinitely many possible outputs.

Classification: Breast cancer detection —— 分类问题 class == category

Predict categories.

small number of possible outputs.


==2.2 Unsupervised learning==(无监督学习)

Find something interesting in unlabeled data.

Data only comes with inputs x, but not output labels y.

Algorithm has to find structure in the data.

Clustering(集群/群集/簇): Google news / Grouping customers

Group similar data points together.

Anomaly detection(异常检测):

Find unusual data points.

Dimensionality reduction(降维):

Compress data using fewer numbers.

image-20220930212432205


3st

==Linear Regression Model==: 线性回归模型

Terminology

Training set

Notation:

x = "input" variable

​ feature

y = "output" variable

​ "target" variable

m = number of training examples.

(x,y) = single training example.

$(x^{i},y^{i}) = i^{th}$ training example.


$\hat{y}$ is "y-hat" estimated y

image-20230314211118413

Linear regression with one variable. (单变量线性回归)

Univariate linear regression.


Cost Function 代价函数 or 成本函数

Squared error cost function 平方误差成本函数/代价函数

Model: $f_{w,b}(x) = wx + b$

w,b : parameters (系数/权重)

y-intercept:截距

slope:斜率

($\hat{y}$ - y) is be called error(误差)

$J_{w,b}=\frac{1}{2m}\sum_{i=1}^m(\hat{y}^{(i)} - y^{(i)})^2$ (m = number of training examples)

image-20230315135148760

$J_{w,b}=\frac{1}{2m}\sum_{i=1}^m(f_{w,b}(x^{i}) - y^{(i)})^2$

Find w,b:

$\hat{y}$ is close to $y^{(i)}$ for all $(x^{(i)},y^{(i)})$


Conclusion

model

$f_{w,b}(x) = wx + b$

parameters:

$w,b$

cost function:

$J_{w,b}=\frac{1}{2m}\sum_{i=1}^m(f_{w,b}(x^{i}) - y^{(i)})^2$

goal:

$\mathop{minimize} \limits_{w,b} J(w,b)$


Simplified

$f_{w}(x) = wx$ (if b = 0)

$J_{w}=\frac{1}{2m}\sum_{i=1}^m(f_{w}(x^{(i)}) - y^{(i)})^2$

$\mathop{minimize} \limits_{w} J(w)$

image-20230316184956237

$f_{w}(x)$ (for fixed w, function of x) (here, x is input)

J(w) (function of w) (here , w is parameter)


image-20230316185339086

choose w to minimize J(w)


3D Visualizing

image-20230316190530542

4st

==Gradient Descent(梯度下降)==

Have some function J(w,b)

want to $\mathop{min} \limits_{w,b} J(w,b)$

$\mathop{min} \limits_{w_{1},...,w_{n},b}J(w_{1},w_{2},...,w_{n},b)$

Outline:

​ Start with some w,b(is not important)(so set w=0,b=0)

​ Keep changing w,b to reduce J(w,b)

​ Until we settle at or near a minimum(may have >1 minimum)

Gradient descent algorithm

$w = w - \alpha \frac{\partial}{\partial w} J(w,b)$

$\alpha$ : Learning rate

$\frac{\partial}{\partial w} J(w,b)$ : Derivative(导数)

$b = b - \alpha \frac{\partial}{\partial b} J(w,b)$

Simultaneously update w and b


Correct : Simultaneous update

$tmp\_w = w - \alpha \frac{\partial}{\partial w} J(w,b)$

$tmp\_b = b - \alpha \frac{\partial}{\partial b} J(w,b)$

$w = tmp\_w$

$b = tmp\_b$

Order of attention


Gradient Descent Intuition (梯度下降的直观理解)

$w = w - \alpha \frac{\partial}{\partial w} J(w)$

image-20230316203717835


学习率

上一篇 Vue
下一篇 概率论(国防科技大学MOOC)课程索引
目录

常见问题

这是示例问题的回答内容,你可以在后台自由编辑。