Dynamic Programming in Maximum Subarray

Posted on 2019-04-10 | Edited on 2019-06-03 | In Algorithm

Dynamic Programming in Maximum Subarray

When I did the problem in Leetcode, the problem of 53 is about the maximum subarray. I got in touch about the DP algorithm, which is very useful for solving this problem, converting the $O(n^3)$ to $O(n)$ complexity.

There are basically three approachs for this problem, the most time consuming one is the most straightforward and simple for user to come up with. Better one is the DP algorithm, which we are going to talk about.

Problem description:

Given an integer array nums, find the contiguous subarray (containing at least one number) which has the largest sum and return its sum.

Example:

1
2
3

Input: [-2,1,-3,4,-1,2,1,-5,4],
Output: 6
Explanation: [4,-1,2,1] has the largest sum = 6.

Answer:

class Solution:
    def maxSubArray(self, nums: List[int]) -> int:
        '''
        DP algorithm
        '''
        n = len(nums)
        dp = [0]*(n)
        dp[0] = nums[0]
        for i in range(1,n):
            dp[i] = max(dp[i-1] + nums[i], nums[i])
        return max(dp)

See the video for more details:

Reference:

Lecture from Youtube.

Computation Graph and Back Propagation

Posted on 2019-04-09 | In Deep Learning

Computation Graph, Back Propagation, Forward and Reversed Auto Differentiation

计算图，反向传播，前向和后向自动求导。

Print in Python3

Posted on 2019-04-08 | In Python

Print in Python3

There are two kinds of ways to output to the screen, one is sys.output.write and print. When we use print, this built-in function actually calls the stdout function.

print equals stdout.write plus "\n". However, when we call print, we can change the optional parameter to print multi objects into a single line.

See the doc of print:

print(*objects, sep=’ ‘, end=’\n’, file=sys.stdout, flush=False)

To get the print without newline：

1	print("your output",end=" ")

My use experience:

See my exercise in the binary search tree, I want to travelsal the tree node and output into a single line.

    def preOrderTraversal(self,node):
        '''
        if you want to output the node in a single line:
            - change the `print` to `print(node.data,end=' ')
        '''
        if node:
#             print(node.data)
            print(node.data,end=" ")
            self.preOrderTraversal(node.lchild)
            self.preOrderTraversal(node.rchild)

List in Python

Posted on 2019-03-22 | In Python

List and For loop in Python

This blog also include some information of the array in Numpy.

Now have a list of [1,-1,1,1,1,-1], want to have another list of category corresponding to the 1 and -1.

1 2	>>> label=[1,-1,1,1,1,-1] >>> cate =['a' if i==1 else 'b' for i in label]

set value using the filter character in array of numpy

>>> import numpy as np
>>> label=np.array([1,-1,1,1,1,-1]) # it must be the array
>>> cate=np.empty(len(label))
>>> cate[label==1]=11
>>> cate[label==-1]=-11

LeetCode Note Two Sum

Posted on 2019-03-21 | In LeetCode

I have tried two kinds of the algorithm.

Do sum by loop and check whether the sum is equal to the target

for i in range(len(nums)):
    sumList = [nums[i]+item for item in nums[i+1:]]
    for sumNum in sumList:
        if sumNum == target:
            return [i, sumList.index(sumNum)+i+1]

Use the index() in List
By using this, the performance become much better.

for i in range(len(nums)):
    targetNum = target-nums[i]
    if targetNum in nums and nums.index(targetNum) != i:
        return i, nums.index(targetNum)

Ensemble Learning

Posted on 2019-03-20 | Edited on 2019-05-27 | In Machine Learning

Ensemble Learning

Bagging

When to use bagging?

用于很强的model。

最容易overfitting的model其实不是神经网络，而是decision tree。如果你想，只要把树扎的足够深，甚至可以在training data上得到100%的准确率，但是那没有任何意义，只是overfitting而已。

Bagging就是将容易overfitting的一堆model结合起来，乱拳打死老师傅。随机森林就是在decision tree上进行bagging，将多个决策树组合起来组成随机森林。

How to get different classifier?

Re-sampling your data set to form a new set
Re-weighting yoru data set to form a new set

Random Forest

The data set is generated by the bootstrapping, which resample the data set with replacement. In random forest, we average much less correlated trees. To implement this algorithm, not only different data subsets are used, but also we choose a subset $m \ll p$ of the features to train decision tree. Typically $m$ can range from $1$ to $\sqrt{p}$. The trees are not that good, but by averaging over huge number of trees, we can get pretty good results.

Boosting

用于比较弱的model。

Adaboost

Can convert the weak learner to strong learner(classifier).

我自己的一个简单Adaboost demo

Reference

台湾大学李宏毅的视频
课程资源: Hung-yi Lee

Linux Use

Posted on 2019-03-20 | In Linux

This blog records some basic commends about the use in Linux Server.

SSH

SSH is used to make connection with the remote server.

Connect the server

1	ssh xxx@xxx.ecs.soton.ac.uk

Close the connection

exit

SCP

1	SCP is for transport between the local and remote.

Transfer the entire file folder

1 2	The formmer path is the `from` location, the latter one is the destination. scp -r xxx@xxx.ecs.soton.ac.uk:/home/sa2y18/mydocuments/4_dataMining ~/Desktop/

Transfer a single file

no -r

1	scp xxx@xxx.ecs.soton.ac.uk:PATH LOCALPATH

Use Jupyter

Run jupyter on the remote and open it from local port. You can follow the instruction: [http://fizzylogic.nl/2017/11/06/edit-jupyter-notebooks-over-ssh/]

1
2
3

jupyter notebook --no-browser --port=8080  # run on the remote machine

ssh -N -L 8080:localhost:8080 xxx@xxx.ecs.soton.ac.uk  # run in the local machine terminal

Beyond Accuracy: Precision and Recall

Posted on 2019-03-12 | In Machine Learning

After training a model, there are some metrics to measure the performance of the model. The accuracy is the common one. Besides, there are other things to measure the performance.

Given four cases of the results:
True positive (TP): actually positive, predictive is positive which is true
False positive (FP): actually negative, predictive is positive which is false (type 1 error)
True negative (TN): actually negative, predictive is negative which is true
False negative (FN): actually positive, predictive is negative which is false (type 2 error)

	True	False
Positve	TP	FP
Negative	TN	FN

$ Precision = \frac{TP}{TP+FP} $
$ Recall = \frac{TP}{TP+FN} $

Hexo writing

Posted on 2019-03-11 | Edited on 2019-07-23 | In Hexo

Sometimes you don’t want to see the blog in the first time as you haven’t finished it.
In this case, you can use the draft.

Initialize draft and publish

$ hexo new draft "draft name"
$ #before publishing, recommend to use Hexo clean
$ # hexo clean
$ hexo publish "draft name"

Other use skills

Add the link
[words](link url)

站内链接 (link within blog)

1	{% post_link blog_name link_name %}

link_name可以不添加，不添加情况下默认提取文章标题

Note:
be careful when you add the article title, what you input through terminal may not the title of real name

Git add ignore

Posted on 2019-03-11 | In Git

What’s gitignore for:

The files which you don’t want to upload or list in your git history, such as “.DS_store” or some other data files. You can achieve this by adding the .gitignore file into your git repo.

Mac .DS_store

gitignore can set globally or just add a single file into your repo. You can find the doc in this link.

所有空行或者以注释符号＃开头的行都会被 git 忽略，空行可以为了可读性分隔段落，# 表明注释。
第一个 / 会匹配路径的根目录，举个栗子，”/*.html”会匹配”index.html”，而不是”d/index.html”。
通配符 匹配任意个任意字符，? 匹配一个任意字符。需要注意的是通配符不会匹配文件路径中的 /，举个栗子，”d/.html”会匹配”d/index.html”，但不会匹配”d/a/b/c/index.html”。
两个连续的星号 ** 有特殊含义：
- 以 / 开头表示匹配所有的文件夹，例如 /test.md 匹配所有的test.md文件。
- 以 / 结尾表示匹配文件夹内所有内容，例如 a/ 匹配文件夹a中所有内容。
- 连续星号 前后分别被 / 夹住表示匹配0或者多层文件夹，例如 a//b 匹配到 a/b 、a/x/b 、a/x/y/b 等。
前缀 ! 的模式表示如果前面匹配到被忽略，则重新添加回来。如果匹配到的父文件夹还是忽略状态，该文件还是保持忽略状态。如果路径名第一个字符为 ! ，则需要在前面增加 \ 进行转义。