# Random Forest

In order to learn svm(support vector machine), we have to learn about what the Random Forest is.

## 1. What is a decision tree

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.(from wiki)

### Two Types of decision tree

1.Categorical Variable Decision Tree

2.Continuous Variable Decision Tree

Example:- Let’s say we have a problem to predict whether a customer will pay his renewal premium with an insurance company (yes/ no). Here we know that income of customer is a significant variable but insurance company does not have income details for all customers. Now, as we know this is an important variable, then we can build a decision tree to predict customer income based on occupation, product and various other variables. In this case, we are predicting values for continuous variable.

### Important Terminology related to Decision Trees

Root Node, Splitting, Decision Node, Leaf/Terminal Node:

Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting.

Branch / Sub-TreeParent and Child Node

1. Over fitting: Over fitting is one of the most practical difficulty for decision tree models. This problem gets solved by setting constraints on model parameters and pruning (discussed in detailed below).
2. Not fit for continuous variables: While working with continuous numerical variables, decision tree looses information when it categorizes variables in different categories.

## 2. Regression Trees vs Classification Trees

Categories: Programming

# Zero to C-Mips Compiler

According to CS143 course task, complete a compiler by my own.

There are four steps:

1. Lexical & Syntax Analysis
2. Semantic Analysis & Type Checking
3. Intermediate Code
4. Translated MIPS Code
5. Optimization
Categories: Programming

# 组成原理处理器部分(1)

4.1到4.3的内容分别为引言、逻辑设计的一般方法和建立数据通路，我们一步步来：

## 4.1 引言

control unit的定义：It tells the computer’s memory, arithmetic/logic unit and input and output devices how to respond to a program’s instructions.

## 4.2 逻辑设计的一般方法

Categories: Programming Tags: Tags: ,

# Archlinux:不能在多个程序中播放音频

Categories: Programming Tags: Tags: ,

# benchmark optimize(1)

source code: whetstone.c

based compiler flags: -std=c89 -DDP  -DROLL -lm

no warning, no error

## GCC First:

1.simply run:

Rolled Double  Precision 703148 Kflops ; 2048 Reps

2.703148 is too slow,then we add flag: -O4, optimize the loops,then compile again,run it:

Rolled Double  Precision 4177105 Kflops ; 2048 Reps

better now!

now come to these flags:

gcc -std=c89 -DDP  -DROLL -O4 -ffast-math -funroll-all-loops -mavx whetstone.c -fopenmp -lm -o b.out

fast-math means faster but sacrifices the accuracy

avx means using the avx instruction
5340310 Kflops now!

## ICC THEN:

1.simply run:

Rolled Double  Precision 4636137 Kflops ; 2048 Reps

seems good at first,if we add flag:-O3, the program isn’t faster at all,then we think about using parallel methods

flags -xHost can improve about 14%

2.parallel methods:

we have to run vtune_amplifier_xe above all,this software locate in /opt/intel/vtune_amplifier_xe_xxx/bin64, run /opt/intel/vtune_amplifier_xe_xxx/bin64/amplxe-gui and you will see the software window.(ps: xxx means the version of vtune_amplifier_xe)

run command(as root):

root# echo 0 > /proc/sys/kernel/yama/ptrace_scope

then refer to the tutorial:hotspots_amplxe_lin.pdf

it shows those hotspots:

it also shows the Utilization situation:

Poor!Now we have to consider to parallel it.

Categories: Programming

# CUDA learning(2)–simple parallelism cuda program

now we use the function add,with these codes:
```__global__ void add(int *a, int *b, int *c) { *c = *a + *b; }```

add() runs on the device, so a, b and c must point to device memory

but we can allocate memory on the GPU

we can use cudaMalloc(), cudaFree(), cudaMemcpy() to handle device memory

now comes with a simple program:

```#include <stdio.h> __global__ void add(int *a, int *b, int *c) { *c = *a + *b; } int main(void) { int a, b, c; int *d_a, *d_b, *d_c; int size = sizeof(int); // host copies of a, b, c // device copies of a, b, c // Allocate space for device copies of a, b, c cudaMalloc((void **)&d_a, size); cudaMalloc((void **)&d_b, size); cudaMalloc((void **)&d_c, size); a = 2; b = 7; cudaMemcpy(d_a, &a, size, cudaMemcpyHostToDevice); cudaMemcpy(d_b, &b, size, cudaMemcpyHostToDevice); // Launch add() kernel on GPU add<<<1,1>>>(d_a, d_b, d_c); // Copy result back to host cudaMemcpy(&c, d_c, size, cudaMemcpyDeviceToHost); // Cleanup cudaFree(d_a); cudaFree(d_b); cudaFree(d_c); return 0; }```
So how do we run code in parallel on the device?

Terminology: each parallel invocation of add() is referred to as a block .Each invocation can refer to its block index using blockIdx.x

then we change the add function:

```__global__ void add(int *a, int *b, int *c) { c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x]; }```

so they change to three arrays,so something has to be changed in main()

maybe we can use function random_ints()

Categories: Programming Tags: Tags:

# CUDA learning(1)–start from hello-world

first comes some concepts:

• Heterogeneous Computing
• Blocks
• Indexing
• Shared memory
• Asynchronous operation
• Handling errors
• Managing devices

Now we have to understand what “Heterogeneous Computing” is.From the 8th page,we can easily realize that it contains device code,host code,parallel code and serial code.And the next content is more intuitive.Data is around CPU and GPU,and codes are execurated in GPU.

(ps: GigaThread™ :this engine provides up to 10x faster context switching compared to previous generation architectures, concurrent kernel execution, and improved thread block scheduling.)

so we add the device code,then the “hello world” program looks like this :
``` #include //do not forget this! __global__ void mykernel(void) {```

``` } int main(void) { mykernel<<<1,1>>>(); printf("Hello World!\n"); return 0; ```

```} ```
__global__: cuda C/C++ keyword,indicates:

• Runs on the device
• Is called from host code

and nvcc separates source code into host and device components(where’s comment nvcc?)

Device functions (e.g. mykernel()) processed by NVIDIA compiler and Host functions (e.g. main()) processed by standard host compiler

(ps:sorry to tell you that for archlinux users, you can install nvidia toolkit by “pacman -S cuda

and maybe you have to restart to use nvcc)

function mykernel does nothing here,and we’ll tell what “<<<1,1>>>” does in a moment.

now I have to accomplish my parallelism learning that was left before:

reference book: CSAPP

(ps: found nothing worth to write now ,maybe I’ll write something in subsequent sections)

Categories: Programming Tags: Tags: