Regularization for Naive Bayes

Question

Xiwei She on 10 Jan 2017

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/319756-regularization-for-naive-bayes

Edited: Xiwei She on 10 Jan 2017

I have a data which the number of features is much more than the number of examples, let's say input X is a 50 * 5000 matrix, 50 is the number of examples and 5000 is the number of features. And Y is the label with two classes 1 or 0. Now I want to use Naive Bayes classifier to make classification of this data. Because the features is much more than the examples, so the result is very poor because of the over-fitting. I already successfully tried lasso algorithm on this data and made pretty good classification result, now I want to compare it with Naive Bayes as a baseline. But the performance of NB is too bad to even make persuasive comparison. So I'm wondering whether I can add regularization to Naive Bayes like the lasso does and overcome this over-fitting problem. Below is my Naive Bayes Code, can anyone help me to revise this and let it had the regularization function? Thanks a lot!

X = rand(50, 5000); % This is my train/test sample matrix
Y = randi([0 1], 50, 1); % This is my train/test label vector
CrossValSet = cvpartition(Y,'KFold',3); % 3-fold Cross Validation
% Training set
Train_sample = X(training(CrossValSet,1),:);
Train_label = Y(training(CrossValSet,1));
% Test set
% Test_sample = X(training(CrossValSet,1),:);
Test_sample = X(test(CrossValSet,1),:);
% Test_label = Y(training(CrossValSet,1));
Test_label = Y(test(CrossValSet,1),:); 
Class_num = length(unique(Train_label)); % Classes Pool - 1 and 0
Feature_num = size(Train_sample,2); 
Para_mean =   cell(1,Class_num);%Mean for each feature and class 
Para_dev = cell(1,Class_num);%Dev for each feature and class 
Sample_byclass = cell(1,Class_num);%Reorder the data set by class 
Prior_prob = zeros(1,Class_num);%Prior probability of each class 
%%Algorithm Processing
% Prior
for i=1:1:size(Train_sample, 1)  
    Sample_byclass{1,Train_label(i,1)+1} = [Sample_byclass{1,Train_label(i,1)+1}; Train_sample(i,:)]; 
    Prior_prob(1,Train_label(i,1)+1) = Prior_prob(1,Train_label(i,1)+1) + 1; 
end 
Prior_prob = Prior_prob/size(Train_sample,1); % Prior probability 
% Parameters from training set
for i=1:1:Class_num 
     mu = mean(Sample_byclass{1,i}); 
     sigma = std(Sample_byclass{1,i});    
     Para_mean{1,i} = mu; 
     Para_dev{1,i} = sigma; 
end 
% Get predicted output for test set
predict = []; 
for i = 1:size(Test_sample)   %length(Test_sample) 
     prob = log(Prior_prob); 
     likelihood = 0; 
     for j = 1:Class_num 
         for k = 1:1:Feature_num  % Adjust sigma if it's zero
             if Para_dev{1,j}(1,k) == 0 
                 Para_dev{1,j}(1,k) = 0.1667; 
             end 
             % Log - Gaussian
             likelihood = likelihood - ( Test_sample(i,k) - Para_mean{1,j}(1,k))^2 / ( 2 * Para_dev{1,j}(1,k)^2 )   - log(Para_dev{1,j}(1,k)); 
           end  % For every Class
           prob(1,j) = prob(1,j)+likelihood; 
       end
       [value index] = max(prob); 
       predict = [predict ; index-1]; 
  end 
  accuracy = length(find(predict - Test_label ==0))/length(Test_label);

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Regularization for Naive Bayes

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Regularization for Naive Bayes

0 Comments Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments