*** AGGIORNAMENTO 21 ottobre 2010 ***
corretto un bug nella classificazione
Ho sviluppato, in base alle mie necessità in machine learning un semplice programmetto in C++ per il calcolo di fmeasure, accuracy, precision e recall di un file di testo contente due colonne:
La prima colonna rappresenta la classe reale (suppongo che il negativo sia <=0, il positivo sia >0).
La seconda colonna rappresenta la classe predetta dal nostro qualsivoglia predittore (positivo >0, negativo <=0)
Il programma si può utilizzare direttamente in pipe unix, ad esempio:
cat test.txt | fmeasure
oppure passandogli l’input:
./fmeasure < test.txt
e stampa le statistiche delle colonne d’interesse, ad esempio:
Total examples = 40320
TP = 10521 FP = 4067
TN = 1163 FN = 24569
Precision = 0.721209
Recall = 0.299829
Accuracy = 0.870288
FMeasure = 0.423568
con chiara indicazione sul significato dei simboli, per info leggere qui: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
===========================
I recently developed a simple program to compute accuracy, precision, recall, fmeasure of a dataset.
First column is the real class, second column is the predicted class.
For binary classification task, I suppose that a value <=0 is negative, else >0 is positive.
You can use the program in a unix shell as follow:
cat test.txt | fmeasure
or also in this way
./fmeasure < test.txt
It prints out to stdout the following log
Total examples = 40320
TP = 10521 FP = 4067
TN = 1163 FN = 24569
Precision = 0.721209
Recall = 0.299829
Accuracy = 0.870288
FMeasure = 0.423568
For info read here : http://en.wikipedia.org/wiki/Receiver_operating_characteristic
Compile it g++ -O3 fmeasure.cpp -o fmeasure
Download it here:
/* PARAMETERS CALCULATOR
* Carlo Nicolini – September 2010
* To use it, cat a file and redirect the output to it, for example:
* cat test.txt | fmeasure
* or also:
* ./fmeasure < test.txt
* First column are reality values, second column are predicted values
*
*/
#include <iostream>
#include <sstream>
#include <string>
#include <stdexcept>
using namespace std;
int main() {
// Don’t sync C++ and C I/O
ios_base::sync_with_stdio(false);
char line[1];
double label=0, margin=0;
double precision=0,recall=0, accuracy=0;
double truePositives=0, trueNegatives=0, falsePositives=0, falseNegatives=0;
double totalLines=0;
while( cin.getline(line,100) )
{
cin >> label >> margin;
if (label >0.0 )
{
if (margin > 0.0 )
truePositives+=1;
else
falseNegatives+=1;
}
else
{
if (margin > 0.0 )
falsePositives+=1;
else
trueNegatives+=1;
}
totalLines+=1;
}
precision = truePositives/ ( truePositives + falsePositives );
recall = truePositives/ ( truePositives + falseNegatives );
accuracy = (truePositives+falseNegatives)/ ( totalLines );
cout << “Total examples = ” << totalLines << endl;
cout << “Precision = ” << precision << endl;
cout << “Recall = ” << recall << endl;
cout << “Accuracy = ” << accuracy << endl;
cout << “FMeasure = ” << 2*(precision*recall)/(precision+recall) << endl;
return 0;
}
Vai articolo originale: http://carlonicolini.altervista.org/index.php/Informatica-e-Web/Notizie-dal-web/Precision-recall-fmeasure-accuracy-calculator.html