예제 #1
0
파일: code.php 프로젝트: cemoulto/nlptools
<?php

include dirname(__FILE__) . "/pagemaker.php";
putHeader("code");
?>

<p>Written in the PHP programming language for its suitability to process
text on the web (i.e., by definition, the Hypertext Preprocessor). The 
latest version of the toolkit is available from the 
<a href="https://github.com/atrilla/nlptools">repo</a>.</p>

<h2>Design by Contract</h2>

<p>Loosely coupled modular design, with orthogonality, reusability and 
extensibility in mind, to not compromise its future growth. Designing
with Contracts is the 
<a href="http://www.codinghorror.com/blog/files/Pragmatic%20Quick%20Reference.htm">Pragmatic Programmer</a>'s tip 31.</p>

<p>
Specific preconditions on parameter types are enforced with Type Hinting on
objects and arrays, and casts to string, int, bool or float, on primitive
types. Therefore, it is of utmost importance to consider the parameter
descriptions in the 
<a href="../doc/html/">documentation</a> 
and to follow the contracts defined in the interface prototypes.
</p>

<p>
Other preconditions unrelated to type checking are asserted.
</p>
예제 #2
0
파일: api.php 프로젝트: cemoulto/nlptools
<?php

include dirname(__FILE__) . "/pagemaker.php";
putHeader("api");
?>

<p>
The NLPTools API is a simple JSON over HTTP RESTful web service for
natural language processing. It is especially focused on text 
classification and sentiment analysis.
</p>

<p>
It currently offers the following functionality:
<ul>
    <li>Sentiment analysis of online news media (service named:
        "sentiment_news"). General-purpose, multiple topics.</li>
</ul>
</p>
<p>
Custom development of other domain-specific solutions are 
<a href="mailto:alex@atrilla.net">available on demand</a>.
</p>

<h2>Usage</h2>
<p>
To analyse the sentiment of some text (in English), do a HTTP POST to
<i>http://nlptools.atrilla.net/api/</i> with form
encoded data containing the following parameters:
<ul>
    <li><b>service</b>: the name of the service, e.g., "sentiment_news".</li>
예제 #3
0
파일: omsa.php 프로젝트: cemoulto/nlptools
<?php

include dirname(__FILE__) . "/../core/classification/MultinomialNaiveBayes.php";
include dirname(__FILE__) . "/../core/util/feeding/FeedRSS.php";
include dirname(__FILE__) . "/pagemaker.php";
putHeader("Opinion Mining and Sentiment Analysis");
?>

<p>Identifies the semantic orientation, aka polarity, that is expressed
in subjective text such as written opinions. 
Overall, what this task aims to accomplish is sensing and
predicting whether a given text shows a 
<font style='background-color: #90EE90'>positive</font>, 
<font style='background-color: #FFA07A'>negative</font> or
<font style='background-color: #DCDCDC'>neutral</font>
sentiment/feeling.</p>

<p>In order to produce this system, a Text Classification technique
has to be adapted to a given application domain. In this demo, the
<a href='http://nlp.cs.swarthmore.edu/semeval/'>SemEval-2007 dataset</a>
is of use for training the classifier, and the learnt model is then 
applied to processing similar world news headlines from 
The Washington Post:</p>

<?php 
// Prepare classifier
$classifier = new MultinomialNaiveBayes();
$classifier->setDatabase("semeval07");
// Prepare data
$feeder = new FeedRSS();
$aFeeds = $feeder->getFood("http://feeds.washingtonpost.com/rss/world");
예제 #4
0
파일: index.php 프로젝트: cemoulto/nlptools
<?php

// This is the ABOUT
include dirname(__FILE__) . "/pagemaker.php";
putHeader("about");
?>

<p>Text processing framework to analyse Natural Language by performing
operations and tasks on corpus data. Hence, this approach focuses on 
the statistical/quantitative track of Natural Language Processing 
(NLP).</p>

<h2>Related fields</h2>

<ul>
<li>Computational Linguistics (CL)</li>
<li>Corpus Linguistics</li>
<li>Information Retrieval</li>
<li>Artificial Intelligence (AI), Machine Learning (ML) and Pattern 
Recognition</li>
</ul>

<p>The differences among the aforementioned fields related to NLP are a
matter of perspective and taste. Nonetheless, NLP is more frequently
regarded to be an engineering-oriented approach while CL is rather more
associated with theoretical aspects.</p>

<h2>Recommended bibliography</h2>

<h3>NLP specific</h3>
예제 #5
0
<?php

include dirname(__FILE__) . "/pagemaker.php";
putHeader("appdemos");
?>

<p>Solution-centric approach to Natural Language Processing 
technology. Some applications that require the processing 
of Natural Language in textual form are shown hereunder:
<ul>
    <li><a href="omsa.php">Opinion Mining and Sentiment Analysis</a></li>
    <li><a href="topicid.php">Text Categorisation and Topic/Domain Identification</a></li>
</ul>
</p>

<p>More demos coming soon!</p>

<?php 
putFooter();
예제 #6
0
<?php

include dirname(__FILE__) . "/../core/classification/MultinomialNaiveBayes.php";
include dirname(__FILE__) . "/../core/util/feeding/FeedRSS.php";
include dirname(__FILE__) . "/pagemaker.php";
putHeader("Text Categorisation and Topic/Domain Identification");
?>

<p>Identifies the semantic field of a given text and relates it
to its corresponding topic or domain.</p>

<p>In order to produce this system, a Text Classification technique
has to be adapted to a given set of application domains. In this demo, 
the <a href='http://kdd.ics.uci.edu/databases/reuters_transcribed/reuters_transcribed.html'>Reuters Transcribed Subset</a>
is of use for training the classifier, and the learnt model is then 
applied to predicting the topic of the most read articles from Reuters:</p>

<?php 
// Prepare classifier
$classifier = new MultinomialNaiveBayes();
$classifier->setDatabase("ReutersTranscribedSubset");
// Prepare data
$feeder = new FeedRSS();
$aFeeds = $feeder->getFood("http://feeds.reuters.com/reuters/MostRead?format=xml");
foreach ($aFeeds as $feed) {
    $lab = $classifier->classify($feed["title"]);
    echo "<p><font color='#808080'>Topic: " . $lab . "</font><br />";
    echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>" . $feed["title"] . "</b>" . " - <a href='" . $feed["link"] . "'>Read more</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;" . preg_replace("/<.+>/", "", $feed["desc"]) . "</p>";
}
?>
예제 #7
0
파일: rss.php 프로젝트: neofutur/MyBestBB
        $title = 'fid';
    } else {
        if (!empty($_GET["tid"])) {
            $where = "\tAND t.id = '" . intval($_GET['tid']) . "'";
            $title = 'tid';
        } else {
            $where = '';
            $title = '';
        }
    }
}
$result = $db->query("\n\tSELECT p.id AS id, p.message AS message, p.posted AS postposted, t.subject AS subject, f.forum_name, c.cat_name \n\tFROM " . $db->prefix . "posts p\n\tLEFT JOIN " . $db->prefix . "topics t \n\tON p.topic_id=t.id \n\tINNER JOIN " . $db->prefix . "forums AS f \n\tON f.id=t.forum_id \n\tLEFT JOIN " . $db->prefix . "categories AS c \n\tON f.cat_id = c.id\n\tLEFT JOIN " . $db->prefix . "forum_perms AS fp \n\tON (\n\t\tfp.forum_id=f.id \n\t\tAND fp.group_id=3\n\t)\n\tWHERE (\n\t\tfp.read_forum IS NULL \n\t\tOR fp.read_forum=1\n\t) \n\t{$where} \n\tORDER BY postposted DESC \n\tLIMIT 0,15\n") or error('Unable to fetch forum posts', __FILE__, __LINE__, $db->error());
$i = 0;
while ($cur = $db->fetch_assoc($result)) {
    if ($i == 0) {
        putHeader($cur, $title);
        $i++;
    }
    putPost($cur);
}
putEnd();
// get feed into $feed
$feed = ob_get_contents();
ob_end_clean();
// create ETAG (hash of feed)
$eTag = '"' . md5($feed) . '"';
header('Etag: ' . $eTag);
// compare Etag to what we got
if ($eTag == $_SERVER['HTTP_IF_NONE_MATCH']) {
    header("HTTP/1.0 304 Not Modified");
    header('Content-Length: 0');
예제 #8
0
<?php

include dirname(__FILE__) . "/pagemaker.php";
putHeader("guidelines");
?>

<p>Please refer to the <a href="../doc/html/">documentation</a> for all
the technical details. Refer to the <a href="api.php">API docs</a> for
a more pragmatic approach to NLPTools.</p>

<h2>Installation</h2>
<p>The toolkit is deployed by directly copying the core wherever it is 
desired and by making it accessible to the applications. 
Module dependencies are set relative to their file paths.</p>

<p>The documentation has to be produced with Doxygen.</p>

<h2>Quick reference</h2>

<ul>
    <li><b><a href='../doc/html/interfaceClassifier.html'>Classifier</a></b> - Predicts the most suitable category label for given textual data.</li>
    <li><b><a href='../doc/html/interfaceFeeder.html'>Feeder</a></b> - Provides textual data to process.</li>
    <li><b><a href='../doc/html/interfaceTokeniser.html'>Tokeniser</a></b> - Splits a given text into smaller units called tokens.</li>
</ul>

<?php 
putFooter();