<?php include dirname(__FILE__) . "/pagemaker.php"; putHeader("code"); ?> <p>Written in the PHP programming language for its suitability to process text on the web (i.e., by definition, the Hypertext Preprocessor). The latest version of the toolkit is available from the <a href="https://github.com/atrilla/nlptools">repo</a>.</p> <h2>Design by Contract</h2> <p>Loosely coupled modular design, with orthogonality, reusability and extensibility in mind, to not compromise its future growth. Designing with Contracts is the <a href="http://www.codinghorror.com/blog/files/Pragmatic%20Quick%20Reference.htm">Pragmatic Programmer</a>'s tip 31.</p> <p> Specific preconditions on parameter types are enforced with Type Hinting on objects and arrays, and casts to string, int, bool or float, on primitive types. Therefore, it is of utmost importance to consider the parameter descriptions in the <a href="../doc/html/">documentation</a> and to follow the contracts defined in the interface prototypes. </p> <p> Other preconditions unrelated to type checking are asserted. </p>
<?php include dirname(__FILE__) . "/pagemaker.php"; putHeader("api"); ?> <p> The NLPTools API is a simple JSON over HTTP RESTful web service for natural language processing. It is especially focused on text classification and sentiment analysis. </p> <p> It currently offers the following functionality: <ul> <li>Sentiment analysis of online news media (service named: "sentiment_news"). General-purpose, multiple topics.</li> </ul> </p> <p> Custom development of other domain-specific solutions are <a href="mailto:alex@atrilla.net">available on demand</a>. </p> <h2>Usage</h2> <p> To analyse the sentiment of some text (in English), do a HTTP POST to <i>http://nlptools.atrilla.net/api/</i> with form encoded data containing the following parameters: <ul> <li><b>service</b>: the name of the service, e.g., "sentiment_news".</li>
<?php include dirname(__FILE__) . "/../core/classification/MultinomialNaiveBayes.php"; include dirname(__FILE__) . "/../core/util/feeding/FeedRSS.php"; include dirname(__FILE__) . "/pagemaker.php"; putHeader("Opinion Mining and Sentiment Analysis"); ?> <p>Identifies the semantic orientation, aka polarity, that is expressed in subjective text such as written opinions. Overall, what this task aims to accomplish is sensing and predicting whether a given text shows a <font style='background-color: #90EE90'>positive</font>, <font style='background-color: #FFA07A'>negative</font> or <font style='background-color: #DCDCDC'>neutral</font> sentiment/feeling.</p> <p>In order to produce this system, a Text Classification technique has to be adapted to a given application domain. In this demo, the <a href='http://nlp.cs.swarthmore.edu/semeval/'>SemEval-2007 dataset</a> is of use for training the classifier, and the learnt model is then applied to processing similar world news headlines from The Washington Post:</p> <?php // Prepare classifier $classifier = new MultinomialNaiveBayes(); $classifier->setDatabase("semeval07"); // Prepare data $feeder = new FeedRSS(); $aFeeds = $feeder->getFood("http://feeds.washingtonpost.com/rss/world");
<?php // This is the ABOUT include dirname(__FILE__) . "/pagemaker.php"; putHeader("about"); ?> <p>Text processing framework to analyse Natural Language by performing operations and tasks on corpus data. Hence, this approach focuses on the statistical/quantitative track of Natural Language Processing (NLP).</p> <h2>Related fields</h2> <ul> <li>Computational Linguistics (CL)</li> <li>Corpus Linguistics</li> <li>Information Retrieval</li> <li>Artificial Intelligence (AI), Machine Learning (ML) and Pattern Recognition</li> </ul> <p>The differences among the aforementioned fields related to NLP are a matter of perspective and taste. Nonetheless, NLP is more frequently regarded to be an engineering-oriented approach while CL is rather more associated with theoretical aspects.</p> <h2>Recommended bibliography</h2> <h3>NLP specific</h3>
<?php include dirname(__FILE__) . "/pagemaker.php"; putHeader("appdemos"); ?> <p>Solution-centric approach to Natural Language Processing technology. Some applications that require the processing of Natural Language in textual form are shown hereunder: <ul> <li><a href="omsa.php">Opinion Mining and Sentiment Analysis</a></li> <li><a href="topicid.php">Text Categorisation and Topic/Domain Identification</a></li> </ul> </p> <p>More demos coming soon!</p> <?php putFooter();
<?php include dirname(__FILE__) . "/../core/classification/MultinomialNaiveBayes.php"; include dirname(__FILE__) . "/../core/util/feeding/FeedRSS.php"; include dirname(__FILE__) . "/pagemaker.php"; putHeader("Text Categorisation and Topic/Domain Identification"); ?> <p>Identifies the semantic field of a given text and relates it to its corresponding topic or domain.</p> <p>In order to produce this system, a Text Classification technique has to be adapted to a given set of application domains. In this demo, the <a href='http://kdd.ics.uci.edu/databases/reuters_transcribed/reuters_transcribed.html'>Reuters Transcribed Subset</a> is of use for training the classifier, and the learnt model is then applied to predicting the topic of the most read articles from Reuters:</p> <?php // Prepare classifier $classifier = new MultinomialNaiveBayes(); $classifier->setDatabase("ReutersTranscribedSubset"); // Prepare data $feeder = new FeedRSS(); $aFeeds = $feeder->getFood("http://feeds.reuters.com/reuters/MostRead?format=xml"); foreach ($aFeeds as $feed) { $lab = $classifier->classify($feed["title"]); echo "<p><font color='#808080'>Topic: " . $lab . "</font><br />"; echo " <b>" . $feed["title"] . "</b>" . " - <a href='" . $feed["link"] . "'>Read more</a><br /> " . preg_replace("/<.+>/", "", $feed["desc"]) . "</p>"; } ?>
$title = 'fid'; } else { if (!empty($_GET["tid"])) { $where = "\tAND t.id = '" . intval($_GET['tid']) . "'"; $title = 'tid'; } else { $where = ''; $title = ''; } } } $result = $db->query("\n\tSELECT p.id AS id, p.message AS message, p.posted AS postposted, t.subject AS subject, f.forum_name, c.cat_name \n\tFROM " . $db->prefix . "posts p\n\tLEFT JOIN " . $db->prefix . "topics t \n\tON p.topic_id=t.id \n\tINNER JOIN " . $db->prefix . "forums AS f \n\tON f.id=t.forum_id \n\tLEFT JOIN " . $db->prefix . "categories AS c \n\tON f.cat_id = c.id\n\tLEFT JOIN " . $db->prefix . "forum_perms AS fp \n\tON (\n\t\tfp.forum_id=f.id \n\t\tAND fp.group_id=3\n\t)\n\tWHERE (\n\t\tfp.read_forum IS NULL \n\t\tOR fp.read_forum=1\n\t) \n\t{$where} \n\tORDER BY postposted DESC \n\tLIMIT 0,15\n") or error('Unable to fetch forum posts', __FILE__, __LINE__, $db->error()); $i = 0; while ($cur = $db->fetch_assoc($result)) { if ($i == 0) { putHeader($cur, $title); $i++; } putPost($cur); } putEnd(); // get feed into $feed $feed = ob_get_contents(); ob_end_clean(); // create ETAG (hash of feed) $eTag = '"' . md5($feed) . '"'; header('Etag: ' . $eTag); // compare Etag to what we got if ($eTag == $_SERVER['HTTP_IF_NONE_MATCH']) { header("HTTP/1.0 304 Not Modified"); header('Content-Length: 0');
<?php include dirname(__FILE__) . "/pagemaker.php"; putHeader("guidelines"); ?> <p>Please refer to the <a href="../doc/html/">documentation</a> for all the technical details. Refer to the <a href="api.php">API docs</a> for a more pragmatic approach to NLPTools.</p> <h2>Installation</h2> <p>The toolkit is deployed by directly copying the core wherever it is desired and by making it accessible to the applications. Module dependencies are set relative to their file paths.</p> <p>The documentation has to be produced with Doxygen.</p> <h2>Quick reference</h2> <ul> <li><b><a href='../doc/html/interfaceClassifier.html'>Classifier</a></b> - Predicts the most suitable category label for given textual data.</li> <li><b><a href='../doc/html/interfaceFeeder.html'>Feeder</a></b> - Provides textual data to process.</li> <li><b><a href='../doc/html/interfaceTokeniser.html'>Tokeniser</a></b> - Splits a given text into smaller units called tokens.</li> </ul> <?php putFooter();