Skip to content

pdphuong/SocialCrawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Social Packets crawler

S. Felix Wu, wu@cs.ucdavis.edu
Fredrik Erlandsson, fredrik.erlandsson@bth.se

This crawler consists of two parts, the agent.php that does the actual crawling and a controller (found in contoller/) keeping track of the current crawling status.


Install

The agent is dependent on the Facebook PHP SDK. To install just do a submodule update:

git submodule update --init


Configuration

Most of the time you only need to use the agent.

Create a Facebook application at: https://developers.facebook.com/apps, make sure to fill in offline_access & read_stream under Permissions->Extended Permissions.

Copy config/config-dist.php to config/config.php and fill APPID, APPSEC (from your Facebook application page) & the URL to a running controller.

Usage

run php agent.php token=FACEBOOK_USER_TOKEN
or as a web application http://example.com/agent.php?token=FACEBOOK_USER_TOKEN

To run multiple instances (reccomended) of the agent in one environment use the script bgxgrp.sh as:
bash bgxgrp.sh <#-instances> php agent.php token=FACEBOOK_USER_TOKEN
where <#-instances> should be replaced with the number of threads to run (something between 8-15 is reasonable to not hit Facebook's 600/600 limit).

The FACEBOOK_USER_TOKEN is generated via the graph explorer page https://developers.facebook.com/tools/explorer/ using an user that is said to be over 18 of age to support crawling of all types of pages.

Happy crawling!!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 55.1%
  • JavaScript 33.7%
  • PLpgSQL 4.1%
  • Shell 3.9%
  • CSS 3.2%