GitHub - mix376/scrapeAds: scrape advertisements from craigslist and worldjournal.com and analyze ad data

scrapeAds

It is my personal hobby project and it is just for fun.

What is it?

Build a web service to periodically scrape advertisements from some job searching websites, like craigslist. Process the data. Store it. Analyze it. Show it.

Why will it be helpful?

We can do analytics after collecting enough job posts data over time. for example,

I want to find out the trend of PHP/Python development positions(so we know the job market better with our first-handed data);
Send alert to me when positions with certain key words are open;
I want to find out repeated ads;

Or possibly create a anonymous job review service based on the jobs posted.

Tech stacks:

I choose the following stacks because I want to practice and be good at them.

web server: DigitalOcean VPS
backend: PHP YII2 framework. composer, mysql, memcached
scrape library: Goutte
how to measure string similarity? similar_text() or levenshtein() or Cosine similarity or more efficient algorithm?
frontend: bootstrap, jquery, maybe backbone.js

Current Status of 11/29/15:

console app is deployed to DigitalOcean droplet, It will crawl every hour for the latest ads.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
commands		commands
config		config
controllers		controllers
dev_notes		dev_notes
mail/layouts		mail/layouts
migrations		migrations
models		models
runtime		runtime
tests		tests
views		views
web		web
.bowerrc		.bowerrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
README_YII2.md		README_YII2.md
Vagrantfile		Vagrantfile
composer.json		composer.json
composer.lock		composer.lock
requirements.php		requirements.php
yii		yii
yii.bat		yii.bat

License

mix376/scrapeAds

Folders and files

Latest commit

History

Repository files navigation

scrapeAds

It is my personal hobby project and it is just for fun.

What is it?

Why will it be helpful?

Tech stacks:

Current Status of 11/29/15:

About

Resources

License

Stars

Watchers

Forks

Languages