#JakkuFirstOrder Task-3: Authorship identification: It determines the likelihood of a piece of writing to be produced by a particular author by examining other writings by that author. Approach: The approach I followed here is naive one. There are many other methods to solve this.it could have been made more accurate, by considering other factors, such as, using api task 1, if we knew the posting day, time, we could crossmatch it with the most probable times of posting by the user. The steps I followed here are: 1. Fetched all texts from timeline of a user and counted frequency of each words. 2. Now, given a tweet, first I counted probability of each word in that tweet and probability of that word in whole timeline and multiplied this probabilities and summed that for all words in the tweet. N.B: The task-3 is case-sensitive. In case, u get probability 0, please check the cases of the words.
Reference: http://arxiv.org/ftp/arxiv/papers/1401/1401.6118.pdf