Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Evaluating Binary n-gram Analysis For Authorship Attribution
Mark Carman, Helen Ashman
Pages - 60 - 91     |    Revised - 31-10-2019     |    Published - 01-12-2019
Volume - 10   Issue - 4    |    Publication Date - December 2019  Table of Contents
Authorship Attribution, Binary n-gram, Stop Word, Cross-domain, Cross-genre.
Authorship attribution techniques focus on characters and words. However the inclusion of words with meaning may complicate authorship attribution. Using only function words provides good authorship attribution with semantic or character n-gram analyses but it is not yet known whether it improves binary n-gram analyses.

The literature mostly reports on authorship attribution at word or character level. Binary n-grams interpret text as binary. Previous work with binary n-grams assessed authorship attribution of full texts only. This paper evaluates binary n-gram authorship attribution over text stripped of content words as well as over a range of cross-domain scenarios.

This paper reports a sequence of experiments. First the binary n-gram analysis method is directly compared with character n-grams for authorship attribution. Then it is evaluated over three forms of input text, full text, stop words and function words only, and content words only. Subsequently, it was tested over cross-domain and cross-genre texts, as well as multiple-author texts.
1 Google Scholar 
2 refSeek 
3 Doc Player 
4 Scribd 
1 Judges 5:5-6. Holy Bible. Authorised King James Version.
2 T. Merriam. "Neural Computation in Stylometry II: An Application to the Works of Shakespeare and Marlowe". Literary and Linguistic Computing, vol. 9 (1), pp. 1-6. 1994.
3 R. Matthews. "Neural Computation in Stylometry I: An Application to the Works of Shakespeare and Fletcher". Literary and Linguistic Computing, vol. 8 (4), pp. 203-210. 1993.
4 D. Lowe and R. Matthews. "Shakespeare vs. Fletcher: A stylometric analysis by radial basis functions". Computers and the Humanities, vol. 29 (6), pp. 449-461. 1995.
5 A. Hamilton, J. Madison, J. Jay and J. Rakove J. "The Federalist". Bedford/St. Martin's, Boston. 2003
6 E. Stamatatos. "A survey of modern authorship attribution methods". Journal of the American Society for Information Science and Technologies, vol. 60 (3), pp. 538-556. 2009.
7 P. Juola. "Authorship Attribution". Foundations and Trends in Information Retrieval, vol. 1, (3), pp. 233-334. 2006.
8 H. Fouche Gaines. H. Cryptanalysis. Dover, New York. 1956.
9 M. Kestemont. "Function Words in Authorship Attribution: From Black Magic to Theory?". Proc. 3rd workshop on Computational Linguistics for Literature, pp. 59-66, Gothenburg, Sweden, ACL, https://www.aclweb.org/anthology/W14-0908 2014,
10 E. Stamatatos. "On the robustness of authorship attribution based on character n-gram features". (Symposium: Authorship Attribution Workshop). Journal of Law and Policy, vol. 21, pp. 421-439. 2013.
11 A. Rocha, W. Scheirer, C. Forstall, T. Cavalcante, Theophilo, B. Shen, A. Carvalho and E. Stamatatos. "Authorship Attribution for Social Media Forensics". IEEE Transactions on Information Forensics and Security, vol. 12 (1), pp. 5-33. 2017.
12 J. Peng, S. Detchon, K-KR. Choo and H. Ashman. "Astroturfing Detection in Social Media: A Binary N-gram Based Approach". Concurrency and Computation: Practice and Experience, doi: 10.1002/cpe.4013. 2016.
13 J. Peng. "Authorship Attribution with Binary N-gram Analysis for Detecting Astroturfing in Social Media". PhD thesis, University of South Australia, Australia. 2017.
14 HDJ. Coupe. "Non-Symbolic Fragmentation Cryptographic Algorithms". PhD thesis, University of Nottingham, UK. 2005.
15 U. Sapkota, S. Bethard, M. Montes-y-G mez and T. Solorio. "Not all character n-grams are created equal: A study in authorship attribution". Proc. Annual Conf. North Amer. Chapter ACL Human Lang. Technologies. https://www.aclweb.org/anthology/N15-1010, pp. 93-102. 2015.
16 K. Sundararajan and D. Woodard. "What constitutes 'style' in authorship attribution?". Proc. 27th Int. Conf. on Computational Linguistics. Assoc. Computational Linguistics. pp. 2814–2822, https://www.aclweb.org/anthology/C18-1238. 2018.
17 J. Peng, K-KR. Choo and Ashman H. "Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles". Journal of Networked and Computer Applications, vol. 70, pp. 171-182. 2016.
18 B. Kjell, W. Woods and O. Frieder. "Discrimination of authorship using visualization". Information Processing and Management, vol. 30 (1), pp. 141-150.
19 US Congress. "The Federalist Papers". Congress.gov Resources. (accessed 2019/09/10), 2017. https://www.congress.gov/resources/display/content/The+Federalist+Papers.
20 V. Kešelj, F. Peng, N. Cercone and C. Thomas. "N-gram-based author profiles for authorship attribution". Proc. of the Pacific association for computational linguistics, Vol. 3, pp/ 255-264). 2003.
21 R. Galbraith. "About Robert Galbraith". 2019/07/25, http://robert-galbraith.com/about/. 2017. (accessed 2019/09/10).
22 D. Doyle. "Stopwords" (English) (accessed 2019/09/10), http://www.ranks.nl/stopwords. 2017.
23 L. Milos. "Playing the Pronoun Game: Are All of The Hobbit’s Dwarves Male?". http://middleearthnews.com/2018/01/09/playing-the- pronoun-game-are-all-of-the-hobbits-dwarves-male/ (accessed 2019/09/10). 2018.
24 B. Blatt. Nabokov's favourite word is Mauve. Simon and Schuster. 2017.
25 J. Rowling, J. Tiffany J and J. Thorne. Harry Potter and the cursed child. Little & Brown, London. 2016.
26 J. Rowling. Harry Potter and the Half-Blood Prince. Pottermore, England. 2012.
27 T. Clancy. Locked On, by Tom Clancy with Mark Greaney. (accessed 2019/09/10), https://tomclancy.com/product/locked-on/. 2017.
28 Alexa. "Facebook.com Traffic, Demographics and Competitors". (accessed 2019/09/10), 2019. https://www.alexa.com/siteinfo/facebook.com.
29 Alexa. "Twitter.com Traffic, Demographics and Competitors". (accessed 2019/09/10), 2019. https://www.alexa.com/siteinfo/twitter.com.
30 S. Rogers. "The Boston Bombing: How journalists used Twitter to tell the story". (accessed 2019/09/10), https://blog.twitter.com/official/en_us/a/2013/the-boston-bombing-how-journalists-used-twitter-to-tell-the-story.html. 2017.
31 S. Walker. "Salutin' Putin: inside a Russian troll house". (accessed 2019/09/10), https://www.theguardian.com/world/2015/apr/02/putin-kremlin-inside- russian-troll-house. 2017.
Mr. Mark Carman
School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA 5095, Australia - Australia
Dr. Helen Ashman
School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA 5095, Australia - Australia