Reading Literature With Computers

Making Progress!

With less than three weeks left of the semester, I’m proud to say that I’ve made some progress on my final project. So, anyone following these posts will know that my final project for Reading Literature With Computers involves computational analysis of science fiction. Originally I wanted to use this resource’s data to conduct that analysis, but unfortunately do not have the time nor coding experience to try and obtain the dataset used in its creation. I then tried to use another corpus, only to have that fail due to the massive size of the file I was trying to repeatedly download and upload.

By the start of this week, I was desperate. Everything I wanted to use was not working. And since I hadn’t found a usable corpus, I hadn’t made any progress on the actual project. This lead to desperate measures.


I did a completely random search for “science fiction corpus” and came across a Reddit post recommending someone to Baen Books’ Free Ebook Library. This turned out to be the big break I needed, because I immediately downloaded all 75 ebooks in their free library. And the best part? Baen Books caters to science fiction and fantasy authors, making it the perfect place for me to obtain data.

Since then I’ve been looking at the data in various ways. I uploaded all the documents to Voyant Tools to try and get an idea of what I want to do with it. I also created a CSV files containing the title, authors, publication year, and word count of every book for easy reference. I had to do away with a few of the titles, simply because they had between 7-15 authors apiece, making the process of examining those harder than anticipated. So I now have a corpus of 47 books written by various science fiction authors. But that’s okay, because I still consider that to be a good number for what I want to do.

So far I have been looking into Ben Blatt’s Fingerprinting method, which Dr. Whalen created his own version of for us to view. I talked to him at length about this because I ran into some issues, but now it’s all working! I have plans to maybe translate the fingerprinting data into Highcharts to make the final project more colorful and interactive, but so far that’s all I have!

