OUR First Blog – Analyzing Literary Texts Using the Digital Humanities’ Tools

Retrieved from: https://digitalhumanitiesnow.org/2016/05/editors-choice-round-up-of-responses-to-the-la-neoliberal-tools-and-archives/

Hello everyone! 🙂 

Ready for this….welcome to OUR blog….BOOM!!

That’s right, Christiano Faluh, Noureddin Maarouf, and I have joined forces to create the ultimate blogging team!

From now on, all our blogs and projects will be a joint effort of the three of us, so enjoy the new and improved content, and stay tuned as we progress through our English 256B Digital Humanities journey!

In our last blog, we each wrote, in our own perspective, about our experiences with digital humanities and how it has affected us or changed the way we do things perhaps. This blog, however, will be about using the different tools available to us online in order to analyse and interpret different texts, to answer a number of research questions we prepared.

Two of those tools, the two we will be focusing on, are CLiC and Voyant Tools. We will cover each more in depth later in the blog, but as a quick summary, these tools will help us analyse our corpus (our collection of texts), by providing us with information such as word count, word frequency, word trends throughout the texts, etc..

The corpus we will be using consists of children’s/young adult’s literature. We will be focusing on three main books: J.M. Barrie’s Peter Pan, Treasure Island by Robert Louis Stevenson, and Coral Island: A Tale of the Pacific Ocean by R. M. Ballantyne.

We will be using the same three books for both CLiC and VT, in order to find out the differences between the two tools, and how each tool provides us with different information, as well as the advantage/disadvantages of the tools when compared to one another.

We chose this corpus as Children’s Literature is a genre of writing we are all familiar with and even might still enjoy, as it provides us with a sense of joy and nostalgia from when we were kids. We also chose these three texts as they all have the theme of islands/pirates/treasure and are somewhat related. We did this in order to see how a certain group of texts of similar nature contain (or don’t contain) similarities in word choice or even word frequency and trends.

Something you can do while you read this blog in order to further immerse yourself with us, is using these tools yourself in order to compare the writing styles of each of us individually from our 1st blog with the writing style of this blog, as all three of us worked on it together.

For now, though, let’s get to analyzing!!

CLiC was the first tool of the two, that we were introduced to in one of our recent class sessions. CLiC refers to Corpus Linguistics in Context, and is a tool that uses computer-assisted methods to study literary texts found from within a certain existing corpus, which in turn may give readers new ideas on how fictional characters may be perceived, based on certain parameters such as word count and frequency which takes an overall look at the whole text (distant reading), or by taking a look at certain quotes, non-quotes or phrases which gives a more direct look at the context surrounding the words studied (close reading). This can be very useful for many reasons; for example the frequency of a word used for a certain character such as ‘captain,’ or ‘boy,’ may help identify how relevant these characters are, based on how many times they have appeared in the text.

Retrieved from http://clic.bham.ac.uk/

The welcome page as seen above, provides a brief description on what CLiC is, a citation reference in case you would like to use CLiC, an acknowledgments section, acknowledging the work of those involved that helped make CLiC and finally a toolbar on the right side of the page that provides us with multiple options to help conduct our close/distant reading. The toolbar includes:

  1. Concordance: In concordance, the immediate context in which a particular word occurs in, can be analyzed.
  2. Subsets: This also involves the analysis of certain phrases in immediate contexts.
  3. Clusters: This is similar to keywords but rather than focusing on just one, focuses on a group of words and a single corpus.
  4. Keywords: By definition this refers to words whose occurrence in a text is significantly high compared to that of another reference corpus. In this option we can help identify keywords used in the text compared to another and derive conclusions from it; moreover, it has a word count (n-gram) that allows us to choose the amount of words we want.
  5. Counts: This tab lists information about individual as well as all books of a corpus, such as book title, number of chapter, number of words…
  6. Texts: In texts you are free to navigate through the entire text of full books, where you are free to do as you please with whatever it is you are searching for.

Now that we were familiar with CLiC and what it does, we chose a corpus to work on, in order to see what findings we may achieve. As stated earlier we chose one of the available corpora on CLiC: child literature. CLiC also contains three other 19th century corpora: ‘Charles Dickens’s novels,’ ‘19th century reference corpus’ and ‘additional requested texts;’ in addition, it contains the corpus ‘African American writers 1892-1912.’ The books we chose as mentioned earlier were: J.M. Barrie’s Peter Pan, Treasure Island by Robert Louis Stevenson, and Coral Island: A Tale of the Pacific Ocean by R. M. Ballantyne.

The initial research questions asked were:

  1. What were the most frequent words used in each of the texts and what do they mean?
  2. How are these words used in different parts of the texts and in different contexts (close reading)?
  3. How do our initial findings compare the texts with respect to each other and how do the different authors use different styles in similar genres?

Our research questions arose after we had initially experimented in class on the book ‘Treasure Island,’ where we delved into the keywords section. For CLiC, we mainly used the keywords tab, as well as the concordance tab for distant and close reading respectively.

Looking at the keywords used in the book ‘Treasure Island,’ we found that the words captain, Jim, squire, silver, doctor, cap’n, trelawney, Smollett and Morgan were the most frequently used. This gave us a sense of who the main characters were and what it was they were possibly seeking. Next, we decided to change the keyword count to 2-grams; this yielded the words ‘captain Smollett’ together which revealed that ‘captain,’ ‘cap’n’ (or so we initially thought. See results in concordance) and ‘Smollett’ were the same person. The same was made clear with the words ‘doctor’ and ‘Livesy.’ This as we understood, referred to distant reading, and despite their being almost no context at all, helped make things clearer in our understanding of who the characters were, and what status they withheld. From the keywords used, we also developed the sense of an existing  clear hierarchical scheme in terms of the characters status, which seemed relevant to the story.

Retrieved from http://clic.bham.ac.uk/

Next we decided to look at the concordance of these keywords to provide ourselves with some context (close reading) in order to better understand the story and its characters. One interesting thing we discovered was that the word silver (a frequently used keyword) actually referred to a person as opposed to our initial thought of it being the asset they were seeking (treasure), as well as the title of the book. This we believe helped us dodge a huge misconception in understanding what the story was about. In addition, the word ‘cap’n’ not only referred to Smollett but also to Trelawney which identify him as another captain. This clearly shows how close reading and context can help in better understanding the story, and how distant reading can be used as a preliminary tool for identifying a certain question.

The same process was carried out for both Peter Pan and Coral Island: A Tale of the Pacific Ocean. Again what we noticed in Peter Pan after our keywords search was that the most frequent words used were names, which made it easy to identify who the main characters supposedly (based on our results) were. Surprisingly though considering the similar themes of adventure and sailing between Peter Pan and Treasure Island, the word captain was nowhere to be found! In addition, as opposed to Treasure Island, there was no  sense of any hierarchical scheme in terms of the characters status, but rather describing the characters’ relevance and status in the story through the frequency of times their names were used; there was a sense of characterizing a set of characters through groups though. This was made apparent when we changed to 2-gram, where the most common were ‘the pirates’ and ‘the redskins.’ Concordance further reinforced this idea through distinctly identifying the groups’ actions.

Retrieved from http://clic.bham.ac.uk/

As for Coral Island: A Tale of the Pacific Ocean, the keywords found were less of the characters of the story, and more of words describing certain contexts such as ‘sea,’ ‘coral’ and island. This gives the impression that the surrounding context is more relevant to the story as opposed to the other two texts where the surroundings seemed to appear in transition as a part of the events happening between the characters in the stories. The word captain like Treasure Island was also frequently used. No clear patterns were identified in 2-gram. Concordance helped reinforce the idea of identifying the surrounding context as a key role in understanding the story behind the text.

These were the results we gathered and compared to each other on CLiC. Next as expected, we moved on to Voyant to see how our results compared.

As mentioned earlier, we will be using the same three texts we used for CLiC in order to compare the two tools and inspect each one’s pros and cons.

Taken from: https://voyant-tools.org/

Voyant Tools, or VT, is a “digital text-mining tool” that allows us to find and identify different patterns and word frequencies/distributions that would be hard to do when close reading. It allows its users to easily view their results in a series of different visualizations and texts and is quite user-friendly and very easy to learn. A full tutorial of VT and all its tools is available for free on: ‘https://voyant-tools.org/docs/#!/guide/start

When you first open the VT website, you are prompted to either upload files from your computer, or to paste in links of the texts. As long as the texts we were using were published before 1924, we had no issues, and so, using ‘www.gutenberg.org’, we found links to the three texts we chose. After inserting the links into the program, you click ‘Reveal’, and the magic happens.

A window with 5 main sections appears, from top left to bottom right respectively: Cirrus, Reader, Trends, Summary, and Contexts.

Below is an overview of the window you get:

Taken from: https://voyant-tools.org/

Before we get into the different tools in depth, the research questions we will be using these tools to answer are as follows:

  1. Most frequent words used in each of the texts
  2. How these words are used in different parts of the texts and in different contexts (close reading)
  3. We will also be comparing these texts to one another, to analyse how the different authors use different styles in similar genres

Now, to begin with, the first tool we see and work with, is Cirrus. It is basically an interactive image of the different words used in the text as well as their frequencies. The bigger the text appears to be, the more frequently it is used. The scale of the image can be changed as we like, varying the amount of words that appear.

Cirrus immediately helps us answer our first question by showing us the frequencies of the words in each of the texts. Using the tool, we are also able to change the options to include/exclude certain words or to add stop words.

Below are the individual Cirrus results of Peter Pan, Treasure Island, and Coral Island respectively, from left to right, and then the Cirrus results of the entire corpus below them:

Retrieved from: https://voyant-tools.org/

From the results of the corpus as a whole we can see that some of the most used words are: ‘captain’, ’children’, ’island’, ’sea’, as was expected of children’s books relating to pirates and treasure, while the results of the texts individually do not overlap as much as we would’ve thought, with Coral Island and Treasure Island having the most in common.

The next main tool available is the Reader, which is basically just a window to read over the different texts of the corpus, and when we hover over words it shows their frequency as well. We can use the Cirrus tool coupled with the Reader in order to get a closer look at the words that are repeated, in order to analyse them depending on their context – close reading.

Also, to help us with close reading, as we alternate between close and distant, is the TermsBerry tool. This is a tool that shows a number of words throughout the corpus (word count chosen by us) in bubbles, and as you hover over a word, it shows you a number of different words that are found in proximity to our word. This can help us better understand the context the word we’re looking at is used in, and in addition to showing us other words, it also shows us how this word co-occurs, meaning that we can see how frequently this word is repeated in a short time. The darker the bubble is highlighted, the closer these words occur/co-occur.

This is shown below:

Taken from: https://voyant-tools.org/

From the above we are able to dive deeper into the context surrounding each word and find out how different words apply different themes based on the context they’re used in.

A tool that also helps us with doing this, but for phrases instead of words, is the Phrases tool. This is a tool that shows us the “repeating sequences of words organized by frequency of repetition or number of words in each repeated phrase”.

There are several other tools that were available to us such as the trends tool, which shows us how words appear from the start to then end of the texts in the corpus, as well as the Bubblelines tool, which follows the same idea. These tools, in addition to the ones we spoke of earlier and several others that we did not use, all helped us analyse our corpus and come to a conclusion on our research questions.

Taken from: https://voyant-tools.org/
Taken from: https://voyant-tools.org/

We were able to:

  1. Figure out the frequencies of words in each of the texts individually, as well as in the corpus as a whole
  2. Analyse these words based on the context they were used in and how depending on this they were used to mean different things
  3. We attempted to find patterns in the ways the authors wrote their books based on these words and their frequencies.

We might not have answered our research questions perfectly, but to compensate for this, we ended up figuring out patterns and structures in our texts that we never even thought of, by using the different tools such as using the Correlations tool to figure out words that rise/fall in frequency in sync, or even using the Knots tool as a creative visualization of the repetition of words as the text goes on.

We were also able to understand how the themes of the texts reflected the word choice of the author. For example, words such as good/happy are repeated and occur much more often that words like bad/sad, which shows us how the author tries to convey a story of adventure and happiness rather than sadness and horror, since these are children’s books after all.

Taken from: https://voyant-tools.org/

As a summary to VT, it is a very user-friendly website and is extremely easy to use and understand especially with the use of their guide that we linked before. The variety of tools available make it an excellent choice to help analyse and understand literary texts. It helped us analyse word frequency and context, as well how certain words rose/fell in frequency together, and how the choice of vocabulary reflected the theme of the text or story.

Using VT will be of great help when we begin our TimelineJS projects later on, as we will be studying the works of Sir Arthur Conan Doyle, one of the greatest story-tellers in history, with his infamous series of books on Sherlock Holmes and Dr. John Watson.

Now, to summarize everything, each of these tools is an excellent way to analyse a corpus or a single text even. They allow for an incredibly quick and accurate accumulation of data representing this corpus, including word frequency, word occurrence and co-occurrence, and the relation between different words as they appear in the text in different contexts. It helps us understand the author’s style and how the theme of the story is reflected in the word choice and trends. It also allows us to understand how an author’s style varies from one story to the next, or even from one genre to another. We can also use them to analyse the writing styles of different authors of the same time period or even completely different times.

They are both an excellent method of distant reading; imagine having to count all these words and analyse them yourself…seems pretty impossible.

CLiC was a very user-friendly tool. From the very moment we were introduced to it, navigating through it was a smooth process especially when certain terms such as corpus and concordance for example were clarified to us. Quick results and the ease of navigating from one tab to another (keywords to concordance for example) was something else we noticed. One disadvantage may be the fact that CLiC does not offer visuals or different modes of representing its information, such as Voyant which may handicap our research considering there are multiple possibilities for interpretation depending on how the information is presented. In addition, CLiC presents the texts or keywords as is based on the certain parameters you choose; this has an advantage in terms of its ability to offer you the exact existing terms or phrases you need within a few seconds. One drawback though is the type of information we receive from CLiC is purely informative rather than derivative or analytical (such as identifying certain themes or ideas we want to extract from the text).

Voyant Tools has its shares of advantages and disadvantages. To begin with, it has a small learning curve. The number of tools available as well as the different options you can mess around with in each can be quite overwhelming and confusing at first. However, when you get used to things, the visuals available make it very easy to look at and understand. The variety of tools can also be an advantage, as it provides an unlimited amount of ways to analyse your texts, with each providing its own information and visualizations.

Again, on their own these tools are still amazing additions to digital humanities, but if we combine the two together, we have a perfect match to analyse texts in any way imaginable. 

We were able to use and understand these tools ourselves in order to analyse different children’s literature and come to a conclusion of our own.

We hope you enjoyed reading our blog, and maybe even inspired you to tinker around with these tools yourself 😉

Thank you for reading and stay tuned for more blogs and an interesting project on the works of Sir Arthur Conan Doyle! 

Ali Munzer       Christiano Faluh       Noureddine Maarouf

Retrieved from: https://pastpresentfuturetvandfilm.wordpress.com/2017/05/10/signing-off/

References:

[1] Ballantyne. (1996, September 1). The Coral Island: A Tale of the Pacific Ocean by R. M. Ballantyne. Retrieved from https://www.gutenberg.org/ebooks/646

[2] Barrie. (2008, June 25). Peter Pan by J. M. Barrie. Retrieved from https://www.gutenberg.org/ebooks/16

[3] CLiC. (n.d.). Retrieved from http://clic.bham.ac.uk/

[4] Mahlberg, M., Stockwell, P., de Joode, J., Smith, C., & O’Donnell, M. B. (2016). CLiC Dickens: Novel uses of concordances for the integration of corpus stylistics and cognitive poetics. Corpora, 11(3), 433–463.

[5] Project Gutenberg. (n.d.). Retrieved from http://www.gutenberg.org/

[6] Stevenson, & Rhead. (2006, February 26). Treasure Island by Robert Louis Stevenson. Retrieved from https://www.gutenberg.org/ebooks/120

[7] Voyant. (n.d.). Retrieved from https://voyant-tools.org/docs/#!/guide/start

Growing up: A Predetermined Partnership With Technology

The end of the 19th century marked the change of life as the generations before then knew it. With the explosive birth and development of the digital world, people who lived during that period of time were bombarded by devices and things they never thought would be possible. Since then, it has been hard to keep up with the technological advancements taking place. People who were born in 2005 feel outdated in 2015. Last year’s cellphone is considered primitive when compared to what’s fresh in the market. Facebook started out as the social platform, but then became one of a million social platforms. What will our life look like 20 years from now?

Related image
Taken from “daxueconsulting”

Early Childhood

I was born in 1998, so it’s safe to say that I was born in times where digital technology was uprising. My first interaction with technology continues to be the most present, despite the massive advancements and “makeovers” it has gone through. What could be the thing that made its way into my childhood, and persisted all the way through my adolescence till now? It is that device I remember thinking of as a “magical portal of moving images and colors” only to later know it as the “television.”

Taken from TechCrunch

In my earliest years of childhood, I remember watching “Teletubbies” and “Zeina w Nahoul” for hours, making it easier for my parents to go about their chores. A tool of distraction for my parents, and a tool of entertainment for me. By the age of 11, it was within my daily routine to watch Cartoon Network and Space Toon. My imagination was feeding on fiction and fantasy. However my parents, at that time, set rules to regulate my interaction with the TV by limiting my time to two hours per day, after I finish my homework of course.. Felt like dictatorship to me if I’m being honest. I’m glad that didn’t last long.

Taken by Hannah Bouckley – 2012

By the age of 14, I not only watched TV for as long as I wanted to, but also claimed ownership of it (not legally though). I had purchased my own Play station, and I must confess that it made my lifestyle very unhealthy. I ignored my family, friends and educational duties, and created a fortress of my own: the living room couch.

At that age I was in the 8th grade, and I had noticed that all of my friends were purchasing iPhones, and after numerous quarrels with my parents and negotiations on reducing my Play station hours, they agreed to buy me one. This palm-sized device had a huge impact on my life. As I expected, I was allowed to use my phone for only a couple of hours a day , so I made the best of them to explore it’s features and capabilities.

Second iPhone model: iPhone 4

My parents were afraid of the negative impact the cellphone might have, especially after my uncontrollable attitude towards the Play station. So my actions were somehow monitored. Up until today, I coexist with my phone; wherever I go, I take it with me. It is my connection to the outer world, and my easy access into information. I was never a bookworm, so thank god for Google! Countless are the times I open my phone to research something, and end up jumping between applications: Instagram, Facebook, Twitter… I am not the kind of person to share my personal life on social media. I use my social media platforms to keep up with news, sports, and most importantly the realm of sports cars. Learning from my early days of obsessive indulgence into technology, I have developed a self monitoring system when it comes to using my social media platforms,going up to a maximum of 5 hours per day. I wouldn’t say it’s perfect, but It’ a process in progress!

Educational Journey

Social Media Platforms
Taken from The National Institute of Social Sciences

Going into university, my parents bought me my first laptop. All of my courses were digitized on my student database, and so the time I spent and continue to spend on my laptop is immeasurable. However one thing I noticed is that studying on the computer, as opposed to school, has helped me in improving my linguistic and literary skills. Every time I have an assignment, I would go online, read samples and explore new terminologies. The unlimited sources offered online for free have aided me in enhancing my academic performance. Of course Netflix and YouTube, being one click away, are very luring when I’m bored in the university library. However, to kill my guilt of diverging from studying, I go for documentaries and educational videos, so I am constantly trying to learn from whatever technology I indulge into.

Image result for digital humanities pictures"
Taken from MASTER’s PROGRAM IN DIGITAL HUMANITIES

Based on my own personal experience, I believe that the digital world has spread its roots into the realm of humanities, reaching a state of coexistence. How did the past generations make it without technology? I guess we’ll never know. Technologies are growing right this second as I type, and the market has been shaped in an inclusive way. In other words, technologies cost as low as 50$ to as high as a million dollars. What does that mean? It means that EVERYONE can purchase a phone. In 2013, TVTechnology (https://www.tvtechnology.com/miscellaneous/the-state-of-television-worldwide) posted an article which reveals that “Globally, more than 1.4 billion households now own at least one TV set, representing 79 percent of total households; the report notes that “virtually all” households in the developed world now own a TV set while 69 percent own at least one set in developing countries.” The contemporary idea of a break is watching a TV show or series on the computer, or even checking the feed on Instagram; the idea of a break is no longer about taking a walk or seeing friends, it has become a computed individualistic process. So in light of that I ask myself: is there any distinction between computing and humanities?

Design a site like this with WordPress.com
Get started