Text Analysis

The amount of word data produced by the human race is astronomical. The ability to rummage through all the data we have and will produce is outside of human comprehension and capability.

This topic of meta-data is the forefront of any modern internet debate for the implications are far more outreaching than simple security and privacy risks. With the discovery of the NSA having access to petabytes of, what was supposed to be private information, the world willingly allows their private data to bottle neck through the most powerful government entity to use as they please. When asked about the obvious abuses to human rights such as those to privacy, the term meta-data is what makes these privacy infringes justifiable.

Not all is dark and gloomy. Despite what some might consider unconstitutional attacks against ones privacy, the collection of meta-data can reveal some interesting truths about human behavior and the culture that supports those behaviors and visa-versa. The truth of the matter is, and it is impossible to avoid and more sobering to think about, is that the collection of meta-data at the rate at which it is being collected means and by almost no stretch of the imagination, that you are always being watched, monitored and or surveyed at all times. This is the major issue over the debate of meta-data. The fact that being able to track and monitor people in real time, and claiming it to be accidental or innocent is a abuse of technological and hierarchical power. All of which is justifiable in the name of national security.

The computers and systems that are used in the field of meta-data are designed to condense, organize and prioritize data points among the superfluous amounts of our information floating around cyberspace. In the day and age of social media, almost everybody on the planet with access to the internet can share what they ate for breakfast or where they plan on going for vacation. Security agencies are no longer  dealing with the issue of scarcity but rather the issue of abundance. And this abundance is of material most of which lacks serious contextual content. This creates a difficult task for agencies whose jobs are to filter through this unreadable amount of data and measure the security risk of the manikin challenge.

Making the information provided by meta-data and making it accessible and comprehensible to human beings is the purpose of text-analysis engines. In the process of unscrambling the clutter that is the internet, textual analytic study can provide insights towards potentially hidden correlations undiscovered by human processes. Throughout history in the English language we have been using more and more words in texts and works than in years prior. This exponential growth in the use of words is causal to the need for computers to analyze the words we use. Certain tools of text analysis can provide insight or an alternative view of past or current events, and what hopefully is not an exercise in futility; predict the future. I used the tools Ngram and Mining the Dispatch to discover the frequency in the use of a word during a specific moment in history. In the Ngram viewer, you can correlate multiple factors across each other over a select period of time. I decided to look up the origin and use of swear words from the year 1800 to 2008, because I am 10 years old. with a little bit of additional research it was quite enlightening the things you could learn from studying bad words. Clearly they are not that bad if they are educational. Certain words that have a correct `christian` translation, such as the word `ass` which is slang for donkey, have had historical footings in the English language. These words actually lost traction as religion and manners were enforced in the late 19th and early 20th century, but became popular again as the rebellious counterculture emerged in the 60`s and 70`s. Two words that seemed to have completely originated in this culture were the `s` word and my personal favorite `f` word which have become 2 of the most commonly used swear words just behind; coincidentally the word ass.

Culturomics is a term derived from the collection of meta-data applied to the study of human culture. Discovering the ways in which and the frequency at which we use words can express a lot about our culture. The real question is, is this benefit worth the cost?

http://dsl.richmond.edu/dispatch/

https://books.google.com/ngrams

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s