Things like “text mining” and “text analysis” are always difficult to understand for non-technical people. But “spreadsheet” is a common and shared concept, and it’s something used everyday, by technical and non-technical people.
If you are a data journalist, or a Social Media consultant, or a Web content editor, you are using spreadsheets to collect and manage any kind of data.
Google Spreadsheet is one of the best spreadsheet tool available, especially if there is a strong need of sharing content and doing collaborative editing. According to the Wall Street Journal, even today it’s an hot topic choosing between an office suite installed on our computers, and an online alternative, like Google Drive.
One of the killer features of Google Drive, and Google Docs before, was its own simplicity. Citing an old article from CNN:
Google’s productivity tools may lack some of Office’s advanced features, but are easier and simpler to use than anything Microsoft offers — especially when it comes to the cloud-centric features.
Things are changing: recently Google has released the Add-on feature to Google Drive platform, to personalize your own Google Drive as you wish.
Add-ons: new tools created by developer partners that give you even more features in your documents and spreadsheets.
Clicking “Get add-ons” in the Add-ons menu of any open document or spreadsheet, it’s possible to navigate through the available add-ons.
According to this principle of “less is more”, we had developed an add-on for Google Sheets to democratize text analysis, called Text Mining.
It’s the integration between Dandelion API and Google Sheets: the goal is making the “entity extraction” concept more accessible and useful.
Without using an API, but only a menu inside a spreadsheet.
What’s needed to play with this add-on:
- the official Google Chrome Store add-on page to install it
- a short tutorial to setup it, published on dandelion.eu
Text analysis for Google Sheets: turn text into data
The main thing to know about text analysis is this: it’s all about turning text into data, to enable graphs, charts and infographics built on the top of that data, and any other kind of data analysis.
Using Text Mining add-on, you can:
- extract persons, places, organizations and concepts cited in a text, to filter your content: this text can be a flow of tweets, or an article taken from a website and copied inside some cells of a spreasheet, or some emails;
- find contextual keywords to enrich your content, using the content of the column “categories”; (useful to improve the content from a SEO point of view)
- build some graph/chart based on the content of the column “type”: you can count how many places, organizations, etc. are found in your content, and make a tag cloud, or something like a mindmap;
- enrich your text with links to Wikipedia, according to the context: so you can find related content, using Wikipedia as a shared knowledge base;
- summarize and tag a text that you don’t know, using the content of the column “categories”: it’s useful to have an overall idea of an article, or to tag a blog post not written by you;
- enrich metadata related to Open Data datasets, using the content of the column “categories” as tags for your raw data.
Text Mining add-on creates a new scheet called “Analysis”, inside your Google Spreadsheet. This sheet contains a lot of different columns, with some structured data, as you can see in this screenshot:
There are a lot of different kind of users of this add-on, just to name a few: Social Media consultants, Security specialists, Web content editors, SEO experts etc.
But there is another kind of audience, more advanced and skilled than the first one: we are speaking about data journalists, and developers not so skilled, who need to build quickly some UI prototypes or interactive web pages.
They use Google Spreadsheets as a database: there are a lot of different frameworks which are based on this functionality. One of the first is Simile Exhibit (old, but still useful and powerful), or another one is Sheetsee.js, to name only a few. Others more used are Google Fusion Tables, or Tableau.
In this scenario, it’s possibile to use structured data like categories, or Wikipedia URLs, to build a faceted search automatically from any kind of content.
But that’s only an idea: tell us your use case, using Twitter #textanalysis #gdrive #textmining.
Some useful sources
- the Google SpreadSheet used as demo example only for reference
- Google Spreadsheets: more useful than you thought
- 22 free tools for data visualization and analysis – useful for data journalists
- How Linking to Knowledge Could Boost News Media – an interesting insight on the value of linking to other sources of knowledge