Tutorials

Dictionaries in Python for Textual Data
First Published: 26 October 2018

Python provides a powerful natural language processing toolkit for handling textual data. The toolkit has functions for operations like tokenizing as well as parts-of-speech identification, n-gramming, etc. In addition it contains a corpora of useful datasets like ‘stopwords’, ‘synonyms’, etc. Natural Language ToolKit (NLTK) To install the NLTK library, run the following at the system command prompt (terminal window): You will have to change your current directory to the ‘Scripts’ folder of the ‘Python’ directory OR you should have the Python folder in the system path variable. ... Read More

Simulation using SimPy
First Published: 28 August 2018

Coffee shop scenario Coffee shops are generally a popular place to hangout. Starbucks, Colectivo, Anodyne, Stone Creek, etc. are some of the sought-after coffee shops in Milwaukee. It is not just the rejuvenating coffee that attracts customers, but the ambience draws people who want to read, work on their laptops, write, do some deep thinking, carry on conversation, or just relax. However, sometimes getting a cup of coffee can take a long wait. ... Read More

Dataset Repository List
First Published: 21 August 2018

Data is called the new “black gold” and is being monetized through different avenues. Recommender Systems, Machine Learning operations, Data Mining, Pattern Recognition, etc. require data. Privacy concerns has limited the accessibility of user generated data. However, there are still a few places where data can be freely got for research purposes. Free Twitter Datasets Curated Twitter Data Twitter DataStream Archives Conference on Web and Social Media – Dataset Sharing Service ... Read More

Blog using Hugo on Github
First Published: 24 June 2018

How to Create a Personal Blog on Github using Hugo Github is a popular free static hosting service for personal and project pages. Hugo is a framework for building websites. More information about Hugo can be got at https://gohugo.io/hosting-and-deployment/hosting-on-github/ The Hugo website also has instructions for creating a personal website on Github. Instructions can be got at https://gohugo.io/hosting-and-deployment/hosting-on-github/ Step 1: Download Resources and Setup Directories Download and intall Git (https://gitforwindows. ... Read More
Sentiment Analysis Identifying the sentiment of a news article or tweet can be useful in many ways. For example, sentiment of news about stock or the economy or sentiment of tweets of a user regarding the stock market might correlate to movements in the stock market. These in turn can be used for personalizing recommendations or for predicting market sentiment. In this brief tutorial we will look at how news article sentiment can be guaged. ... Read More

Handling data in JSON and CSV formats
First Published: 28 October 2017

Python includes packages for handling data in JSON and CSV formats. JSON stands for JavaScript Object Notation format. It is basically a key-value pair store. CSV stands for Comma Separated Variables. It is a list of data values separated by comma (or some other delimiter). If you open the tweets.txt file saved in the previous session, you will notice that each line is in the JSON format. Each tweet is enclosed in braces {}, and within the braces are key:value pairs separated by commas. ... Read More

Collecting Streaming Twitter Data
First Published: 28 October 2017

Twitter no longer provides free access to all streaming tweets (Twitter Firehose). However, it does allow access to “some” streaming tweets in real time based on filters. In the code below all tweets having ‘Las Vegas’ in them will be filtered. [More: http://tweepy.readthedocs.io/en/v3.5.0/streaming_how_to.html] One needs to be aware of the rate limits imposed by Twitter while accessing their feeds. It is always advisable to include pauses in code execution to ensure that the rate limits are not violated OR include error handling routines that can trap rate violation errors. ... Read More

Introduction to Reading Twitter Data in Python
First Published: 27 October 2017

Social Media Analysis Social media has become the go-to place for information and opinions. Social media can be mined to gauge sentiment, analyze opinion, identity interests, locate concerns, and so on. Facebook, Pinterest, Instagram, Twitter, Google+ are some social media platforms. Among them, Twitter data is public and can be mined relatively easily. First of all you need to obtain access codes from Twitter to be able to mine Twitter data. ... Read More