E-Book Content
R Data Analysis Projects Build end to end analytics systems to get deeper insights from your data
Gopi Subramanian
BIRMINGHAM - MUMBAI
R Data Analysis Projects Copyright © 2017 Packt Publishing First published: November 2017 Production reference: 1151117 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78862-187-8 www.packtpub.com
Contents Preface
1
Chapter 1: Association Rule Mining
6
Understanding the recommender systems Transactions Weighted transactions Our web application
Retailer use case and data Association rule mining Support and confidence thresholds The cross-selling campaign Leverage Conviction Weighted association rule mining Hyperlink-induced topic search (HITS) Negative association rules Rules visualization Wrapping up Summary
Chapter 2: Fuzzy Logic Induced Content-Based Recommendation Introducing content-based recommendation News aggregator use case and data Designing the content-based recommendation engine Building a similarity index Bag-of-words Term frequency Document frequency Inverse document frequency (IDF) TFIDF Why cosine similarity?
Searching Polarity scores Jaccard's distance Jaccards distance/index Ranking search results Fuzzy logic Fuzzification
7 8 9 9 10 13 25 30 33 34 35 42 50 53 57 65 66 68 72 77 80 80 81 81 81 82 85 86 89 91 92 94 95 95
Defining the rules Evaluating the rules Defuzzification
Complete R Code Summary
Chapter 3: Collaborative Filtering Collaborative filtering Memory-based approach Model-based approach Latent factor approach Recommenderlab package Popular approach Use case and data Designing and implementing collaborative filtering Ratings matrix Normalization Train test split Train model User-based models Item-based models Factor-based models
Complete R Code Summary
Chapter 4: Taming Time Series Data Using Deep Neural Networks Time series data Non-seasonal time series Seasonal time series Time series as a regression problem Deep neural networks Forward cycle Backward cycle
Introduction to the MXNet R package Symbolic programming in MXNet Softmax activation Use case and data Deep networks for time series prediction
Training test split Complete R code Summary
97 97 98 107 113 115 116 118 119 120 122 124 126 136 137 139 141 144 149 152 153 155 161 162 164 165 166 167 171 174 175 175 178 182 184 186 188 202 209
Chapter 5: Twitter Text Sentiment Classification Using Kernel Density Estimates Kernel density estimation Twitter text Sentiment classification Dictionary methods Machine learning methods Our approach Dictionary based scoring Text pre-processing Term-frequeny inverse document frequency (TFIDF) Delta TFIDF Building a sentiment classifier Assembling an RShiny application Complete R code Summary
Chapter 6: Record Linkage - Stochastic and Machine Learning Approaches Introducing our use case Demonstrating the use of RecordLinkage package Feature generation String features Phonetic features
Stochastic record linkage Expectation maximization method Weights-based method Machine learning-based record linkage Unsupervised learning Supervised learning Building an RShiny application Complete R code Feature generation Expectation maximization method Weights-based method Machine learning method RShiny application Summary
Chapter 7: Streaming Data Clustering Analysis in R
210 212 217 218 219 219 219 220 224 226 227 230 234 238 242 243 244 245 247 249 250 252 252 258 261 261 263 268 271 272 273 274 275 277 278 279
Streaming data and its challenges Bounded problems Drift Single pass Real time Introducing stream clustering Macro-cluster Introducing the stream package Data stream data DSD as a static simulator DSD connecting to memory,