R Data Analysis Projects


E-Book Content

R Data Analysis Projects Build end to end analytics systems to get deeper insights from your data Gopi Subramanian BIRMINGHAM - MUMBAI R Data Analysis Projects Copyright © 2017 Packt Publishing First published: November 2017 Production reference: 1151117 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78862-187-8 www.packtpub.com Contents Preface 1 Chapter 1: Association Rule Mining 6 Understanding the recommender systems Transactions Weighted transactions Our web application Retailer use case and data Association rule mining Support and confidence thresholds The cross-selling campaign Leverage Conviction Weighted association rule mining Hyperlink-induced topic search (HITS) Negative association rules Rules visualization Wrapping up Summary Chapter 2: Fuzzy Logic Induced Content-Based Recommendation Introducing content-based recommendation News aggregator use case and data Designing the content-based recommendation engine Building a similarity index Bag-of-words Term frequency Document frequency Inverse document frequency (IDF) TFIDF Why cosine similarity? Searching Polarity scores Jaccard's distance Jaccards distance/index Ranking search results Fuzzy logic Fuzzification 7 8 9 9 10 13 25 30 33 34 35 42 50 53 57 65 66 68 72 77 80 80 81 81 81 82 85 86 89 91 92 94 95 95 Defining the rules Evaluating the rules Defuzzification Complete R Code Summary Chapter 3: Collaborative Filtering Collaborative filtering Memory-based approach Model-based approach Latent factor approach Recommenderlab package Popular approach Use case and data Designing and implementing collaborative filtering Ratings matrix Normalization Train test split Train model User-based models Item-based models Factor-based models Complete R Code Summary Chapter 4: Taming Time Series Data Using Deep Neural Networks Time series data Non-seasonal time series Seasonal time series Time series as a regression problem Deep neural networks Forward cycle Backward cycle Introduction to the MXNet R package Symbolic programming in MXNet Softmax activation Use case and data Deep networks for time series prediction Training test split Complete R code Summary 97 97 98 107 113 115 116 118 119 120 122 124 126 136 137 139 141 144 149 152 153 155 161 162 164 165 166 167 171 174 175 175 178 182 184 186 188 202 209 Chapter 5: Twitter Text Sentiment Classification Using Kernel Density Estimates Kernel density estimation Twitter text Sentiment classification Dictionary methods Machine learning methods Our approach Dictionary based scoring Text pre-processing Term-frequeny inverse document frequency (TFIDF) Delta TFIDF Building a sentiment classifier Assembling an RShiny application Complete R code Summary Chapter 6: Record Linkage - Stochastic and Machine Learning Approaches Introducing our use case Demonstrating the use of RecordLinkage package Feature generation String features Phonetic features Stochastic record linkage Expectation maximization method Weights-based method Machine learning-based record linkage Unsupervised learning Supervised learning Building an RShiny application Complete R code Feature generation Expectation maximization method Weights-based method Machine learning method RShiny application Summary Chapter 7: Streaming Data Clustering Analysis in R 210 212 217 218 219 219 219 220 224 226 227 230 234 238 242 243 244 245 247 249 250 252 252 258 261 261 263 268 271 272 273 274 275 277 278 279 Streaming data and its challenges Bounded problems Drift Single pass Real time Introducing stream clustering Macro-cluster Introducing the stream package Data stream data DSD as a static simulator DSD connecting to memory,
You might also like

Computer Science Handbook
Authors: Allen B. Tucker    220    0



Advances In Discrete Tomography And Its Applications
Authors: Gabor T. Herman , Attila Kuba    116    0


Introduction To Complexity Theory, Lecture Notes
Authors: Goldreich O.    133    0


Object-oriented Programming Via Fortran 90-95
Authors: Ed Akin    148    0


Fortran 90: A Conversion Course For Fortran 77 Programmers
Authors: Walter S. Brainerd , Charles H. Goldberg , Jeanne C. Adams    134    0


Encyclopedia Of Physical Science And Technology - Computer Software
Authors: Robert A. Meyers (Editor-in-Chief)    199    0


Combinatorial Optimization: Networks And Matroids
Authors: Lawler E.L.    157    0


New Optimization Algorithms In Physics
Authors: Alexander K. Hartmann , Heiko Rieger    147    0


A Handbook Of Statistical Analyses Using R
Authors: Brian S. Everitt , Torsten Hothorn    123    0