Tuesday, December 6, 2016

Machine Learning


https://github.com/actionml/template-personalized-search

http://redmonk.com/fryan/2016/06/06/a-look-at-popular-machine-learning-frameworks/
Salesforce (who recently acquired prediction.io)
http://spark.apache.org/docs/latest/ml-guide.html
The MLlib RDD-based API is now in maintenance mode.
As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package.
MLlib uses the linear algebra package Breeze, which depends on netlib-java for optimised numerical processing. If native libraries1 are not available at runtime, you will see a warning message and a pure JVM implementation will be used instead.
Due to licensing issues with runtime proprietary binaries, we do not include netlib-java’s native proxies by default. To configure netlib-java / Breeze to use system optimised binaries, include com.github.fommil.netlib:all:1.1.2 (or build Spark with -Pnetlib-lgpl) as a dependency of your project and read the netlib-java documentation for your platform’s additional installation instructions.




Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness
关联规则学习英语:Association rule learning)是一种在大型数据库中发现变量之间的有趣性关系的方法。它的目的是利用一些有趣性的量度来识别数据库中发现的强规则
基于强规则的概念,Rakesh Agrawal等人[2]引入了关联规则以发现由超市的POS系统记录的大批交易数据中产品之间的规律性。例如,从销售数据中发现的规则 {洋葱, 土豆}→{汉堡} 会表明如果顾客一起买洋葱和土豆,他们也有可能买汉堡的肉。此类信息可以作为做出促销定价产品植入等营销活动决定的根据。除了上面购物篮分析中的例子以外, 关联规则如今还被用在许多应用领域中,包括网络用法挖掘入侵检测连续生产生物信息学中。与序列挖掘相比,关联规则学习通常不考虑在、事务中或事务间的项目的顺序。

the problem of association rule mining is defined as:
Let  be a set of  binary attributes called items.
Let  be a set of transactions called the database.
Each transaction in  has a unique transaction ID and contains a subset of the items in .
rule is defined as an implication of the form:
Where  and .
Every rule is composed by two different sets of items, also known as itemsets and , where  is called antecedent or left-hand-side (LHS) and  consequent or right-hand-side (RHS).
To illustrate the concepts, we use a small example from the supermarket domain. The set of items is  and in the table is shown a small database containing the items, where, in each entry, the value 1 means the presence of the item in the corresponding transaction, and the value 0 represents the absence of an item in that transaction.
An example rule for the supermarket could be  meaning that if butter and bread are bought, customers also buy milk.
Support is an indication of how frequently the item-set appears in the database.
The support value of  with respect to  is defined as the proportion of transactions in the database which contains the item-set . In formula: 
In the example database, the item-set  has a support of  since it occurs in 20% of all transactions (1 out of 5 transactions). The argument of  is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive)
Confidence is an indication of how often the rule has been found to be true.

The confidence value of a rule,  , with respect to a set of transactions , is the proportion of the transactions that contains  which also contains .
Confidence is defined as:
.
For example, the rule  has a confidence of  in the database, which means that for 100% of the transactions containing butter and bread the rule is correct (100% of the times a customer buys butter and bread, milk is bought as well).

http://www.actionml.com/blog/personalized_search
Amazon, another trailblazer in Machine Learning, gave the technique the name "Behavior-Based Search". They describe it as "People who searched for X bought item Y.”

https://github.com/actionml/template-personalized-search


Labels

Review (551) System Design (281) System Design - Review (188) Java (176) Coding (75) Interview-System Design (65) Book Notes (59) Coding - Review (59) Interview (58) to-do (45) Knowledge (39) Linux (37) Interview-Java (35) Knowledge - Review (32) Database (29) Design Patterns (29) Product Architecture (28) Big Data (26) Miscs (25) Concurrency (24) Cracking Code Interview (24) MultiThread (24) Soft Skills (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Distributed (20) Interview Q&A (20) OOD Design (20) System Design - Practice (19) How to Ace Interview (15) Security (15) Brain Teaser (14) Algorithm (13) Linux - Shell (13) Code Quality (12) How to (12) Interview-Database (12) Interview-Operating System (12) Spark (12) Tools (12) Architecture Principles (11) Company - LinkedIn (11) Google (11) Redis (11) Resource (10) Testing (10) Amazon (9) Search (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cache (8) Company - Uber (8) Interview - MultiThread (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Scalability (8) Spring (8) Git (7) Interview Corner (7) JVM (7) Java Basics (7) NoSQL (7) C++ (6) File System (6) Highscalability (6) How to Better (6) Machine Learning (6) Network (6) Restful (6) Solr (6) CareerCup (5) Code Review (5) Company - Facebook (5) Design (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Be Architect (4) Big Fata (4) C (4) Cassandra (4) Company Product Architecture (4) Design Principles (4) Facebook (4) GeeksforGeeks (4) Generics (4) Google Interview (4) Hardware (4) JDK8 (4) Kafka (4) Optimization (4) Product + Framework (4) Shopping System (4) Source Code (4) Trouble Shooting (4) Web Service (4) node.js (4) Back-of-Envelope (3) Company - Pinterest (3) Company - Twiiter (3) Company - Twitter (3) Consistent Hash (3) Data structures (3) GOF (3) Game Design (3) GeoHash (3) Growth (3) Guava (3) Interview-Big Data (3) Interview-Linux (3) Interview-Network (3) Java EE Patterns (3) Javarevisited (3) Map Reduce (3) Math - Probabilities (3) Performance (3) Puzzles (3) Python (3) Resource-System Desgin (3) Scala (3) UML (3) geeksquiz (3) AI (2) API Design (2) AngularJS (2) Behavior Question (2) Bugs (2) Coding Interview (2) Company - Netflix (2) Crawler (2) Cross Data Center (2) Data Structure Design (2) Database-Shard (2) Debugging (2) Elasticsearch (2) Garbage Collection (2) Go (2) Hadoop (2) Html (2) Interview - Soft Skills (2) Interview-Miscs (2) Interview-Web (2) JDK (2) Logging (2) POI (2) Papers (2) Programming (2) Project Practice (2) Random (2) Software Desgin (2) System Design - Feed (2) Thread Synchronization (2) Video (2) ZooKeeper (2) reddit (2) Ads (1) Advanced data structures (1) Algorithm - Review (1) Android (1) Approximate Algorithms (1) Base X (1) Bash (1) Books (1) C# (1) CSS (1) Chrome (1) Client-Side (1) Cloud (1) CodingHorror (1) Company - Yelp (1) Counter (1) DSL (1) Dead Lock (1) Difficult Puzzles (1) Distributed ALgorithm (1) Docker (1) Eclipse (1) Facebook Interview (1) Function Design (1) Functional (1) GoLang (1) How to Solve Problems (1) ID Generation (1) IO (1) Important (1) Internals (1) Interview - Dropbox (1) Interview - Project Experience (1) Interview Tips (1) Interview-Brain Teaser (1) Interview-How (1) Interview-Mics (1) Interview-Process (1) Jeff Dean (1) Joda (1) LeetCode - Review (1) Library (1) LinkedIn (1) Mac (1) Micro-Services (1) Mini System (1) MySQL (1) Nigix (1) NonBlock (1) Process (1) Productivity (1) Program Output (1) Programcreek (1) Quora (1) RPC (1) Raft (1) RateLimiter (1) Reactive (1) Reading (1) Reading Code (1) Resource-Java (1) Resource-System Design (1) Resume (1) SQL (1) Sampling (1) Shuffle (1) Slide Window (1) Spotify (1) Stability (1) Storm (1) Summary (1) System Design - TODO (1) Tic Tac Toe (1) Time Management (1) Web Tools (1) algolist (1) corejavainterviewquestions (1) martin fowler (1) mitbbs (1)

Popular Posts