Monday, November 9, 2015

Database Misc



https://www.revsys.com/blog/2012/may/01/three-things-you-should-never-put-your-database/
Three things you should never put in your database

Images, files, and binary data

Your database supports BLOBs so it must be a good idea to shove your files in there right? No it isn't! Hell it isn't even very convenient to use with many DB language bindings.
There are a few of problems with storing files in your database:
  • read/write to a DB is always slower than a filesystem
  • your DB backups grow to be huge and more time consuming
  • access to the files now requires going through your app and DB layers
The last two are the real killers. Storing your thumbnail images in your database? Great now you can't use nginx or another lightweight web server to serve them up.
Do yourself a favor and store a simple relative path to your files on disk in the database or use something like S3 or any CDN instead.

Ephemeral data

Usage statistics, metrics, GPS locations, session data anything that is only useful to you for a short period of time or frequently changes. If you find yourself DELETEing an hour, day, or weeks worth of some table with a cron job, you're using the wrong tool for the job.
Use redisstatsd/graphiteRiak anything else that is better suited to that type of work load. The same advice goes for aggregations of ephemeral data that doesn't live for very long.

https://stackoverflow.com/questions/24076911/how-oracle-rowid-is-generated-internally
After a rowid is assigned to a row piece, the rowid can change in special circumstances. For example, if row movement is enabled, then the rowid can change because of partition key updates, Flashback Table operations, shrink table operations, and so on. If row movement is disabled, then a rowid can change if the row is exported and imported using Oracle Database utilities.
The "and so on" indicates that there are many reasons which would cause a rowid to change. 
You'll notice that after an alter table operation the sole rowid has changed.
A rowid is a pseudo column (like versions_xid), that uniquely identifies a row within a table, but not within a database. 

It is possible for two rows of two different tables stored in the same cluster to have the same rowid. 

Bigfile rowids 
--------------- 
A bigfile tablespace can only have on datafile associated with it, therefor, the three bytes that identified the datafile can now additionally be used to address db blocks. 


'Changing' rowids 
-------------------- 
Although a rowid uniquely identifies a row in a table, it might change its value if the underlying table is an index organized table or a partitioned table. 
Also, rowids change if a table is exported and imported using EXP/IMP. 
This implies that rowids should not be stored away for later re-use as the corresponding row then might either not exist or contain completely different data. 

Source - http://www.adp-gmbh.ch/ora/concepts/rowid.html 
https://alexzeng.wordpress.com/2008/10/15/oralce-physical-rowid-structure/
Since Oracle 8i, oracle physical rowid is in extended rowid format. Extended rowids show in 18 characters with base 64 encoding. The encoding characters are A-Z, a-z, 0-9, +, and /. It is stored in 10 bytes (80 bits).
An extended rowid has a four-piece format, OOOOOOFFFBBBBBBRRR:
The first 6 characters OOOOOO represent data object number, using 32bits
The next 3 characters FFF represent tablespace-relative datafile number, using 10bits.
The next 6 characters BBBBB represent block number, using 22bits.
The last 3 character RRR represent row number, using 16bit
18characters=6+3+6+3
80bits=32+10+22+16
It will easy to understand below.
32bit—object number——–The maximum number of OBJECT in one database is 4G (4294967296)
10bit—file number—–The maximum number of FILE in a tablespace is 1022 (keep 2 in remainder)
22bit—block number——–The maximum number of BLOCK in one file is : 4M (4194304)
16bit—row number——– The maximum number of ROW in one block is 64K(65536)
假设一高频查询如下
SELECT * FROM user WHERE area='amoy' AND sex=0 ORDER BY last_login DESC limit 30;
如何建立索引?描述考虑的过程
user表如下:
初始化100W条数据,其中,area要通过IP查询生成,sex为 0,1 随机
CREATE TABLE user (
id int(10) NOT NULL AUTO_INCREMENT COMMENT '自增编号',
username varchar(30) NOT NULL DEFAULT '0' COMMENT '用户名',
password varchar(30) NOT NULL DEFAULT '0' COMMENT '密码',
area varchar(30) NOT NULL COMMENT '地址',
sex int(10) NOT NULL COMMENT '性别1,男;2,女。',
last_login int(10) NOT NULL COMMENT '最近一次登录时间戳',
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=892013 DEFAULT CHARSET=latin1
最终我的索引
(last_login,area)

索引原则:

1.where和order by等的字段建立索引

2.使用唯一索引:对于last_login,area等字段重复的次数比较少,可以使用索引;而sex无非就两个值:性别1,男;2,不值得索引

3.多列索引:不要为每一个列单独建立索引,这样并不能将mysql索引的效率最大化。使用“索引合并策略”

4.选择合理的索引列顺序:索引列的顺序意味着索引首先按照最左列进行排序,然后是第二列,以此类推。如(last_login,area)会先按照 last_login 进行排序,然后才是area。

5.将选择性最高的索引放到前面,也就是会所按照这个条件搜索到的数据最少,选择性就越高,比如选择性:last_login> area> sex。

6.索引不是越多越好,适合的索引可以提高查询效率,但是会降低写入效率,根据项目保持两者的平衡性最好了。

总结上面,首先sex不适合建立索引,有没有索引对于效率的提升意义不大,其次索引会按照最左列进行排序,因此将last_login放到最前面

综上所述:1.建立索引不一定能够加快查询效率如sex这种给重复次数特别多的列增加索引如sex这种会降低查询效率,具体的原因有待查找
2.给重复次数比较少的列增加u讴吟还是能够大幅度提高效率
3.给where和orderby之后的字段添加索引才会加快查询效率
4.为每一个列单独建立索引,不能将索引的效率最大化,应该使用索引合并策略,即根据查询条件,建立联合索引
5.联合索引的顺序问题:将选择性高的索引放到前面
6.根据资料建立索引意味着索引按照最左列进行排序,然后事第二列,以此类推。如(last_login ,area)就会按照last_login进行排序,然后才是area
7.根据这次的这个查询条件来说最好的索引是:ALTER TABLE userADD INDEX index_last_login_area(last_login,area)。



Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts