Knowlege points of database especially mysql

Index

The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially.

Select

表扫描(table scan)

  • 在表扫描中,每一个表中的文件块都会被顺序扫描,而且每一个记录都会被测试,看它是否会满足条件
  • 表扫描是在没有索引的情况下进行的

索引扫描(index scan)

  • 使用索引的扫描算法称之为索引扫描
  • 当有索引的时候,选择运算就会执行索引扫描

MySQL

Truncate diff Delete

TRUNCATE TABLE empties a table completely. It requires the DROP privilege.

Logically, TRUNCATE TABLE is similar to a DELETE statement that deletes all rows, or a sequence of DROP TABLE and CREATE TABLE statements. To achieve high performance, it bypasses the DML method of deleting data. Thus, it cannot be rolled back, it does not cause ON DELETE triggers to fire, and it cannot be performed for InnoDB tables with parent-child foreign key relationships.

Although TRUNCATE TABLE is similar to DELETE, it is classified as a DDL statement rather than a DML statement.

  • Any AUTO_INCREMENT value is reset to its start value. This is true even for MyISAM and InnoDB, which normally do not reuse sequence values.
  • When used with partitioned tables, TRUNCATE TABLE preserves the partitioning; that is, the data and index files are dropped and re-created, while the partition definitions (.par) file is unaffected.
  • The TRUNCATE TABLE statement does not invoke ON DELETE triggers.

ref

Partitioning

MySQL 5.6 Community binaries provided by Oracle include partitioning support.

SELECT PLUGIN_NAME as Name, PLUGIN_VERSION as Version, PLUGIN_STATUS as Status FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_TYPE='STORAGE ENGINE';
Name Version Status
partition 1.0 ACTIVE

Sharding

  • 分片、分库分表(Sharding)
  • 以 MySQL 为例,分库分表从阶段应该拆分为分表、分库,一般来说是先进行分表,分表的原动力在于 MySQL 单表性能问题
  • 是由于业务的不可控,所以大家往往采取比较保守的策略,这就是分表的原因。
  • 分库主要由于 MySQL 容量上,MySQL 的写入是很昂贵的操作。目前有一些云 RDS 通过计算与存储分离、log is database 的理念来很大程度解决了写入扩大的问题,但在这之前,更为普遍的解决方案就是把一个集群拆分成 N 个集群,即分库分表(sharding)。为了规避热点问题,绝大多数采用的方法就是 hash 切分,也有极少的范围、或者基于 Mapping 的查询切分。
  • 既然做了分表,那数据的分发、路由就需要进行处理,自下而上分为三层,分别 DB 层、中间层、应用层。DB 层实现,简单来说就是把路由信息加入到某个 Metedata 节点,同时加上一些诸如读写分离、HA 整合成一个 DB 服务或者产品,但这种方案实现复杂度非常高,有的逐步演变成了一种新的数据库,更为常见的是在中间层实现,而中间层又根据偏向 DB 还是偏向应用分为 DB proxy 和 JDBC proxy。
  • DB proxy高度依赖网络组件,它需要诸如 LVS/F5 等 VIP 来实现流量的负载均衡,如果跨 IDC,还依赖诸如 DNS 进行IDC 分发。同时部分 DBproxy 对 Prepare 这类操作支持不友好,所以它的问题概括来说:
    • 链路过长,每层都会增加响应时间
    • 网络单点,并且往往是整个公司层面的单点
    • 部分产品对Prepare 应用不友好,需要绑定 connection 信息
  • JDBC proxy最大的问题是违背了 DB 透明的原则,它需要对不同的语言编写 Driver,概括来说:
    • 语言限制,总会遭到一批 RD 同学的吐槽 “世界上最好的语言竟然不支持!”
    • 接入繁琐
    • DB 不透明

ref