本文共 4862 字,大约阅读时间需要 16 分钟。
一个OLTP系统最可能出现瓶颈的地方是随机型的IO。
优化手段当然有很多,如加数据库端的缓存,应用端的缓存,存储端的CACHE,降低读的IO。
写的IO的话估计没那么容易解决。通常的做法是增加硬盘,细化表空间,细化对象与表空间的关系,存储的话也可以提高盘阵的CACHE(需要配置write-back模式)。
随着电子盘的出现,随机型的IO能力提高成为可能。为啥这么说呢,电子盘没有机械手臂,没有了机械手臂的移动延时和拾取延时。
当然电子盘也离不开CACHE,CACHE永远是最快的。
下面是一个老外对SSD和数据库应用场景的描述:
(SSD) are getting and more popular. Initially, SSDs appeared to be ideal for databases because they potentially allow fast I/O with non-volatile storage — unfortunately, neither of these is completely true. Let me explain:
Fast I/O: While SSDs offer random I/O speeds far in excess of traditional hard drives (because there are no moving platters or heads), the sequential I/O speed of SSDs is only marginally better than mechanical drives. Database activity that causes random I/O, like index scans that do not fit in RAM, will benefit from SSDs’ superior random I/O speeds, but sequential scans only marginally benefit by using SSDs. (Greg Smith explains the limited use-case for SSDs in this .) Postgres 9.0 will allow to be set so administrators can indicate that random I/O has the same speed as sequential I/O for SSD-based tablespaces.
Non-Volatile Storage: Because SSDs offer permanent storage, it is often believed that they are an ideal place to store the Postgres (WAL) which are on every transaction commit. However, SSDs typically write data in 256 kilobyte chunks, meaning the small write operations that occur at every commit are not ideal for SSD drives, and might not even be flushed to permanent storage immediately. (Many SSD vendors have been vague about this behavior.) (FYI, drives are getting good reviews.) This explains the internal workings of SSD drives, and the article summary contains this warning:
We find that SSD performance and lifetime is highly workload-sensitive, and that complex systems problems that normally appear higher in the storage stack, or even in distributed systems, are relevant to device firmware.
反驳方:
>> the sequential I/O speed of SSDs is only marginally better than mechanical drives. … sequential scans only marginally benefit by using SSDs.
Not even remotely true, even for retail level drives. Visit for theory and practice of SSD design and history. Visit for current drives and tests, for example: . Note the sequential speeds of SSDs versus the HDDs (fastest retail available). Enterprise drives (STEC, Violin, Texas Memory) aren’t as widely reviewed, since these parts are “qualified” to OEMs. They rip.
>> SSDs typically write data in 256 kilobyte chunks, meaning the small write operations that occur at every commit are not ideal for SSD drives, and might not even be flushed to permanent storage immediately.
Not really an issue. The 256K assumes that each write does a block erase for each write; controllers aren’t that naive’. The write size on the NAND isn’t the issue, beyond the factor of sector (512) writes vs. block (4K) writes. Controllers parse out the writes as they see fit. The SandForce controller even does deduping and compression on the fly; not everyone is thrilled with this, by the way. See: . The paper cited is quite dated; controllers today are far more sophisticated than what was current in 2007. Server/enterprise drives are/can be heavily cached (see link above), so that the size of the write from the OS is not relevant. Even retail level drives can be (those aimed at laptops and such aren’t).
What has been ignored is the real strength of SSD with databases: implementing BCNF schemas. The strength of SSD is that supports the redundancy elimination of the relational model as joins are costless. As just a faster HDD, not so much; volumetric density of NAND will never reach areal density of rotating rust. It needs to stressed that what is viewed as a sequential operation at the application/OS level is not guaranteed to be into contiguous locations on the NAND, as it is on rotating rust. The probability is virtually zero.
In summary, SSD drives are not the panacea we hoped, or at least, not yet. A battery-backed disk drive controller is still the ideal solution for high performance at a reasonable cost. This Postgres email thread from and covers many of these details
Just for grins, and because it’s timely (published today), here’s the latest AnandTech SSD review: .
What’s important about this review:
- it’s for a SandForce 1200 controller, which is intended to be SF’s vanilla retail controller
- it’s really fast
- it uses no external DRAM cache, which is material to design decisions about how to build an SSD, which makes SF based SSD materially different from all others, retail or “enterprise”
- one can see the playing field of most retail SSD and top notch retail HDD in the tables
SSD are not yet commodity parts, unlike HDD. Evolution continues; hopefully so will decisions about how best to utilize them in RDBMS.
到底有多大的提升,用了才知道。
就像某些存储厂商提供的数据一样,理论值大得惊人,用起来实际上达不到那么大的IOPS。
转载地址:http://rioxa.baihongyu.com/