Just K2: 如何處理 45 TB 的資料

前幾天, 公司有一個案子要報價, 是 150M 用戶的 VMS, 要提供每人 30MB 的容量, 同時會有 26 E1 的接入.

這可是大型的案子. 光硬碟就要 45TB ! 雖然一定做的出來, 但是還是得做一些 survey.

首先得考慮這麼大的容量, 如果不幸沒有正常 umount, 下次要 mount 得執行 fsck 的話, 那得要花多少時間. (i.e. fsck after crash) 選擇使用 journalling filesystem 或多或少可以解決一部分這個問題.

再來, 是萬一真的得徹底執行 fsck 的話, 得多久. (ie. how much time it costs to do a complete fsck), 這就得衡量 filesystem size. 例如: 檢查 4TB 所花的時間, 跟同時檢查兩個 2TB 所花的時間有差多少.

另外一些問題是, 我有哪些 filesystem 可供選擇 ?
經過一番調查, 有幾點要注意 (已經決定使用 linux 了)
1. linux kernel 的 block layer 是否可以 support 這麼大的容量. ([1])
-> 2.6 的 kernel, 可以打開 CONFIG_LBD.
-> linux 使用 sector_t 大小的變數, 來定址某一 block device 上的 sector (512Byte),
for 32bit, sector_t 為 32bit (if CONFIG_LBD not defined, i.e. 2TiB limit), or 64bit (if CONFIG_LBD defined)
for 64bit, sector_t 為 64bit. (i.e. 2^73 = 8 ZiB [2])

所以, 只要確定是使用 linux 2.6, 並且 enable CONFIG_LBD, 這樣就可以使用大於 2TB 的 disk array.

2. 哪些 filesystem 可以 support 這麼大的容量 ([Comparison of file systems])
我想下列是可以考慮的選擇 ext3, reiserfs3, jfs, xfs.

file system	max. file size	max. vol. size	max. # of file	max # of inodes	max. # of file in a directory
ext3 (4KB block)	2TiB	16TiB	less than # of inodes	If V is the volume size in bytes, then the default number of inodes is given by V/2¹³ (or the number of blocks, whichever is less), and the minimum by V/2²³
ReiserFs3	8 TiB	16 TiB
jfs	4 PiB	32 PiB
xfs	8 EiB	8 EiB

由於是小檔案 (幾 MB) 居多, 因此還是得確認一下每一個 filesystem 可以儲存的最大檔案數.

3. 由於 acd/acdmgr 會透過 NFS 存取 file server, 所以 NFS 是否 support 這麼大的容量 ? 或者有別的 solution ?
-> 應該只需考慮 filesize. 因為, nfs server 並沒有實際建立一個 filesystem, 而是 export 一個 local filesystem to remote by the lower layer file system.
-> nfs v3 以後, 是使用 64 bit 來存取一個file.
-> 在這個應用中, 應該不會有這個問題. (每個人只有 30MB 的空間)

4. performance 的比較
a. ext2/ext3/reiserfs/jfs IBM 的一篇文章, 從不同的角度來看這個問題.
b. xfs 的 performance tweak.

5. LVM / RAID 的考慮

Just K2

2007年7月13日星期五

如何處理 45 TB 的資料

沒有留言:

網誌存檔

關於我自己

其他連結

標籤