您當前位置：首頁 > 互聯(lián)網(wǎng) > Hadoop-2.4.1學習之edits和fsimage查看器

Hadoop-2.4.1學習之edits和fsimage查看器

來源：程序員人生發(fā)布時間：2014-11-13 09:02:56 閱讀次數(shù)：2746次

在hadoop中edits和fsimage是兩個相當重要的文件，其中edits負責保存自最新檢查點后命名空間的變化，起著日志的作用，而fsimage則保存了最新的檢查點信息。這個兩個文件中的內(nèi)容使用普通文本編輯器是沒法直接查看的，榮幸的是hadoop為此準備了專門的工具用于查看文件的內(nèi)容，這些工具分別為oev和oiv，可使用hdfs調(diào)用履行。

oev是offline edits viewer（離線edits查看器）的縮寫，該工具只操作文件因此其實不需要hadoop集群處于運行狀態(tài)。該工具提供了幾個輸出處理器，用于將輸入文件轉換為相干格式的輸出文件，可使用參數(shù)-p指定。目前支持的輸出格式有binary（hadoop使用的2進制格式）、xml（在不使用參數(shù)p時的默許輸出格式）和stats（輸出edits文件的統(tǒng)計信息）。該工具支持的輸入格式為binary和xml，其中的xml文件為該工具使用xml處理器的輸出文件。由于沒有與stats格式對應的輸入文件，所以1旦輸出為stats格式將不可以再轉換為原有格式。比如輸入格式為bianry，輸出格式為xml，可以通過將輸入文件指定為原來的輸出文件，將輸出文件指定為原來的輸入文件實現(xiàn)binary和xml的轉換，而stats則不可以。該工具的具體使用語法為：

Usage: bin/hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE Parse a Hadoop edits log file INPUT_FILE and save results in OUTPUT_FILE. Required command line arguments: -i,--inputFile <arg> edits file to process, xml (case insensitive) extension means XML format, any other filename means binary format -o,--outputFile <arg> Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option Optional command line arguments: -p,--processor <arg> Select which type of processor to apply against image file, currently supported processors are: binary (native binary format that Hadoop uses), xml (default, XML format), stats (prints statistics about edits file) -h,--help Display usage information and exit -f,--fix-txids Renumber the transaction IDs in the input,so that there are no gaps or invalid transaction IDs. -r,--recover When reading binary edit logs, use recovery mode. This will give you the chance to skip corrupt parts of the edit log. -v,--verbose More verbose output, prints the input and output filenames, for processors that write to a file, also output to screen. On large image files this will dramatically increase processing time (default is false).

該工具使用的示例及輸出文件的部份文件內(nèi)容以下：

$ hdfs oev -i edits_0000000000000000081-0000000000000000089 -o edits.xml <?xml version="1.0" encoding="UTF⑻"?> <EDITS> <EDITS_VERSION>⑸6</EDITS_VERSION> <RECORD> <OPCODE>OP_DELETE</OPCODE> <DATA> <TXID>88</TXID> <LENGTH>0</LENGTH> <PATH>/user/hive/test</PATH> <TIMESTAMP>1413794973949</TIMESTAMP> <RPC_CLIENTID>a52277d8-a855⑷1ee⑼ca2-a5d0bc7d298a</RPC_CLIENTID> <RPC_CALLID>3</RPC_CALLID> </DATA> </RECORD> </EDITS>

在輸出文件中，每一個RECORD記錄了1次操作，在該示例中履行的是刪除操作。當edits文件破損進而致使hadoop集群出現(xiàn)問題時，保存edits文件中正確的部份是可能的，可以通過將原本的bianry文件轉換為xml文件，并手動編輯xml文件然后轉回bianry文件來實現(xiàn)。最多見的edits文件破損情況是丟失關閉記錄的部份（OPCODE為⑴），關閉記錄以下所示。如果在xml文件中沒有關閉記錄，可以在最后正確的記錄后面添加關閉記錄，關閉記錄后面的記錄都將被疏忽。

oiv是offline image viewer的縮寫，用于將fsimage文件的內(nèi)容轉儲到指定文件中以便于瀏覽，該工具還提供了只讀的WebHDFS API以允許離線分析和檢查hadoop集群的命名空間。oiv在處理非常大的fsimage文件時是相當快的，如果該工具不能夠處理fsimage，它會直接退出。該工具不具有向后兼容性，比如使用hadoop⑵.4版本的oiv不能處理hadoop⑵.3版本的fsimage，只能使用hadoop⑵.3版本的oiv。同oev1樣，就像它的名稱所提示的（offline），oiv也不需要hadoop集群處于運行狀態(tài)。oiv具體語法可以通過在命令行輸入hdfs oiv查看。

oiv支持3種輸出處理器，分別為Ls、XML和FileDistribution，通過選項-p指定。Ls是默許的處理器，該處理器的輸出與lsr命令的輸出極為相似，以相同的順序輸出相同的字段，比如目錄或文件的標志、權限、副本數(shù)量、所有者、組、文件大小、修改日期和全路徑等。與lsr不同的是，該處理器的輸出包括根路徑/，另外一個重要的不同是該處理器的輸出不是依照目錄名稱和內(nèi)容排序的，而是依照在fsimage中的順序顯示。除非命名空間包括較少的信息，否則不太可能直接比較該處理器和lsr命令的輸出。Ls使用INode塊中的信息計算文件大小并疏忽-skipBlocks選項。示例以下：

[hadoop@hadoop current]$ hdfs oiv -i fsimage_0000000000000000115 -o fsimage.ls [hadoop@hadoop current]$ cat fsimage.ls drwxr-xr-x - hadoop supergroup 1412832662162 0 / drwxr-xr-x - hadoop supergroup 1413795010372 0 /user drwxr-xr-x - hadoop supergroup 1414032848858 0 /user/hadoop drwxr-xr-x - hadoop supergroup 1411626881217 0 /user/hadoop/input drwxr-xr-x - hadoop supergroup 1413770138964 0 /user/hadoop/output

XML處理器輸出fsimage的xml文檔，包括了fsimage中的所有信息，比如inodeid等。該處理器的輸出支持XML工具的自動化處理和分析，由于XML語法格式的冗雜，該處理器的輸出也最大。示例以下：

[hadoop@hadoop current]$ hdfs oiv -i fsimage_0000000000000000115 -p XML -o fsimage.xml [hadoop@hadoop current]$ cat fsimage.xml <?xml version="1.0"?> <fsimage> <NameSection> <genstampV1>1000</genstampV1> <genstampV2>1004</genstampV2> <genstampV1Limit>0</genstampV1Limit> <lastAllocatedBlockId>1073741828</lastAllocatedBlockId> <txid>115</txid> </NameSection> <INodeSection> <lastInodeId>16418</lastInodeId> <inode> <id>16385</id> <type>DIRECTORY</type> <name></name> <mtime>1412832662162</mtime> <permission>hadoop:supergroup:rwxr-xr-x</permission> <nsquota>9223372036854775807</nsquota> <dsquota>⑴</dsquota> </inode> <inode> <id>16386</id> <type>DIRECTORY</type> <name>user</name> <mtime>1413795010372</mtime> <permission>hadoop:supergroup:rwxr-xr-x</permission> <nsquota>⑴</nsquota> <dsquota>⑴</dsquota> </inode> </INodeSection> </fsimage>

FileDistribution是分析命名空間中文件大小的工具。為了運行該工具需要通過指定最大文件大小和段數(shù)定義1個整數(shù)范圍[0,maxSize]，該整數(shù)范圍根據(jù)段數(shù)分割為若干段[0, s[1], ..., s[n⑴], maxSize]，處理器計算有多少文件落入每一個段中（[s[i⑴], s[i]），大于maxSize的文件總是落入最后的段中，即s[n⑴], maxSize。輸出文件被格式化為由tab分隔的包括Size列和NumFiles列的表，其中Size表示段的起始，NumFiles表示文件大小落入該段的文件數(shù)量。在使用FileDistribution處理器時還需要指定該處理器的參數(shù)maxSize和step，若未指定默許為0。示例以下：

[hadoop@hadoop current]$ hdfs oiv -i fsimage_0000000000000000115 -o fsimage.fd -p FileDistribution maxSize 1000 step 5 [hadoop@hadoop current]$ cat fsimage.fd Processed 0 inodes. Size NumFiles 2097152 2 totalFiles = 2 totalDirectories = 11 totalBlocks = 2 totalSpace = 4112 maxFileSize = 1366

生活不易，碼農(nóng)辛苦
如果您覺得本網(wǎng)站對您的學習有所幫助,可以手機掃描二維碼進行捐贈
程序員人生

------分隔線----------------------------

上一篇 Android Material Design-Maintaining Compatibility(保持兼容性)-(七)

下一篇 Android-2電話應用，短信應用

分享到:

------分隔線----------------------------

為碼而活

積分：4237

15粉絲

7關注

欄目熱點

多多色-多人伦交性欧美在线观看-多人伦精品一区二区三区视频-多色视频-免费黄色视屏网站-免费黄色在线

Hadoop-2.4.1學習之edits和fsimage查看器