




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、Google Cluster Computing Faculty Training Workshop Module VI: Distributed Filesystems Spinnaker Labs, Inc.Outline Filesystems overview NFS & AFS (Andrew File System) GFS Spinnaker Labs, Inc.File Systems Overview System that permanently stores data Usually layered on top of a lower-level physical
2、 storage medium Divided into logical units called “files” Addressable by a filename (“foo.txt”) Usually supports hierarchical nesting (directories) Spinnaker Labs, Inc.File Paths A file path joins file & directory names into a relative or absolute address to identify a file Absolute: /home/aaron
3、/foo.txt Relative: docs/someFile.doc The shortest absolute path to a file is called its canonical path The set of all canonical paths establishes the namespace for the filesystemWhat Gets Stored User data itself is the bulk of the file systems contents Also includes meta-data on a drive-wide and per
4、-file basis:Drive-wide:Available spaceFormatting infocharacter set.Per-file:nameownermodification datephysical layout. Spinnaker Labs, Inc.High-Level Organization Files are organized in a “tree” structure made of nested directories One directory acts as the “root” “l(fā)inks” (symlinks, shortcuts, etc)
5、provide simple means of providing multiple access paths to one file Other file systems can be “mounted” and dropped in as sub-hierarchies (other drives, network shares) Spinnaker Labs, Inc.Low-Level Organization (1/2) File data and meta-data stored separately File descriptors + meta-data stored in i
6、nodes Large tree or table at designated location on disk Tells how to look up file contents Meta-data may be replicated to increase system reliability Spinnaker Labs, Inc.Low-Level Organization (2/2) “Standard” read-write medium is a hard drive (other media: CDROM, tape, .) Viewed as a sequential ar
7、ray of blocks Must address 1 KB chunk at a time Tree structure is “flattened” into blocks Overlapping reads/writes/deletes can cause fragmentation: files are often not stored with a linear layout inodes store all block ids related to fileFragmentationDesign Considerations Smaller inode size reduces
8、amount of wasted space Larger inode size increases speed of sequential reads (may not help random access) Should the file system be faster or more reliable? But faster at what: Large files? Small files? Lots of reading? Frequent writers, occasional readers? Spinnaker Labs, Inc.Filesystem Security Fi
9、le systems in multi-user environments need to secure private data Notion of username is heavily built into FS Different users have different access writes to files Spinnaker Labs, Inc.UNIX Permission Bits World is divided into three scopes: User The person who owns (usually created) the file Group A
10、 list of particular users who have “group ownership” of the file Other Everyone else “Read,” “write” and “execute” permissions applicable at each level Spinnaker Labs, Inc.UNIX Permission Bits: Limits Only one group can be associated with a file No higher-order groups (groups of groups) Makes it dif
11、ficult to express more complicated ownership sets Spinnaker Labs, Inc.Access Control Lists More general permissions mechanism Implemented in Windows Richer notion of privileges than r/w/x e.g., SetPrivilege, Delete, Copy Allow for inheritance as well as deny lists Can be complicated to reason about
12、and lead to security gaps Spinnaker Labs, Inc.Process Permissions Important note: processes running on behalf of user X have permissions associated with X, not process file owner Y So if root owns ls, user aaron can not use ls to peek at other users files Exception: special permission “setuid” sets
13、the user-id associated with a running process to the owner of the program file Spinnaker Labs, Inc.Disk Encryption Data storage medium is another security concern Most file systems store data in the clear, rely on runtime security to deny access Assumes the physical disk wont be stolen The disk itse
14、lf can be encrypted Hopefully by using separate passkeys for each users files (Challenge: how do you implement read access for group members?) Metadata encryption may be a separate concern Spinnaker Labs, Inc.Distributed Filesystems Support access to files on remote servers Must support concurrency
15、Make varying guarantees about locking, who “wins” with concurrent writes, etc. Must gracefully handle dropped connections Can offer support for replication and local caching Different implementations sit in different places on complexity/feature scale Spinnaker Labs, Inc.NFS First developed in 1980s
16、 by Sun Presented with standard UNIX FS interface Network drives are mounted into local directory hierarchy Your home directory on attu is NFS-driven Type mount some time at the prompt if curious Spinnaker Labs, Inc.NFS Protocol Initially completely stateless Operated over UDP; did not use TCP strea
17、ms File locking, etc, implemented in higher-level protocols Modern implementations use TCP/IP & stateful protocolsServer-side Implementation NFS defines a virtual file system Does not actually manage local disk layout on server Server instantiates NFS volume on top of local file system Local har
18、d drives managed by concrete file systems (EXT, ReiserFS, .) Other networked FSs mounted in by.? Spinnaker Labs, Inc.NFS Locking NFS v4 supports stateful locking of files Clients inform server of intent to lock Server can notify clients of outstanding lock requests Locking is lease-based: clients mu
19、st continually renew locks before a timeout Loss of contact with server abandons locksNFS Client Caching NFS Clients are allowed to cache copies of remote files for subsequent accesses Supports close-to-open cache consistency When client A closes a file, its contents are synchronized with the master
20、, and timestamp is changed When client B opens the file, it checks that local timestamp agrees with server timestamp. If not, it discards local copy. Concurrent reader/writers must use flags to disable caching Spinnaker Labs, Inc.NFS: Tradeoffs NFS Volume managed by single server Higher load on cent
21、ral server Simplifies coherency protocols Full POSIX system means it “drops in” very easily, but isnt “great” for any specific need Spinnaker Labs, Inc.Distributed FS Security Security is a concern at several levels throughout DFS stack Authentication Data transfer Privilege escalation How are these
22、 applied in NFS? Spinnaker Labs, Inc.Authentication in NFS Initial NFS system trusted client programs User login credentials were passed to OS kernel which forwarded them to NFS server A malicious client could easily subvert this Modern implementations use more sophisticated systems (e.g., Kerberos)
23、 Spinnaker Labs, Inc.Data Privacy Early NFS implementations sent data in “plaintext” over network Modern versions tunnel through SSH Double problem with UDP (connectionless) protocol: Observers could watch which files were being opened and then insert “write” requests with fake credentials to corrup
24、t data Spinnaker Labs, Inc.Privilege Escalation Local filesystem username is used as NFS username Implication: being “root” on local machine gives you root access to entire NFS cluster Solution: “root squash” NFS hard-codes a privilege de-escalation from “root” down to “nobody” for all accesses. Spi
25、nnaker Labs, Inc.AFS (The Andrew File System) Developed at Carnegie Mellon Strong security, high scalability Supports 50,000+ clients at enterprise level Spinnaker Labs, Inc.Security in AFS Uses Kerberos authentication Supports richer set of access control bits than UNIX Separate “administer”, “dele
26、te” bits Allows application-specific bits Spinnaker Labs, Inc.Local Caching File reads/writes operate on locally cached copy Local copy sent back to master when file is closed Open local copies are notified of external updates through callbacks Spinnaker Labs, Inc.Local Caching - Tradeoffs Shared da
27、tabase files do not work well on this system Does not support write-through to shared medium Spinnaker Labs, Inc.Replication AFS allows read-only copies of filesystem volumes Copies are guaranteed to be atomic checkpoints of entire FS at time of read-only copy generation Modifying data requires acce
28、ss to the sole r/w volume Changes do not propagate to read-only copies Spinnaker Labs, Inc.AFS Conclusions Not quite POSIX Stronger security/permissions No file write-through High availability through replicas, local caching Not appropriate for all file types Spinnaker Labs, Inc.The Google File Syst
29、em Spinnaker Labs, Inc.Motivation Google needed a good distributed file system Redundant storage of massive amounts of data oncheap and unreliable computers Why not use an existing file system? Googles problems are different from anyone elses Different workload and design priorities GFS is designed
30、for Google apps and workloads Google apps are designed for GFS Spinnaker Labs, Inc.Assumptions High component failure rates Inexpensive commodity components fail often “Modest” number of HUGE files Just a few million Each is 100MB or larger; multi-GB files typical Files are write-once, mostly append
31、ed to Perhaps concurrently Large streaming reads High sustained throughput favored over low latency Spinnaker Labs, Inc.GFS Design DecisionsFiles stored as chunks Fixed size (64MB)Reliability through replication Each chunk replicated across 3+ chunkserversSingle master to coordinate access, keep met
32、adata Simple centralized managementNo data caching Little benefit due to large data sets, streaming readsFamiliar interface, but customize the API Simplify the problem; focus on Google apps Add snapshot and record append operations Spinnaker Labs, Inc.GFS Client Block DiagramGFS Architecture Single
33、master Mutiple chunkserversCan anyone see a potential weakness in this design? Spinnaker Labs, Inc.Single master From distributed systems we know this is a: Single point of failure Scalability bottleneck GFS solutions: Shadow masters Minimize master involvement never move data through it, use only f
34、or metadata and cache metadata at clients large chunk size master delegates authority to primary replicas in data mutations (chunk leases) Simple, and good enough! Spinnaker Labs, Inc.Metadata (1/2) Global metadata is stored on the master File and chunk namespaces Mapping from files to chunks Locati
35、ons of each chunks replicas All in memory (64 bytes / chunk) Fast Easily accessible Spinnaker Labs, Inc.Metadata (2/2) Master has an operation log for persistent logging of critical metadata updates persistent on local disk replicated checkpoints for faster recovery Spinnaker Labs, Inc.Mutations Mut
36、ation = write or append must be done for all replicas Goal: minimize master involvement Lease mechanism: master picks one replica as primary; gives it a “l(fā)ease” for mutations primary defines a serial order of mutations all replicas follow this order Data flow decoupled from control flow Spinnaker La
37、bs, Inc.Mutations Diagram Spinnaker Labs, Inc.Mutation ExampleClient 1 opens foo for modify. Replicas are named A, B, and C. B is declared primary.Client 1 sends data X for chunk to chunk serversClient 2 opens foo for modify. Replica B still primaryClient 2 sends data Y for chunk to chunk serversSer
38、ver B declares that X will be applied before YOther servers signal receipt of dataAll servers commit X then YClients 1 & 2 close connectionsBs lease on chunk is lost Spinnaker Labs, Inc.Atomic record append Client specifies data GFS appends it to the file atomically at least once GFS picks the o
39、ffset works for concurrent writers Used heavily by Google apps e.g., for files that serve as multiple-producer/single-consumer queues Spinnaker Labs, Inc.Relaxed consistency model (1/2) “Consistent” = all replicas have the same value “Defined” = replica reflects the mutation, consistent Some propert
40、ies: concurrent writes leave region consistent, but possibly undefined failed writes leave the region inconsistent Some work has moved into the applications: e.g., self-validating, self-identifying records Spinnaker Labs, Inc.Relaxed consistency model (2/2) Simple, efficient Google apps can live wit
41、h it what about other apps? Namespace updates atomic and serializable Spinnaker Labs, Inc.Masters responsibilities (1/2) Metadata storage Namespace management/locking Periodic communication with chunkservers give instructions, collect state, track cluster health Chunk creation, re-replication, rebal
42、ancing balance space utilization and access speed spread replicas across racks to reduce correlated failures re-replicate data if redundancy falls below threshold rebalance data to smooth out storage and request load Spinnaker Labs, Inc.Masters responsibilities (2/2) Garbage Collection simpler, more reliable than traditional file delete master logs the deletion, renames the file to a hidden name lazily garbage collects hid
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 母嬰店打工合同范本
- 職業(yè)暴露防護理
- 管網(wǎng)改造施工合同范本
- 2025至2030年中國木雕線條數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年中國復(fù)膜膠數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年中國埋入式旋轉(zhuǎn)接頭數(shù)據(jù)監(jiān)測研究報告
- 2025年中國高速肉丸打漿機市場調(diào)查研究報告
- 二零二五年度電子商務(wù)領(lǐng)域競業(yè)協(xié)議敬業(yè)保證合同
- 2025年度籃球賽事贊助商責(zé)任免除合同
- 2025年度自愿離婚協(xié)議及子女撫養(yǎng)與監(jiān)護協(xié)議
- 地理-天一大聯(lián)考2025屆高三四省聯(lián)考(陜晉青寧)試題和解析
- 醫(yī)療衛(wèi)生系統(tǒng)招聘考試(中醫(yī)學(xué)專業(yè)知識)題庫及答案
- 小巴掌童話課件
- 教科版六年級科學(xué)下冊全冊教學(xué)設(shè)計教案
- 部編版小學(xué)五年級下冊《道德與法治》全冊教案含教學(xué)計劃
- 2024年青島遠洋船員職業(yè)學(xué)院高職單招語文歷年參考題庫含答案解析
- 8款-組織架構(gòu)圖(可編輯)
- 四年級下冊綜合實踐活動教案 跟著節(jié)氣去探究 全國通用
- 領(lǐng)導(dǎo)干部道德修養(yǎng)1
- Chapter-1-生物信息學(xué)簡介
- 中國郵政銀行“一點一策”方案介紹PPT課件
評論
0/150
提交評論