博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Hadoop port to Jxta P2P Framework
阅读量:6394 次
发布时间:2019-06-23

本文共 7397 字,大约阅读时间需要 24 分钟。

https://www.java.net/forum/topic/jxta/jxta-community-forum/hadoop-port-jxta-p2p-framework

——————————————————————————————————————————————————————————————————————

besn0847
Offline
Joined: 2010-06-01
 
 

Hi,

I started few months ack a port of Hadoop DFS to JXTA to use it to share files across all my PCs and ensure automated replication.

Current version is 0.6.0 and is quite stable. Still have to finalize the Windows poirt but workds fine on other platforms.

Source :

Info :
Binaries :

Feel free to comment / criticize ...

Franck

——————————————————————————————————————————

Description

Hadoop is designed to work in large datacenters with thousands of servers connected to each others in the Hadoop cloud. This project focuses on the Distributed File System part of Hadoop (HDFS).

The goal of this project is to provide an alternative to direct IP connectivity required for Hadoop. Instead, the DFS layer has been modified to use a Peer-2-Peer framework which allows direct connectivity in datacenters as well as indirect connectivity to bypass firewall constraints.
The typical use case is the use of servers in various DMZs with hundreds of gigabytes of data not used which can be leveraged to provide a massive storage cloud for Hadoop.
The first release of Jxtadoop focused on providing the data storage layer with P2P capabilities based on the JXTA framework.

——————————————————————————————————————————————————————

History

Authors:

Genesis

This project started about 1 year and half ago when i thought about creating a private Hadoop cloud to leverage existing servers not located in the same sites. The initial goal was to create a private storage cloud using HDFS.

This cloud was setup using OpenVpn between all the sites but this was not ideal because :

  • Using a VPN involves to have all the traffic flowing to one central place even if the nodes are on the same local LAN;
  • This solution requires to install a VPN on remote server which couldn't be easily controlled;

Hence the decision to look into an alternative to direct IP connectivity requested by Hadoop nodes.

Concept

The concept is quite simple : the IP connectivity layer is replaced by a P2P one which can handle either direct connections (thru use of multicast) or indirect connections (thru the use of relays & rendez-vous).

——————————————————————————————————————————————————

Peer-to-Peer

Authors:

This wiki section describes the peer-to-peer layer chosen for Jxtadoop.

Multiple peer-to-peer frameworks exist today and some of them are based on Java. Jxta is a framework designed since 2001 and its current version is 2.7 dating back to H1 2011. This is one of the most comprehensive P2P Java framework even though it is quite complex and is not currently active.

More information can be found at :

I also recommend the reading of Jérôme Verstrynge book : Practical JXTA II () which is a very good introduction.

Few reasons drove the choice of JXTA :

  1. JXTA is a longstanding P2P framework (10 years) with recent updates in 2011
  2. JXTA is developped in Java making seamless integration with Hadoop
  3. JXTA can cope with LAN and enable direct communications through use of multicasting
  4. For none direct communications (firewalls, NAT, internet...), JXTA provides the rendez-vous and peers infrastructure to enable those communications
  5. JXTA provides sockets capabilities which can replace the Hadoop sockets without requiring in-depth Hadoop code rework
  6. Communications in JXTA can be fully authorized and encrypted to secure communications out of the corporate LAN
  7. JXTA provides PeerGroup concepts which can be used to isolate datanodes ...

However if you want to support this project and start diving into JXTA you need to know that support it pretty limited, the document is quite poor and the community is small. So your investment can be quite important.

————————————————————————————————————————————————————————

Architecture

Authors:

The JXTA P2P layer has been implemented aside the Namenode and Datanodes. This layer uses with the basic JXTA features. There is one PeerGroup dedicated for NN and DN RPC and DATA communications.

Security features will be added in the future along with multiple peer groups to isolate and secure RPC comms from DATA comms.

At this P2P level, a monitor is implemented to identify datanode when they connect and disconnect. A notification is then sent to the Namenode which will update the datanode hosts map accordingly.

This has been designed that way since many storage nodes could be connected and disconnected quite often.

On top of this P2P layer, JXTA sockets have been used to minimize the rework at the Namenode and DataNode level.

For the first version the following components have been removed : Balancer, Secondary Namenode and Jetty server.

——————————————————————————————————————————————————————————

Code Changes

Authors:

The following code changes have been made to Hadoop DFS 0.20.2. The next version will be based on Hadoop 1.0.0.

P2P Infrastructure

JXTA layer deployed with a unique peergroup for all comms :

. NN-to-DN : RPC
. DN-to-DN : RPC + data comms

Hadoop RPC Server

The RPC server classes have been modified to support JXTA sockets.

Hadoop Data Block Server

The socket server used to exchange data blocks has been modified to support JXTA sockets.

Components Removal

The following components have been removed from the first version :

. Balancer
. Secondary Namenode
. Http Web Server

Local Buffering

When the used FsShell has a colocated Datanode, the file is loaded to the local DN only. The replication will then take place in the backend.

Datanodes Notifications

If a Datanode disconnects from the P2P cloud, a notification is raised by the peer monitor and sent to the Namenode which will remove it from the hosts map.

Full list of modified classes

————————————————————————————————————————————————————

Roadmap

Authors:

This is the roadmap page


 

Jxtadoop Admin

2012-01-26

Future work (thoughts)

  1. Re-include the removed components (balancer, secondary namenode, jetty server)
  2. Re-work the code to use the Hadoop 1.0.0 branch
 
Last edit: Jxtadoop Admin 2012-01-28
——————————————————————————————————————————————————————

Instructions

Authors:

Instructions to start the Namenode & Datanodes

Namenode

1/ Set the JAVA_HOME environment variable2/ Unzip the jxtadoop-datanode-x.y.z.zip to the target directory        + chmod the executable in bin/ directory3/ Edit the etc/hdfs-p2p.xml and set the following 2 properties :        hadoop.p2p.rpc.rdv        hadoop.p2p.rpc.relay     Note that this 2 properties are mandatory even if the same multicast network    to avoid issues with multiple namenodes running in the same network.4/ Initiliaze the namenode :         > bin/hadoop namenode -format5/ Start up the namenode        > bin/start-namenode.sh

Datanode

1/ Set the JAVA_HOME environment variable2/ Unzip the jxtadoop-datanode-x.y.z.zip to the target directory        + chmod the executable in bin/ directory3/ Edit the etc/hdfs-p2p.xml and set the following 2 properties :        hadoop.p2p.rpc.rdv        hadoop.p2p.rpc.relay4/ Start up the namenode        > bin/start-datanode.sh

DFSClient

You can use the DFSClient as per Hadoop. For example

bin/hadoop fs -mkdir /test

bin/hadoop fs -chmod 777 /test
bin/hadoop fs -put ~/tmp/myfile /test
bin/hadoop fs get /test/myfile /tmp

Contact

Mail to : jxtadoop@besnard.mobi

Known issues

i1/ The JXTA layer may generate P2P exceptions upon sockets closure;

 

转载地址:http://dheha.baihongyu.com/

你可能感兴趣的文章
个人网站搭建---godaddy域名+freewebhostingarea免费空间
查看>>
yum安装nginx的负载均衡详解
查看>>
如何保证程序进入后台继续运行
查看>>
Node.js权威指南 (9) - 进程与子进程
查看>>
Toast 信息提示框
查看>>
BitArray源码解析
查看>>
vue-fullpage应用
查看>>
Ansible批量部署工具的安装
查看>>
javascript的阻塞机制
查看>>
vue-cli打包之后的项目在nginx的部署
查看>>
页面触底自动加载数据
查看>>
Sublime Text3在Windows下怎样运行PHP代码
查看>>
清单文件介绍
查看>>
go之数组
查看>>
springMVC + quartz实现定时器(任务调度器)
查看>>
精英程序员
查看>>
「插件」Runner更新Pro版,帮助设计师远离996
查看>>
TCP协议之三次握手(二)
查看>>
WindowsPhone 7 页面导航和虚拟路径、导航传值
查看>>
子元素要绝对定位时,父元素应该怎么办?
查看>>