How the Program Works¶
This section describes, how is the distributed chat implemented and what technology is used.
Technology¶
The program is implemented in AsyncIO. It uses a few new techniques, so you need Python 3.7 or newer to run it.
All the network connections are resolved via Asyncio Streams. Every connection is TCP realized through the asyncio.StreamReader and asyncio.StreamWriter classes.
The whole program is asynchronous. Every node keeps 3 connections open at once.
Implementation¶
This section describes the inner mechanisms of the program.
The Leader Node¶
In order to avoid confusion, in this documentation the term leader will be used for the coordinating node of the chatroom and the term server will be used for the networking part of the node that listens to network connections. It means that every node has a server, however only one node in the chatroom is the leader.
The leader is the coordinator of the chat room – it receives the user messages from other nodes and broadcasts them to all other nodes in the chatroom.
The leader is the only node in the chatroom that actually knows and is connected to all the other nodes.
When the leader disconnects or dies, a new one must be elected. How it’s done it described in the next sections.
Node C sends message to the leader. The leader broadcasts it to the other nodes.
The Network Topology¶
The chatroom is defined as one network ring. This means that any node only knows its two neighbors, in the program called next and previous. Plus, every node knows the leader in order to sent it a message.
When a node disconnects or dies, the ring must be renewed. Let’s say, that node B has two neighbors – A and C. A is its previous node and C is its next node. It means that any communication in the ring goes from node A to B and then from B to C.
- When node
Blogs out, the following happens: - Both
AandCdetect, that their neighbor died and that the ring must be repaired.
- Both
Node
Cknows that when its previous node (B) dies, it needs to send a message to nodeB’s previous node, so thatB’s previous node knows who its new next node will be.- A message is sent by
Cto the ring. Every node in the way reads it and if its next node id dead, it connects to the initial sender of this message.
- A message is sent by
This is how the ring repairs itself. However, only one node can leave it at once.
Leader Election¶
This program uses the Chang-Roberts leader election algorithm to elect the leader node when it disconnects.
When the leader disconnects, first of all, the ring is renewed (the same way as described above).
Node A from the previous example is then the initiator of the election, because it knows the best when the ring is renewed and ready to operate again.
The node with the largest ID wins. The ID is basically just a string concatenation of its IP address and port.
The election message containing the leader candidate (the initial sender) is sent from A to the ring.
- When a node
Xreceives the election message with candidateYinside, there are 3 possible scenarios: - The ID of the node
Yis larger than the ID of the nodeX - It means that
Yremains the leader candidate.Xjust passes the election message on to the ring.
- The ID of the node
- The ID of the node
Xis larger than the ID of the nodeY - This means that the node
Xis the new leader candidate. It writes its ID to the election message and passes it on to the ring.
- The ID of the node
- The ID of
Yis the same as ID ofX - This means that the node
Xjust received the election message it sent when the message was passing with a lower candidate ID. NodeXjust won the election
- The ID of
When a node wins the election, it sends the elected message to the ring. The elected message tells the other nodes who the winner is. All the other nodes immediately open a new connection to the newly elected leader.
Logical Time¶
Any message sent in the program and any log message has a time stamp. This time stamp contains the logical time.
The logical time is calculated using Lamport’s algorithm.
- Basically, what happens, is:
- Any node starts with logical time 0
- When a process (a node) makes a significant action, its logical time is incremented by 1
- When a node
Xreceives a message from another node (Y), it synchronizes its logical time and increments it time(X) = max( time(X), time(Y) ) + 1
- When a node
Logging in¶
When a node wants to log in, it needs to be given an IP address and a port of any node that is already participating in the ring.
- The node
Xwants to log in and contacts the nodeY: Xopens the connection and sends a login message to the nodeY
Yreceives the message and crafts and sends the answer. The answer contains: IP and port of theY’s next node and IP and port of the leader.
Xreceives the answer. It connects to theY’s next node, which becomesX’s next node. It also connects to the leader.
Yconnects toXasXbecomes the new next node ofY
Shortly said, the new node goes in front of the node it contacts during the login process.
Message Types¶
- There are various message types in the program:
user_message: chat message sent by another user (read fromstdin)election_message: message used during the elections, containing the candidatelogin_message: message sent by the node that wants to log inprev_inform_message: previous node is dead, inform its previous node about address and port to connect toi_am_prev_message: message sent by new prev node to its next nodehello_leader_message: let the leader know about new nodeelected_message: leader is already elected