How the Program Works¶
This section describes, how is the distributed chat implemented and what technology is used.
Technology¶
The program is implemented in AsyncIO. It uses a few new techniques, so you need Python 3.7 or newer to run it.
All the network connections are resolved via Asyncio Streams
. Every connection is TCP realized through the asyncio.StreamReader
and asyncio.StreamWriter
classes.
The whole program is asynchronous. Every node keeps 3 connections open at once.
Implementation¶
This section describes the inner mechanisms of the program.
The Leader Node¶
In order to avoid confusion, in this documentation the term leader will be used for the coordinating node of the chatroom and the term server will be used for the networking part of the node that listens to network connections. It means that every node has a server, however only one node in the chatroom is the leader.
The leader is the coordinator of the chat room – it receives the user messages from other nodes and broadcasts them to all other nodes in the chatroom.
The leader is the only node in the chatroom that actually knows and is connected to all the other nodes.
When the leader disconnects or dies, a new one must be elected. How it’s done it described in the next sections.
The Network Topology¶
The chatroom is defined as one network ring. This means that any node only knows its two neighbors, in the program called next and previous. Plus, every node knows the leader in order to sent it a message.
When a node disconnects or dies, the ring must be renewed. Let’s say, that node B
has two neighbors – A
and C
. A
is its previous node and C
is its next node. It means that any communication in the ring goes from node A
to B
and then from B
to C
.
- When node
B
logs out, the following happens: - Both
A
andC
detect, that their neighbor died and that the ring must be repaired.
- Both
Node
C
knows that when its previous node (B
) dies, it needs to send a message to nodeB
’s previous node, so thatB
’s previous node knows who its new next node will be.- A message is sent by
C
to the ring. Every node in the way reads it and if its next node id dead, it connects to the initial sender of this message.
- A message is sent by
This is how the ring repairs itself. However, only one node can leave it at once.
Leader Election¶
This program uses the Chang-Roberts leader election algorithm to elect the leader node when it disconnects.
When the leader disconnects, first of all, the ring is renewed (the same way as described above).
Node A
from the previous example is then the initiator of the election, because it knows the best when the ring is renewed and ready to operate again.
The node with the largest ID wins. The ID is basically just a string concatenation of its IP address and port.
The election message containing the leader candidate (the initial sender) is sent from A
to the ring.
- When a node
X
receives the election message with candidateY
inside, there are 3 possible scenarios: - The ID of the node
Y
is larger than the ID of the nodeX
- It means that
Y
remains the leader candidate.X
just passes the election message on to the ring.
- The ID of the node
- The ID of the node
X
is larger than the ID of the nodeY
- This means that the node
X
is the new leader candidate. It writes its ID to the election message and passes it on to the ring.
- The ID of the node
- The ID of
Y
is the same as ID ofX
- This means that the node
X
just received the election message it sent when the message was passing with a lower candidate ID. NodeX
just won the election
- The ID of
When a node wins the election, it sends the elected message to the ring. The elected message tells the other nodes who the winner is. All the other nodes immediately open a new connection to the newly elected leader.
Logical Time¶
Any message sent in the program and any log message has a time stamp. This time stamp contains the logical time.
The logical time is calculated using Lamport’s algorithm.
- Basically, what happens, is:
- Any node starts with logical time 0
- When a process (a node) makes a significant action, its logical time is incremented by 1
- When a node
X
receives a message from another node (Y
), it synchronizes its logical time and increments it time(X) = max( time(X), time(Y) ) + 1
- When a node
Logging in¶
When a node wants to log in, it needs to be given an IP address and a port of any node that is already participating in the ring.
- The node
X
wants to log in and contacts the nodeY
: X
opens the connection and sends a login message to the nodeY
Y
receives the message and crafts and sends the answer. The answer contains: IP and port of theY
’s next node and IP and port of the leader.
X
receives the answer. It connects to theY
’s next node, which becomesX
’s next node. It also connects to the leader.
Y
connects toX
asX
becomes the new next node ofY
Shortly said, the new node goes in front of the node it contacts during the login process.
Message Types¶
- There are various message types in the program:
user_message
: chat message sent by another user (read fromstdin
)election_message
: message used during the elections, containing the candidatelogin_message
: message sent by the node that wants to log inprev_inform_message
: previous node is dead, inform its previous node about address and port to connect toi_am_prev_message
: message sent by new prev node to its next nodehello_leader_message
: let the leader know about new nodeelected_message
: leader is already elected