public class Graph
{ private final int V;
private int E;
private LinkedList[] adj;
private int edgeTo[];
private boolean marked[];
private final int s;
public Graph(int V){
this.V=V;
this.E=0;
adj=new LinkedList[V];
for(int v=0;v<V;v++){
adj[v]=new LinkedList<Integer>();
}
}
public DepthFirstPaths(Graph s, int s){
marked=new boolean[G.V()];
edgeTo=new int[G.V()];
this.s=s;
dfs(G,s);
}
private int dfs(int v,Graph s){
marked[v]=1;
for(int w:G.adj[v]){
if(!marked[w]){
edgeTo[w]=v;
dfs(G,w);
}
}
}
public boolean hasPathTo(int v)
{
return marked[v];
}
public Iterable<Integer> pathTo(int v){
if(!hasPathTo(v))
return null;
Stack<Integer> path=new Stack<Integer>();
for(int x=v;x!=s;x=edgeTo[x]){
path.push(x);
}
path.push(s);
return path;
}
}
2. review
chapter 1 of distributed systems for fun and profit!
Three major abstractions:
System model (asynchronous / synchronous)
Failure model (crash-fail, partitions, Byzantine)
Consistency model (strong, eventual)
Replication and partition
(replica shards is a crucial part of ElasticSearch.)
3. Tip
Delete unused code to simplify the code base and not to confuse other engineers!
4. Share
Constant Time Suffices
常数时间需求
The persistent data structure used in messaging systems are often a per-consumer queue with an associated BTree or other general-purpose random access data structures to maintain metadata about messages. BTrees are the most versatile data structure available, and make it possible to support a wide variety of transactional and non-transactional semantics in the messaging system. They do come with a fairly high cost, though: Btree operations are O(log N). Normally O(log N) is considered essentially equivalent to constant time, but this is not true for disk operations. Disk seeks come at 10 ms a pop, and each disk can do only one seek at a time so parallelism is limited. Hence even a handful of disk seeks leads to very high overhead. Since storage systems mix very fast cached operations with very slow physical disk operations, the observed performance of tree structures is often superlinear as data increases with fixed cache--i.e. doubling your data makes things much worse than twice as slow.
Intuitively a persistent queue could be built on simple reads and appends to files as is commonly the case with logging solutions. This structure has the advantage that all operations are O(1) and reads do not block writes or each other. This has obvious performance advantages since the performance is completely decoupled from the data size—one server can now take full advantage of a number of cheap, low-rotational speed 1+TB SATA drives. Though they have poor seek performance, these drives have acceptable performance for large reads and writes and come at 1/3 the price and 3x the capacity.
Having access to virtually unlimited disk space without any performance penalty means that we can provide some features not usually found in a messaging system. For example, in Kafka, instead of attempting to delete messages as soon as they are consumed, we can retain messages for a relatively long period (say a week). This leads to a great deal of flexibility for consumers, as we will describe.