ARTS WEEK 3

  1. algorithm

无向图深度优先搜索找路径

public class Graph
{ private final int V; 
private int E; 
private LinkedList[] adj; 
private int edgeTo[]; 
private boolean marked[];
private final int s;

 public Graph(int V){ 
 this.V=V; 
 this.E=0; 
 adj=new LinkedList[V];
        for(int v=0;v<V;v++){
        adj[v]=new LinkedList<Integer>();
    }
}

public DepthFirstPaths(Graph s, int s){
marked=new boolean[G.V()];
edgeTo=new int[G.V()];
this.s=s;
dfs(G,s);
}

private int dfs(int v,Graph s){
    marked[v]=1;
    for(int w:G.adj[v]){
        if(!marked[w]){
            edgeTo[w]=v;
            dfs(G,w);
        }
    }
}

public boolean hasPathTo(int v)
{
return marked[v];
}
public Iterable<Integer> pathTo(int v){
if(!hasPathTo(v))
return null;

Stack<Integer> path=new Stack<Integer>();
for(int x=v;x!=s;x=edgeTo[x]){
path.push(x);
}
path.push(s);
return path;
}
}

2. review

chapter 1 of distributed systems for fun and profit!

Three major abstractions:

  • System model (asynchronous / synchronous)

  • Failure model (crash-fail, partitions, Byzantine)

  • Consistency model (strong, eventual)

Replication and partition

(replica shards is a crucial part of ElasticSearch.)

3. Tip

Delete unused code to simplify the code base and not to confuse other engineers!

4. Share

Constant Time Suffices

常数时间需求

The persistent data structure used in messaging systems are often a per-consumer queue with an associated BTree or other general-purpose random access data structures to maintain metadata about messages. BTrees are the most versatile data structure available, and make it possible to support a wide variety of transactional and non-transactional semantics in the messaging system. They do come with a fairly high cost, though: Btree operations are O(log N). Normally O(log N) is considered essentially equivalent to constant time, but this is not true for disk operations. Disk seeks come at 10 ms a pop, and each disk can do only one seek at a time so parallelism is limited. Hence even a handful of disk seeks leads to very high overhead. Since storage systems mix very fast cached operations with very slow physical disk operations, the observed performance of tree structures is often superlinear as data increases with fixed cache--i.e. doubling your data makes things much worse than twice as slow.

Intuitively a persistent queue could be built on simple reads and appends to files as is commonly the case with logging solutions. This structure has the advantage that all operations are O(1) and reads do not block writes or each other. This has obvious performance advantages since the performance is completely decoupled from the data size—one server can now take full advantage of a number of cheap, low-rotational speed 1+TB SATA drives. Though they have poor seek performance, these drives have acceptable performance for large reads and writes and come at 1/3 the price and 3x the capacity.

Having access to virtually unlimited disk space without any performance penalty means that we can provide some features not usually found in a messaging system. For example, in Kafka, instead of attempting to delete messages as soon as they are consumed, we can retain messages for a relatively long period (say a week). This leads to a great deal of flexibility for consumers, as we will describe.

消息系统数据结构,一般为一个消费者一个队列,一队列有一个关联的B树/其他随机获得数据的数据结构。B树灵活多变,时间复杂度为O(logN),尽管。一般情况认为这就是线性的时间复杂度,但在磁盘操作中并不是这样。磁盘中观察到的树形结构中的性能曲线会是超线性的,数据变为两倍,时间不止是过去的两倍。

另一种直觉想法是可以直接用简单的读操作+把数据拼接到文件上。好处是操为O(1),读写互相不会阻碍对方。性能上把数据大小和时间复杂度解耦。

当我们这样做时,我们就有可视作无限的磁盘空间&没有性能问题。这使得我们可以提供一些消息系统中少见的功能,比如,我们在消费后不会立即删除,而可以保存相对长的时间。这给予了客户不错的灵活性。

Last updated