2015年4月8日星期三

Learned from "A Conversation with Werner Vogels"

What I learn from:
http://queue.acm.org/detail.cfm?id=1142065

Growth impact:
  Larger data sets, faster update rates, more requests, more services, tighter SLAs (service-level agreements), more failures, more latency challenges, more service interdependenceies, more developers, more documentation, more programs, more servers, more networks, more data centers.

Progress:
  There were many complex pieces of software combined into a single system. It could not evolve anymore. The parts that needed to scale independently were tied into sharing resources with other unknown code path. There was no isolation and, as a result, no clear ownership.
  Service orientation becomes encapsulating the data with the business logic and operates on the data, with the only access through a published service interface.
  Grew into hundreds of services and a number of application servers that aggregate the information from the services.
  Move from a two-tier monolith to a fully distributed, decentralized, services platform serving many different applications.

Outcomes:
  Strict service orientation is an excellent technique to achieve isolation; you come to a level of ownership and control that was not seen before.
  By prohibiting direct database access by clients, you can make scaling and reliability improvements to your service state without involving your clients.
  If you want to insert advanced infrastructure techniques such as decentralized request routing or distributed request tracking, you need a single unified service-access mechanism.
  The developement and operational process has greatly benefited from using services.
  Giving developers operational responsibilities has greatly enhanced the quality of the services.

Headaches:
  How do you make sure that developers are productive?
  How can you make sure that all the pieces work together as intended, now and in the future?
  How do you test in this environment?

How to make decision:
  Scope and prototype the idea quickly. New ideas are enabled through the loosely coupled services model.

How to measure:
  Have good understanding of how customers interact with the site.
  Most of our developers are in the loop with customers.

Integrated with partners:
  Provide branded web sites for the general retail partners.

2015年4月7日星期二

Fallacies of distributed computing

Good reading materials about common problems of the distributed computing.

http://en.wikipedia.org/wiki/Fallacies_of_distributed_computing

1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology does not change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous

http://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/


  • Distributed systems are different because they fail often.
  • Writing robust distributed systems costs more than writing robust single-machine systems.
  • Robust, open source distributed systems are much less common than robust, single-machine systems.
  • Coordination is very hard.
  • If you can fit your problem in memory, it’s probably trivial.
  • “It’s slow” is the hardest problem you’ll ever debug.
  • Implement backpressure throughout your system.
  • Find ways to be partially available. 
  • Metrics are the only way to get your job done.
  • Use percentiles, not averages.
  • Learn to estimate your capacity.
  • Feature flags are how infrastructure is rolled out.
  • Choose id spaces wisely.
  • Exploit data-locality ?
  • Writing cached data back to persistent storage is bad. 
  • Computers can do more than you think they can ?
  • Use the CAP theorem to critique systems.
  • Extract services.

2014年2月4日星期二

工作一年有感

算算到今日就在Pinterest工作一年了,仔细想想这一年收获颇丰。

为什么我当初会选择来Pinterest工作?鄙人不才,没有能力拿到Facebook的offer。算算现在Facebook当时给我周围的同学的offer现在的价值,都已经翻了一翻,而Pinterest的价值却还没有长,如果pinterest未来失败的话,后悔肯定会是有的。
想当初面试的时候,其实最想去的是quora和dropbox,因为面试我的人实在太牛逼了。现在想想觉得quora确实没有什么意思,但是dropbox还是很吸引人的。
除去上面几家,那么剩下最有兴趣的就是pinterest了,于是就来了。

说说我的同事们,我一直觉得和聪明的人一起工作在不管是在哪里都会成功的,这也是当初quora和dropbox吸引我的原因,期初我并不知道pinterest太多的牛人,但是加入之后我就深深的被我的老板和我的同事们折服了。
Pinterest对外来说,估计最出名的员工之一就是Tracy Chou了,美丽,聪明,做事情认真负责。可是现在我觉得最优秀的员工就是我的manager Hui Xu和mentor Charles了。Hui Xu在Google就很牛,短时间就做到了Stuff Eng,而且做了Google的realtime index,review我的design和code都非常的细致,期初我觉得有些严格到吹毛求疵,但是如今他不再review我的code的时候,我开始经常反复思考如果他能review得话会有多好。
Charles来自Amazon,后来也在Amazon旗下的IMDB,A9工作过,是很牛的eng。他对Distributed System理解很深刻,对code,project的要求很高。可以说看他写过的code是一种享受,对自己的提高非常大。因为有Hui和Charles的指导,我有时候觉得在Pinterest当一个工作狂是一种享受。
另外真心觉得在这里工作比在UCLA读Phd要累很多,可能是我当初不够认真努力吧。

我的收获颇丰,其主要原因有两点,一点是team size比较小,另一点是睡眠时间比较少。Team size比较小,当我刚加入的时候,我们team刚把search的infrastructure写好,然后开始投入recommendation system当中。我作为新手,被安排去维护和更新我们的search infrastructure。所以我可以接触到很多项目,做了autocomplete,search quality improvement,search query classification,search internationalize。还有一直到今天还在做的realtime search和新的search infrastructure。我们的小search team逐渐长到今天的4个人,我也从一个初出茅庐的小屁孩成长到要独立面对问题去做一些decision的小屁孩,不管是完成project的能力还是解决问题的能力,我都有很大提升。


对未来有很多期待,最期待的无非就是Pinterest在未来的三四年能成功上市。为了达成这个目标,我一定要好好给我认识的牛人推销Pinterest,短时间内的大目标就是把胡伯涛和郭华阳拉到公司里来。拉更加senior的eng这个任务就让那些senior eng去完成吧。还有一点就是要学会理财,半年来没有怎么存钱,如今到回家的时候发现钱不够用,真是拮据啊,希望今年的理财不要那么糟糕。

2014年1月17日星期五

快一年了

工作快一年了,多多少少有些感触,本来想等到一年之后写个一年总结,但是白羊座本来就是个急性子,所以今天想到了整理一下自己的bookmark,因为想读得东西读不玩了所以就忍不住要说点什么。

每天基本上要打开我的feedly刷新一下,一共订阅了49个sources。从前在Google Reader上订阅的更多,因为那里我年轻的时候订阅了很多女明星,女神的blog,谁知道很久以前他们都不写blog了,所以如今移民到feedly我就把他们都删掉了。剩下49个sources里有那么2个是动漫相关,3~4个是IT新闻有关,另外的都是技术博客,推荐那么几个,等到工作一年之后再讲。

平时看看技术博客,大多数都一扫而过,因为有的太过理论,有的是讲新技术,并不适用,碰到好的都会单独放在bookmark里。今天就是看到bookmark里东西太多,而且大多是计划要看但是没看的。所以不得不过来吐槽。

今天整理bookmark,发现:

和Guitar相关的有5个,但是四大和弦到现在还没有练会。
Javascript看了那么5个网页,但是没有什么大project要用到它,以后想用还要从新学。
Pinterest内部的收藏夹也就11个网页,还有很多没有在里面的常用的大多也都记住了。
想学习Go,买了一本书,也收藏了10个网页,现在基本处于能看懂的状态,自己写还要查很多文档。
MTG,PAD,For Fun什么的收藏了14个网页,比较推荐的就是The Coding Love,而且每周一我都会看Rich kids of Instagram激励自己。
除了Feedly之外,还收藏了几个IT新闻的网站。
Research有关的东西其实数数也就比较仔细的看了19篇,而且都没掌握。
Engineer有关的东西比较仔细的看了43篇,而且都不会实现。
收藏了但是还没来得及学习的东西还有40+篇,刚入职一周可能收藏个1篇,现在经常1天就收藏几篇,目测这个队列是只会增长不会减小了。


写完发现自己推荐的也就是the coding love和rich kids of instagram这样娱乐网站,总体评价就是自己太弱所以只能推荐这些了。

2013年6月12日星期三

[Linux] Hbase shell

# Use for commands:
help 'command', status, list, describe 'tablename'
*The status shows the basic status of the cluster.
status 'simple'
status 'detailed'

# Create tables:
create 'tablename', {NAME=>'family 1'}, {NAME=>'family 2'}
create 'tablename', 'family1', 'family2'

# Scan table:
scan 'tablename'
*Restrict columns: {COLUMNS=>'family:'}, {COLUMNS=>['c1', 'c2']
*Restrict number of results: {LIMIT=>number}
*Restrict start / stop row: {STARTROW=>'start row key', ENDROW=>'end row key'}

# Count:
count 'tablename', 5000
*Here 5000 is means report results every 5000 rows.

# Delete:
delete 'table', 'rowkey', 'colkey(family:column)'
delete 'table', 'rowkey'
*Delete all rows in the table.
truncate 'tablename'

# Remove table:
*First you need to disable the table, and then you can drop table.
disable 'tablename'
drop 'tablename'

# Change column family:
*Must disable table first, and then change column family
alter 'tablename', {NAME=>'family', METHOD=>'delete'}

- Hbase files stored as HFiles in HDFS.
- sorted key/value pairs and an index of keys.
- /hbase/tablename/region/column-family

[Linux] Screen command

Use screen on remote machine.

screen [-AmRvx -ls -wipe] [-d name] [-h lines] [-r name] [-s] [-S name]

screen -r pid.name: recover a screen work.
screen -d -r pid.name : recover a screen work, kick off other user on this work and start.
screen -ls
screen -m : start a new screen.
screen -dm : a new screen start with detached model.
screen -p number or name

Ctrl + a + d : quit screen with process still running.
exit : quit screen.

Ctrl + a + c : create a new window.
Ctrl + a + w : list windows.
Ctrl + a + n : next window.
Ctrl + a + p : previous window.
Ctrl + a + 0 ~ 9 : switch between window 0 to 9.
Ctrl + a + K : quit current window and jump to next.

2013年2月10日星期日

[Work] Using Git

Working in a small company will need git to commit your code and cooperate with other people.
Here is a list of command lines usually used.

Clone the main repository to your workspace:
git clone git@github.p.com:user/project.git

Add your fork:
git remote add myname git@github.p.com:user/project.git

Reset to latest code by running:
git checkout master
git fetch origin
git reset --hard origin/master

Setup your branch:
git branch your_branch_name

Make changes to your code and push it for others to see:
git push myname new_feature_branch

Hacker way, commit often:
git commit -a -m "message"

Modify commit list:
git rebase -i master

Setup git default edit and global information:
git config --global user.name "your name"
git config --global user.email "your email"
git config --global core.editor vim

More about:
git diff master
git pull

Other git commands:

git config --global user.name "robbin"   
git config --global user.email "fankai#gmail.com"
git config --global color.ui true
git config --global alias.co checkout
git config --global alias.ci commit
git config --global alias.st status
git config --global alias.br branch
git config --global core.editor "mate -w"    # 设置Editor使用textmate
git config -1 #列举所有配置
用户的git配置文件~/.gitconfig Git常用命令 查看、添加、提交、删除、找回,重置修改文件
git help   # 显示command的help
git show            # 显示某次提交的内容
git show $id

git co  --    # 抛弃工作区修改
git co  .           # 抛弃工作区修改

git add       # 将工作文件修改提交到本地暂存区
git add .           # 将所有修改过的工作文件提交暂存区

git rm        # 从版本库中删除文件
git rm  --cached  # 从版本库中删除文件,但不删除文件

git reset     # 从暂存区恢复到工作文件
git reset -- .      # 从暂存区恢复到工作文件
git reset --hard    # 恢复最近一次提交过的状态,即放弃上次提交后的所有本次修改

git ci 
git ci .
git ci -a           # 将git add, git rm和git ci等操作都合并在一起做
git ci -am "some comments"
git ci --amend      # 修改最后一次提交记录

git revert <$id>    # 恢复某次提交的状态,恢复动作本身也创建了一次提交对象
git revert HEAD     # 恢复最后一次提交的状态
查看文件diff
git diff      # 比较当前文件和暂存区文件差异
git diff
git diff <$id1> <$id2>   # 比较两次提交之间的差异
git diff .. # 在两个分支之间比较 
git diff --staged   # 比较暂存区和版本库差异
git diff --cached   # 比较暂存区和版本库差异
git diff --stat     # 仅仅比较统计信息
查看提交记录
git log
git log       # 查看该文件每次提交记录
git log -p    # 查看每次详细修改内容的diff
git log -p -2       # 查看最近两次详细修改内容的diff
git log --stat      #查看提交统计信息
tig Mac上可以使用tig代替diff和log,brew install tig Git 本地分支管理 查看、切换、创建和删除分支
git br -r           # 查看远程分支
git br  # 创建新的分支
git br -v           # 查看各个分支最后提交信息
git br --merged     # 查看已经被合并到当前分支的分支
git br --no-merged  # 查看尚未被合并到当前分支的分支

git co      # 切换到某个分支
git co -b  # 创建新的分支,并且切换过去
git co -b    # 基于branch创建新的new_branch

git co $id          # 把某次历史提交记录checkout出来,但无分支信息,切换到其他分支会自动删除
git co $id -b   # 把某次历史提交记录checkout出来,创建成一个分支

git br -d   # 删除某个分支
git br -D   # 强制删除某个分支 (未被合并的分支被删除的时候需要强制)
 分支合并和rebase
git merge                # 将branch分支合并到当前分支
git merge origin/master --no-ff  # 不要Fast-Foward合并,这样可以生成merge提交

git rebase master        # 将master rebase到branch,相当于:
git co  && git rebase master && git co master && git merge 
 Git补丁管理(方便在多台机器上开发同步时用)
git diff > ../sync.patch         # 生成补丁
git apply ../sync.patch          # 打补丁
git apply --check ../sync.patch  #测试补丁能否成功
 Git暂存管理
git stash                        # 暂存
git stash list                   # 列所有stash
git stash apply                  # 恢复暂存的内容
git stash drop                   # 删除暂存区
Git远程分支管理
git pull                         # 抓取远程仓库所有分支更新并合并到本地
git pull --no-ff                 # 抓取远程仓库所有分支更新并合并到本地,不要快进合并
git fetch origin                 # 抓取远程仓库更新
git merge origin/master          # 将远程主分支合并到本地当前分支
git co --track origin/branch     # 跟踪某个远程分支创建相应的本地分支
git co -b  origin/  # 基于远程分支创建本地分支,功能同上

git push                         # push所有分支
git push origin master           # 将本地主分支推到远程主分支
git push -u origin master        # 将本地主分支推到远程(如无远程主分支则创建,用于初始化远程仓库)
git push origin    # 创建远程分支, origin是远程仓库名
git push origin :  # 创建远程分支
git push origin :  #先删除本地分支(git br -d ),然后再push删除远程分支
Git远程仓库管理 github
git remote -v                    # 查看远程服务器地址和仓库名称
git remote show origin           # 查看远程服务器仓库状态
git remote add origin git@ github:robbin/robbin_site.git         # 添加远程仓库地址
git remote set-url origin git@ github.com:robbin/robbin_site.git # 设置远程仓库地址(用于修改远程仓库地址)
git remote rm        # 删除远程仓库
创建远程仓库
git clone --bare robbin_site robbin_site.git  # 用带版本的项目创建纯版本仓库
scp -r my_project.git git@ git.csdn.net:~      # 将纯仓库上传到服务器上

mkdir robbin_site.git && cd robbin_site.git && git --bare init # 在服务器创建纯仓库
git remote add origin git@ github.com:robbin/robbin_site.git    # 设置远程仓库地址
git push -u origin master                                      # 客户端首次提交
git push -u origin develop  # 首次将本地develop分支提交到远程develop分支,并且track

git remote set-head origin master   # 设置远程仓库的HEAD指向master分支
也可以命令设置跟踪远程库和本地库
git branch --set-upstream master origin/master
git branch --set-upstream develop origin/develop