胆结石有什么症状| 纸片人什么意思| 100a是什么尺码| 什么样的人容易得心梗| 为什么会有黑眼圈| 什么情况需要根管治疗| 婴儿大便绿色是什么原因| 挂急诊和门诊有什么区别| 关节退行性改变是什么意思| 赊账是什么意思| 解酒的酶是什么酶| 泡沫尿挂什么科| 白带带血丝是什么原因| 公园里有什么有什么还有什么| 拉肚子是什么原因引起的怎么办| 淋巴瘤是什么症状| 指什么生肖| 早上吃什么减肥| 高血糖主食吃什么好| 看病人买什么花合适| 羊汤放什么调料| 术后恢复吃什么好| 吃维c有什么好处| 什么是好词| 高血压可以吃什么水果| 太阳光是什么颜色| 愤青是什么意思| 排便困难拉不出来是什么原因| 刑妻克子是什么意思| 母亲节什么时候| 北京属于什么气候| grn什么颜色| pt是什么意思| 阴道感染有什么症状| 牙龈肿痛看什么科| gps是什么意思| 10.19什么星座| 韭菜补什么| 器材是什么意思| 痔疮的表现症状是什么| 咳嗽咳白痰是什么症状| 一什么天| 少字五行属什么| 喝酒过敏是什么原因| 霉菌性阴道炎用什么药效果好| 云翳是什么意思| 92年五行属什么| 钱癣用什么药膏最好| 水母吃什么| mdz0.2是什么药| 牙龈肿痛吃什么药最好| 柠檬水喝多了有什么坏处| 许久是什么意思| 含漱是什么意思| 6月7号什么星座| 四大金刚是什么意思| 牙龈出血用什么药| 过期啤酒有什么用途| 甲亢是什么意思| 白细胞低是怎么回事有什么危害| 什么样的女人容易出轨| 前列腺增生是什么意思| 心脏斑块是什么意思啊| 卫衣是什么| 莲蓬什么时候成熟| norm什么意思| 红花有什么功效| 京东自营店是什么意思| 什么的歌声填词语| 云南雪燕有什么作用| 郑州有什么好玩的景点| 扁桃体发炎可以吃什么水果| pre是什么的缩写| 何首乌长什么样子图片| 一什么网| 绞股蓝有什么功效| 雄黄是什么| 膝盖积液用什么药最好| 说话不清楚是什么原因导致的| 养肝要吃什么| 楔形是什么形状图片| 路怒症是什么| 发烧看什么科| 天什么云什么| 什么叫精神病| 香港警司是什么级别| 心肌病是什么症状| dtc什么意思| 戊肝阳性是什么意思| 荡气回肠是什么意思| 基础代谢率是什么意思| 爸爸是什么意思| 冠心病什么症状| 表现是什么意思| 什么是月经不调| 双向什么意思| 儿加一笔是什么字| 孕妇喝柠檬水对胎儿有什么好处| 远香近臭是什么意思| 腱鞘炎用什么药| 雷达表属于什么档次| 西梅是什么水果| 慢性胃炎吃什么药好| 8月18日什么星座| 推油是什么意思| 头皮脂溢性皮炎用什么药| 急性上呼吸道感染吃什么药| 甲状腺结节吃什么药| 跟腱炎什么症状| 12月11号是什么星座| 月经一直不停有什么办法止血| 量词是什么意思| 杨过是什么生肖| 什么是人生格言| 择日不如撞日什么意思| 腿上有白色条纹是什么| 月经提前十天是什么原因| 四十不惑是什么意思| 什么人容易高原反应| 身体出汗多是什么原因| 知我者非你也什么意思| 一年半载是什么意思| 解表化湿是什么意思| 为什么会高反| 慢性咽炎吃什么药效果最好| 夏天用什么护肤品比较好| 片仔癀有什么功效| 润喉喝什么| 登对是什么意思| 八一是什么节| 尿频是什么原因引起的| 尿酸高是什么原因引起的| 鼻塞有脓鼻涕吃什么药| 肝气虚吃什么中成药| 扒皮鱼是什么鱼| 胃幽门螺旋杆菌吃什么药| 附件炎是什么原因引起的| 男士内裤什么材质的好| 过午不食是什么意思| 有个马的标志是什么车| 阿莫西林吃多了有什么副作用| ins是什么| 女性尿路感染有什么症状| 6月16是什么星座| 妨父母痣是什么意思| 梦见死人预示什么| 小孩抵抗力差吃什么提高免疫力| 梦见别人开车撞死人是什么意思| 鸭子炖汤和什么一起炖最有营养| 月经前有褐色分泌物是什么原因| 盆腔积液用什么药| b型钠尿肽是什么意思| 防小人应该佩戴什么| 胃脘是什么意思| 3月5日什么星座| 三个小是什么字| 肚子经常疼是什么原因| 足底筋膜炎什么症状| 口腔溃疡为什么是白色的| 为什么睡觉会流口水| 灼口综合症吃什么药| 老年人反复发烧是什么原因引起的| 肠胃不好吃什么水果| 早上4点是什么时辰| 肺结核钙化是什么意思| 心电图伪差是什么意思| 拉肚子可以吃什么食物| 息斯敏又叫什么药名| 运六月有什么说法| ep是什么意思| 随餐吃是什么意思| 民不聊生是什么意思| 手指缝痒是什么原因| 血小板高什么原因| 头一直摇晃是什么病| 女娲用什么补天| 什么可以治早泄| 鱼工念什么| 吃什么对肺最好| 孩子过敏性咳嗽吃什么药好| 信阳毛尖属于什么茶| 11月9日是什么星座| 头孢克肟和头孢拉定有什么区别| 恐惧症吃什么药最好| 长期熬夜有什么危害| 低钾血症吃什么食补| 保持器是什么| 什么是膝关节退行性变| 梦见自己流产了是什么征兆| 过敏不能吃什么| 十二生肖为什么老鼠排第一| abs是什么材质| 三叉神经是什么病| 福荫是什么意思| 脸书是什么意思| 藏红花有什么功效| 县尉相当于现在什么官| 植物有什么| 心血管堵塞吃什么药| 吃什么头发长的快| 妈妈咪呀是什么意思| 长春都有什么大学| 梭形是什么形状| 做胃镜之前需要做什么准备| 合成碳硅石是什么| 2017年属鸡的是什么命| 住院需要带什么生活用品| 魔芋是什么东西做的| 主治医生是什么级别| 黄芪是什么味道| 梦见自己鼻子流血是什么预兆| 才貌双全是什么生肖| 孕妇吐得厉害有什么办法解决| 孕妇胃痛可以吃什么药| 鱼最喜欢吃什么| 血色素是什么意思| 毫不犹豫的意思是什么| 女生的胸部长什么样| 乙肝核心抗体阳性说明什么| 子宫前倾是什么意思| cp1是什么意思| 什么风什么面| 湿疹用什么药| 腋下长痘痘是什么原因| 胃疼可以吃什么食物| 小孩拉稀吃什么药| 什么东东| 生物工程学什么| 农历今天属什么生肖| 大人退烧吃什么药| 结石不能吃什么| 孤独终老什么意思| 洗牙为什么要验血| 传媒公司是干什么的| 2021年是什么年| hcg是什么检查项目| canon是什么意思| 心慌是什么症状| gap什么意思| 新生儿白细胞高是什么原因| 夏天适合种什么菜| 形态什么| 星光是什么意思| fcm是什么意思| gpr是什么意思| 经常玩手机有什么危害| 透明的什么| 口干口苦是什么病| 反常是什么意思| 脂肪肝吃什么中药| aivei是什么品牌| fdg代谢增高是什么意思| tsh是什么意思| 双子座男生喜欢什么样的女生| 胆结石是什么原因造成的| 收官什么意思| 蝉鸣声耳鸣是什么原因引起的| 口腔溃疡可以吃什么药| 爱啃指甲是什么原因| 什么是病毒| 今天过生日是什么星座| 男性尿道口流脓吃什么药最管用| 什么眼霜比较好用| 什么是高血压| 百度Jump to content

车讯:2016洛杉矶车展:自由侠Altitude特别版

From Meta, a Wikimedia project coordination wiki
An example of words persisting between revisions of a Wikipedia article about apples.
Word persistence example. An example of words persisting between revisions of a Wikipedia article about apples.
百度 国际泳联日程安排如下:第5届世界游泳大会,将于2018年12月8日12月10日举行;2018年第14届世界游泳锦标赛(25米),将于2018年12月11日12月16日举行;国际泳联世界游泳大会是世界最高级别的泳坛会议,一般安排在世界游泳锦标赛(25米)前举行,会期3天。

Content persistence is the measurement of how content persists through the history of revisions to a wiki-page based on the assumption that content that survives a certain amount of time or subsequent revisions does so due to some inherent quality of the content and its relevance to the article. This assumption is based on the view of wikis' publish-first, edit-later model as a case of informal peer review[1] where contributions that are low quality should be quickly removed or overwritten in by subsequent edits. In this way, content persistence can be viewed as a generalization of revert rate.

Construction

[edit]

The persistence of content through revisions of an article is generally determined by performing textual diffs between revisions and tracking the content that does not change. Figure 1 depicts words persisting between revisions of a toy example of an article about apples. The information attained by performing a diff between the revisions might look as follows:

  • -1: (insert: "Apples are red.")
  • 1-2: (equal: "Apples are ") (remove: "red") (insert: "blue") (equal: ".")
  • 2-3: (equal: "Apples are ") (remove: "blue") (insert: "red") (equal: ".")
  • 3-4: (equal: "Apples are ") (insert: "tasty and ") (equal: "red.")
  • 4-5: (equal: "Apples are tasty and ") (remove: "red") (insert: "blue") (equal: ".")

By tracing this diff information, a data structure can be built that keeps track of discrete content items and attributes them to their original author. In order to turn text into discrete content items, a tokenizer is used to discover word boundaries. Once content is broken into tokens, identifiers can be associated with them so that they can be tracked through the history of a page. For example:

  1. (1, "Apples"), (1, " "), (1, "are"), (1, " "), (1, "red"), (1, ".")
  2. (1, "Apples"), (1, " "), (1, "are"), (1, " "), (2, "blue"), (1, ".")
  3. (1, "Apples"), (1, " "), (1, "are"), (1, " "), (1, "red"), (1, ".")
  4. (1, "Apples"), (1, " "), (1, "are"), (1, " "), (4, "tasty"), (4, " "), (4, "and"), (1, " "), (1, "red"), (1, ".")
  5. (1, "Apples"), (1, " "), (1, "are"), (1, " "), (4, "tasty"), (4, " "), (4, "and"), (1, " "), (5, "blue"), (1, ".")

In the last revision's list of tokens, it's obvious now that "Apples are " was added by the first revision since the identifier "1" suggests this. However, it may be surprising that in the same revision, "blue" is given a new identifier, rather than persisting the (2, "blue") seen in revision #2. This is due to the lack of clarity for what text means in relation to other text. However, in revision #3, (1, "red") persisted. This is due to an identity revert, a revision that exactly duplicates a previous revision. Since the content is exactly duplicated, an algorithm can be sure that the tokens are the exact same ones.

Another way to view this set of words is to transform it into a token-major list that expresses which revisions contained the word. For example:

  • ("Apples", [1,2,3,4,5])
  • (" ", [1,2,3,4,5])
  • ("are", [1,2,3,4,5])
  • (" ", [1,2,3,4,5])
  • ("red", [1,3,4])
  • (".", [1,2,3,4,5])
  • ("blue", [2])
  • ("tasty", [4,5])
  • (" ", [4,5])
  • ("and", [4,5])
  • (" ", [4,5])
  • ("blue", [5])

From this list, it is easy to see how many revisions a given token persisted. For example, "Apples" was added in the first revision and appeared in 4 subsequent revisions. Under the assumption that subsequent revisions of the page represent informal review of the contents, one might assume that the token "Apples" was a high quality contribution to the article. However, this assumption falls flat with content that was only recently inserted into the article. This problem is commonly referred to as right censoring since, when time is plotted from left to right, the samples on the right side have less information. To state it simply, we need more revisions in the article before we can know if "tasty" was a good addition to the article or not. However, we can conclude quite confidently that the token "blue" that was added in revision #2 was not of high quality since it did not persist for a single revision (it was immediately reverted).

There's one additional issue to be concerned with: How much does whitespace matter? And for that matter, what about stop words (grayed out in Figure 1)? Recent research[2][3] has eliminated whitespace, stop words and other wiki markup when computing the value, quality and productivity of editors' work.

Metrics

[edit]
  • Persistent Word Revision (PWR): The sum total of subsequent revisions persisted by the words in a revision. Halfaker et al. used this as a measure of productivity, a mixture of the quality and quantity of Wikiwork[3]
  • PWR per Word (PWRpW): The average of the log number of revisions persisted by each word in a revision. Halfaker et al. used this as a measure of the quality of work performed by editors.[4] This indicator works under the assumption that content in Wikipedia is best thought of as randomly volatile and highly reviewed. Under this assumption, there should be an exponential decay (or a constant hazard function) in the probability of persistence of the highest quality words due to re-structuring and en:Wikipedia:Edit_creep#E.
  • Persistent Word Views (PWV): The sum of views that pages receive while a word appears in the article. Priedhorsky et al. used this metric as a measure of the value contributed by authors (assuming that an encyclopedia is meant to be read and highly viewed articles are of high value)[2].

Code

[edit]

Open licensed code has been made publicly available for tracing the content persistence through the history of revisions of an article in the python-mwpersistence library (notes from February 2016 architectural discussion).

Services

[edit]

The WikiWho API provides, for each element of the tokenized Wikitext of an article at any given revision, the revision in which the token was originally added and all revisions in which the token was deleted or reinserted. This enables content persistence measurements of several kinds, e.g., aggregated per token or editor in the article. For per-user aggregations of persisting content over all articles see a secondary endpoint for edit persistence . Available language editions to date: EN, DE, ES, TR, EU.

Limitations

[edit]
  • Only measures the quality of added content. Does not measure the quality of removals.
  • Content must be explicitly vetted by subsequent revisions in order to determine quality.

Usage

[edit]
  • In their work on WikiTrust, Adler and Alfaro use the implied review of words that last over time and revisions in articles to determine a "trustworthiness" score for content in Wikipedia[5][6].
  • A 2006 First Monday paper described the implementation of a similar coloring algorithm in MediaWiki, based on both the number of edits and the amount of time that a part of Wikipedia article has survived.[7]
  • Priedhorsky et al. used the number of views article receive while content persists to measure the value contributed by editors of Wikipedia[2]. Notably, they found that 0.1% of editors contribute ~40% of the value in the wiki (as of early 2007).
  • Halfaker et al. used the number of revisions words added by an editor last in an article to approximate the quality of contributions[4]. They developed a metric for the average number of revisions that content added by an author lasts (PWRpW = Persistent Word Revisions per Word) to be highly related to revert rate.
  • To control for the scale of contributions made by editors, Halfaker et al. used the amount of persistent word revisions contributed to measure the productivity of an editor in Wikipedia[3]. They found evidence that, despite a decrease in the rate of contributions following being reverted, editors were generally more productive, they argue, suggests a learning effect.
  • Research:Measuring edit productivity
  • A conference paper for CHI'13[8] reported on a project where 640 undergraduate and graduate students edited Wikipedia articles on scientific topics in 36 university courses. The authors found that the "students substantially improved the scientific content of over 800 articles, at a level of quality indistinguishable from content written by PhD experts", measured in a content persistence metric.
  • In 2017, Fl?ck et al. published the "Toktrack" dataset, described as containing "every instance of all tokens (≈ words) ever written in undeleted, non-redirect English Wikipedia articles until October 2016, in total 13,545,349,787 instances. Each token is annotated with (i) the article revision it was originally created in, and (ii) lists with all the revisions in which the token was ever deleted and (potentially) re-added and re-deleted from its article, enabling a complete and straightforward tracking of its history." The accompanying paper[9] presents various results about content persistence derived from this dataset. See also the summary in the Wikimedia Research Newsletter: "Who wrote this? A new dataset tracks the provenance of English Wikipedia text over 15 years"

See also

[edit]

References

[edit]
  1. Stvilia, B., Twidale, M. B., Smith, L. C., & Gasser L (2005). Information quality work organization in Wikipedia. American Society for Information Science and Technology, 59(6), 983-1001.
  2. a b c Priedhorsky, R., Chen, J., Lam, S. K., Panciera, K., Terveen, L., & Riedl, J. (2007). Creating, destroying, and restoring value in Wikipedia, GROUP (pp. 259-268).
  3. a b c Aaron Halfaker, Aniket Kittur, & John Riedl (2011). Don't bite the Newbies: How reverts effect the quantity and quality of Wikipedia work, The 7th International Symposium on Wiki's and Open Collaboration (pp. 163-172). 10.1145/2038558.2038585
  4. a b Aaron Halfaker, Aniket Kittur, Robert E. Kraut, & John Riedl. (2009). A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia, The 5th International Symposium on Wiki's and Open Collaboration Article 15, 10 pages. 10.1145/1641309.1641332
  5. B. Thomas Adler and Luca de Alfaro. A Content-Driven Reputation System for the Wikipedia. Technical Report ucsc-crl-06-18, School of Engineering, University of California, Santa Cruz, 2006.
  6. B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, Vishwanath Raman, Assigning Trust to Wikipedia Content, in WikiSym '08: Proceedings of the 2008 international symposium on Wikis, May 2008
  7. Tom Cross, Puppy smoothies: Improving the reliability of open, collaborative wikis," First Monday, volume 11, number 9 (September 2006)
  8. Rosta Farzan, Robert E. Kraut: "Wikipedia Classroom Experiment: bidirectional benefits of students’ engagement in online production communities" CHI’13, April 27–May 2, 2013, Paris, France. PDF
  9. Fl?ck, Fabian; Erdogan, Kenan; Acosta, Maribel (2025-08-06). TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia. Eleventh International AAAI Conference on Web and Social Media. 
考级有什么用 茶禅一味是什么意思 中性皮肤的特征是什么 牛蒡是什么 为什么会头疼
荔枝什么季节成熟 全身皮肤痒是什么原因 犟是什么意思 抗核抗体阳性是什么意思 什么的老虎
梦见兔子是什么预兆 速战速决的意思是什么 乙肝看什么科 无缘是什么意思 二狗子是什么意思
尿道下裂是什么意思 什么食物高蛋白含量高 结石挂什么科 舌面有裂纹是什么原因 维酶素片搭配什么药治萎缩性胃炎
清点是什么意思hcv7jop9ns3r.cn 吃中药不可以吃什么水果hcv8jop7ns9r.cn 三级医院是什么意思zhiyanzhang.com 干细胞是什么东西hcv9jop3ns7r.cn 护理主要学什么hcv8jop4ns5r.cn
三颗星是什么军衔hcv7jop6ns9r.cn 晚上睡觉放屁多是什么原因hcv9jop0ns7r.cn 张国立老婆叫什么名字hcv8jop3ns4r.cn 为什么月经前乳房胀痛hcv8jop3ns6r.cn 鳞状上皮内高度病变是什么意思hcv7jop4ns6r.cn
临期是什么意思hcv9jop4ns9r.cn epo是什么意思hcv8jop2ns3r.cn 大姨妈不来是什么原因造成的hcv9jop1ns8r.cn 头晕可以吃什么药hcv8jop5ns2r.cn ck什么意思hcv9jop1ns2r.cn
皮肤一块白一块白的是什么原因hcv8jop3ns4r.cn editor是什么意思hcv9jop2ns2r.cn ood是什么意思hcv9jop0ns9r.cn 毛发旺盛女生什么原因引起的hcv8jop4ns9r.cn 山竹和什么不能一起吃hcv8jop4ns7r.cn
百度