内容纲要

欢迎转载,作者:Ling,注明出处:自然语言处理:实践教程-人名识别

人名识别有多种方式

  • 基于HMM的人名识别
  • 基于CRF的人名识别
  • 基于角色标注的人名识别

基于角色标注的人名识别

参考论文:《基于角色标注的中国人名自动识别研究》

说明

nlpbpr001

实例

打印词图:========按终点打印========

to:  1, from:  0, weight:04.60, word:始##始@世

to:  2, from:  0, weight:04.59, word:始##始@世界

to:  3, from:  1, weight:11.47, word:世@界

to:  4, from:  2, weight:01.45, word:世界@上

to:  4, from:  3, weight:11.40, word:界@上

to:  5, from:  4, weight:04.64, word:上@最

to:  6, from:  5, weight:09.49, word:最@伟

to:  7, from:  5, weight:04.71, word:最@伟大

to:  8, from:  6, weight:11.55, word:伟@大

to:  9, from:  7, weight:00.78, word:伟大@的

to:  9, from:  8, weight:02.31, word:大@的

to: 10, from:  9, weight:05.67, word:的@领

to: 11, from:  9, weight:05.33, word:的@领导

to: 12, from: 10, weight:11.34, word:领@导

to: 13, from: 11, weight:05.32, word:领导@是

to: 13, from: 12, weight:11.49, word:导@是

to: 14, from: 13, weight:07.19, word:是@李

to: 15, from: 14, weight:10.90, word:李@靖

to: 16, from: 15, weight:02.22, word:靖@和

to: 17, from: 16, weight:07.42, word:和@吴

to: 18, from: 17, weight:11.35, word:吴@科

to: 19, from: 18, weight:11.12, word:科@乔

to: 20, from: 19, weight:11.60, word:乔@末##末

 

粗分结果[世界/n, 上/m, 最/d, 伟大/a, 的/ude1, 领导/n, 是/vshi, 李/ng, 靖/b, 和/cc, 吴/tg, 科/n, 乔/ag]

人名角色观察:[  A 22202445 ][世界 L 15 ][上 L 248 K 181 C 42 D 9 M 3 E 1 ][最 L 157 K 10 C 4 ][伟大 L 4 ][的 L 15411 K 11354 M 96 C 1 ][领导 K 238 L 47 ][是 K 2507 L 2504 M 123 C 10 E 1 ][李 B 26468 E 88 C 79 D 4 L 2 K 1 ][靖 C 198 E 77 D 33 B 17 ][和 M 15401 L 2868 K 2281 D 538 C 164 E 34 ][吴 B 7853 E 9 D 4 L 4 C 3 K 3 ][科 D 911 C 75 E 66 K 20 L 4 ][乔 B 741 D 69 C 29 E 16 ][  A 22202445 ]

人名角色标注:[ /A ,世界/L ,上/K ,最/L ,伟大/L ,的/K ,领导/K ,是/K ,李/B ,靖/C ,和/D ,吴/B ,科/C ,乔/D , /A]

识别出人名:李靖 BC

识别出人名:李靖和 BCD

识别出人名:吴科 BC

识别出人名:吴科乔 BCD

 

粗分结果[世界/n, 上/m, 最/d, 伟大/a, 的/ude1, 领/v, 导/vg, 是/vshi, 李/ng, 靖/b, 和/cc, 吴/tg, 科/n, 乔/ag]

人名角色观察:[  A 22202445 ][世界 L 15 ][上 L 248 K 181 C 42 D 9 M 3 E 1 ][最 L 157 K 10 C 4 ][伟大 L 4 ][的 L 15411 K 11354 M 96 C 1 ][领 D 39 L 14 C 13 E 6 ][导 E 172 C 27 K 20 L 3 D 2 ][是 K 2507 L 2504 M 123 C 10 E 1 ][李 B 26468 E 88 C 79 D 4 L 2 K 1 ][靖 C 198 E 77 D 33 B 17 ][和 M 15401 L 2868 K 2281 D 538 C 164 E 34 ][吴 B 7853 E 9 D 4 L 4 C 3 K 3 ][科 D 911 C 75 E 66 K 20 L 4 ][乔 B 741 D 69 C 29 E 16 ][  A 22202445 ]

人名角色标注:[ /A ,世界/L ,上/K ,最/L ,伟大/L ,的/K ,领/D ,导/L ,是/K ,李/B ,靖/C ,和/D ,吴/B ,科/C ,乔/D , /A]

识别出人名:李靖 BC

识别出人名:李靖和 BCD

识别出人名:吴科 BC

识别出人名:吴科乔 BCD

 

细分词网:

0:[ ]

1:[世界]

2:[]

3:[上]

4:[最]

5:[伟大]

6:[]

7:[的]

8:[领导, 领]

9:[导]

10:[是]

11:[李, 李靖, 李靖和]

12:[靖]

13:[和]

14:[吴, 吴科, 吴科乔]

15:[科]

16:[乔]

17:[ ]

 

细分词图:========按终点打印========

to:  1, from:  0, weight:04.59, word:始##始@世界

to:  2, from:  1, weight:01.45, word:世界@未##数

to:  3, from:  2, weight:00.50, word:未##数@最

to:  4, from:  3, weight:04.71, word:最@伟大

to:  5, from:  4, weight:00.78, word:伟大@的

to:  6, from:  5, weight:05.33, word:的@领导

to:  7, from:  5, weight:05.67, word:的@领

to:  8, from:  7, weight:11.34, word:领@导

to:  9, from:  6, weight:05.32, word:领导@是

to:  9, from:  8, weight:11.49, word:导@是

to: 10, from:  9, weight:07.19, word:是@李

to: 11, from:  9, weight:03.54, word:是@未##人

to: 12, from:  9, weight:03.54, word:是@未##人

to: 13, from: 10, weight:10.90, word:李@靖

to: 14, from: 11, weight:04.56, word:未##人@和

to: 14, from: 13, weight:02.22, word:靖@和

to: 15, from: 12, weight:11.57, word:未##人@吴

to: 15, from: 14, weight:07.42, word:和@吴

to: 16, from: 12, weight:11.57, word:未##人@未##人

to: 16, from: 14, weight:02.82, word:和@未##人

to: 17, from: 12, weight:11.57, word:未##人@未##人

to: 17, from: 14, weight:02.82, word:和@未##人

to: 18, from: 15, weight:11.35, word:吴@科

to: 19, from: 16, weight:11.57, word:未##人@乔

to: 19, from: 18, weight:11.12, word:科@乔

to: 20, from: 17, weight:04.39, word:未##人@末##末

to: 20, from: 19, weight:11.60, word:乔@末##末

 

[世界/n, 上/m, 最/d, 伟大/a, 的/ude1, 领导/n, 是/vshi, 李靖/nr, 和/cc, 吴科乔/nr]