自然语言处理:实践教程-人名识别
欢迎转载,作者:Ling,注明出处:自然语言处理:实践教程-人名识别
人名识别有多种方式:
- 基于HMM的人名识别
- 基于CRF的人名识别
- 基于角色标注的人名识别
基于角色标注的人名识别:
参考论文:《基于角色标注的中国人名自动识别研究》
说明:
实例:
打印词图:========按终点打印========
to: 1, from: 0, weight:04.60, word:始##始@世
to: 2, from: 0, weight:04.59, word:始##始@世界
to: 3, from: 1, weight:11.47, word:世@界
to: 4, from: 2, weight:01.45, word:世界@上
to: 4, from: 3, weight:11.40, word:界@上
to: 5, from: 4, weight:04.64, word:上@最
to: 6, from: 5, weight:09.49, word:最@伟
to: 7, from: 5, weight:04.71, word:最@伟大
to: 8, from: 6, weight:11.55, word:伟@大
to: 9, from: 7, weight:00.78, word:伟大@的
to: 9, from: 8, weight:02.31, word:大@的
to: 10, from: 9, weight:05.67, word:的@领
to: 11, from: 9, weight:05.33, word:的@领导
to: 12, from: 10, weight:11.34, word:领@导
to: 13, from: 11, weight:05.32, word:领导@是
to: 13, from: 12, weight:11.49, word:导@是
to: 14, from: 13, weight:07.19, word:是@李
to: 15, from: 14, weight:10.90, word:李@靖
to: 16, from: 15, weight:02.22, word:靖@和
to: 17, from: 16, weight:07.42, word:和@吴
to: 18, from: 17, weight:11.35, word:吴@科
to: 19, from: 18, weight:11.12, word:科@乔
to: 20, from: 19, weight:11.60, word:乔@末##末
粗分结果[世界/n, 上/m, 最/d, 伟大/a, 的/ude1, 领导/n, 是/vshi, 李/ng, 靖/b, 和/cc, 吴/tg, 科/n, 乔/ag]
人名角色观察:[ A 22202445 ][世界 L 15 ][上 L 248 K 181 C 42 D 9 M 3 E 1 ][最 L 157 K 10 C 4 ][伟大 L 4 ][的 L 15411 K 11354 M 96 C 1 ][领导 K 238 L 47 ][是 K 2507 L 2504 M 123 C 10 E 1 ][李 B 26468 E 88 C 79 D 4 L 2 K 1 ][靖 C 198 E 77 D 33 B 17 ][和 M 15401 L 2868 K 2281 D 538 C 164 E 34 ][吴 B 7853 E 9 D 4 L 4 C 3 K 3 ][科 D 911 C 75 E 66 K 20 L 4 ][乔 B 741 D 69 C 29 E 16 ][ A 22202445 ]
人名角色标注:[ /A ,世界/L ,上/K ,最/L ,伟大/L ,的/K ,领导/K ,是/K ,李/B ,靖/C ,和/D ,吴/B ,科/C ,乔/D , /A]
识别出人名:李靖 BC
识别出人名:李靖和 BCD
识别出人名:吴科 BC
识别出人名:吴科乔 BCD
粗分结果[世界/n, 上/m, 最/d, 伟大/a, 的/ude1, 领/v, 导/vg, 是/vshi, 李/ng, 靖/b, 和/cc, 吴/tg, 科/n, 乔/ag]
人名角色观察:[ A 22202445 ][世界 L 15 ][上 L 248 K 181 C 42 D 9 M 3 E 1 ][最 L 157 K 10 C 4 ][伟大 L 4 ][的 L 15411 K 11354 M 96 C 1 ][领 D 39 L 14 C 13 E 6 ][导 E 172 C 27 K 20 L 3 D 2 ][是 K 2507 L 2504 M 123 C 10 E 1 ][李 B 26468 E 88 C 79 D 4 L 2 K 1 ][靖 C 198 E 77 D 33 B 17 ][和 M 15401 L 2868 K 2281 D 538 C 164 E 34 ][吴 B 7853 E 9 D 4 L 4 C 3 K 3 ][科 D 911 C 75 E 66 K 20 L 4 ][乔 B 741 D 69 C 29 E 16 ][ A 22202445 ]
人名角色标注:[ /A ,世界/L ,上/K ,最/L ,伟大/L ,的/K ,领/D ,导/L ,是/K ,李/B ,靖/C ,和/D ,吴/B ,科/C ,乔/D , /A]
识别出人名:李靖 BC
识别出人名:李靖和 BCD
识别出人名:吴科 BC
识别出人名:吴科乔 BCD
细分词网:
0:[ ]
1:[世界]
2:[]
3:[上]
4:[最]
5:[伟大]
6:[]
7:[的]
8:[领导, 领]
9:[导]
10:[是]
11:[李, 李靖, 李靖和]
12:[靖]
13:[和]
14:[吴, 吴科, 吴科乔]
15:[科]
16:[乔]
17:[ ]
细分词图:========按终点打印========
to: 1, from: 0, weight:04.59, word:始##始@世界
to: 2, from: 1, weight:01.45, word:世界@未##数
to: 3, from: 2, weight:00.50, word:未##数@最
to: 4, from: 3, weight:04.71, word:最@伟大
to: 5, from: 4, weight:00.78, word:伟大@的
to: 6, from: 5, weight:05.33, word:的@领导
to: 7, from: 5, weight:05.67, word:的@领
to: 8, from: 7, weight:11.34, word:领@导
to: 9, from: 6, weight:05.32, word:领导@是
to: 9, from: 8, weight:11.49, word:导@是
to: 10, from: 9, weight:07.19, word:是@李
to: 11, from: 9, weight:03.54, word:是@未##人
to: 12, from: 9, weight:03.54, word:是@未##人
to: 13, from: 10, weight:10.90, word:李@靖
to: 14, from: 11, weight:04.56, word:未##人@和
to: 14, from: 13, weight:02.22, word:靖@和
to: 15, from: 12, weight:11.57, word:未##人@吴
to: 15, from: 14, weight:07.42, word:和@吴
to: 16, from: 12, weight:11.57, word:未##人@未##人
to: 16, from: 14, weight:02.82, word:和@未##人
to: 17, from: 12, weight:11.57, word:未##人@未##人
to: 17, from: 14, weight:02.82, word:和@未##人
to: 18, from: 15, weight:11.35, word:吴@科
to: 19, from: 16, weight:11.57, word:未##人@乔
to: 19, from: 18, weight:11.12, word:科@乔
to: 20, from: 17, weight:04.39, word:未##人@末##末
to: 20, from: 19, weight:11.60, word:乔@末##末
[世界/n, 上/m, 最/d, 伟大/a, 的/ude1, 领导/n, 是/vshi, 李靖/nr, 和/cc, 吴科乔/nr]
留言