[js] 智能关联新方案：相似文章（分词 + 全文索引 +SQL 实现）

缘起

之前看到 @player 大佬的前缀文档树：智能文档管理新方案，新增 Tags 面板联动感觉很不错。

然后，通过简单的方法实现了一版实现相似文章。

后又看到 @leolee 大佬的发了个 hnsw 包, 有用到的老哥可以试试及大佬的提议，感觉通过 SQL 查询也可以简单的实现。

于是夜，潇潇雨歇，万籁俱寂。键声响于静室，思绪如泉。不觉东方既白，蛙鸣入耳，而码已越万重山。

【我是小白看不懂怎么办？】

看不懂，没关系，只需要看这里即可小白使用手把手教程

实现方法

采用分词 + 全文索引 +SQL 实现方式。

前提：获取文章内容、标题和 tag 分词

首先得获取文章内容、标题和 tag 的分词，然后才方便查询。

获取分词的方法：

第三方接口，比如 https://api.yesapi.cn/docs-api-App.Scws.GetWords.html 缺点：收费（免费每月最高 10 万次）。
segmentit 分词，缺点：库 3.8M（建议下载到本地），性能一般。
如果仅在客 pc 端使用，可使用 https://github.com/yanyiwu/nodejieba 性能更好

这里，为了演示方便，选用第 2 种方案（建议把代码下载到本地），如果对其他方案有兴趣的，请自行实现。

实现：获取相似文章函数

把下面的函数放到 js 代码片段即可或放到你需要的地方调用。

// 获取相似文章列表函数
// docId 文档id，即哪个文档作为参考
// showNum 返回多少条相似文章，默认20条
// keywordNum 允许传入的最大分词个数
// wordNum 参与提取分词的字数，0 全部 >0 前n个字符
// titleTagWeight 标题和tag权重 默认0.7，代表70%
// contentWeight 内容权重 默认0.3，代表30%
// 返回 相似文章列表，如 [{id:'',title:'',score:-0,root_id:''}]
// 调用示例 await getSimilarDocs('20250702014415-7rk1d1g');
async function getSimilarDocs(docId, showNum = 20, keywordNum = 100, wordNum = 2000, titleTagWeight=0.7, contentWeight=0.3) {
    if(!docId) return [];
    // 获取文章信息
    const blockBlacks = ['c', 'html', 'iframe', 'm', 'query_embed', 'tb', 'video', 'audio', 'widget']; // 过滤块，在此列表中的不筛选
    let doc = await querySql(`select content as title, tag from blocks where id = '${docId}'`);
    doc = doc[0] || {};
    blockBlacks.push('d'); // 不包含文档类型
    let contents = await querySql(`SELECT id, content, type FROM blocks WHERE root_id = '${docId}' AND type not in (${blockBlacks.map(i=>`'${i}'`).join(',')});`);
    const ids = contents.map(item => item.id);
    const indexs = (await requestApi('/api/block/getBlocksIndexes', {ids}))?.data || {};
    contents = moveHeadersToFront(sortContentsByIndexs(contents, indexs));
    doc.content = contents.map(item => item.content).join("\n")?.trim()?.slice(0, wordNum || undefined);
    // 加载分词js
    if(!window.Segmentit) {
        await loadJs([
            '/snippets/libs/segmentit.min.js', // 本地js路径
            'https://jsd.onmicrosoft.cn/npm/segmentit@2.0.3/dist/umd/segmentit.min.js',
        ]);
        if(!window.Segmentit) return [];
        // 自定义停用词
        let resp = await fetch('/snippets/libs/cn_stopwords.txt'); // 本地停用词
        if(resp.status === 404) resp = await fetch('https://jsd.onmicrosoft.cn/gh/goto456/stopwords@master/cn_stopwords.txt');
        const cnStopWords = await resp.text() || '';
        Segmentit.stopwords.push(cnStopWords);
        Segmentit.stopwords.push(`a\nb\nc\nd\ne\nf\ng\nh\ni\nj\nk\nl\nm\nn\no\np\nq\nr\ns\nt\nu\nv\nw\nx\ny\nz\nA\nB\nC\nD\nE\nF\nG\nH\nI\nJ\nK\nL\nM\nN\nO\nP\nQ\nR\nS\nT\nU\nV\nW\nX\nY\nZ`);
    }
    // 获取分词对象
    const segmentit = window.segmentit1 || Segmentit.useDefault(new Segmentit.Segment());
    let stopedWords = window?.segmentit1?.stopedWords || [];
    if(!window.segmentit1) {
        Segmentit.stopwords.forEach(item => stopedWords = [...stopedWords, ...item]);
        segmentit.stopedWords = stopedWords;
        window.segmentit1 = segmentit;
    }
    // 获取分词信息，从内到外依次是，过滤空值和停用词，去除不重要词性，去重，按词性重要程度排序，取前n个分词
    const titleKeywords = sortWordsByPriority(segmentit, uniqueWords(segmentit.doSegment(doc.title).filter(word => word.w && word.p && !stopedWords.includes(word.w) && !getExcludeWords(segmentit, word.p) && !isLink(word.w)))).slice(0, keywordNum || undefined);
    const contentKeywords = sortWordsByPriority(segmentit, uniqueWords(segmentit.doSegment(doc.content).filter(word => word.w && word.p && !stopedWords.includes(word.w) && !getExcludeWords(segmentit, word.p) && !isLink(word.w)))).slice(0, keywordNum || undefined);
    const tagKeywords = sortWordsByPriority(segmentit, uniqueWords(segmentit.doSegment(doc.tag).filter(word => word.w && word.p && !stopedWords.includes(word.w) && !getExcludeWords(segmentit, word.p) && !isLink(word.w)))).slice(0, keywordNum || undefined);
    // 根据分词查询相似文章
    // 原理：通过查询标题和tag的匹配结果的rank，然后与查询内容的匹配结果的rank进行加权计算得分
    // sql说明：1 MATCH中的关键词需加双引号以精确匹配，然后，必须替换关键词中的双引号和单引号为两个进行转义
    //         2 MAX(title)， MAX(id)为了防止GROUP BY时出现null的情况
    //         3 MATCH不允许有空值存在，否则整个SQL都匹配不到，因此关键词为空时需要用""代替空
    //         4 得分用ROUND防止浮点数溢出
    const whereParts = [];
    if(tagKeywords.length) whereParts.push(`tag MATCH '${getKeywordsSql(tagKeywords)}'`);
    if(titleKeywords.length) whereParts.push(`content MATCH '${getKeywordsSql(titleKeywords)}'`);
    const sql = `
        SELECT 
            root_id,
            Max(hpath) AS hpath,
            MAX(title) AS title,
            MAX(id) AS id,
            ROUND(
                ${titleTagWeight} * MAX(COALESCE(doc_score, 0)) + 
                ${contentWeight} * MAX(COALESCE(content_score, 0)), 
                6
            ) AS score
        FROM (
            -- 文档元信息：标题和tag（type='d'）
            SELECT 
                id AS root_id,
                hpath,
                content AS title,
                id,
                -bm25(blocks_fts_case_insensitive) AS doc_score,
                NULL AS content_score
            FROM blocks_fts_case_insensitive
            WHERE type = 'd'
              AND id != '${docId}'
              AND (${whereParts.join(' OR ')})
      
            UNION ALL
      
            -- 文档内容（type≠'d'）
            SELECT 
                root_id,
                NULL AS hpath,
                NULL AS title,
                NULL AS id,
                NULL AS doc_score,
                -bm25(blocks_fts_case_insensitive) AS content_score
            FROM blocks_fts_case_insensitive
            WHERE type not in (${blockBlacks.map(i=>`'${i}'`).join(',')})
              AND root_id != '${docId}'
              AND content MATCH '${getKeywordsSql(contentKeywords)}'
        ) AS combined
        GROUP BY root_id
        HAVING score > 0
        ORDER BY score DESC
        LIMIT ${showNum};
    `;
    const result = await querySql(sql);
    // 补全标题（当标题或tag未匹配到，仅内容匹配到时，此时标题和标题id为null，需要补全）
    const nullTitleIds = result.filter(doc=>doc.id === null || doc.title === null).map(doc=>doc.root_id);
    if(nullTitleIds.length) {
        const docs = await querySql(`select id, content, hpath from blocks where type='d' and id in (${nullTitleIds.map(id=>`'${id}'`).join(',')});`);
        const docsMap = {};
        docs.forEach(doc=>docsMap[doc.id] = doc);
        result.forEach(doc=>{
            if(doc.id === null) doc.id = docsMap[doc.root_id]?.id;
            if(doc.title === null) doc.title = docsMap[doc.root_id]?.content;
            if(doc.hpath === null) doc.hpath = docsMap[doc.root_id]?.hpath;
        });
    }
    return result;
    ///////////////////// 辅助函数 ////////////////////
    function moveHeadersToFront(blocks) {
        const headers = [];
        const others = [];
        // 一次遍历，分类
        blocks.forEach(block => {
            if (block.type === 'h') {
                headers.push(block);
            } else {
                others.push(block);
            }
        });
        // 合并：headers 在前，others 在后
        return [...headers, ...others];
    }
    function sortWordsByPriority(segmentit, words) {
        const priorityOrder = [
            segmentit.POSTAG.D_N,   // 名词
            segmentit.POSTAG.A_NR,  // 人名
            segmentit.POSTAG.A_NS,  // 地名
            segmentit.POSTAG.A_NT,  // 机构团体
            segmentit.POSTAG.A_NZ,  // 其他专名
            segmentit.POSTAG.D_V,   // 动词
            segmentit.POSTAG.D_A,   // 形容词
            segmentit.POSTAG.D_I,   // 成语
            segmentit.POSTAG.D_L,   // 习语
            segmentit.POSTAG.D_MQ,  // 数量词
            segmentit.POSTAG.A_M    // 数词
        ];
        // 用 Map 作为哈希表分组
        const inPriority = new Map(); // Map<POSTAG, Array<Word>>
        const notInPriority = [];
        words.forEach(word => {
            const p = Array.isArray(word.p) ? word.p[0] : word.p;
            if (priorityOrder.includes(p)) {
                if (!inPriority.has(p)) {
                    inPriority.set(p, []);
                }
                inPriority.get(p).push(word);
            } else {
                notInPriority.push(word);
            }
        });
        // 最终结果数组
        const newWords = [];
        // 按优先级顺序拼接，使用 push(...), 性能最优
        priorityOrder.forEach(postag => {
            const wordsOfPos = inPriority.get(postag);
            if (wordsOfPos) {
                newWords.push(...wordsOfPos); // ✅ 原地添加，性能最好
            }
        });
        newWords.push(...notInPriority);
        return newWords;
    }
    function getExcludeWords(segmentit, p) {
        if(Array.isArray(p)) p = p[0];
        return [
            segmentit.POSTAG.D_U,  // 助词
            segmentit.POSTAG.D_P,  // 介词
            segmentit.POSTAG.D_C,  // 连词
            segmentit.POSTAG.D_D,  // 副词
            segmentit.POSTAG.D_W,  // 标点符号
            segmentit.POSTAG.D_O,  // 拟声词
            segmentit.POSTAG.D_X,  // 非语素字
            segmentit.POSTAG.D_Y,  // 语气词
            segmentit.POSTAG.D_Z,  // 状态词
            segmentit.POSTAG.D_E,  // 叹词
            segmentit.POSTAG.D_K,  // 后接成分
            segmentit.POSTAG.D_ZH, // 前接成分
            segmentit.POSTAG.UNK  // 未知词性
        ].includes(p);
    }
    function isLink(word) {
        word = word.toLowerCase();
        return word.startsWith('http://')||word.startsWith('https://')||word.startsWith('file://')||word.startsWith('assets/');
    }
    // 当出现重复时，保留第一个
    function uniqueWords(words) {
        const seen = new Set();
        return words.filter(item => {
            if (seen.has(item.w)) return false;
            seen.add(item.w);
            return true;
        });
    }
    function getKeywordsSql(keywords) {
        if(!Array.isArray(keywords) || keywords.length === 0) return '""';
        // 把关键词中的双引号和单引号都转换为双个以转义
        return keywords.map(w=>`"${w.w.replace(/"/g, '""').replace(/'/, "''")}"`).join(' OR ');
    }
    function sortContentsByIndexs(contents, indexs) {
        return contents.slice().sort((a, b) => {
            const idxA = indexs[a.id] ?? Infinity;
            const idxB = indexs[b.id] ?? Infinity;
            return idxA - idxB;
        });
    }
    async function querySql(sql) {
        const result = await requestApi('/api/query/sql', { "stmt": sql });
        if (result.code !== 0) {
            console.error("查询数据库出错", result.msg);
            return [];
        }
        return result.data;
    }
    async function requestApi(url, data, method = 'POST') {
        return await (await fetch(url, {method: method, body: JSON.stringify(data||{})})).json();
    }
    async function loadJs(urls) {
        if (!Array.isArray(urls) || urls.length === 0) {
            throw new Error('Please provide a non-empty array of script URLs');
        }
        for (const url of urls) {
            let script;
            try {
                await new Promise((resolve, reject) => {
                    script = document.createElement('script');
                    script.src = url;
                    script.async = true;
                    script.onload = () => resolve();
                    script.onerror = () => {
                        script.remove();
                        reject(new Error(`Failed to load ${url}`));
                    };
                    document.head.appendChild(script);
                });
                // 只有加载成功才会走到这里，并拿到正确的 script 元素
                return script;
            } catch (e) {
                console.warn('加载失败:', url);
                // 失败时继续下一个 URL
            }
        }
        throw new Error('所有脚本加载失败');
    }
}

函数说明

// docId 文档id，即哪个文档作为参考
// showNum 返回多少条相似文章，默认20条
// keywordNum 允许传入的最大分词个数
// wordNum 参与提取分词的字数，0 全部 >0 前n个字符
// titleTagWeight 标题和tag权重 默认0.7，代表70%
// contentWeight 内容权重 默认0.3，代表30%
// 返回 相似文章列表，如 [{id:'',title:'',score:-0,root_id:''}]
// 调用示例 await getSimilarDocs('20250702014415-7rk1d1g');

建议把分词库 cdn 地址 https://jsd.onmicrosoft.cn/npm/segmentit@2.0.3/dist/umd/segmentit.min.js 下载到本地/data/snippets/libs 目录以加快访问速度。

建议把停用词库 https://jsd.onmicrosoft.cn/gh/goto456/stopwords@master/cn_stopwords.txt 下载到本地/data/snippets/libs 目录以加快访问速度。

另外，第一次执行 getSimilarDocs 时，会进行分词初始化，可能会略有延迟，建议在页面加载时先执行初始化分词函数 initSegmentit，这样就解决了。

initSegmentit 函数如下：

// 初始化分词，建议加载时执行，以加快第一次执行时的速度
async function initSegmentit() {
    if(!window.Segmentit) {
        await loadJs([
            '/snippets/libs/segmentit.min.js', // 本地js路径
            'https://jsd.onmicrosoft.cn/npm/segmentit@2.0.3/dist/umd/segmentit.min.js',
        ]);
        if(!window.Segmentit) return;
        // 自定义停用词
        let resp = await fetch('/snippets/libs/cn_stopwords.txt'); // 本地停用词
        if(resp.status === 404) resp = await fetch('https://jsd.onmicrosoft.cn/gh/goto456/stopwords@master/cn_stopwords.txt');
        const cnStopWords = await resp.text() || '';
        Segmentit.stopwords.push(cnStopWords);
        Segmentit.stopwords.push(`a\nb\nc\nd\ne\nf\ng\nh\ni\nj\nk\nl\nm\nn\no\np\nq\nr\ns\nt\nu\nv\nw\nx\ny\nz\nA\nB\nC\nD\nE\nF\nG\nH\nI\nJ\nK\nL\nM\nN\nO\nP\nQ\nR\nS\nT\nU\nV\nW\nX\nY\nZ`);
    }
    // 获取分词信息（过滤标点符号2048）
    const segmentit = window.segmentit1 || Segmentit.useDefault(new Segmentit.Segment());
    let stopedWords = window?.segmentit1?.stopedWords || [];
    if(!window.segmentit1) {
        Segmentit.stopwords.forEach(item => stopedWords = [...stopedWords, ...item]);
        segmentit.stopedWords = stopedWords;
        window.segmentit1 = segmentit;
    }
    async function loadJs(urls) {
        if (!Array.isArray(urls) || urls.length === 0) {
            throw new Error('Please provide a non-empty array of script URLs');
        }
        for (const url of urls) {
            let script;
            try {
                await new Promise((resolve, reject) => {
                    script = document.createElement('script');
                    script.src = url;
                    script.async = true;
                    script.onload = () => resolve();
                    script.onerror = () => {
                        script.remove();
                        reject(new Error(`Failed to load ${url}`));
                    };
                    document.head.appendChild(script);
                });
                // 只有加载成功才会走到这里，并拿到正确的 script 元素
                return script;
            } catch (e) {
                console.warn('加载失败:', url);
                // 失败时继续下一个 URL
            }
        }
        throw new Error('所有脚本加载失败');
    }
}

另外，如果你想分别自定义标题，tag，文章内容的权重，需要修改 SQL，参考如下：

注意，该 SQL 仅参考的 demo，忽略复杂逻辑。

SELECT 
    root_id,
    MAX(title) AS title,
    MAX(doc_id) AS doc_id,
    ROUND(
        0.4 * MAX(COALESCE(title_score, 0)) + 
        0.3 * MAX(COALESCE(tag_score, 0)) + 
        0.3 * MAX(COALESCE(content_score, 0)), 
        6
    ) AS final_score
FROM (
    -- 分支1：标题匹配（type='d'）
    SELECT 
        id AS root_id,
        content AS title,
        id AS doc_id,
        -bm25(blocks_fts_case_insensitive) AS title_score,
        NULL AS tag_score,
        NULL AS content_score
    FROM blocks_fts_case_insensitive
    WHERE type = 'd'
      AND content MATCH '"api" OR "hello" OR "数据库"'

    UNION ALL

    -- 分支2：Tag 匹配（type='d'）
    SELECT 
        id AS root_id,
        content AS title,
        id AS doc_id,
        NULL AS title_score,
        -bm25(blocks_fts_case_insensitive) AS tag_score,
        NULL AS content_score
    FROM blocks_fts_case_insensitive
    WHERE type = 'd'
      AND tag MATCH '"api" OR "hello" OR "数据库"'

    UNION ALL

    -- 分支3：内容块匹配（type≠'d'）
    SELECT 
        root_id,
        NULL AS title,
        NULL AS doc_id,
        NULL AS title_score,
        NULL AS tag_score,
        -bm25(blocks_fts_case_insensitive) AS content_score
    FROM blocks_fts_case_insensitive
    WHERE type != 'd'
      AND content MATCH '"api" OR "hello" OR "数据库"'
) AS combined
GROUP BY root_id
HAVING final_score > 0
ORDER BY final_score DESC
LIMIT 10;

优点

简单易用，基本满足需求
对文章结构做了优化，文档标题和 tag 权重较高，段落标题前置优先分词
分词去除无关词汇，按重要词性排序，去重
分词和停用词缓存机制，提高了查询性能
SQL 利用全文检索提高性能，借助 rank 排序及内容加权进行计算，提高结果相关性
默认提取前 100 个分词和前 2000 个字符拆分关键词，兼顾性能和精度的平衡

缺点

分词缺少权重，性能一般
为提高性能需要事先把分词库和停用词下载到本地
为提高性能需要页面加载后预加载分词对象
使用 SQL 查询不如向量数据库高效
目前仅支持标题、tag 和内容加权，更多维度需要自行扩展

应用举例

添加到文档工具栏（小白直接使用该部分代码即可）

完整代码如下：

https://gitee.com/wish163/mysoft/blob/main/%E6%80%9D%E6%BA%90/%E6%9F%A5%E7%9C%8B%E5%BD%93%E5%89%8D%E6%96%87%E6%A1%A3%E7%9B%B8%E4%BC%BC%E6%96%87%E7%AB%A0%E5%88%97%E8%A1%A8.js

书签 + 插件实现

（前提：已把 getSimilarDocs 放到代码片段中或已放到该代码的前面）

// 前提：先把getSimilarDocs放入代码片段或放到这里（演示方便，这里省略）
async function main() {
   let id = '{{CurDocId}}';
   if(/^null$/i.test(id)) return;
    return await getSimilarDocs(id, 50);
}
return main();

嵌入查询实现

先安装 [js] 告别 select * from blocks！嵌入块多字段查询来了代码片段

然后把下面代码放到嵌入块查询中即可（前提：已把 getSimilarDocs 放到代码片段中或已放到该代码的前面）。

-- js
// 前提：先把getSimilarDocs放入代码片段或放到这里（演示方便，这里省略）
return pick(await getSimilarDocs(currDocId, 20), 'id__hide', 'title');

非全文索引实现方式

略去获取分词部分（请参考上文），仅供学习参考，勿用于实际应用。

WITH matched AS (
  SELECT
    root_id,
    SUM(
      (INSTR(content, '果创云') > 0)
    + (INSTR(content, 'API')     > 0)
    + (INSTR(content, '开发者')   > 0)
    ) AS match_count
  FROM blocks
  WHERE type NOT IN (
      'd','c','html','iframe','m','query_embed','tb',
      'video','audio','widget'
    )
    AND (
      INSTR(content, '果创云') > 0
      OR INSTR(content, 'API') > 0
      OR INSTR(content, '开发者') > 0
    )
  GROUP BY root_id
)
SELECT
  b.content,
  b.id__hide
FROM matched m
JOIN blocks b
  ON b.id = m.root_id
ORDER BY
  m.match_count DESC
LIMIT 10;

更专业的实现方式

请参考 @leolee 大佬的方法发了个 hnsw 包, 有用到的老哥可以试试自行实现，不在本文讨论范围。

常见问题

如何优化性能？
第一步：把 https://jsd.onmicrosoft.cn/npm/segmentit@2.0.3/dist/umd/segmentit.min.js 和 https://jsd.onmicrosoft.cn/gh/goto456/stopwords@master/cn_stopwords.txt 下载到本地
第二步：预加载分词对象，加载时执行 initSegmentit() 函数
第三步：微调 getSimilarDocs 函数，比如，修改显示条数，分词数，传给分词的字数等。
出现卡顿怎么办？
按问题 1 说的方法进行优化。
支持手机版吗？
支持。目前兼容 pc 客户端、浏览器、手机、平板等。
我是小白，看不懂怎么办？

看不懂，没关系，只需要看这里即可小白使用手把手教程

免责声明

本文所提供的代码仅是 demo 或仍处于实验阶段，仅供学习与参考之用。
请在充分测试、确认无误后再谨慎使用，切勿直接用于生产环境。
如因使用本文所述方法造成任何问题，本人不承担任何责任。

如你有任何疑问或优化建议，欢迎留言交流，共同进步！

20 回帖

[js] 智能关联新方案：相似文章（分词 + 全文索引 +SQL 实现）

欢迎来到这里！

我们正在构建一个小众社区，大家在这里相互信任，以平等 • 自由 • 奔放的价值观进行分享交流。最终，希望大家能够找到与自己志同道合的伙伴，共同成长。

注册关于

请输入回帖内容 ...

wilsons • 4 个月前 • 1
付费者捐赠者作者

@taobuyan

好的，以后会集成到插件里。

现在想使用的话，可以按下面方式操作：

基础使用：

你只需要把这个文档工具栏查看相似文章 js 代码放到思源 js 代码片段中即可

如果你不知道如何安装 js 代码片段，可参考这里，如何使用代码片段？

然后，你就可以在文档右上角点击“查看相似文章”按钮打开相似文章列表了，如下演示

至此，你就可以正常使用了。

优化方法：

如果你使用过程中发现卡顿或者你想运行更流畅，可以把下面两个文件下载后放到思源笔记工作空间的 /data/snippets/libs 目录中即可，如果 libs 目录不存在，手动创建一个即可

文件 1 https://jsd.onmicrosoft.cn/npm/segmentit@2.0.3/dist/umd/segmentit.min.js

文件 2 https://jsd.onmicrosoft.cn/gh/goto456/stopwords@master/cn_stopwords.txt

至此，你的代码应该运行流畅了很多。

进阶优化：

如果，还想进一步优化，可以修改源码中的 const data = await getSimilarDocs(docId, 50); 调用，约 21 行处。

这个函数从左到右的参数及含义依次如下，根据需要调整即可

// docId 文档 id，即哪个文档作为参考
// showNum 返回多少条相似文章，默认 20 条
// keywordNum 允许传入的最大分词个数
// wordNum 参与提取分词的字数，0 全部 >0 前 n 个字符
// titleTagWeight 标题和 tag 权重默认 0.7，代表 70%
// contentWeight 内容权重默认 0.3，代表 30%

当然，如果不太懂，只需要按上两步或一步操作即可。

1 回复

1 引用

[js] 智能关联新方案：相似文章（分词 + 全文索引 +SQL 实现） • wilsons

2 操作
wilsons 在 2025-07-24 23:25:04 更新了该回帖

wilsons 在 2025-07-24 23:07:24 更新了该回帖
其他回帖
wilsons • 4 个月前
付费者捐赠者作者

感谢分享，暂时不折腾了，以后有需要再学习。

突然感觉思源应该内核也支持插件开发，这样就可以以插件或内核插件的形式扩充内核能力，甚至扩充 api😄 。

但编译型语言估计不好实现。

2 回复
leolee • 4 个月前 • 1
订阅者捐赠者恶龙

SiyuanAssistantCollection/source/utils/tokenizer/jieba.js at master · leolee9086/SiyuanAssistantCollection
用结巴 jieba_rs_wasm 实现的,还做了下词库统记啥的,可以根据笔记内容自动统记更新自己的词库,不知道有没有参考价值

1 回复
wilsons • 4 个月前
付费者捐赠者作者

嗯嗯，目前可以先用这个，和你说的方式类似文档工具栏查看相似文章
查看全部回帖

缘起

实现方法

前提：获取文章内容、标题和 tag 分词

实现：获取相似文章函数

优点

缺点

应用举例

添加到文档工具栏（小白直接使用该部分代码即可）

书签 + 插件实现

嵌入查询实现

非全文索引实现方式

更专业的实现方式

常见问题

免责声明

相关帖子

[css] 图片反色问题显示问题

[js] 批量修改自定义属性的代码片段，可以在文档树右键批量添加和删除自定义属性

[css] 思源笔记个人 CSS 样式分享（第三版）

[css] callout 样式分享

[css] 深色模式下图片反色显示

思源笔记开发 alert、confirm 之坑

如何用 css 给有特定属性的文档块添加样式？

欢迎来到这里！

[js] 智能关联新方案：相似文章（分词 + 全文索引 +SQL 实现）

缘起

实现方法

前提：获取文章内容、标题和 tag 分词

实现：获取相似文章函数

优点

缺点

应用举例

添加到文档工具栏（小白直接使用该部分代码即可）

书签 + 插件实现

嵌入查询实现

非全文索引实现方式

更专业的实现方式

常见问题

免责声明

相关帖子

[css] 图片反色问题显示问题

[js] 批量修改自定义属性的代码片段，可以在文档树右键批量添加和删除自定义属性

[css] 思源笔记个人 CSS 样式分享 （第三版）

[css] callout 样式分享

[css] 深色模式下图片反色显示

思源笔记开发 alert、confirm 之坑

如何用 css 给有特定属性的文档块添加样式？

欢迎来到这里！

[css] 思源笔记个人 CSS 样式分享（第三版）