爬取html文件（试卷）只统计未满分（ai辅助编程）

张

张建站

2026/4/21 18:51:27

10分钟阅读

爬取html文件试卷只统计未满分import os import re import csv file_path ./data # 获取文件列表 file_names os.listdir(file_path) # 1. 创建文件对象 f open(文件名3.csv, w, newline, encodingutf-8) # 2. 基于文件对象构建 csv写入对象 csv_writer csv.writer(f) # 3. 构建列表头 csv_writer.writerow([用户名, 未满分题目解析]) def extract_tigan_and_scores(html_content): 提取每个题目的题干和评分项 results [] # 找到所有题目标记位置 tihao_pattern rp id(第\d题)b([^])/b/p tihao_matches list(re.finditer(tihao_pattern, html_content)) # 找到所有评分项 score_pattern rdiv(\d\.\s*[^]\((\d)\)\s*---\s*(\d))/div score_matches list(re.finditer(score_pattern, html_content)) # 为每个评分项找到对应的题目 for score_match in score_matches: score_start score_match.start() score_text score_match.group(1) full_score int(score_match.group(2)) actual_score int(score_match.group(3)) # 向前查找最近的题目 corresponding_tihao 未知题目 for tihao_match in reversed(tihao_matches): if tihao_match.end() score_start: corresponding_tihao tihao_match.group(2).replace(nbsp;, ) break results.append({ tihao: corresponding_tihao, score_text: score_text, full_score: full_score, actual_score: actual_score }) return results for file_name in file_names: # 只处理.html文件 if not file_name.endswith(.html): continue file_full_path os.path.join(file_path, file_name) # 提取用户名方括号[]中的内容考号 username_match re.search(r\[(\d)\], file_name) if username_match: yonghuming username_match.group(1) # 提取方括号中的考号如 240701 else: yonghuming file_name[:6] # 备用方案 with open(file_full_path, r, encodingutf-8) as f1: html_content f1.read() # 提取所有题目和评分项的对应关系 all_items extract_tigan_and_scores(html_content) # 筛选未满分的题目并包含题干信息 not_full_items [] for item in all_items: if item[actual_score] item[full_score]: # 格式【题干】评分项 (满分X实得Y) formatted f【{item[tihao]}】{item[score_text]} (满分{item[full_score]}实得{item[actual_score]}) not_full_items.append(formatted) # 将未满分的题目合并为一个字符串用分号分隔 if not_full_items: jiexi ; .join(not_full_items) else: jiexi 全部满分 # 4. 写入csv文件内容 csv_writer.writerow([yonghuming, jiexi]) # 5. 关闭文件 f.close() print(ok)

MCP23017 Arduino库：类型安全与零开销GPIO扩展

1. 项目概述MCP23017_MR 是一款面向嵌入式工程师的高性能、类型安全型 MCP23017 I/O 扩展器 Arduino 库。其核心设计目标并非简单封装寄存器读写，而是构建一套符合现代 C 工程实践的硬件抽象层（HAL），在保证极致可靠性的同时&#…...

2026/4/9 20:57:07 阅读更多 →

跟着卷卷龙一起学 Camera-- 低延迟

转载至：跟着卷卷龙一起学Camera--低延迟01 原文链接： https://xie.infoq.cn/article/93d0ef671a17d0be57bcb65f4延迟的构成目前主流的 camera 大多都遵循以下工作原理：使用集成的 MIPI CSI/LVDS 等视频输入接口捕获 CMOS sensor 输出的 RAW 数…...

2026/4/9 20:57:25 阅读更多 →