標(biāo)題: Titlebook: Document Analysis and Recognition - ICDAR 2024; 18th International C Elisa H. Barney Smith,Marcus Liwicki,Liangrui Peng Conference proceedi [打印本頁(yè)] 作者: 側(cè)面上下 時(shí)間: 2025-3-21 16:47
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024影響因子(影響力)
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024影響因子(影響力)學(xué)科排名
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024網(wǎng)絡(luò)公開(kāi)度
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024網(wǎng)絡(luò)公開(kāi)度學(xué)科排名
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024被引頻次
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024被引頻次學(xué)科排名
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024年度引用
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024年度引用學(xué)科排名
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024讀者反饋
書(shū)目名稱Document Analysis and Recognition - ICDAR 2024讀者反饋學(xué)科排名
作者: 談判 時(shí)間: 2025-3-21 23:26 作者: Pigeon 時(shí)間: 2025-3-22 02:41 作者: 建筑師 時(shí)間: 2025-3-22 05:49 作者: oxidant 時(shí)間: 2025-3-22 11:48
Doc-DINO: A Transformer Model for?Complex Logical Document Layout Analysisachine translation, document information retrieval, and structured data extraction from documents. However, most publicly available datasets in the field of layout analysis primarily consist of documents with a single layout type, are in the English language, and are limited to PDF documents. In thi作者: probate 時(shí)間: 2025-3-22 16:41 作者: probate 時(shí)間: 2025-3-22 20:07 作者: 辭職 時(shí)間: 2025-3-22 21:31 作者: 皮薩 時(shí)間: 2025-3-23 04:42 作者: Genteel 時(shí)間: 2025-3-23 06:28 作者: Anthem 時(shí)間: 2025-3-23 09:56 作者: 權(quán)宜之計(jì) 時(shí)間: 2025-3-23 14:54 作者: 不給啤 時(shí)間: 2025-3-23 20:35 作者: deriver 時(shí)間: 2025-3-24 02:10
Deep Learning-Driven Innovative Model for?Generating Functional Knowledge Unitsscribe the functional design knowledge. Nowadays, the acquisition of functional units is mainly manual, which is time-consuming and labor-intensive. Functional knowledge integration is an effective way to achieve innovation design, yet the insufficient functional units cannot effectively support the作者: Latency 時(shí)間: 2025-3-24 05:55
Global-SEG: Text Semantic Segmentation Based on?Global Semantic Pair Relationsc blocks. This paper introduces a new perspective on this task by utilizing global semantic pair relations from both token- and sentence-level language models. This approach addresses the limitations of prior work, which concentrated solely on individual semantic units like sentences. Our model proc作者: 皺痕 時(shí)間: 2025-3-24 07:35
Multimodal Adaptive Inference for?Document Image Classification with?Anytime Early Exiting understanding (VDU) tasks. Currently, there is a reliance on large document foundation models that offer advanced capabilities but come with a heavy computational burden. In this paper, we propose a multimodal early exit (EE) model design that incorporates various training strategies, exit layer ty作者: AER 時(shí)間: 2025-3-24 13:27 作者: OPINE 時(shí)間: 2025-3-24 15:36
0302-9743 ng document semantics; NLP for document understanding; office automation; graphics recognition; human document interaction; document representation modeling and much more...?.978-3-031-70545-8978-3-031-70546-5Series ISSN 0302-9743 Series E-ISSN 1611-3349 作者: 結(jié)束 時(shí)間: 2025-3-24 21:31
hout any specific pre-training and with fewer parameters. We also report an extensive ablation study performed on FUNSD, highlighting the great impact of certain features and modelization choices on the performances.作者: Blanch 時(shí)間: 2025-3-24 23:50 作者: dithiolethione 時(shí)間: 2025-3-25 04:55
Conference proceedings 2024ndwriting recognition; document analysis systems; document classification; indexing and retrieval of documents; document synthesis; extracting document semantics; NLP for document understanding; office automation; graphics recognition; human document interaction; document representation modeling and much more...?.作者: flex336 時(shí)間: 2025-3-25 09:06
Conference proceedings 20244, held in Athens, Greece, during August 30–September 4, 2024..The total of 144 full papers presented in these proceedings were carefully selected from 263 submissions..The papers reflect topics such as: document image processing; physical and logical layout analysis; text and symbol recognition; ha作者: Ornament 時(shí)間: 2025-3-25 12:09 作者: 索賠 時(shí)間: 2025-3-25 18:33
https://doi.org/10.1007/978-3-031-70546-5Document Analysis Systems; Handwriting Recognition; Scene Text Detection and Recognition; Document Imag作者: FELON 時(shí)間: 2025-3-26 00:02
978-3-031-70545-8The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerl作者: 蛙鳴聲 時(shí)間: 2025-3-26 01:30 作者: TAG 時(shí)間: 2025-3-26 06:52 作者: Legion 時(shí)間: 2025-3-26 10:57 作者: 看法等 時(shí)間: 2025-3-26 12:59
https://doi.org/10.1007/978-94-009-6300-9various elements within document images, such as text, images, tables, and headings. The approach employs an advanced Transformer-based object detection network as an innovative graphical page object detector for identifying tables, figures, and displayed elements. We introduce a query encoding mech作者: 調(diào)色板 時(shí)間: 2025-3-26 19:57
Gerlinde Strohmaier-Wiederandersument summarization, knowledge extraction, etc. However, previous studies have typically used separate models to address individual sub-tasks within DLA, including table/figure detection, text region detection, logical role classification, and reading order prediction. In this work, we propose an en作者: START 時(shí)間: 2025-3-26 23:13
challenge. Contemporary OMR techniques, grounded in machine learning principles, have a critical requirement: a labeled dataset for training. This presents a practical challenge due to the extensive manual effort required, coupled with the fact that the availability of suitable data for creating tr作者: corn732 時(shí)間: 2025-3-27 02:03
achine translation, document information retrieval, and structured data extraction from documents. However, most publicly available datasets in the field of layout analysis primarily consist of documents with a single layout type, are in the English language, and are limited to PDF documents. In thi作者: 束以馬具 時(shí)間: 2025-3-27 07:07
https://doi.org/10.1007/978-3-030-68375-7documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating “the right to be forgotten” has recently been proposed, allowing users to request 作者: Allege 時(shí)間: 2025-3-27 12:52 作者: 空洞 時(shí)間: 2025-3-27 16:29
e current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in the visual recognition domain. We provide a comprehe作者: 開(kāi)頭 時(shí)間: 2025-3-27 18:48
s and tasks, including document-specific ones. On the other hand, there is a trend to train multi-modal transformer architectures tailored for document understanding that are designed specifically to fuse textual inputs with the corresponding document layout. This involves a separate fine-tuning ste作者: SEED 時(shí)間: 2025-3-27 23:43 作者: bizarre 時(shí)間: 2025-3-28 04:55
https://doi.org/10.1007/BFb0048530teps, such as layout analysis and optical character recognition (OCR), for information extraction from document images. We attempt to provide some answers through experiments conducted on a new database of food labels. The goal is to extract nutritional values from cellphone pictures taken in grocer作者: glucagon 時(shí)間: 2025-3-28 09:59
classification (DIC). While VRD research is dependent on increasingly sophisticated and cumbersome models, the field has neglected to study efficiency via model compression. Here, we design a KD experimentation methodology. for more lean, performant models on document understanding (DU) tasks that 作者: 輕打 時(shí)間: 2025-3-28 10:50
https://doi.org/10.1007/978-3-642-71896-0for many research tasks, including text recognition, but it is costly to annotate them. Therefore, methods utilizing unlabeled data are researched. We study self-supervised pre-training methods based on masked label prediction using three different approaches – Feature Quantization, VQ-VAE, and Post作者: Amendment 時(shí)間: 2025-3-28 18:05
scribe the functional design knowledge. Nowadays, the acquisition of functional units is mainly manual, which is time-consuming and labor-intensive. Functional knowledge integration is an effective way to achieve innovation design, yet the insufficient functional units cannot effectively support the作者: Obedient 時(shí)間: 2025-3-28 20:19
c blocks. This paper introduces a new perspective on this task by utilizing global semantic pair relations from both token- and sentence-level language models. This approach addresses the limitations of prior work, which concentrated solely on individual semantic units like sentences. Our model proc作者: aqueduct 時(shí)間: 2025-3-29 02:14
Within the Box: Captives of Our Own Mind, understanding (VDU) tasks. Currently, there is a reliance on large document foundation models that offer advanced capabilities but come with a heavy computational burden. In this paper, we propose a multimodal early exit (EE) model design that incorporates various training strategies, exit layer ty作者: GRE 時(shí)間: 2025-3-29 03:38
ncing performance in relation extraction tasks by leveraging dependency trees. However, noise in automatically generated dependency trees poses a challenge to using syntactic dependency information effectively. In this paper, we propose an Adaptive Graph Attention Network model based on Dependency T作者: 易受刺激 時(shí)間: 2025-3-29 09:55 作者: Abduct 時(shí)間: 2025-3-29 13:58
A Hybrid Approach for?Document Layout Analysis in?Document Imagesd PubTables benchmarks show that our approach outperforms current state-of-the-art methods. It achieves an average precision of . on PubLayNet, . on DocLayNet, and . on PubTables, demonstrating its superior performance in layout analysis. These advancements not only enhance the conversion of documen作者: 勾引 時(shí)間: 2025-3-29 18:24
DLAFormer: An End-to-End Transformer For Document Layout Analysisiple tasks concurrently. Additionally, we introduce a novel set of . to enhance the physical meaning of content queries in DETR. Moreover, we adopt a coarse-to-fine strategy to accurately identify graphical page objects. Experimental results demonstrate that our proposed DLAFormer outperforms previo作者: Reclaim 時(shí)間: 2025-3-29 21:46
A Region-Based Approach for?Layout Analysis of?Music Score Images in?Scarce Data Scenariosnimal labeled data necessary for an effective model and demonstrated that our method could achieve a performance comparable with the state-of-the-art with just 8 to 32 labeled samples. The implications of our research extend beyond improving LA, providing a scalable and practical solution for digiti作者: SPASM 時(shí)間: 2025-3-30 03:09
Doc-DINO: A Transformer Model for?Complex Logical Document Layout Analysisncludes convolutional attention and convolutional feedforward networks to better capture relationships between inputs and enhance the model’s expressive power. The model achieves a mean Average Precision (mAP) of 65.7 on the complex document layout analysis dataset M6Doc and 64.2 on SCUT-CAB, settin作者: irreparable 時(shí)間: 2025-3-30 06:12 作者: Dri727 時(shí)間: 2025-3-30 09:00 作者: 小教堂 時(shí)間: 2025-3-30 12:32
CICA: Content-Injected Contrastive Alignment for?Zero-Shot Document Image Classificationtent module’ designed to leverage any generic document-related textual information. The discriminative features extracted by this module are aligned with CLIP’s text and image features using a novel ‘coupled-contrastive’ loss. Our module improves CLIP’s ZSL top-1 accuracy by 6.7% and GZSL harmonic m作者: pulse-pressure 時(shí)間: 2025-3-30 18:02 作者: Heart-Attack 時(shí)間: 2025-3-30 22:38
Are Layout Analysis and?OCR Still Useful for?Document Information Extraction Using Foundation Modelsfood label, and a small crop focusing on the relevant nutrition information. Comparative experiments are also conducted on the CORD database of receipts. Our results demonstrate that although OCR-free models achieve a remarkable performance, they still require some guidance regarding the layout, and作者: 去才蔑視 時(shí)間: 2025-3-31 01:23
: Knowledge Distillation for?Visually-Rich Document Applicationsess of distilled DLA models on zero-shot layout-aware document visual question answering (DocVQA). DLA-KD experiments result in a large mAP knowledge gap, which unpredictably translates to downstream robustness, accentuating the need to further explore how to efficiently obtain more semantic documen作者: 媒介 時(shí)間: 2025-3-31 09:05 作者: ellagic-acid 時(shí)間: 2025-3-31 11:13 作者: canonical 時(shí)間: 2025-3-31 13:23
Global-SEG: Text Semantic Segmentation Based on?Global Semantic Pair Relations from large language models and consider the positional information of text within the document to assess their efficacy in augmenting semantics. We test our model with both contemporary and historical corpora, and the results demonstrate that our approach outperforms benchmarks on each dataset.