Github Action Automatic Update Diffusion NLP Arxiv Papers

bansky-cl · bansky-cl · commit 740cfd87a110 · 2025-11-17T08:30:28.000Z
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@
 
 For more carefully curated articles, you can refer to this [repository](https://github.com/bansky-cl/Diffusion_NLP_Papers).
 
-## Updated on 2025.11.16
+## Updated on 2025.11.17
 
 ![Monthly Trend](imgs/trend.png)
 
@@ -246,7 +246,7 @@ For more carefully curated articles, you can refer to this [repository](https://
 |**2022-11-01**|**DiffusER: Discrete Diffusion via Edit-based Reconstruction**|cs.CL, cs.LG| <details><summary>Full Abstract</summary>In text generation, models that generate text from scratch one token at a time are currently the dominant paradigm. Despite being performant, these models lack the ability to revise existing text, which limits their usability in many practical scenarios. We look to address this, with DiffusER (Diffusion via Edit-based Reconstruction), a new edit-based generative model for text based on denoising diffusion models -- a class of models that use a Markov chain of denoising steps to incrementally generate data. DiffusER is not only a strong generative model in general, rivalling autoregressive models on several tasks spanning machine translation, summarization, and style transfer; it can also perform other varieties of generation that standard autoregressive models are not well-suited for. For instance, we demonstrate that DiffusER makes it possible for a user to condition generation on a prototype, or an incomplete sequence, and continue revising based on previous edit steps.</details>|[2210.16886v1](http://arxiv.org/abs/2210.16886v1)| null|
 |**2023-02-01**|**Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences**|cs.LG, cs.CL|<details><summary>Full Abstract</summary>Efficient Transformers have been developed for long sequence modeling, due to their subquadratic memory and time complexity. Sparse Transformer is a popular approach to improving the efficiency of Transformers by restricting self-attention to locations specified by the predefined sparse patterns. However, leveraging sparsity may sacrifice expressiveness compared to full-attention, when important token correlations are multiple hops away. To combine advantages of both the efficiency of sparse transformer and the expressiveness of full-attention Transformer, we propose \textit{Diffuser}, a new state-of-the-art efficient Transformer. Diffuser incorporates all token interactions within one attention layer while maintaining low computation and memory costs. The key idea is to expand the receptive field of sparse attention using Attention Diffusion, which computes multi-hop token correlations based on all paths between corresponding disconnected tokens, besides attention among neighboring tokens. Theoretically, we show the expressiveness of Diffuser as a universal sequence approximator for sequence-to-sequence modeling, and investigate its ability to approximate full-attention by analyzing the graph expander property from the spectral perspective. Experimentally, we investigate the effectiveness of Diffuser with extensive evaluations, including language modeling, image modeling, and Long Range Arena (LRA). Evaluation results show that Diffuser achieves improvements by an average of 0.94% on text classification tasks and 2.30% on LRA, with 1.67$\times$ memory savings compared to state-of-the-art benchmarks, which demonstrates superior performance of Diffuser in both expressiveness and efficiency aspects.</details>|[2210.11794v2](http://arxiv.org/abs/2210.11794v2)|**[link](https://github.com/asFeng/Diffuser)**|
 
-<p align=right>(<a href=#Updated-on-20251116>back to top</a>)</p>
+<p align=right>(<a href=#Updated-on-20251117>back to top</a>)</p>
 
 [contributors-shield]: https://img.shields.io/github/contributors/bansky-cl/diffusion-nlp-paper-arxiv.svg?style=for-the-badge
 [contributors-url]: https://github.com/bansky-cl/diffusion-nlp-paper-arxiv/graphs/contributors