Browse Source

first real commit

pull/25/head
Noam Ross 1 year ago
parent
commit
12a7884f1c
41 changed files with 1513 additions and 0 deletions
  1. +22
    -0
      .Rbuildignore
  2. +96
    -0
      .github/CODE_OF_CONDUCT.md
  3. +21
    -0
      .github/CONTRIBUTING.md
  4. +12
    -0
      .github/ISSUE_TEMPLATE.md
  5. +8
    -0
      .gitignore
  6. +25
    -0
      .travis.yml
  7. +61
    -0
      CONTRIBUTING.md
  8. +28
    -0
      DESCRIPTION
  9. +2
    -0
      LICENSE
  10. +31
    -0
      NAMESPACE
  11. +3
    -0
      NEWS.md
  12. +49
    -0
      R/docx-utils.R
  13. +95
    -0
      R/docx_reversible.R
  14. +194
    -0
      R/extract.R
  15. +23
    -0
      R/officer-embed.R
  16. +48
    -0
      R/officer-modify_style.R
  17. +254
    -0
      R/parser.R
  18. +16
    -0
      R/redoc-package.R
  19. +94
    -0
      README.Rmd
  20. +21
    -0
      _pkgdown.yml
  21. +12
    -0
      codecov.yml
  22. +32
    -0
      inst/criticmarkup.lua
  23. +33
    -0
      inst/revchunks.lua
  24. +5
    -0
      inst/rmarkdown/templates/rdocx_reversible/skeleton/.gitignore
  25. +49
    -0
      inst/rmarkdown/templates/rdocx_reversible/skeleton/skeleton.Rmd
  26. BIN
      inst/rmarkdown/templates/rdocx_reversible/skeleton/skeleton.docx
  27. +5
    -0
      inst/rmarkdown/templates/rdocx_reversible/template.yaml
  28. +22
    -0
      man/is_redoc.Rd
  29. +21
    -0
      man/rdocx_reversible.Rd
  30. +22
    -0
      man/redoc.Rd
  31. +14
    -0
      man/redoc_example_docx.Rd
  32. +14
    -0
      man/redoc_example_rmd.Rd
  33. +40
    -0
      man/redoc_extract_rmd.Rd
  34. +43
    -0
      man/undoc.Rd
  35. +21
    -0
      redoc.Rproj
  36. +4
    -0
      tests/testthat.R
  37. +3
    -0
      tests/testthat/.gitignore
  38. +9
    -0
      tests/testthat/test-reverse.R
  39. BIN
      tests/testthat/test.docx
  40. +2
    -0
      vignettes/.gitignore
  41. +59
    -0
      vignettes/redoc-package-design.Rmd

+ 22
- 0
.Rbuildignore View File

@@ -0,0 +1,22 @@
^Meta$
^doc$
^CRAN-RELEASE$
^LICENSE\.md$
^docs$
^_pkgdown\.yml$
^pkgdown$
^\.github$
^CODE_OF_CONDUCT\.md$
^codecov\.yml$
^.*\.Rproj$
^\.Rproj\.user$
^\.travis\.yml$
^\.httr-oauth$
tests/testthat/token_file.enc
^appveyor\.yml$
^README\.Rmd$
^README-.*\.png$
^\.V8history$
^TODO\.md$
^CONTRIBUTING\.md$
^inst/endnote$

+ 96
- 0
.github/CODE_OF_CONDUCT.md View File

@@ -0,0 +1,96 @@
# Contributor Covenant Code of Conduct

[Bosnian](http://contributor-covenant.org/version/1/4/bs/)
| [Deutsch](http://contributor-covenant.org/version/1/4/de/)
| [ελληνικά](http://contributor-covenant.org/version/1/4/el/)
| [English](http://contributor-covenant.org/version/1/4/)
| [Español](http://contributor-covenant.org/version/1/4/es/)
| [Français](http://contributor-covenant.org/version/1/4/fr/)
| [Italiano](http://contributor-covenant.org/version/1/3/0/it/)
| [日本語](http://contributor-covenant.org/version/1/3/0/ja/)
| [Magyar](http://contributor-covenant.org/version/1/3/0/hu/)
| [Nederlands](http://contributor-covenant.org/version/1/4/nl/)
| [Polski](http://contributor-covenant.org/version/1/4/pl/)
| [Português](http://contributor-covenant.org/version/1/4/pt/)
| [Português do Brasil](http://contributor-covenant.org/version/1/4/pt_br/)
| [Pусский](http://contributor-covenant.org/version/1/3/0/ru/)
| [Română](http://contributor-covenant.org/version/1/4/ro/)
| [Svenska](http://contributor-covenant.org/version/1/4/sv/)
| [Slovenščina](http://contributor-covenant.org/version/1/4/sl/)
| [Türkçe](http://contributor-covenant.org/version/1/4/tr/)
| [Українська](http://contributor-covenant.org/version/1/4/uk/)
| [한국어](http://contributor-covenant.org/version/1/4/ko/)


## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, gender identity and expression, level of experience,
nationality, personal appearance, race, religion, or sexual identity and
orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at ross@ecohealthalliance.org. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at [http://contributor-covenant.org/version/1/4][version]

[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/

+ 21
- 0
.github/CONTRIBUTING.md View File

@@ -0,0 +1,21 @@
# Contributing to the redoc R package

You want to contribute to **redoc**? Great!

Please submit questions, bug reports, and requests in the [issues tracker](https://github.com/noamross/redoc/issues). Please submit bug
reports with a minimal [reprex](https://www.tidyverse.org/help/#reprex).

If you plan to contribute code, go ahead and fork the repo and submit a pull request. A few notes:

- This package is released with a [Contributor Code of Conduct](.github/CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms. Why? We want contribution to be enjoyable and rewarding for everyone!
- If you have large change, please open an issue first to discuss.
- I'll generally include contributors as authors in the DESCRIPTION file (with
their permission) for most contributions that go beyond small typos in code or documentation.
- This package generally uses the [rOpenSci packaging guidelines](https://github.com/ropensci/onboarding/blob/master/packaging_guide.md) for style and structure.
- Documentation is generated by **roxygen2**. Please write documentation in code files and let it auto-generate documentation files. We use a recent version so documentation my be [written in markdown](https://cran.r-project.org/web/packages/roxygen2/vignettes/markdown.html)
- We aim for testing that has high coverage and is robust. Include tests with
any major contribution to code. Test your changes the package with [**goodpractice**](https://cran.r-project.org/web/packages/goodpractice/index.html) before
submitting your change.


## Roadmap

+ 12
- 0
.github/ISSUE_TEMPLATE.md View File

@@ -0,0 +1,12 @@
<!-- If this issue relates to usage of the package, whether a question, bug or similar, along with your query, please paste your devtools::session_info() or sessionInfo() into the code block below, AND include a reproducible example (consider using a "reprex" https://cran.rstudio.com/web/packages/reprex/) If not, delete all this and proceed :) -->

<!-- Note that if the aml-to-json parser is not behaving as you expect, the issue should be filed at https://github.com/newsdev/archieml-js/issues -->

---

<details> <summary><strong>Session Info</strong></summary>

```r

```
</details>

+ 8
- 0
.gitignore View File

@@ -0,0 +1,8 @@
Meta
doc
.Rproj.user
.Rhistory
.RData
.DS_store
inst/endnote
docs

+ 25
- 0
.travis.yml View File

@@ -0,0 +1,25 @@
# R for travis: see documentation at https://docs.travis-ci.com/user/languages/r

sudo: false
cache: packages
language: r
r:
- oldrel
- release
- devel

r_github_packages:
- r-lib/pkgdown

after_success:
- Rscript -e 'covr::codecov()'

r_check_args: "--as-cran --run-dontrun"

deploy:
provider: script
script: pkgdown::deploy_site_github(new_process=FALSE)' || true
skip_cleanup: true
on:
branch: master
condition: "$TRAVIS_R_VERSION_STRING = release"

+ 61
- 0
CONTRIBUTING.md View File

@@ -0,0 +1,61 @@
Contributing to the rchie R package
===================================

You want to contribute to **rchie**? Great!

Please submit questions, bug reports, and requests in the [issues
tracker](https://github.com/ecohealthalliance/fasterize/issues). Please submit
bug reports with a minimal [reprex](https://www.tidyverse.org/help/#reprex).

Note that this package wraps
[archieml-js](https://github.com/newsdev/archieml-js), by the New York Times so
many actual parser errors are best reported there. Suggestions to change the
ArchieML [spec](http://archieml.org/spec/1.0/CR-20151015.html) should be
discussed under the issues at <https://github.com/newsdev/archieml.org>.

If you plan to contribute code, go ahead and fork the repo and submit a pull
request. A few notes:

- This package is released with a [Contributor Code of
Conduct](CODE_OF_CONDUCT.md). By participating in this project you agree to
abide by its terms. Why? We want contribution to be enjoyable and rewarding
for everyone!
- If you have large change, please open an issue first to discuss.
- I'll generally include contributors as authors in the DESCRIPTION file (with
their permission) for most contributions that go beyond small typos in code
or documentation.
- This package generally uses the [rOpenSci packaging
guidelines](https://ropensci.github.io/dev_guide/) for style and structure.
- Documentation is generated by **roxygen2**. Please write documentation in
code files and let it auto-generate documentation files. We use a recent
version so documentation my be [written in
markdown](https://cran.r-project.org/web/packages/roxygen2/vignettes/markdown.html)
- We aim for testing that has high coverage and is robust. Include tests with
any major contribution to code. Test your changes the package with
[**goodpractice**](https://github.com/MangoTheCat/goodpractice) before
submitting your change.
- I'm avoiding too many dependencies. If there's some fancy extension you'd
like to add make it optional using `if(requireNamepace(...))` statements and
placing the dependency in `Suggests:`, as is currently done for the Google
Drive functionality.

Roadmap
-------

I don't have big plans to expand **rchie**, but here are some things that might
happen at some point if someone is inclined to tackle them.

- I'd love to have some more examples. If you have a project that uses
**rchie** that you'd like to share, please let me know! I'll link to it
or perhaps make a vignette.
- It might be fun to build a pure R or C++ parser and not require V8 or any
system requirements.
- It would be useful to parse value fields in Google Docs so as to get links
(as this archieml-js
[example](https://github.com/newsdev/archieml-js/blob/master/examples/google_drive.js)),
or even translate other formatting to markdown. This would require
downloading the Google Doc as HTML, parsing and converting it.
- Since the package already contains the Javascript of archieml-js, it might
be neat to create an htmlwidget that injects this into the page of an R
Markdown or Shiny app, and makes the data from a Google Doc or other source
available so the page live-updates.

+ 28
- 0
DESCRIPTION View File

@@ -0,0 +1,28 @@
Package: redoc
Version: 0.0.0.9000
Title: Reversible Reproducible Documents
Description: Implements a reversible 'R Markdown' to 'Microsoft Word' pipeline.
Authors@R: person(given = "Noam",
family = "Ross",
email = 'noam.ross@gmail.com',
role = c('aut', 'cre'),
comment = c(ORCID='0000-0002-2136-0000'))
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
ByteCompile: true
URL: https://github.com/noamross/redoc
BugReports: https://github.com/noamross/redoc/issues
Suggests:
testthat,
roxygen2
RoxygenNote: 6.1.1
Imports:
rmarkdown,
officer,
tools,
knitr,
mime,
stringi,
xml2
VignetteBuilder: knitr

+ 2
- 0
LICENSE View File

@@ -0,0 +1,2 @@
YEAR: 2018
COPYRIGHT HOLDER: Noam Ross

+ 31
- 0
NAMESPACE View File

@@ -0,0 +1,31 @@
# Generated by roxygen2: do not edit by hand

export(is_redoc)
export(rdocx_reversible)
export(redoc_example_docx)
export(redoc_example_rmd)
export(redoc_extract_rmd)
export(undoc)
importFrom(knitr,all_patterns)
importFrom(knitr,opts_knit)
importFrom(mime,guess_type)
importFrom(officer,read_docx)
importFrom(officer,styles_info)
importFrom(rmarkdown,output_format)
importFrom(rmarkdown,pandoc_convert)
importFrom(rmarkdown,word_document)
importFrom(stringi,stri_detect_fixed)
importFrom(stringi,stri_detect_regex)
importFrom(stringi,stri_locate_all_regex)
importFrom(stringi,stri_match_all_regex)
importFrom(stringi,stri_replace_all_fixed)
importFrom(stringi,stri_replace_all_regex)
importFrom(stringi,stri_replace_first_fixed)
importFrom(stringi,stri_split_lines1)
importFrom(stringi,stri_trim_both)
importFrom(tools,file_path_sans_ext)
importFrom(xml2,read_xml)
importFrom(xml2,write_xml)
importFrom(xml2,xml_add_child)
importFrom(xml2,xml_find_first)
importFrom(xml2,xml_set_attrs)

+ 3
- 0
NEWS.md View File

@@ -0,0 +1,3 @@
# redoc 0.0.0.9000

* Initial commit

+ 49
- 0
R/docx-utils.R View File

@@ -0,0 +1,49 @@
#' Is this a reversible document?
#'
#' A function for testing is the file can be un-knit. If not, un-knitting
#' may be attempted with the `orig_chunkfile` or `orig_docx` files in [undoc()].
#'
#' @param docx A path to a `.docx` file or an `rdocx` object produced by
#' [officer::read_docx()]
#' @return a logical value
#' @export
#' @examples
#' is_redoc(redoc_example_docx())
is_redoc <- function(docx) {
docx <- to_docx(docx)
chunkfile <- list.files(docx$package_dir, pattern = "\\.chunks\\.csv$")
return(as.logical(length(chunkfile)))
}

#' @importFrom officer read_docx
to_docx <- function(docx) {
if (inherits(docx, "rdocx")) {
return(docx)
} else {
return(read_docx(docx))
}
}

assert_redoc <- function(docx) {
if (!is_redoc(docx)) {
stop(deparse(substitute(docx), " is not a reversible document"))
}
}

#' Path to an example R Markdown file
#' @export
#' @examples
#' redoc_example_rmd()
redoc_example_rmd <- function() {
system.file("rmarkdown", "templates", "rdocx_reversible", "skeleton",
"skeleton.Rmd", package = "redoc")
}

#' Path to an example Revserible Microsoft Word file
#' @export
#' @examples
#' redoc_example_docx()
redoc_example_docx <- function() {
system.file("rmarkdown", "templates", "rdocx_reversible", "skeleton",
"skeleton.docx", package = "redoc")
}

+ 95
- 0
R/docx_reversible.R View File

@@ -0,0 +1,95 @@
#' Convert to a Reversible Microsoft Word Document
#'
#' Format for converting from R Markdown to a Microsoft Word Document that can
#' be reversed using [undoc()] after editing in Word.
#'
#' @param highlight_outputs whether to highlight outputs from chunks and inline
#' code in the final document
#' @param wrap when round-tripping the document, at what width to wrap the
#' markdown output? See [undoc()].
#' @param ... other parameters passed to [rmarkdown::word_document()]
#' @importFrom rmarkdown output_format word_document
#' @importFrom officer read_docx
#' @importFrom tools file_path_sans_ext
#' @importFrom rmarkdown word_document
#' @export
rdocx_reversible <- function(highlight_outputs = FALSE, wrap = 80, ...) {

out <- word_document(
md_extensions = c("+fenced_divs", "+bracketed_spans"),
...)

out$knitr <- rmarkdown::knitr_options(
# Wrap code outputs in spans and divs
knit_hooks = list(
inline = function(x) {
id = paste0("inline-", inline_counter())
paste0("[", x, "]{custom-style=\"", id, "\"}")},
chunk = function(x, options) {
if (isFALSE(options$redoc_include)) {
# Special output for empty chunks
# TODO: move empty chunk handler to a lua filter to make more general
paste0("```{=openxml}\n<w:p><w:pPr><w:pStyle w:val=\"chunk-",
options$label,
"\"/><w:rPr><w:vanish/></w:rPr></w:pPr></w:p>\n```")
} else {
paste0("::: {custom-style=\"chunk-", options$label, "\"}\n",
x,
"\n:::")
}
}
),
opts_hooks = list(
include = function(options) {
if (isFALSE(options$include)) {
options$include <- TRUE
options$redoc_include <- FALSE
}
options
}
)
)

# Pre-parse, name inline chunks and save chunk contents to lookup table
out$pre_knit <- function(input, ...) {
utils::write.table(parse_rmd_to_df(input),
file = paste0(file_path_sans_ext(input), ".chunks.csv"),
sep = ",", row.names = FALSE, qmethod = "double")
inline_counter(reset = TRUE)
chunk_counter(reset = TRUE)
}

out$post_processor <-
function(metadata, input_file, output_file, clean, verbose) {
docx <- read_docx(output_file)
rmd_input <- get(envir = parent.frame(n = 1), "original_input")
chunkfile <- paste0(file_path_sans_ext(rmd_input), ".chunks.csv")
tmpd <- tempdir()

orig_rmd <- file.path(tmpd,
paste0(file_path_sans_ext(basename(rmd_input)),
".original.Rmd"))
file.copy(rmd_input, orig_rmd)

roundtrip_rmd <- undoc(
output_file,
to = paste0(basename(file_path_sans_ext(rmd_input)), ".roundtrip.Rmd"),
dir = tmpd, wrap = wrap, overwrite = TRUE,
orig_chunkfile = chunkfile)

docx <- embed_file(docx, chunkfile)
docx <- embed_file(docx, orig_rmd)
docx <- embed_file(docx, roundtrip_rmd)

if (highlight_outputs) {
docx <- highlight_output_styles(docx)
}

print(docx, output_file)
if (clean) {
file.remove(chunkfile)
}
return(output_file)
}
out
}

+ 194
- 0
R/extract.R View File

@@ -0,0 +1,194 @@
#' Convert an Reversible Document back to R Markdown
#'
#' Converts a document originally created with [rdocx_reversible()] back to R
#' Markdown, including changes made to text in MS Word.
#'
#' @param docx The `.docx file to convert`
#' @param to the filename to write the resulting `.Rmd` file. The default is to
#' use the same basename as the docx document
#' @param dir The directory to write the `.Rmd`` to. Defaults to current working
#' directory
#' @param track_changes How to deal with tracked changes and comments in the
#' `.docx` file. `"accept"` accepts all changes, and `"reject"` rejects all of
#' them. The default, `"criticmarkup`, converts the tracked changes to [Critic
#' Markup syntax](http://criticmarkup.com/spec.php#thebasicsyntax). `"all"`
#' marks up tracked changes and comments in `<span>` tags. See the [pandoc
#' manual](http://pandoc.org/MANUAL.html#option--track-changes) for details.
#' @param wrap The width at which to wrap text. If `NA`, text is not wrapped
#' @param overwrite Whether to overwrite an existing file
#' @param orig_chunkfile,orig_docx The original chunkfile or Word document
#' created when the document was first knit. Useful for debugging, or in
#' cases where the word file has been corrupted or transformed, for instance
#' by copy-and-pasting the content into a new file. If provided, undoc will
#' use this chunkfile or word file to re-create the `.Rmd` file with the text
#' of the input.
#' @param verbose whether to print pandoc progress text
#' @importFrom rmarkdown pandoc_convert
#' @importFrom tools file_path_sans_ext
#' @export
undoc <- function(docx, to = NULL, dir = ".",
track_changes = c("criticmarkup", "accept", "reject", "all"),
wrap = 80, overwrite = FALSE,
orig_chunkfile = NULL, orig_docx = NULL, verbose = FALSE) {

if (!is_redoc(docx) && is.null(orig_chunkfile) && is.null(orig_docx))
stop("Document is not reversible and no alternate data provided via
orig_chunkfile or orig_docx")

if (is.null(to)) to <- paste0(file_path_sans_ext(basename(docx)), ".Rmd")
to <- file.path(dir, to)
if (!overwrite && file.exists(to)) stop(to, " exists and overwrite = FALSE")

if (!is.null(orig_chunkfile)) {
chunk_df <- utils::read.csv(orig_chunkfile, stringsAsFactors = FALSE)
} else if (!is.null(orig_docx)) {
chunk_df <- redoc_extract_chunks(orig_docx)
} else {
chunk_df <- redoc_extract_chunks(docx)
}

md_lines <- convert_docx_to_md(docx, track_changes, wrap, verbose)
md_lines <- replace_inlines(md_lines, chunk_df)
md_lines <- replace_chunks(md_lines, chunk_df)
md_lines <- prepend_yaml(md_lines, chunk_df)

cat(md_lines, file = to, sep = "\n")

return(to)
}

#' Extract the Rmd used to to produce a Reversible Word Doc
#'
#' Documents produced with [docx_reversible()] store an copy of the original
#' `.Rmd` files used to produce them. This is useful for diffing against the
#' version created with [undoc()], especially if tracked changes have not been
#' used.
#' @param docx A path to a word file or a an `rdocx` object created with
#' [officer::read_docx()].
#' @param type One of `"original"` or `"roundtrip"`.. `"original"` extracts the
#' exact document originally knit. `"roundtrip` (default) extracts a document
#' that has been converted to Word and back with no edits in between. The
#' latter should be more useful for comparing against edits, as line-wrapping
#' and placement of no-output chunks should match.
#' @param dir The directory to write the `.Rmd`` to. Defaults to current working
#' directory
#' @param to the filename to write the resulting `.Rmd` file. The default is to
#' use the the original name with either `.orignal.Rmd` or `roundtrip.Rmd`
#' extensions.
#' @param overwrite whether to overwite existing files
#' @export
#' @return The path to the extracted `.Rmd`
#' @examples
#' redoc_extract_rmd(redoc_example_docx(), dir = tempdir())
redoc_extract_rmd <- function(docx, type = c("original", "roundtrip"),
dir = ".", to = NULL, overwrite = FALSE) {
docx <- to_docx(docx)
assert_redoc(docx)
type <- match.arg(type)
rmdfile <- list.files(docx$package_dir,
pattern = paste0("\\.", type, "\\.Rmd$"),
full.names = TRUE)
if (is.null(to)) to <- basename(rmdfile)
out <- file.path(dir, to)
if (file.exists(out) && !overwrite) stop(out, " exists and overwrite=FALSE")
file.copy(rmdfile, out, overwrite = overwrite)
return(file.path(dir, to))
}

#' @importFrom officer read_docx
redoc_extract_chunks <- function(docx) {
docx <- to_docx(docx)
assert_redoc(docx)
chunkfile <- list.files(docx$package_dir, pattern = "\\.chunks\\.csv$",
full.names = TRUE)
chunk_df <- utils::read.csv(chunkfile, stringsAsFactors = FALSE)
chunk_df
}

#' @importFrom stringi stri_replace_all_fixed
replace_inlines <- function(md_lines, chunk_df) {
chunk_df <- chunk_df[chunk_df$type == "inline", ]
if (nrow(chunk_df)) {
patterns <- paste0("[[", chunk_df$label, "]]")
replacements <- paste0("`r ", chunk_df$code, "`")
md_lines <- stri_replace_all_fixed(md_lines, patterns, replacements,
vectorize_all = FALSE)
}
md_lines
}

#' @importFrom stringi stri_replace_all_fixed stri_replace_first_fixed
#' stri_replace_all_regex stri_split_lines1 stri_detect_fixed
replace_chunks <- function(md_lines, chunk_df) {
chunk_df <- chunk_df[chunk_df$type == "block", ]
md_lines <- paste(md_lines, collapse = "\n")
if (nrow(chunk_df)) {
patterns <- paste0("[[", chunk_df$label, "]]")
replacements <- paste(chunk_df$header, chunk_df$code, "```", sep = "\n")
detected <- logical(1)
append <- ""
last_detected <- 1
for (i in seq_along(patterns)) {
detected <- stri_detect_fixed(md_lines, patterns[i])
if (!detected) {
if (i == 1) {
md_lines <- paste0(c(patterns[i], md_lines), collapse = "\n\n")
} else {
append <- paste0(c(append, replacements[i]), collapse = "\n\n")
}
} else {
replacements[last_detected] <-
paste0(c(replacements[last_detected], append), collapse = "\n\n")
last_detected <- i
append <- ""
}
}
for (i in seq_along(patterns)) {
md_lines <- stri_replace_first_fixed(md_lines, patterns[i],
replacements[i])
md_lines <- stri_replace_all_fixed(md_lines, patterns[i], "")
}
md_lines <- stri_replace_all_regex(md_lines, "\n{3,}", "\n\n")
md_lines <- stri_split_lines1(md_lines)
}
md_lines
}

# This will be need to be modified to deal with yaml at arbitrary locations
prepend_yaml <- function(md_lines, chunk_df) {
chunk_df <- chunk_df[chunk_df$type == "yaml", ]
if (nrow(chunk_df)) {
md_lines <- c(chunk_df$code, "", md_lines)
md_lines <- stri_split_lines1(paste(md_lines, collapse = "\n"))
}
md_lines
}

convert_docx_to_md <- function(docx, track_changes, wrap, verbose) {
docx <- normalizePath(docx)
track_changes <- match.arg(track_changes, track_changes)
if (track_changes == "criticmarkup") {
track_opts <- c("--track-changes=all",
paste0("--lua-filter=",
system.file("criticmarkup.lua", package = "redoc")))
} else {
track_opts <- paste0("--track-changes=", track_changes)
}

if (is.na(wrap)) {
wrap_opts <- "--wrap=none"
} else {
wrap_opts <- c("--wrap=auto", paste0("--columns=", wrap))
}
filter_opts <- c(paste0("--lua-filter=",
system.file("revchunks.lua", package = "redoc")))
opts <- c(track_opts, filter_opts, wrap_opts)
md_tmp <- tempfile(fileext = ".md")
pandoc_convert(docx,
from = "docx+styles+empty_paragraphs",
to = "markdown",
output = md_tmp,
options = opts,
verbose = verbose)
return(readLines(md_tmp))
}

+ 23
- 0
R/officer-embed.R View File

@@ -0,0 +1,23 @@
# Functions in this file should probably be migrated to the `officer` package

#' @importFrom mime guess_type
embed_file <- function(docx, file, content_type = guess_type(file)) {
docx <- to_docx(docx)
file.copy(file, to = file.path(docx$package_dir, basename(file)))

extension <- tools::file_ext(file)
docx$content_type$add_ext(extension, content_type)
docx$content_type$save()

rel <- docx$doc_obj$relationship()
new_rid <- sprintf("rId%.0f", rel$get_next_id())
rel$add(
id = new_rid,
type = paste0(
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/",
extension
),
target = file.path("..", basename(file))
)
return(docx)
}

+ 48
- 0
R/officer-modify_style.R View File

@@ -0,0 +1,48 @@
# Functions in this file should probably eventually be moved to the `officer`
# package once they are made more general

#' @importFrom xml2 read_xml xml_find_first xml_add_child xml_set_attrs
#' write_xml
add_to_style <- function(docx, style_id, name, attrs = NULL) {
docx <- to_docx(docx)
name <- prepend_ns(name)
if (!is.null(attrs)) names(attrs) <- prepend_ns(names(attrs))
styles_path <- file.path(docx$package_dir, "word", "styles.xml")
styles_xml <- read_xml(styles_path)
style_xml <- xml_find_first(styles_xml,
paste0("//w:style[@w:styleId='", style_id, "']"))
rPr <- xml_add_child(style_xml, "w:rPr")
pPr <- xml_add_child(style_xml, "w:pPr")
style <- xml_add_child(rPr, name)
if (!is.null(attrs)) xml_set_attrs(style, attrs)
style <- xml_add_child(pPr, name)
if (!is.null(attrs)) xml_set_attrs(style, attrs)
write_xml(styles_xml, styles_path)
return(docx)
}

#' @importFrom officer styles_info
#' @importFrom stringi stri_detect_regex
highlight_output_styles <- function(docx, name = "shd",
attrs = c(val = "clear",
color = "auto",
fill = "FFBEBF")) {
docx <- to_docx(docx)
styles <- styles_info(docx)
styles <-
styles$style_id[stri_detect_regex(styles$style_id, "^(inline|chunk)-")]
lapply(styles, function(s) {
add_to_style(docx, s, name = name, attrs = attrs)
add_to_style(docx, s, name = "hidden")
})
return(docx)
}

#' @importFrom stringi stri_detect_regex
prepend_ns <- function(x, ns="w") {
ifelse(
stri_detect_regex(x, paste0("^", ns, ":")),
x,
paste0(ns, ":", x)
)
}

+ 254
- 0
R/parser.R View File

@@ -0,0 +1,254 @@
## Parsers largely lifted from knitr and rmarkdown packages

#' @importFrom knitr all_patterns
#' @importFrom stringi stri_trim_both
parse_rmd_to_df <- function(input_file) {
lines <- readLines(input_file)

patterns <- all_patterns$md
chunk.begin <- patterns$chunk.begin
chunk.end <- patterns$chunk.end
yaml.delim <- "^(---|\\.\\.\\.)\\s*$"

yaml <- parse_yaml(yaml.delim, lines)

blks <- grepl(chunk.begin, lines)
txts <- filter_chunk_end(blks, grepl(chunk.end, lines))
tmp <- blks | utils::head(c(TRUE, txts), -1)
groups <- unname(split(lines, cumsum(tmp)))

chunk_counter(reset = TRUE)
inline_counter(reset = TRUE)

chunks <- lapply(groups, function(g) {
block <- grepl(chunk.begin, g[1])
if (block) {
n <- length(g)
if (n >= 2 && grepl(chunk.end, g[n])) {
g <- g[-n]
}
g <- strip_block(g, patterns$chunk.code)
params.src <- if (group_pattern(chunk.begin)) {
stri_trim_both(gsub(chunk.begin, "\\1", g[1]))
} else {
""
}
parse_block(g[-1], g[1], params.src)
}
else {
parse_inline(g, patterns)
}
})

chunk_df <- do.call(rbind, c(lapply(chunks, as.data.frame, stringsAsFactors = FALSE)))
chunk_df <- rbind(as.data.frame(yaml, stringsAsFactors = FALSE), chunk_df, stringsAsFactors = FALSE)
chunk_df$label <- ifelse(chunk_df$type == "block",
paste0("chunk-", chunk_df$label),
chunk_df$label)
chunk_df
}

filter_chunk_end <- function(chunk.begin, chunk.end) {
in.chunk <- FALSE
fun <- function(is.begin, is.end) {
if (in.chunk && is.end) {
in.chunk <<- FALSE
return(TRUE)
}
if (!in.chunk && is.begin) {
in.chunk <<- TRUE
}
FALSE
}
mapply(fun, chunk.begin, chunk.end)
}

# Possibly extraneous - removes code prefix for latex and other non-md formats
strip_block <- function(x, prefix = NULL) {
if (!is.null(prefix) && (length(x) > 1)) {
x[-1L] <- sub(prefix, "", x[-1L])
spaces <- min(attr(regexpr("^ *", x[-1L]), "match.length"))
if (spaces > 0) {
x[-1L] <- substring(x[-1L], spaces + 1)
}
}
x
}

group_pattern <- function(pattern) {
!is.null(pattern) && grepl("\\(.+\\)", pattern)
}

parse_block <- function(code, header, params.src) {
params <- params.src
engine <- "r"
# if (out_format("markdown")) {
engine <- sub("^([a-zA-Z0-9_]+).*$", "\\1", params)
params <- sub("^([a-zA-Z0-9_]+)", "", params)
# }
params <- gsub("^\\s*,*|,*\\s*$", "", params)
if (tolower(engine) != "r") {
params <- sprintf("%s, engine=\"%s\"", params, engine)
params <- gsub("^\\s*,\\s*", "", params)
}
params.src <- params
params <- parse_params(params.src)

if (nzchar(spaces <- gsub("^([\t >]*).*", "\\1", header))) {
params$indent <- spaces
code <- gsub(sprintf("^%s", spaces), "", code)
code <- gsub(
sprintf("^%s", gsub("\\s+$", "", spaces)),
"", code
)
}
code <- paste(code, collapse = "\n")

label <- params$label

list(label = label, type = "block", header = header, code = code)
}

#' @importFrom knitr opts_knit
out_format <- function(x) {
fmt <- opts_knit$get("out.format")
if (missing(x))
fmt
else !is.null(fmt) && (fmt %in% x)
}

parse_params <- function(params) {
if (params == "")
return(list(label = unnamed_chunk()))
res = withCallingHandlers(eval(parse_only(paste("alist(",
quote_label(params), ")"))), error = function(e) {
message("(*) NOTE: I saw chunk options \"", params,
"\"\n please go to https://yihui.name/knitr/options",
"\n (it is likely that you forgot to quote \"character\" options)")
})
idx = which(names(res) == "")
for (i in idx) if (identical(res[[i]], alist(, )[[1]]))
res[[i]] = NULL
idx = if (is.null(names(res)) && length(res) == 1L)
1L
else which(names(res) == "")
if ((n <- length(idx)) > 1L || (length(res) > 1L && is.null(names(res))))
stop("invalid chunk options: ", params, "\n(all options must be of the form 'tag=value' except the chunk label)")
if (is.null(res$label)) {
if (n == 0L)
res$label = unnamed_chunk()
else names(res)[idx] = "label"
}
if (!is.character(res$label))
res$label = gsub(" ", "", as.character(as.expression(res$label)))
if (identical(res$label, ""))
res$label = unnamed_chunk()
res
}

parse_only <- function(code) {
if (length(code) == 0)
return(expression())
parse(text = code, keep.source = FALSE)
}

quote_label <- function(x) {
x = gsub("^\\s*,?", "", x)
if (grepl("^\\s*[^'\"](,|\\s*$)", x)) {
x = gsub("^\\s*([^'\"])(,|\\s*$)", "'\\1'\\2", x)
}
else if (grepl("^\\s*[^'\"](,|[^=]*(,|\\s*$))", x)) {
x = gsub("^\\s*([^'\"][^=]*)(,|\\s*$)", "'\\1'\\2",
x)
}
x
}

#' @importFrom knitr opts_knit
unnamed_chunk <- function(prefix = NULL, i = chunk_counter()) {
if (is.null(prefix))
prefix = opts_knit$get("unnamed.chunk.label")
paste(prefix, i, sep = "-")
}

.counters <- new.env(parent = emptyenv())

chunk_counter <- function(reset = FALSE, init_chunk = 1) {
if (reset)
return(.counters$nc <- init_chunk)
.counters$nc <- .counters$nc + 1L
.counters$nc - 1L
}

inline_counter <- function(reset = FALSE, init_inline = 1) {
if (reset)
return(.counters$ni <- init_inline)
.counters$ni <- .counters$ni + 1L
.counters$ni - 1L
}

unnamed_inline <- function(prefix = NULL, i = inline_counter()) {
if (is.null(prefix))
prefix = "inline"
paste(prefix, i, sep = "-")
}

#' @importFrom stringi stri_locate_all_regex stri_match_all_regex
parse_inline <- function(input, patterns) {
inline.code = patterns$inline.code
input = paste(input, collapse = "\n")
loc = cbind(start = numeric(0), end = numeric(0))
if (group_pattern(inline.code))
loc = stri_locate_all_regex(input, inline.code)[[1]]
if (nrow(loc) && !all(is.na(loc))) {
code = stri_match_all_regex(input, inline.code)[[1L]]
code = if (NCOL(code) >= 2L) {
code[is.na(code)] = ""
apply(code[, -1L, drop = FALSE], 1, paste, collapse = "")
}
} else {
return(NULL)
}
labels <- character(0)
for (i in seq_along(code)) {
labels[i] <- unnamed_inline()
}
list(label = labels, type = rep("inline", length(code)),
header = rep(NA_character_, length(code)), code = code)
}

parse_yaml <- function(yaml.delim, input_lines) {
# TODO Make yaml parser yaml blocks, not just front matter
validate_front_matter <- function(delimiters) {
if (length(delimiters) >= 2 &&
(delimiters[2] - delimiters[1] > 1) &&
grepl("^---\\s*$", input_lines[delimiters[1]])) {
# verify that it's truly front matter (not preceded by other content)
if (delimiters[1] == 1)
TRUE
else
is_blank(input_lines[1:delimiters[1] - 1])
} else {
FALSE
}
}

# is there yaml front matter?
delimiters <- grep(yaml.delim, input_lines)
if (validate_front_matter(delimiters)) {
front_matter <- input_lines[(delimiters[1]):(delimiters[2])]
return(
list(label = NA_character_, type = "yaml",
header = NA_character_, code = paste(front_matter, collapse = "\n"))
)
}
else {
return(list())
}
}

is_blank <- function(x) {
if (length(x))
all(grepl("^\\s*$", x))
else TRUE
}

+ 16
- 0
R/redoc-package.R View File

@@ -0,0 +1,16 @@
#' @title Reversible Reproducible Documents
#'
#' @description Implements a reversible 'R Markdown' to 'Microsoft Word'
#' pipeline.
#'
#' @section Support:
#'
#' The development repository for the **redoc** package is found at
#' \url{https://github.com/noamross/redoc}. Please file
#' bug reports or other feedback at
#' \url{https://github.com/noamross/redoc/issues}
#'
#' @name redoc
#' @author Noam Ross \email{noam.ross@gmail.com}
#' @keywords package
NULL

+ 94
- 0
README.Rmd View File

@@ -0,0 +1,94 @@
---
output:
github_document:
html_preview: FALSE
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(width = 120)
```
# redoc - reversible R Markdown/MS Word documents.

Author: _Noam Ross_

[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg){data-external="1"}](https://opensource.org/licenses/MIT)
[![Project Status: WIP - Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](http://www.repostatus.org/badges/latest/wip.svg){data-external="1"}](http://www.repostatus.org/#wip)
[![Build
Status](https://travis-ci.org/noamross/redoc.svg?branch=master)](https://travis-ci.org/noamross/redoc)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/noamross/redoc?branch=master&svg=true){data-external="1"}](https://ci.appveyor.com/project/noamross/redoc)
[![codecov](https://codecov.io/gh/noamross/redoc/branch/master/graph/badge.svg){data-external="1"}](https://codecov.io/gh/noamross/redoc)
[![CRAN status](https://www.r-pkg.org/badges/version/redoc){data-external="1"}](https://cran.r-project.org/package=redoc)

**redoc** is an experimental package to enable a two-way R-Markdown ⟷ Microsoft
Word workflow. It is in early design phase. Testing and feedback is welcome!
Please look at [CONTRIBUTING.md](https://github.com/noamross/redoc/blob/master/.github/CONTRIBUTING.md) and the [design vignette](https://noamross.github.io/redoc/articles/redoc-package-design.md)
if you are interested in development.

## Installation

Install the **redoc** package with this command:

```{r install_me, eval = FALSE}
source("https://install-github.me/noamross/redoc")
```

```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
```

## Usage

**redoc** provides an R Markdown [output format] of `docx_reversible()`, built
on top of `rmarkdown::word_document()`. You will typically call it via the
YAML header in your R Markdown document. You have the option of highlighting
the outputs (both chunk and inline) in the Word Document.

```yaml
---
output:
redoc::rdocx_reversible:
keep_md: TRUE
highlight_outputs: TRUE
---
```

Word files that have been created by `docx_reversible()` ("redocs") can be reverted to
`.Rmd` with `undoc()`, _even after they are edited_.

```{r undoc}
library(redoc)
undoc(redoc_example_docx())
```

If the Word document has tracked changes, `undoc()` will, by default, convert
these to [Critic Markup syntax](http://criticmarkup.com/spec.php#thebasicsyntax).

Undoc'ing a redoc where chunk outputs have been deleted will restore the original
code chunks to the document, usually immediately after the previous chunk. If
chunk outputs are moved, code chunks move with them. Inline code outputs that
are deleted are not restored.

Redocs also store the original `.Rmd` used to make thim internally, which can
be extracted and used to diff against the original.

```{r extact}
redoc_extract_rmd(redoc_example_docx())
```

## Contributing

Want have feedback or want to contribute? Great! Please take a look at the [contributing guidelines](https://github.com/noamross/redoc/blob/master/.github/CONTRIBUTING.md) before filing an issue or pull request.

Please note that this project is released with a [Contributor Code of Conduct](https://github.com/noamross/redoc/blob/master/.github/CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.

```{r cleanup, include = FALSE}
unlink(c("example.Rmd", "example.docx", "skeleton.Rmd", "skeleton.docx", "skeleton.original.Rmd"))
```


+ 21
- 0
_pkgdown.yml View File

@@ -0,0 +1,21 @@
bootswatch: flatly

navbar:
title: pkgdown
type: inverse
left:
- text: Home
icon: fa-home
href: index.html
- text: Package Design
href: articles/redoc-pacakge-design.html
- text: Contributing
href: CONTRIBUTING.html
- text: Release Notes
href: news/index.html
right:
- text: Reference
href: reference/index.html
- icon: fa-github fa-lg
href: https://github.com/noamross/rchie


+ 12
- 0
codecov.yml View File

@@ -0,0 +1,12 @@
comment: false

coverage:
status:
project:
default:
target: auto
threshold: 1%
patch:
default:
target: auto
threshold: 1%

+ 32
- 0
inst/criticmarkup.lua View File

@@ -0,0 +1,32 @@
function Span(elem)
if elem.classes[1] and elem.classes[1] == "insertion" then
local opener = { pandoc.RawInline(FORMAT, "{++ ") }
local closer = { pandoc.RawInline(FORMAT, " ++}") }
return opener .. elem.content .. closer
elseif
elem.classes[1] and elem.classes[1] == "deletion" then
local opener = { pandoc.RawInline(FORMAT, "{-- ") }
local closer = { pandoc.RawInline(FORMAT, " --}") }
return opener .. elem.content .. closer
elseif
elem.classes[1] and elem.classes[1] == "comment-start" then
if elem.t == nil then
return pandoc.RawInline(FORMAT, "")
end
local opener = { pandoc.RawInline(FORMAT, "{>> ") }
local closer = { pandoc.RawInline(FORMAT, " ("), pandoc.RawInline(FORMAT, elem.attributes.author), pandoc.RawInline(FORMAT, ")<<}")}
return opener .. elem.content .. closer
elseif
elem.classes[1] and (elem.classes[1] == "comment-end" or elem.classes[1] == "paragraph-insertion") then
return pandoc.RawInline(FORMAT, "")
else
return nil
end
end


-- Addition {++ ++}
-- Deletion {-- --}
-- Substitution {~~ ~> ~~} # TODO figure out how use CriticMarkup substitution
-- Comment {>> <<}
-- Highlight {== ==}{>> <<} #TODO figure out how to use CriticMarkup highlighting

+ 33
- 0
inst/revchunks.lua View File

@@ -0,0 +1,33 @@
-- A Pandoc filter for reversing knitted documents. Text generated from
-- code in R Markdown has custom styles with names of the chunks that generated
-- them. The filter replaces Divs and Spans with these custom styles with
-- placeholders like [[inline-1]] and [[chunk-setup]], to be replaced with
-- chunk content lateer

function Div(elem)
if elem.attributes["custom-style"] then
local i = string.find(elem.attributes["custom-style"], "chunk%-")
if i == 1 then
i = 0
return pandoc.Para(pandoc.RawInline(FORMAT, "[["..elem.attributes["custom-style"].."]]"))
else
return elem.content
end
else
return elem
end
end

function Span(elem)
if elem.attributes["custom-style"] then
local i = string.find(elem.attributes["custom-style"], "inline%-")
if i == 1 then
i = 0
return pandoc.RawInline(FORMAT, "[["..elem.attributes["custom-style"].."]]")
else
return elem.content
end
else
return elem
end
end

+ 5
- 0
inst/rmarkdown/templates/rdocx_reversible/skeleton/.gitignore View File

@@ -0,0 +1,5 @@
*.md
*_files/
*.original.Rmd
*.roundtrip.Rmd
*.chunks.csv

+ 49
- 0
inst/rmarkdown/templates/rdocx_reversible/skeleton/skeleton.Rmd View File

@@ -0,0 +1,49 @@
---
title: "Your Title"
author: "Your Name"
date: "The Date"
output:
redoc::rdocx_reversible:
keep_md: TRUE
highlight_outputs: TRUE
---

```{r setup, include = FALSE}
# A non-included setup chunk
knitr::opts_chunk$set(include = TRUE)
```

Reversible R Markdown Document
------------------------------

This is an example Reversible R Markdown document.

Chunk with code output

```{r cars}
summary(cars)
```

Inline text
-----------

```{r, include= FALSE}
# A non-included chunk to provide inline chunks with values.
a <- 2
b <- 3
```

You can include calculations inline like so: `r a` plus
`r b` equals `r a + b`.

What about empty inline chunks?: Like `r NULL` or `r `?

Chunks with plots
-----------------

You can also embed plots, for example:

```{r pressure}
plot(pressure)
```


BIN
inst/rmarkdown/templates/rdocx_reversible/skeleton/skeleton.docx View File


+ 5
- 0
inst/rmarkdown/templates/rdocx_reversible/template.yaml View File

@@ -0,0 +1,5 @@
name: Reversible Microsoft Word Document
description: >
Example of a `redoc`: An R Markdown document that can be compiled to
a Microsoft Word `.docx` file and then reversed, following editing.
create_dir: FALSE

+ 22
- 0
man/is_redoc.Rd View File

@@ -0,0 +1,22 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docx-utils.R
\name{is_redoc}
\alias{is_redoc}
\title{Is this a reversible document?}
\usage{
is_redoc(docx)
}
\arguments{
\item{docx}{A path to a `.docx` file or an `rdocx` object produced by
[officer::read_docx()]}
}
\value{
a logical value
}
\description{
A function for testing is the file can be un-knit. If not, un-knitting
may be attempted with the `orig_chunkfile` or `orig_docx` files in [undoc()].
}
\examples{
is_redoc(redoc_example_docx())
}

+ 21
- 0
man/rdocx_reversible.Rd View File

@@ -0,0 +1,21 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docx_reversible.R
\name{rdocx_reversible}
\alias{rdocx_reversible}
\title{Convert to a Reversible Microsoft Word Document}
\usage{
rdocx_reversible(highlight_outputs = FALSE, wrap = 80, ...)
}
\arguments{
\item{highlight_outputs}{whether to highlight outputs from chunks and inline
code in the final document}

\item{wrap}{when round-tripping the document, at what width to wrap the
markdown output? See [undoc()].}

\item{...}{other parameters passed to [rmarkdown::word_document()]}
}
\description{
Format for converting from R Markdown to a Microsoft Word Document that can
be reversed using [undoc()] after editing in Word.
}

+ 22
- 0
man/redoc.Rd View File

@@ -0,0 +1,22 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/redoc-package.R
\name{redoc}
\alias{redoc}
\title{Reversible Reproducible Documents}
\description{
Implements a reversible 'R Markdown' to 'Microsoft Word'
pipeline.
}
\section{Support}{


The development repository for the **redoc** package is found at
\url{https://github.com/noamross/redoc}. Please file
bug reports or other feedback at
\url{https://github.com/noamross/redoc/issues}
}

\author{
Noam Ross \email{noam.ross@gmail.com}
}
\keyword{package}

+ 14
- 0
man/redoc_example_docx.Rd View File

@@ -0,0 +1,14 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docx-utils.R
\name{redoc_example_docx}
\alias{redoc_example_docx}
\title{Path to an example Revserible Microsoft Word file}
\usage{
redoc_example_docx()
}
\description{
Path to an example Revserible Microsoft Word file
}
\examples{
redoc_example_docx()
}

+ 14
- 0
man/redoc_example_rmd.Rd View File

@@ -0,0 +1,14 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/docx-utils.R
\name{redoc_example_rmd}
\alias{redoc_example_rmd}
\title{Path to an example R Markdown file}
\usage{
redoc_example_rmd()
}
\description{
Path to an example R Markdown file
}
\examples{
redoc_example_rmd()
}

+ 40
- 0
man/redoc_extract_rmd.Rd View File

@@ -0,0 +1,40 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extract.R
\name{redoc_extract_rmd}
\alias{redoc_extract_rmd}
\title{Extract the Rmd used to to produce a Reversible Word Doc}
\usage{
redoc_extract_rmd(docx, type = c("original", "roundtrip"), dir = ".",
to = NULL, overwrite = FALSE)
}
\arguments{
\item{docx}{A path to a word file or a an `rdocx` object created with
[officer::read_docx()].}

\item{type}{One of `"original"` or `"roundtrip"`.. `"original"` extracts the
exact document originally knit. `"roundtrip` (default) extracts a document
that has been converted to Word and back with no edits in between. The
latter should be more useful for comparing against edits, as line-wrapping
and placement of no-output chunks should match.}

\item{dir}{The directory to write the `.Rmd`` to. Defaults to current working
directory}

\item{to}{the filename to write the resulting `.Rmd` file. The default is to
use the the original name with either `.orignal.Rmd` or `roundtrip.Rmd`
extensions.}

\item{overwrite}{whether to overwite existing files}
}
\value{
The path to the extracted `.Rmd`
}
\description{
Documents produced with [docx_reversible()] store an copy of the original
`.Rmd` files used to produce them. This is useful for diffing against the
version created with [undoc()], especially if tracked changes have not been
used.
}
\examples{
redoc_extract_rmd(redoc_example_docx(), dir = tempdir())
}

+ 43
- 0
man/undoc.Rd View File

@@ -0,0 +1,43 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extract.R
\name{undoc}
\alias{undoc}
\title{Convert an Reversible Document back to R Markdown}
\usage{
undoc(docx, to = NULL, dir = ".", track_changes = c("criticmarkup",
"accept", "reject", "all"), wrap = 80, overwrite = FALSE,
orig_chunkfile = NULL, orig_docx = NULL, verbose = FALSE)
}
\arguments{
\item{docx}{The `.docx file to convert`}

\item{to}{the filename to write the resulting `.Rmd` file. The default is to
use the same basename as the docx document}

\item{dir}{The directory to write the `.Rmd`` to. Defaults to current working
directory}

\item{track_changes}{How to deal with tracked changes and comments in the
`.docx` file. `"accept"` accepts all changes, and `"reject"` rejects all of
them. The default, `"criticmarkup`, converts the tracked changes to [Critic
Markup syntax](http://criticmarkup.com/spec.php#thebasicsyntax). `"all"`
marks up tracked changes and comments in `<span>` tags. See the [pandoc
manual](http://pandoc.org/MANUAL.html#option--track-changes) for details.}

\item{wrap}{The width at which to wrap text. If `NA`, text is not wrapped}

\item{overwrite}{Whether to overwrite an existing file}

\item{orig_chunkfile, orig_docx}{The original chunkfile or Word document
created when the document was first knit. Useful for debugging, or in
cases where the word file has been corrupted or transformed, for instance
by copy-and-pasting the content into a new file. If provided, undoc will
use this chunkfile or word file to re-create the `.Rmd` file with the text
of the input.}

\item{verbose}{whether to print pandoc progress text}
}
\description{
Converts a document originally created with [rdocx_reversible()] back to R
Markdown, including changes made to text in MS Word.
}

+ 21
- 0
redoc.Rproj View File

@@ -0,0 +1,21 @@
Version: 1.0

RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: XeLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes

BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace,vignette

+ 4
- 0
tests/testthat.R View File

@@ -0,0 +1,4 @@
library(testthat)
library(redoc)

test_check("redoc")

+ 3
- 0
tests/testthat/.gitignore View File

@@ -0,0 +1,3 @@
skel.*
skel_files
skeleton.roundtrip.Rmd

+ 9
- 0
tests/testthat/test-reverse.R View File

@@ -0,0 +1,9 @@
context("redoc round-trips")

test_that("Document round-tripping works", {
rmarkdown::render(redoc_example_rmd(), output_dir = getwd(),
output_file = "skel.docx", quiet = TRUE)
rdoc <- undoc("skel.docx", overwrite = TRUE)
odoc <- redoc_extract_rmd("skel.docx", type = "roundtrip", overwrite = TRUE)
expect_equal(readLines(rdoc), readLines(odoc))
})

BIN
tests/testthat/test.docx View File


+ 2
- 0
vignettes/.gitignore View File

@@ -0,0 +1,2 @@
*.html
*.R

+ 59
- 0
vignettes/redoc-package-design.Rmd View File

@@ -0,0 +1,59 @@
---
title: "redoc Package Design"
author: "Noam Ross"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

This document describes the general approach and design of **redoc** for
developers interested in contributing.

Two-way R Markdown workflows are challenging because R Markdown and **knitr**
workflows are lossy - the compiled document does not contain all of the information
in the source. Also, we are limited by information that can be passed via
`pandoc` from markdown to final formats and in reverse.

To produced a Reversible Reproducible Document in Word (a "redoc"), the
`rdocx_reversible()` format first pre-parses the source `.Rmd` file. **knitr**
doesn't expose its parser to developers, so I've lifted most of the code for
this parser from **knitr** and **rmarkdown**. The parser captures YAML headers,
code chunks, and inline code, giving names to unnaamed chunks and inline code
sections. These are stored in a data frame saved to `filename.chunks.csv`.

`rdocx_reversible()` then knits the Rmd file, using
[knitr hooks](https://yihui.name/knitr/hooks/) to wrap all code chunks in
[pandoc style `divs` and `spans`](http://pandoc.org/MANUAL.html#divs-and-spans).
This differentiates these outputs in the compiled markdown. These divs and spans
are given an attribute of `custom-style="CHUNK_NAME"`. Non-included chunks are
replaced with an empty raw `docx` element.

Pandoc then converts the markdown to `docx`, and elements with `custom-style` attributes
keep these attributes in the `docx` format, by using the `docx+styles` extension.

`docx` files are just zip archives of XML files, so once the `docx` us created,
A post-processor embeds three files for later retrieval: the original
`*.Rmd`, the `*.chunks.csv` with chunk information, another version of the `.Rmd`
which has been converted to `docx` and back. These files are available for
retrieval when reversing the complilation. If `highlight_output=TRUE` is set,
the post-processor also modifies all the output styles to be visible.

To reverse compilation via `undoc()`, the `*.chunks.csv` file is first extracted,
then pandoc is used to convert the `docx` back to markdown. A custom [lua filter](https://pandoc.org/lua-filters.html) converts any track-changes text
to [Critic Markup](http://criticmarkup.com/spec.php), and another lua filter
replaces any elements with `custom-style="CHUNK_NAME"` attributes with placeholders
of the form `[[chunk-name]]`. `undoc()` then uses the data in `*.chunks.csv` to
replace these placeholders with original chunk (or inline code). In the event
that chunk output has been deleted, the chunk is placed immediately following
the previous chunk (or at the top of the document). Deleted inline elements
are not restored. The original YAML data is prepended to the document.

Loading…
Cancel
Save