Skip to content

Inconsistent treatment of unicode character in RStudio vs quarto #492

@richardjtelford

Description

@richardjtelford

This is a problem that affects some of the students with windows computers in my class.

The students are importing an excel file that contains a unicode character \u2103 (℃) in the header row. They are then using janitor::clean_names().

For most of the students janitor::clean_names() converts the column name to "temperature_c" in both Rstudio and when rendering with quarto.
For about 20% of the students, janitor::clean_names() converts the "℃" to "temperature_u_00b0_c" (the unicode for "°") in Rstudio but to "temperature_c" when rendered with quarto. This then causes problems with the rest of their code when they render the document

In both rstudio and quarto the "℃" is being imported correctly as utf-8 and has the same output with charToRaw() - e2 84 83, so it is not an import problem. Somehow janitor is treating the unicode differently depending on how R is being run.

All the affected students are using R4.2.1 with the current version of RStudio on windows. Students might have Norwegian locales - I haven't been able to check that.

Minimal example (but it might work correctly for you)

tibble::tibble("Temperature (℃)" = 1) |> janitor::clean_names() |> names()
#temperature_u_00b0_c 

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions