R>> dm>> 返回
项目作者: krlmlr

项目描述 :
Relational data models
高级语言: R
项目地址: git://github.com/krlmlr/dm.git
创建时间: 2019-07-02T12:22:06Z
项目社区:https://github.com/krlmlr/dm

开源协议:Other

下载


dm

Lifecycle:
stable
R build
status
Codecov test
coverage
CRAN
status
Launch
rstudio.cloud

Are you using multiple data frames or database tables in R? Organize
them with dm.

  • Use it for data analysis today.
  • Build data models tomorrow.
  • Deploy the data models to your organization’s Relational Database
    Management System (RDBMS) the day after.

Overview

dm bridges the gap in the data pipeline between individual data frames
and relational databases. It’s a grammar of joined tables that provides
a consistent set of verbs for consuming, creating, and deploying
relational data models. For individual researchers, it broadens the
scope of datasets they can work with and how they work with them. For
organizations, it enables teams to quickly and efficiently create and
share large, complex datasets.

dm objects encapsulate relational data models constructed from local
data frames or lazy tables connected to an RDBMS. dm objects support the
full suite of dplyr data manipulation verbs along with additional
methods for constructing and verifying relational data models, including
key selection, key creation, and rigorous constraint checking. Once a
data model is complete, dm provides methods for deploying it to an
RDBMS. This allows it to scale from datasets that fit in memory to
databases with billions of rows.

Features

dm makes it easy to bring an existing relational data model into your R
session. As the dm object behaves like a named list of tables it
requires little change to incorporate it within existing workflows. The
dm interface and behavior is modeled after dplyr, so you may already be
familiar with many of its verbs. dm also offers:

  • visualization to help you understand relationships between entities
    represented by the tables
  • simpler joins that “know” how tables are related, including a
    “flatten” operation that automatically follows keys and performs
    column name disambiguation
  • consistency and constraint checks to help you understand (and fix) the
    limitations of your data

That’s just the tip of the iceberg. See Getting
started
to hit the ground
running and explore all the features.

Installation

The latest stable version of the {dm} package can be obtained from
CRAN with the command

  1. install.packages("dm")

The latest development version of {dm} can be installed from R-universe:

  1. # Enable repository from cynkra
  2. options(
  3. repos = c(
  4. cynkra = "https://cynkra.r-universe.dev",
  5. CRAN = "https://cloud.r-project.org"
  6. )
  7. )
  8. # Download and install dm in R
  9. install.packages('dm')

or from GitHub:

  1. # install.packages("devtools")
  2. devtools::install_github("cynkra/dm")

Usage

Create a dm object (see Getting
started
for details).

  1. library(dm)
  2. dm <- dm_nycflights13(table_description = TRUE)
  3. dm
  4. #> ── Metadata ────────────────────────────────────────────────────────────────────
  5. #> Tables: `airlines`, `airports`, `flights`, `planes`, `weather`
  6. #> Columns: 53
  7. #> Primary keys: 4
  8. #> Foreign keys: 4

dm is a named list of tables:

  1. names(dm)
  2. #> [1] "airlines" "airports" "flights" "planes" "weather"
  3. nrow(dm$airports)
  4. #> [1] 86
  5. dm$flights %>%
  6. count(origin)
  7. #> # A tibble: 3 × 2
  8. #> origin n
  9. #> <chr> <int>
  10. #> 1 EWR 641
  11. #> 2 JFK 602
  12. #> 3 LGA 518

Visualize relationships at any time:

  1. dm %>%
  2. dm_draw()

Simple joins:

  1. dm %>%
  2. dm_flatten_to_tbl(flights)
  3. #> Renaming ambiguous columns: %>%
  4. #> dm_rename(flights, year.flights = year) %>%
  5. #> dm_rename(flights, month.flights = month) %>%
  6. #> dm_rename(flights, day.flights = day) %>%
  7. #> dm_rename(flights, hour.flights = hour) %>%
  8. #> dm_rename(airlines, name.airlines = name) %>%
  9. #> dm_rename(airports, name.airports = name) %>%
  10. #> dm_rename(planes, year.planes = year) %>%
  11. #> dm_rename(weather, year.weather = year) %>%
  12. #> dm_rename(weather, month.weather = month) %>%
  13. #> dm_rename(weather, day.weather = day) %>%
  14. #> dm_rename(weather, hour.weather = hour)
  15. #> # A tibble: 1,761 × 48
  16. #> year.flights month.…¹ day.f…² dep_t…³ sched…⁴ dep_d…⁵ arr_t…⁶ sched…⁷ arr_d…⁸
  17. #> <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl>
  18. #> 1 2013 1 10 3 2359 4 426 437 -11
  19. #> 2 2013 1 10 16 2359 17 447 444 3
  20. #> 3 2013 1 10 450 500 -10 634 648 -14
  21. #> 4 2013 1 10 520 525 -5 813 820 -7
  22. #> 5 2013 1 10 530 530 0 824 829 -5
  23. #> 6 2013 1 10 531 540 -9 832 850 -18
  24. #> 7 2013 1 10 535 540 -5 1015 1017 -2
  25. #> 8 2013 1 10 546 600 -14 645 709 -24
  26. #> 9 2013 1 10 549 600 -11 652 724 -32
  27. #> 10 2013 1 10 550 600 -10 649 703 -14
  28. #> # ℹ 1,751 more rows
  29. #> # ℹ abbreviated names: ¹​month.flights, ²​day.flights, ³​dep_time,
  30. #> # ⁴​sched_dep_time, ⁵​dep_delay, ⁶​arr_time, ⁷​sched_arr_time, ⁸​arr_delay
  31. #> # ℹ 39 more variables: carrier <chr>, flight <int>, tailnum <chr>,
  32. #> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
  33. #> # hour.flights <dbl>, minute <dbl>, time_hour <dttm>, name.airlines <chr>,
  34. #> # name.airports <chr>, lat <dbl>, lon <dbl>, alt <dbl>, tz <dbl>, dst <chr>,
  35. #> # tzone <chr>, year.planes <int>, type <chr>, manufacturer <chr>,
  36. #> # model <chr>, engines <int>, seats <int>, speed <int>, engine <chr>,
  37. #> # year.weather <int>, month.weather <int>, day.weather <int>,
  38. #> # hour.weather <int>, temp <dbl>, dewp <dbl>, humid <dbl>, wind_dir <dbl>,
  39. #> # wind_speed <dbl>, wind_gust <dbl>, precip <dbl>, pressure <dbl>, …

Check consistency:

  1. dm %>%
  2. dm_examine_constraints()
  3. #> ! Unsatisfied constraints:
  4. #> • Table `flights`: foreign key `tailnum` into table `planes`: values of `flights$tailnum` not in `planes$tailnum`: N725MQ (6), N537MQ (5), N722MQ (5), N730MQ (5), N736MQ (5), …

Learn more in the Getting
started
article.

Getting help

If you encounter a clear bug, please file an issue with a minimal
reproducible example on GitHub.
For questions and other discussion, please use
community.rstudio.com.


License: MIT © cynkra GmbH.

Funded by:

energie360°

cynkra


Please note that the ‘dm’ project is released with a Contributor Code
of Conduct
. By contributing
to this project, you agree to abide by its terms.