Skip to content

Incostencies in behaviour of DataFrame. #515

@weqopy

Description

@weqopy

EDIT (@v0dro):
Following is a list of method that should be implemented/corrected to get more consistency:

  • Vector#last.
  • DataFrame#last.
  • Return type of DataFrame#[] must be consistent when using a timeseries. It currently returns either a numerical value of another Vector or DataFrame depending on what you pass into #[].
  • Return nil when element not present in the DataFrame (currently raises error).

Ideally these should be split into separate issues and tackled one at a time.


I'd like to use this data to show the situation made me confused:

[25] pry(main)> dates=["2018-03-30", "2018-04-02", "2018-04-27", "2018-05-31", "2018-06-29", "2018-07-31", "2018-08-31", "2018-09-28", "2018-10-31", "2018-11-30"]
=> ["2018-03-30",
 "2018-04-02",
 "2018-04-27",
 "2018-05-31",
 "2018-06-29",
 "2018-07-31",
 "2018-08-31",
 "2018-09-28",
 "2018-10-31",
 "2018-11-30"]
[26] pry(main)> val=[1.00000001, 0.9999, 0.9908, 1.0885, 1.0586, 1.0374, 0.9456, 0.9638, 0.8397, 0.8788]
=> [1.00000001, 0.9999, 0.9908, 1.0885, 1.0586, 1.0374, 0.9456, 0.9638, 0.8397, 0.8788]
[27] pry(main)> id=Daru::DateTimeIndex.new(dates)
=> #<Daru::DateTimeIndex(10) 2018-03-30T00:00:00+00:00...2018-11-30T00:00:00+00:00>
[28] pry(main)> df = Daru::DataFrame.new({val: val}, index: id)
=> #<Daru::DataFrame(10x1)>
                   val
 2018-03-30 1.00000001
 2018-04-02     0.9999
 2018-04-27     0.9908
 2018-05-31     1.0885
 2018-06-29     1.0586
 2018-07-31     1.0374
 2018-08-31     0.9456
 2018-09-28     0.9638
 2018-10-31     0.8397
 2018-11-30     0.8788
  • first & last
[29] pry(main)> df.val.first
=> 1.00000001
[30] pry(main)> df.val.last 
NoMethodError: undefined method `last' for #<Daru::Vector:0x00007f43dbc591f0>
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/vector.rb:1420:in `method_missing'
# which I supposed it returns 0.8788
  • The return type
[31] pry(main)> df.val['2018-03-30','2018-04-30']
=> #<Daru::Vector(3)>
                                       val
 2018-03-30T00:00:00+           1.00000001
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
[32] pry(main)> df.val['2018-04']
=> #<Daru::Vector(2)>
                                       val
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
[33] pry(main)> df.val['2018-03-30','2018-04-01']
=> 1.00000001
[34] pry(main)> df.val['2018-03']
=> 1.00000001
# which I supposed [33] and [34] both return:
# => #<Daru::Vector(1)>
#                                        val
#  2018-03-30T00:00:00+           1.00000001
  • errors and a not error
[48] pry(main)> df.val['2018']
=> #<Daru::Vector(10)>
                                       val
 2018-03-30T00:00:00+           1.00000001
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
 2018-05-31T00:00:00+               1.0885
 2018-06-29T00:00:00+               1.0586
 2018-07-31T00:00:00+               1.0374
 2018-08-31T00:00:00+               0.9456
 2018-09-28T00:00:00+               0.9638
 2018-10-31T00:00:00+               0.8397
 2018-11-30T00:00:00+               0.8788
[49] pry(main)> df.val['2017']
ArgumentError: Key 2017 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'
[50] pry(main)> df.val['2019']
ArgumentError: Key 2019 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'
[52] pry(main)> df.val['2018-12']
ArgumentError: bad value for range
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:547:in `slice_between_dates'
[53] pry(main)> df.val['2018-02']
=> #<Daru::Vector(10)>
                                       val
 2018-03-30T00:00:00+           1.00000001
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
 2018-05-31T00:00:00+               1.0885
 2018-06-29T00:00:00+               1.0586
 2018-07-31T00:00:00+               1.0374
 2018-08-31T00:00:00+               0.9456
 2018-09-28T00:00:00+               0.9638
 2018-10-31T00:00:00+               0.8397
 2018-11-30T00:00:00+               0.8788
# I supposed all those errors and [53] could return #<Daru::Vector(0)> #

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions