class String

This is an extension and modification of the standard String class. We do a lot of UTF-8 character processing in the parser. Ruby 1.8 does not have good enough UTF-8 support and Ruby 1.9 only handles UTF-8 characters as Strings. This is very inefficient compared to representing them as Fixnum objects. Some of these hacks can be removed once we have switched to 1.9 support only.

Public Instance Methods

<<(obj) click to toggle source

Replacement for the existing << operator that also works for characters above Fixnum 255 (UTF-8 characters).

# File lib/taskjuggler/UTF8String.rb, line 61
def << (obj)
  if obj.is_a?(String) || (obj < 256)
    # In this case we can use the built-in concat.
    concat(obj)
  else
    # UTF-8 characters have a maximum length of 4 byte and no byte is 0.
    mask = 0xFF000000
    pos = 3
    while pos >= 0
      # Use the built-in concat operator for each byte.
      concat((obj & mask) >> (8 * pos)) if (obj & mask) != 0
      # Move mask and position to the next byte.
      mask = mask >> 8
      pos -= 1
    end
  end
end
Also aliased as: old_double_left_angle
each_utf8_char() { |c| ... } click to toggle source

Iterate over the String calling the block for each UTF-8 character in the String. This implementation looks more awkward but is noticeably faster than the often propagated regexp based implementations.

# File lib/taskjuggler/UTF8String.rb, line 30
def each_utf8_char
  c = ''
  length = 0
  each_byte do |b|
    c << b
    if length > 0
      # subsequent unicode byte
      if (length -= 1) == 0
        # end of unicode character reached
        yield c
        c = ''
      end
    elsif (b & 0xC0) == 0xC0
      # first unicode byte
      length = -1
      while (b & 0x80) != 0
        length += 1
        b = b << 1
      end
    else
      # ASCII character
      yield c
      c = ''
    end
  end
end
forceUTF8Encoding() click to toggle source

Ensure the String is really UTF-8 encoded and newlines are only n. If that's not possible, an Encoding::UndefinedConversionError is raised.

# File lib/taskjuggler/UTF8String.rb, line 121
def forceUTF8Encoding
  if RUBY_VERSION < '1.9.0'
    # Ruby 1.8 really only support 7 bit ASCII well. Only do the line-end
    # clean-up.
    gsub(/\r\n/, "\n")
  else
    begin
      # Ensure that the text has LF line ends and is UTF-8 encoded.
      encode('UTF-8', :universal_newline => true)
    rescue
      # The encoding of the String is broken. Find the first broken line and
      # report it.
      lineCtr = 1
      each_line do |line|
        begin
          line.encode('UTF-8')
        rescue
         line = line.encode('UTF-8', :invalid => :replace,
                                     :undef => :replace, :replace => '<?>')
          raise Encoding::UndefinedConversionError,
                "UTF-8 encoding error in line #{lineCtr}: #{line}"
        end
        lineCtr += 1
      end
    end
  end
end
length_utf8() click to toggle source

Return the number of UTF8 characters in the String. We don't override the built-in length() function here as we don't know who else uses it for what purpose.

# File lib/taskjuggler/UTF8String.rb, line 82
def length_utf8
  len = 0
  each_utf8_char { |c| len += 1 }
  len
end
ljust(len, pad = ' ') click to toggle source
# File lib/taskjuggler/UTF8String.rb, line 88
def ljust(len, pad = ' ')
  return self + pad * (len - length_utf8) if length_utf8 < len
  self
end
old_double_left_angle(obj)
Alias for: <<
old_reverse()
Alias for: reverse
reverse() click to toggle source

UTF-8 aware version of reverse that replaces the built-in one.

# File lib/taskjuggler/UTF8String.rb, line 96
def reverse
  a = []
  each_utf8_char { |c| a << c }
  a.reverse.join
end
Also aliased as: old_reverse
to_base64() click to toggle source
# File lib/taskjuggler/UTF8String.rb, line 111
def to_base64
  Base64.encode64(self)
end
to_quoted_printable() click to toggle source
# File lib/taskjuggler/UTF8String.rb, line 107
def to_quoted_printable
  [self].pack('M').gsub(/\n/, "\r\n")
end
unix2dos() click to toggle source
# File lib/taskjuggler/UTF8String.rb, line 115
def unix2dos
  gsub(/\n/, "\r\n")
end