Thursday, July 18, 2019

Protocol Buffer Tips and Tricks



Extensions let you declare that a range of field numbers in a message are available for third-party extensions. An extension is a placeholder for a field whose type is not defined by the original .proto file. This allows other .proto files to add to your message definition by defining the types of some or all of the fields with those field numbers.

the problem is that map is a pointer so the [] operator does not work.

Thus, the pointer needs to be dereferenced first. *map[key] also does not work, as the compiler first parses [] and then the *. The following does work:

(*map)[key] = val;
you could do: auto& map = *test.mutable_map1();, and then map[key] would work

map->operator[](key) = val

a mutable_ getter that lets you get a direct pointer to the string, and some extra setters.
The last two methods, set_allocated_nickname and release_nickname are only needed for manual memory management
bool IsInitialized(): checks if all the required fields have been set (proto2 only).

Protobuf has many advantages for serialization that go beyond the capacity of XML. It allows you to create a simpler description than using XML. Even for small messages, when requiring multiple nested messages, reading XML starts to get difficult for human eyes.

Another advantage is the size, as the Protobuf format is simplified, the files can reach 10 times smaller compared to XML. But the great benefit is its speed, which can reach 100 times faster than the standard XML serialization, all due to its optimized mechanism. In addition to size and speed, Protobuf has a compiler capable of processing a .proto file to generate multiple supported languages, unlike the traditional method where it is necessary to arrange the same structure in multiple source files.

  • bool ParseFromString(const string& data);: parses a message from the given string.
  • bool SerializeToOstream(ostream* output) const;: writes the message to the given C++ ostream.
  • bool ParseFromIstream(istream* input);: parses a message from the given C++ istream.


  • use uint32 if the value cannot be negative
  • use sint32 if the value is pretty much as likely to be negative as not (for some fuzzy definition of "as likely to be")
  • use int32 if the value could be negative, but that's much less likely than the value being positive (for example, if the application sometimes uses -1 to indicate an error or 'unknown' value and this is a relatively uncommon situation)

As you saw in the previous section, all the protocol buffer types associated with wire type 0 are encoded as varints. However, there is an important difference between the signed int types (sint32 and sint64) and the "standard" int types (int32 and int64) when it comes to encoding negative numbers. If you use int32 or int64 as the type for a negative number, the resulting varint is always ten bytes long – it is, effectively, treated like a very large unsigned integer. If you use one of the signed types, the resulting varint uses ZigZag encoding, which is much more efficient.

ZigZag encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small varint encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers, so that -1 is encoded as 1, 1 is encoded as 2, -2 is encoded as 3, and so on


上述三个字节实际分为两部分: 08  96 01。第一部分(08)包含了message成员变量的field number(a=1)和变量类型(Varint),第二部分(96 01)为a的实际值150。

这里面涉及几个概念:

  Varint:这个可以理解为可变长的int类型,数值越小使用的byte越少;

  field number和type:protocol buffer消息为一系列的key-value对。二进制版本的消息使用field number作为key。


message流中的key类型为varint,计算方式为:(field_number << 3) | wire_type ,即后三位保存了通信类型

上述第一个字节为08,转化为二进制为0000 1000,没个varint的第一个比特位为MSB位,置位表示后续还有字节。去掉MSB位后为

000 1000

后三位表示类型,值为0,表示类型为Varint;右移三位获取tag值为1(即message中设置的a = 1)

下面获取消息值150,注意:字节顺序为大端序

96 01 = 1001 0110  0000 0001
       → 000 0001  ++  001 0110 (drop the msb and reverse the groups of 7 bits)
       → 10010110
       → 128 + 16 + 4 + 2 = 150


https://stackoverflow.com/questions/43675811/create-new-builder-using-com-google-protobuf-descriptors-descriptor
You can't create a Builder from a Descriptor. A Descriptor has no type information as to the proto (or builder) class that it need to create, because all Descriptor instances are of the same class (it's final).
If you can only work with the Descriptor, your if/else is roughly as good as you can get. (I say roughly because you could do it with a map or a switch instead; but it's basically the same).
A better approach would be to work with the default instance of the proto that you are trying to create (or any other instance of that proto; but the default instance is simplest to obtain).
Message prototype = Foo.getDefaultInstance();  // Or Bar.getDefaultInstance().
because from Message you can get both a builder and the descriptor:
Message.Builder builder = prototype.newBuilderForType();
Descriptor descriptor = prototype.getDescriptorForType();

http://giorgio.azzinna.ro/2017/07/extending-protobuf-dynamic-messages/
Message is an abstract interface, but whenever you call protoc the generated classes will subclass it, hence the frequent indirect usage.

Descriptor
Descriptor, as the name suggests, describes messages.
Again, think of protoc: once it effectively parses the .proto files, it will create a Descriptor for each message.

With this in mind, it should be clear when we need Descriptors or Messages. When dealing with actual objects filled with data, Message can be used (hand in hand with reflection).
When message definitions are unknown at compile-time, and should be generated at run-time, Descriptor does the job.

https://codeburst.io/using-dynamic-messages-in-protocol-buffers-in-scala-9fda4f0efcb3

https://pinkiepractices.com/posts/protobuf-field-masks/
Field masks are similar to any other kind of mask. You might already be familiar with bitmasks (for bitwise operations) or layer masks (for image editing). A mask lets you indicate which parts of an object you’re interested in.


To actually write the code to do this, we want to use FieldMaskUtil.
The merge method will apply a field mask to a message for us. merge takes in a field mask, a source message, and a destination message builder. It sets fields in the destination builder, according to the field mask and source.
public FetchItemResponse fetchItem(FetchItemRequest request) {
    Item item = // fetch item as before
    Item filteredItem = Item.newBuilder();
    FieldMaskUtil.merge(request.getFieldMask(), item, filteredItem);
    return FetchItemResponse.newBuilder()
        .setItem(filteredItem)
        .build();
}

https://developers.google.com/protocol-buffers/docs/proto3#scalar
https://developers.google.com/protocol-buffers/docs/proto
import "myproject/other_protos.proto";
  • the field numbers for any existing fields.
  • Any new fields that you add should be optional or repeated. This means that any messages serialized by code using your "old" message format can be parsed by your new generated code, as they won't be missing any required elements. You should set up sensible default values for these elements so that new code can properly interact with messages generated by old code. Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available.
  • Non-required fields can be removed, as long as the field number is not used again in your updated message type. You may want to rename the field instead, perhaps adding the prefix "OBSOLETE_", or make the field number reserved, so that future users of your .proto can't accidentally reuse the number.
  • A non-required field can be converted to an extension and vice versa, as long as the type and number stay the same.
  • optional is compatible with repeated. Given serialized data of a repeated field as input, clients that expect this field to be optional will take the last input value if it's a primitive type field or merge all input elements if it's a message type field.

For historical reasons, repeated fields of scalar numeric types aren't encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding. For example:
repeated int32 samples = 4 [packed=true];

Reserved Fields
If you update a message type by entirely removing a field, or commenting it out, future users can reuse the field number when making their own updates to the type. This can cause severe issues if they later load old versions of the same .proto, including data corruption, privacy bugs, and so on. One way to make sure this doesn't happen is to specify that the field numbers (and/or names, which can also cause issues for JSON serialization) of your deleted fields are reserved. The protocol buffer compiler will complain if any future users try to use these field identifiers.
message Foo { reserved 2, 15, 9 to 11; reserved "foo", "bar"; }

Oneof

If you have a message with many optional fields and where at most one field will be set at the same time, you can enforce this behavior and save memory by using the oneof feature.

Oneof fields are like optional fields except all the fields in a oneof share memory, and at most one field can be set at the same time. Setting any member of the oneof automatically clears all the other members

https://blog.bazel.build/2017/02/27/protocol-buffers.html
proto_library is a language-agnostic rule that describes relations between .proto files.
java_proto_libraryjava_lite_proto_library and cc_proto_library are rules that "attach" to proto_library and generate language-specific bindings.

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts