Saturday, March 19, 2016

JSON



Jolt - JSON to JSON transform
http://bazaarvoice.github.io/jolt/
https://github.com/bazaarvoice/jolt/blob/master/gettingStarted.md
        List chainrSpecJSON = JsonUtils.classpathToList( "/json/sample/spec.json" );
        Chainr chainr = Chainr.fromSpec( chainrSpecJSON );

        Object inputJSON = JsonUtils.classpathToObject( "/json/sample/input.json" );

        Object transformedOutput = chainr.transform( inputJSON );
        System.out.println( JsonUtils.toJsonString( transformedOutput ) );
https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g94901479_261
JSON to JSON transform library
Declarative
Transforms are written in JSON

Transform Separable Concerns 1:
For each Input value, where does it go in the Output?
Operate on Maps-of-Maps
Small JSON based DSL for each transform "concern"
Chain them together

SPEC (Starts out as a copy of the INPUT
SPEC {
  "rating": {
    "quality": {
      "value": "SecondaryRatings.quality.Value",
    },
    "primary": {
      "value": [ "Rating", "PrimaryRating" ]
} } }

Shiftr is not going to Overwrite the 3.  Instead it is going go make an array, like so.
2) you are ok with an array in your output, and you understand order is not guaranteed.

Shiftr WildCards 101 : *  and &
SPEC {
  "rating": {
    "*": {
      "value":
    }
} }
SPEC
{
  "rating": "Ratings"
}
The @ sign.
Take whatever is "at this spot in the input tree", and copy it to the output.
SPEC
{
  "rating": {
    "@" : "Ratings",
    "primary" : {
      "value" : "PrimaryRating"
} } }


com.bazaarvoice.jolt.Shiftr
 {
   "rating": {
     "quality": {
         "value": "SecondaryRatings.quality.Value",     // copy 3 to "SecondaryRatings.quality.Value"
         "max": "SecondaryRatings.quality.RatingRange"  // copy 5 to "SecondaryRatings.quality.RatingRange"
     }
 }
com.bazaarvoice.jolt.Defaultr.class
https://github.com/bazaarvoice/jolt
  • jq - Awesome command line tool to extract data from JSON files (use it all the time, available via brew)
  • JsonPath - Java : Extract data from JSON using XPATH like syntax.
  • JsonSurfer - Java : Streaming JsonPath processor dedicated to processing big and complicated JSON data.
https://google.github.io/styleguide/jsoncstyleguide.xml
https://github.com/darcyliu/google-styleguide/blob/master/JSONStyleGuide.md
No comments in JSON objects.
Use double quotes.
If a property requires quotes, double quotes must be used. All property names must be surrounded by double quotes. Property values of type string must be surrounded by double quotes. Other value types (like boolean or number) should not be surrounded by double quotes.

Flattened data vs Structured Hierarchy
Data should not be arbitrarily grouped for convenience.
Data elements should be "flattened" in the JSON representation. Data should not be arbitrarily grouped for convenience.
In some cases, such as a collection of properties that represents a single structure, it may make sense to keep the structured hierarchy. These cases should be carefully considered, and only used if it makes semantic sense. For example, an address could be represented two ways, but the structured way probably makes more sense for developers:
Flattened Address:
{
  "company": "Google",
  "website": "http://www.google.com/",
  "addressLine1": "111 8th Ave",
  "addressLine2": "4th Floor",
  "state": "NY",
  "city": "New York",
  "zip": "10011"
}
Structured Address:
{
  "company": "Google",
  "website": "http://www.google.com/",
  "address": {
    "line1": "111 8th Ave",
    "line2": "4th Floor",
    "state": "NY",
    "city": "New York",
    "zip": "10011"
  }
}
Choose meaningful property names.

Property names must conform to the following guidelines:
  • Property names should be meaningful names with defined semantics.
  • Property names must be camel-cased, ascii strings.
  • Reserved JavaScript keywords should be avoided (A list of reserved JavaScript keywords can be found below).
These guidelines mirror the guidelines for naming JavaScript identifiers. This allows JavaScript clients to access properties using dot notation. (for example, result.thisIsAnInstanceVariable).

JSON maps can use any Unicode character in key names.

The property name naming rules do not apply when a JSON object is used as a map. A map (also referred to as an associative array) is a data type with arbitrary key/value pairs that use the keys to access the corresponding values. JSON objects and JSON maps look the same at runtime; this distinction is relevant to the design of the API. The API documentation should indicate when JSON objects are used as maps.
The keys of a map do not have to obey the naming guidelines for property names. Map keys may contain any Unicode characters. Clients can access these properties using the square bracket notation familiar for maps (for example, result.thumbnails["72"]).

Reserved Property Names


Avoid naming conflicts by choosing a new property name or versioning the API.
New properties may be added to the reserved list in the future. There is no concept of JSON namespacing. If there is a naming conflict, these can usually be resolved by choosing a new property name or by versioning. For example, suppose we start with the following JSON object:
{
  "apiVersion": "1.0",
  "data": {
    "recipeName": "pizza",
    "ingredients": ["tomatoes", "cheese", "sausage"]
  }
}
If in the future we wish to make ingredients a reserved word, we can do one of two things:
1) Choose a different name:
{
  "apiVersion": "1.0",
  "data": {
    "recipeName": "pizza",
    "ingredientsData": "Some new property",
    "ingredients": ["tomatoes", "cheese", "sausage"]
  }
}
2) Rename the property on a major version boundary:
{
  "apiVersion": "2.0",
  "data": {
    "recipeName": "pizza",
    "ingredients": "Some new property",
    "recipeIngredients": ["tomatos", "cheese", "sausage"]
  }
}
Consider removing empty or null values.
If a property is optional or has an empty or null value, consider dropping the property from the JSON, unless there's a strong semantic reason for its existence.
{
  "volume": 10,

  // Even though the "balance" property's value is zero, it should be left in,
  // since "0" signifies "even balance" (the value could be "-1" for left
  // balance and "+1" for right balance.
  "balance": 0,

  // The "currentlyPlaying" property can be left out since it is null.
  // "currentlyPlaying": null
}

Enum values should be represented as strings.
As APIs grow, enum values may be added, removed or changed. Using strings as enum values ensures that downstream clients can gracefully handle changes to enum values.
Java code:
public enum Color {
  WHITE,
  BLACK,
  RED,
  YELLOW,
  BLUE
}
JSON object:
{
  "color": "WHITE"
}

Property Value Data Types

Dates should be formatted as recommended by RFC 3339.

Time Duration Property Values




Time duration values should be strings formatted as recommended by ISO 8601.
  • P is the duration designator (for period) placed at the start of the duration representation.
{
  // three years, six months, four days, twelve hours,
  // thirty minutes, and five seconds
  "duration": "P3Y6M4DT12H30M5S"
}

Latitude/Longitude should be strings formatted as recommended by ISO 6709. Furthermore, they should favor the ±DD.DDDD±DDD.DDDD degrees format.
{
  // The latitude/longitude location of the statue of liberty.
  "statueOfLiberty": "+40.6894-074.0447"
}

Top-Level Reserved Property Names

Represents the desired version of the service API in a request, and the version of the service API that's served in the response. apiVersion should always be present. This is not related to the version of the data. Versioning of data should be handled through some other mechanism such as etags.

Client sets this value and server echos data in the response. This is useful in JSON-P and batch situations , where the user can use the context to correlate responses with requests. This property is a top-level property because the context should present regardless of whether the response was successful or an error. context differs from id in that context is specified by the user while id is assigned by the service.
Example:
Request #1:
http://www.google.com/myapi?context=bart
Response #1:
{
  "context": "bart",
  "data": {
    "items": []
  }
}
{ "id": "1" }
Represents the operation to perform, or that was performed, on the data. In the case of a JSON request, the method property can be used to indicate which operation to perform on the data. In the case of a JSON response, the method property can indicate the operation performed on the data.
One example of this is in JSON-RPC requests, where method indicates the operation to perform on the params property:
This object serves as a map of input parameters to send to an RPC request. It can be used in conjunction with the method property to execute an RPC function. If an RPC function does not need parameters, this property can be omitted.
{
  "method": "people.get",
  "params": {
    "userId": "@me",
    "groupId": "@self"
  }

Container for all the data from a response. This property itself has many reserved property names, which are described below. Services are free to add their own data to this object. A JSON response should contain either a data object or an error object, but not both. If both data and error are present, the error object takes precedence.

Indicates that an error has occurred, with details about the error. The error format supports either one or more errors returned from the service. A JSON response should contain either a data object or anerror object, but not both. If both data and error are present, the error object takes precedence.
{
  "apiVersion": "2.0",
  "error": {
    "code": 404,
    "message": "File Not Found",
    "errors": [{
      "domain": "Calendar",
      "reason": "ResourceNotFoundException",
      "message": "File Not Found
    }]
  }
}

Reserved Property Names in the data object

The kind property serves as a guide to what type of information this particular object stores. It can be present at the data level, or at the items level, or in any object where its helpful to distinguish between various types of objects. If the kind object is present, it should be the first property in the object (See the "Property Ordering" section below for more details).
// "Kind" indicates an "album" in the Picasa API.
{"data": {"kind": "album"}}
Represents the fields present in the response when doing a partial GET, or the fields present in a request when doing a partial PATCH. This property should only exist during a partial GET/PATCH, and should not be empty.
{
  "data": {
    "kind": "user",
    "fields": "author,id",
    "id": "bart",
    "author": "Bart"
  }
}

Represents the etag for the response. Details about ETags in the GData APIs can be found here: http://code.google.com/apis/gdata/docs/2.0/reference.html#ResourceVersioning
{"data": {"etag": "W/"C0QBRXcycSp7ImA9WxRVFUk.""}}
{"data": {"id": "12345"}}
{"data": {
  "items": [
    { "lang": "en",
      "title": "Hello world!" },
    { "lang": "fr",
      "title": "Bonjour monde!" }
  ]}
}
Indicates the last date/time (RFC 3339) the item was updated, as defined by the service.
{"data": {"updated": "2007-11-06T16:34:41.000Z"}}
A marker element, that, when present, indicates the containing entry is deleted. If deleted is present, its value must be true; a value of false can cause confusion and should be avoided.
{"data": {
  "items": [
    { "title": "A deleted entry",
      "deleted": true
    }
  ]}
}
The property name items is reserved to represent an array of items (for example, photos in Picasa, videos in YouTube). This construct is intended to provide a standard location for collections related to the current result. For example, the JSON output could be plugged into a generic pagination system that knows to page on the items array. If items exists, it should be the last property in the data object 
{
  "data": {
    "items": [
      { /* Object #1 */ },
      { /* Object #2 */ },
      ...
    ]
  }

Reserved Property Names for Paging

The number of items in this result set. Should be equivalent to items.length, and is provided as a convenience property. For example, suppose a developer requests a set of search items, and asks for 10 items per page. The total set of that search has 14 total items. The first page of items will have 10 items in it, so both itemsPerPage and currentItemCount will equal "10". The next page of items will have the remaining 4 items; itemsPerPage will still be "10", but currentItemCount will be "4".
{
  "data": {
    // "itemsPerPage" does not necessarily match "currentItemCount"
    "itemsPerPage": 10,
    "currentItemCount": 4
  }
}

data.totalItems


data.pagingLinkTemplate





http://www.webhek.com/convert-unquoted-json-key-string-to-json-object
json_string.replace(/(\s*?{\s*?|\s*?,\s*?)(['"])?([a-zA-Z0-9]+)(['"])?:/g, '$1"$3":');
eval('var json = new Object(' + json_string + ')');
最后,最简单的一种方法是直接用eval()运行它:
var obj = eval('(' + invalid_json + ')');
但这样执行时,你需要理解执行的代码是什么,因为如果它里面含有一些恶意程序,你这样直接运行很可能引起安全问题。
http://www.cnblogs.com/absfree/p/5502705.html
词法分析的目的是把这些无意义的字符串变成一个一个的token,而这些token有着自己的类型和值,所以计算机能够区分不同的token,还能以token为单位解读JSON数据。接下来,语法分析的目的就是进一步处理token,把token构造成一棵抽象语法树(Abstract Syntax Tree)(这棵树的结点是我们上面所说的抽象语法对象)。比如上面的JSON数据我们经过词法分析后得到了一系列token,然后我们把这些token作为语法分析的输入,就可以构造出一个JSONObject对象(即只有一个结点的抽象语法树),这个JSONObject对象有date和id两个实例域。下面我们来分别介绍词法分析与语法分析的原理和实现。

1. 词法分析

    JSON字符串中,一共有几种token呢?根据http://www.json.org/对JSON格式的相关定义,我们可以把token分为以下类型:
  • STRING(字符串字面量)
  • NUMBER(数字字面量)
  • NULL(null)
  • START_ARRAY([)
  • END_ARRAY(])
  • START_OBJ({)
  • END_OBJ(})
  • COMMA(,)
  • COLON(:)
  • BOOLEAN(true或者false)
  • END_DOC(表示JSON数据的结束)
    我们可以定义一个枚举类型来表示不同的token类型:
public enum TokenType {
    START_OBJ, END_OBJ, START_ARRAY, END_ARRAY, NULL, NUMBER, STRING, BOOLEAN, COLON, COMMA, END_DOC
}
然后,我们还需要定义一个Token类用于表示token:

public class Token {
    private TokenType type;
    private String value;
}
   在这之后,我们就可以开始写词法分析器了,词法分析器通常被称为lexer或是tokenizer。我们可以使用DFA(确定有限状态自动机)来实现tokenizer,也可以直接使用使用Java的regex包。这里我们使用DFA来实现tokenizer。
    实现词法分析器(tokenizer)和语法分析器(parser)的依据都是JSON文法,完整的JSON文法如下(来自https://www.zhihu.com/question/24640264/answer/80500016):

 1 private Token start() throws Exception {
 2     c = '?';
 3     Token token = null;
 4     do {    //先读一个字符,若为空白符(ASCII码在[0, 20H]上)则接着读,直到刚读的字符非空白符
 5         c = read();
 6     } while (isSpace(c));
 7     if (isNull(c)) {
 8         return new Token(TokenType.NULL, null);
 9     } else if (c == ',') {
10         return new Token(TokenType.COMMA, ",");
11     } else if (c == ':') {
12         return new Token(TokenType.COLON, ":");
13     } else if (c == '{') {
14         return new Token(TokenType.START_OBJ, "{");
15     } else if (c == '[') {
16         return new Token(TokenType.START_ARRAY, "[");
17     } else if (c == ']') {
18         return new Token(TokenType.END_ARRAY, "]");
19     } else if (c == '}') {
20         return new Token(TokenType.END_OBJ, "}");
21     } else if (isTrue(c)) {
22         return new Token(TokenType.BOOLEAN, "true"); //the value of TRUE is not null
23     } else if (isFalse(c)) {
24         return new Token(TokenType.BOOLEAN, "false"); //the value of FALSE is null
25     } else if (c == '"') {
26         return readString();
27     } else if (isNum(c)) {
28         unread();
29         return readNum();
30     } else if (c == -1) {
31         return new Token(TokenType.END_DOC, "EOF");
32     } else {
33         throw new JsonParseException("Invalid JSON input.");
34     }
35 }

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts