Jolt - JSON to JSON transform
http://bazaarvoice.github.io/jolt/
https://github.com/bazaarvoice/jolt/blob/master/gettingStarted.md
JSON to JSON transform library
Declarative
Transforms are written in JSON
Transform Separable Concerns 1:
For each Input value, where does it go in the Output?
Operate on Maps-of-Maps
Small JSON based DSL for each transform "concern"
Chain them together
SPEC (Starts out as a copy of the INPUT
SPEC {
"rating": {
"quality": {
"value": "SecondaryRatings.quality.Value",
},
"primary": {
"value": [ "Rating", "PrimaryRating" ]
} } }
Shiftr is not going to Overwrite the 3. Instead it is going go make an array, like so.
2) you are ok with an array in your output, and you understand order is not guaranteed.
Shiftr WildCards 101 : * and &
SPEC {
"rating": {
"*": {
"value":
}
} }
SPEC
{
"rating": "Ratings"
}
The @ sign.
Take whatever is "at this spot in the input tree", and copy it to the output.
SPEC
{
"rating": {
"@" : "Ratings",
"primary" : {
"value" : "PrimaryRating"
} } }
com.bazaarvoice.jolt.Shiftr
{
"rating": {
"quality": {
"value": "SecondaryRatings.quality.Value", // copy 3 to "SecondaryRatings.quality.Value"
"max": "SecondaryRatings.quality.RatingRange" // copy 5 to "SecondaryRatings.quality.RatingRange"
}
}
com.bazaarvoice.jolt.Defaultr.class
https://github.com/bazaarvoice/jolt
https://github.com/darcyliu/google-styleguide/blob/master/JSONStyleGuide.md
No comments in JSON objects.
Flattened data vs Structured Hierarchy
http://www.webhek.com/convert-unquoted-json-key-string-to-json-object
然后,我们还需要定义一个Token类用于表示token:
http://bazaarvoice.github.io/jolt/
https://github.com/bazaarvoice/jolt/blob/master/gettingStarted.md
List chainrSpecJSON = JsonUtils.classpathToList( "/json/sample/spec.json" );
Chainr chainr = Chainr.fromSpec( chainrSpecJSON );
Object inputJSON = JsonUtils.classpathToObject( "/json/sample/input.json" );
Object transformedOutput = chainr.transform( inputJSON );
System.out.println( JsonUtils.toJsonString( transformedOutput ) );
https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g94901479_261JSON to JSON transform library
Declarative
Transforms are written in JSON
Transform Separable Concerns 1:
For each Input value, where does it go in the Output?
Operate on Maps-of-Maps
Small JSON based DSL for each transform "concern"
Chain them together
SPEC (Starts out as a copy of the INPUT
SPEC {
"rating": {
"quality": {
"value": "SecondaryRatings.quality.Value",
},
"primary": {
"value": [ "Rating", "PrimaryRating" ]
} } }
Shiftr is not going to Overwrite the 3. Instead it is going go make an array, like so.
2) you are ok with an array in your output, and you understand order is not guaranteed.
Shiftr WildCards 101 : * and &
SPEC {
"rating": {
"*": {
"value":
}
} }
SPEC
{
"rating": "Ratings"
}
The @ sign.
Take whatever is "at this spot in the input tree", and copy it to the output.
SPEC
{
"rating": {
"@" : "Ratings",
"primary" : {
"value" : "PrimaryRating"
} } }
com.bazaarvoice.jolt.Shiftr
{
"rating": {
"quality": {
"value": "SecondaryRatings.quality.Value", // copy 3 to "SecondaryRatings.quality.Value"
"max": "SecondaryRatings.quality.RatingRange" // copy 5 to "SecondaryRatings.quality.RatingRange"
}
}
com.bazaarvoice.jolt.Defaultr.class
https://github.com/bazaarvoice/jolt
- jq - Awesome command line tool to extract data from JSON files (use it all the time, available via brew)
- JsonPath - Java : Extract data from JSON using XPATH like syntax.
- JsonSurfer - Java : Streaming JsonPath processor dedicated to processing big and complicated JSON data.
https://github.com/darcyliu/google-styleguide/blob/master/JSONStyleGuide.md
No comments in JSON objects.
Use double quotes.
If a property requires quotes, double quotes must be used. All property names must be surrounded by double quotes. Property values of type string must be surrounded by double quotes. Other value types (like boolean or number) should not be surrounded by double quotes.
Flattened data vs Structured Hierarchy
Data should not be arbitrarily grouped for convenience.
Data elements should be "flattened" in the JSON representation. Data should not be arbitrarily grouped for convenience.
In some cases, such as a collection of properties that represents a single structure, it may make sense to keep the structured hierarchy. These cases should be carefully considered, and only used if it makes semantic sense. For example, an address could be represented two ways, but the structured way probably makes more sense for developers:
Flattened Address:
{ "company": "Google", "website": "http://www.google.com/", "addressLine1": "111 8th Ave", "addressLine2": "4th Floor", "state": "NY", "city": "New York", "zip": "10011" }
Structured Address:
{ "company": "Google", "website": "http://www.google.com/", "address": { "line1": "111 8th Ave", "line2": "4th Floor", "state": "NY", "city": "New York", "zip": "10011" } }
Choose meaningful property names.
Property names must conform to the following guidelines:
- Property names should be meaningful names with defined semantics.
- Property names must be camel-cased, ascii strings.
- Reserved JavaScript keywords should be avoided (A list of reserved JavaScript keywords can be found below).
These guidelines mirror the guidelines for naming JavaScript identifiers. This allows JavaScript clients to access properties using dot notation. (for example,
result.thisIsAnInstanceVariable
).
JSON maps can use any Unicode character in key names.
The property name naming rules do not apply when a JSON object is used as a map. A map (also referred to as an associative array) is a data type with arbitrary key/value pairs that use the keys to access the corresponding values. JSON objects and JSON maps look the same at runtime; this distinction is relevant to the design of the API. The API documentation should indicate when JSON objects are used as maps.
The keys of a map do not have to obey the naming guidelines for property names. Map keys may contain any Unicode characters. Clients can access these properties using the square bracket notation familiar for maps (for example,
result.thumbnails["72"]
).Reserved Property Names
Avoid naming conflicts by choosing a new property name or versioning the API.
New properties may be added to the reserved list in the future. There is no concept of JSON namespacing. If there is a naming conflict, these can usually be resolved by choosing a new property name or by versioning. For example, suppose we start with the following JSON object:
If in the future we wish to make
1) Choose a different name:
2) Rename the property on a major version boundary:
{ "apiVersion": "1.0", "data": { "recipeName": "pizza", "ingredients": ["tomatoes", "cheese", "sausage"] } }
ingredients
a reserved word, we can do one of two things:1) Choose a different name:
{ "apiVersion": "1.0", "data": { "recipeName": "pizza", "ingredientsData": "Some new property", "ingredients": ["tomatoes", "cheese", "sausage"] } }
{ "apiVersion": "2.0", "data": { "recipeName": "pizza", "ingredients": "Some new property", "recipeIngredients": ["tomatos", "cheese", "sausage"] } }
Consider removing empty or
null
values.
If a property is optional or has an empty or
null
value, consider dropping the property from the JSON, unless there's a strong semantic reason for its existence.{ "volume": 10, // Even though the "balance" property's value is zero, it should be left in, // since "0" signifies "even balance" (the value could be "-1" for left // balance and "+1" for right balance. "balance": 0, // The "currentlyPlaying" property can be left out since it is null. // "currentlyPlaying": null }
Enum values should be represented as strings.
As APIs grow, enum values may be added, removed or changed. Using strings as enum values ensures that downstream clients can gracefully handle changes to enum values.
Java code:
JSON object:
Java code:
public enum Color { WHITE, BLACK, RED, YELLOW, BLUE }
{ "color": "WHITE" }
Property Value Data Types
Dates should be formatted as recommended by RFC 3339.
Time Duration Property Values
Time duration values should be strings formatted as recommended by ISO 8601.
- P is the duration designator (for period) placed at the start of the duration representation.
{ // three years, six months, four days, twelve hours, // thirty minutes, and five seconds "duration": "P3Y6M4DT12H30M5S" }
Latitude/Longitude should be strings formatted as recommended by ISO 6709. Furthermore, they should favor the ±DD.DDDD±DDD.DDDD degrees format.
{ // The latitude/longitude location of the statue of liberty. "statueOfLiberty": "+40.6894-074.0447" }
Top-Level Reserved Property Names
Represents the desired version of the service API in a request, and the version of the service API that's served in the response.
apiVersion
should always be present. This is not related to the version of the data. Versioning of data should be handled through some other mechanism such as etags.
Client sets this value and server echos data in the response. This is useful in JSON-P and batch situations , where the user can use the
context
to correlate responses with requests. This property is a top-level property because the context
should present regardless of whether the response was successful or an error. context
differs from id
in that context
is specified by the user while id
is assigned by the service.
Example:
Request #1:
http://www.google.com/myapi?context=bart
Response #1:
{ "context": "bart", "data": { "items": [] } }
{ "id": "1" }
Represents the operation to perform, or that was performed, on the data. In the case of a JSON request, the
method
property can be used to indicate which operation to perform on the data. In the case of a JSON response, the method
property can indicate the operation performed on the data.
One example of this is in JSON-RPC requests, where
method
indicates the operation to perform on the params
property:
This object serves as a map of input parameters to send to an RPC request. It can be used in conjunction with the
method
property to execute an RPC function. If an RPC function does not need parameters, this property can be omitted.{ "method": "people.get", "params": { "userId": "@me", "groupId": "@self" }
Container for all the data from a response. This property itself has many reserved property names, which are described below. Services are free to add their own data to this object. A JSON response should contain either a
data
object or an error
object, but not both. If both data
and error
are present, the error
object takes precedence.
Indicates that an error has occurred, with details about the error. The error format supports either one or more errors returned from the service. A JSON response should contain either a
data
object or anerror
object, but not both. If both data
and error
are present, the error
object takes precedence.{ "apiVersion": "2.0", "error": { "code": 404, "message": "File Not Found", "errors": [{ "domain": "Calendar", "reason": "ResourceNotFoundException", "message": "File Not Found }] } }
Reserved Property Names in the data object
The
kind
property serves as a guide to what type of information this particular object stores. It can be present at the data
level, or at the items
level, or in any object where its helpful to distinguish between various types of objects. If the kind
object is present, it should be the first property in the object (See the "Property Ordering" section below for more details).// "Kind" indicates an "album" in the Picasa API. {"data": {"kind": "album"}}
Represents the fields present in the response when doing a partial GET, or the fields present in a request when doing a partial PATCH. This property should only exist during a partial GET/PATCH, and should not be empty.
{ "data": { "kind": "user", "fields": "author,id", "id": "bart", "author": "Bart" } }
Represents the etag for the response. Details about ETags in the GData APIs can be found here: http://code.google.com/apis/gdata/docs/2.0/reference.html#ResourceVersioning
{"data": {"etag": "W/"C0QBRXcycSp7ImA9WxRVFUk.""}}
{"data": {"id": "12345"}}
{"data": { "items": [ { "lang": "en", "title": "Hello world!" }, { "lang": "fr", "title": "Bonjour monde!" } ]} }
Indicates the last date/time (RFC 3339) the item was updated, as defined by the service.
{"data": {"updated": "2007-11-06T16:34:41.000Z"}}
A marker element, that, when present, indicates the containing entry is deleted. If deleted is present, its value must be
true
; a value of false
can cause confusion and should be avoided.{"data": { "items": [ { "title": "A deleted entry", "deleted": true } ]} }
The property name
items
is reserved to represent an array of items (for example, photos in Picasa, videos in YouTube). This construct is intended to provide a standard location for collections related to the current result. For example, the JSON output could be plugged into a generic pagination system that knows to page on the items
array. If items
exists, it should be the last property in the data
object { "data": { "items": [ { /* Object #1 */ }, { /* Object #2 */ }, ... ] }
Reserved Property Names for Paging
The number of items in this result set. Should be equivalent to items.length, and is provided as a convenience property. For example, suppose a developer requests a set of search items, and asks for 10 items per page. The total set of that search has 14 total items. The first page of items will have 10 items in it, so both
itemsPerPage
and currentItemCount
will equal "10". The next page of items will have the remaining 4 items; itemsPerPage
will still be "10", but currentItemCount
will be "4".{ "data": { // "itemsPerPage" does not necessarily match "currentItemCount" "itemsPerPage": 10, "currentItemCount": 4 } }
data.totalItems
abstract boolean break byte case catch char class const continue debugger default delete do double else enum export extends false final finally float for function goto if implements import in instanceof int interface let long native new null package private protected public return short static super switch synchronized this throw throws transient true try typeof var volatile void while with yield
json_string.replace(/(\s*?{\s*?|\s*?,\s*?)(['"])?([a-zA-Z0-9]+)(['"])?:/g, '$1"$3":'); eval('var json = new Object(' + json_string + ')');
最后,最简单的一种方法是直接用
eval()
运行它:var obj = eval('(' + invalid_json + ')');
但这样执行时,你需要理解执行的代码是什么,因为如果它里面含有一些恶意程序,你这样直接运行很可能引起安全问题。
http://www.cnblogs.com/absfree/p/5502705.html
词法分析的目的是把这些无意义的字符串变成一个一个的token,而这些token有着自己的类型和值,所以计算机能够区分不同的token,还能以token为单位解读JSON数据。接下来,语法分析的目的就是进一步处理token,把token构造成一棵抽象语法树(Abstract Syntax Tree)(这棵树的结点是我们上面所说的抽象语法对象)。比如上面的JSON数据我们经过词法分析后得到了一系列token,然后我们把这些token作为语法分析的输入,就可以构造出一个JSONObject对象(即只有一个结点的抽象语法树),这个JSONObject对象有date和id两个实例域。下面我们来分别介绍词法分析与语法分析的原理和实现。
1. 词法分析
JSON字符串中,一共有几种token呢?根据http://www.json.org/对JSON格式的相关定义,我们可以把token分为以下类型:
- STRING(字符串字面量)
- NUMBER(数字字面量)
- NULL(null)
- START_ARRAY([)
- END_ARRAY(])
- START_OBJ({)
- END_OBJ(})
- COMMA(,)
- COLON(:)
- BOOLEAN(true或者false)
- END_DOC(表示JSON数据的结束)
我们可以定义一个枚举类型来表示不同的token类型:
public enum TokenType {
START_OBJ, END_OBJ, START_ARRAY, END_ARRAY, NULL, NUMBER, STRING, BOOLEAN, COLON, COMMA, END_DOC
}
public class Token { private TokenType type; private String value;
}
在这之后,我们就可以开始写词法分析器了,词法分析器通常被称为lexer或是tokenizer。我们可以使用DFA(确定有限状态自动机)来实现tokenizer,也可以直接使用使用Java的regex包。这里我们使用DFA来实现tokenizer。
实现词法分析器(tokenizer)和语法分析器(parser)的依据都是JSON文法,完整的JSON文法如下(来自https://www.zhihu.com/question/24640264/answer/80500016):
1 private Token start() throws Exception { 2 c = '?'; 3 Token token = null; 4 do { //先读一个字符,若为空白符(ASCII码在[0, 20H]上)则接着读,直到刚读的字符非空白符 5 c = read(); 6 } while (isSpace(c)); 7 if (isNull(c)) { 8 return new Token(TokenType.NULL, null); 9 } else if (c == ',') { 10 return new Token(TokenType.COMMA, ","); 11 } else if (c == ':') { 12 return new Token(TokenType.COLON, ":"); 13 } else if (c == '{') { 14 return new Token(TokenType.START_OBJ, "{"); 15 } else if (c == '[') { 16 return new Token(TokenType.START_ARRAY, "["); 17 } else if (c == ']') { 18 return new Token(TokenType.END_ARRAY, "]"); 19 } else if (c == '}') { 20 return new Token(TokenType.END_OBJ, "}"); 21 } else if (isTrue(c)) { 22 return new Token(TokenType.BOOLEAN, "true"); //the value of TRUE is not null 23 } else if (isFalse(c)) { 24 return new Token(TokenType.BOOLEAN, "false"); //the value of FALSE is null 25 } else if (c == '"') { 26 return readString(); 27 } else if (isNum(c)) { 28 unread(); 29 return readNum(); 30 } else if (c == -1) { 31 return new Token(TokenType.END_DOC, "EOF"); 32 } else { 33 throw new JsonParseException("Invalid JSON input."); 34 } 35 }