Esercizi aggregation su MongoDB - Books

file book.txt

Json del documento

{

"_id" : 9,

"title" : "Griffon in Action",

"isbn" : "1935182234",

"pageCount" : 375,

"publishedDate" : ISODate("2012-06-04T07:00:00Z"),

"thumbnailUrl" : "https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.book-thumb-images/almiray.jpg",

"shortDescription" : "Griffon....",

"longDescription" : "Although...",

"status" : "PUBLISH",

"authors" : [ "Andres Almiray", "Danno Ferrin", "", "James Shingler"],

"categories" : [ "Java"]

}

Query da fare:

  • derive the categories of the books written by “Andres Almiray”
  • derive the average length of the books on “Java”
  • derive the shortest books in each category
  • derive the most prolific authors wrt the number of books they have written
  • rank the authors wrt the number of pages they have written ○ for a book with n authors and p pages, author contribution is estimated as p/n
  • derive the most referred book categories


Group Accumulator Operators

$sum (Returns a sum of numerical values. Ignores non-numeric values.)

$avg (Returns an average of numerical values. Ignores non-numeric values.)

$first (Returns a value from the first document for each group. Order is only defined if the documents are in a defined order. - Available in $group stage only.)

$last (Returns a value from the last document for each group. Order is only defined if the documents are in a defined order. - Available in $group stage only.)

$max (Returns the highest expression value for each group. - Available in both $group and $project stages)

$min (Returns the lowest expression value for each group. - Available in both $group and $project stages)

$push (Returns an array of expression values for each group. - Available in $group stage only)

$addToSet (Returns an array of unique expression values for each group. Order of the array elements is undefined. - Available in $group stage only.)

$stdDevPop (Returns the population standard deviation of the input values. - Available in both $group and $project stages)

$stdDevSamp (Returns the sample standard deviation of the input values. - Available in both $group and $project stages)

{$sort: { field1: sortOrder, field2 : sortOrder, ... } }

{ $cmp: [ expression1, expression2] } -1 0 oppure +1 in base al valori di comparazione

$addFields { $addFields: { newField: expression, ... } }

$size { $size: expression }

$ne{ $ne: [ expression1, expression2 ] } (diverso)

Limit { $limit: positiveInteger }


Varie
$gt (Maggiore - db.inventory.find( { qty: { $gt: 20 } } ) )

1) derive the categories of the books written by “Andres Almiray”


db.books.aggregate([
{ $match: { authors: {$in: ["Andres Almiray","$authors"]}}},
{ $group : { _id : "NULL", categories: {$addToSet: "$categories" }}},
{ $project: { _id:0, autore:"Andres Almiray", categories:1 }}
])

2) derive the average length of the books on “Java”

db.books.aggregate([
{ $match: { categories: {$in: ["Java", "$categories"]}}},
{ $group: { _id:"NULL", avarage: {$avg: "$pageCount"}}},
{ $project: {_id:0, JavaAvarageLenght: "$avarage"}}
]).pretty()

3) derive the shortest books in each category [io]

db.books.aggregate([
{ $unwind: "$categories" },
{ $sort : { pageCount : -1}},
{ $group: {_id: {categories:"$categories", title:"$title" }, npage:{$push: "$pageCount"}}},
{ $unwind: "$npage" },
{ $project: {_id:"NULL", categories:"$_id.categories", title:"$_id.title",npage:"$npage"}},
{ $sort : { npage : -1}},
{ $group: {_id:"$categories", book: {$last:"$title"}, npage: {$last: "$npage"}}},
{ $project: {_id:0, categories:"$_id", title:"$book",npage:"$npage"}}
]).pretty()

3) derive the shortest books in each category [Giusto - Proff]

db.books.aggregate([
{$unwind: "$categories"},
{$sort: {"pageCount":1}},
{$group: {"_id":{"category":"$categories"}, "books":{$addToSet: {"title":"$title","pages":"$pageCount"}}, "shortest":{$first:"$pageCount"}}},
{$unwind: "$books"},
{$addFields: {"toBeSelected": {$cmp: ["$books.pages","$shortest"]}}},
{$match: {"toBeSelected":0}},
{$group: {"_id":{"category":"$_id.category"}, "books":{$addToSet: {"title":"$books.title","pages":"$books.pages"}}, "cfr":{$first:"$shortest"}}}
])

4) derive the most prolific authors wrt the number of books they have written

db.books.aggregate([
{ $unwind: "$authors" },
{ $group:{ _id : "$authors", nbook:{$sum: 1} }},
{ $sort: { nbook:-1 }}
]).pretty()

5) rank the authors wrt the number of pages they have written (for a book with n authors and p pages, author contribution is estimated as p/n )

db.books.aggregate([
{ $addFields: { "nauthors" : { $size: "$authors" }}},
{ $match : { nauthors : {$gt: 0}}},
{ $addFields: { "authorsPage": {$divide: ["$pageCount", "$nauthors"]}}},
{ $unwind:"$authors" },
{ $group:{ _id:"$authors", npage: {$sum: "$authorsPage"}}},
{ $sort: { npage:-1 }}
{ $project: {_id:0, autor:"$_id", pagewritten:"$npage"}}
]).pretty()

6) derive the most referred book categories

db.books.aggregate([
{ $unwind : "$categories"},
{ $group: { _id:"$categories", t: {$sum: 1}}},
{ $sort: {t:-1}},
{ $limit: 1 },
{ $project: {_id:0, Category:"$_id", numberBook:"$t"}}
])