My Philips HUE bridge is queried by NiFi every 5 seconds and the data is stored in mongodb. The data then is used in R to create analytics.

Query HUE collection for temperature and motion sensor data:


In one of my latest projects I have created animation of flights around Amsterdam Schiphol airport. First I needed to generate plots in R then save those as a JPG and then stich the JPG files in to a video file. The result can be found here

The R script to get data from the MySQL database and generate JPG images of plots:

A lot of famous websites are allowing you to develop custom applications to interact with their API. In a previous example, we saw how to use NiFi to perform OAuth 1.0A authentication against Flickr API. However a lot of websites are using OAuth 2.0 mechanism to authenticate your applications. You can find more details here, […]

via NiFi and OAuth 2.0 to request WordPress API — Pierre Villard

Store Locator

Find query example where all documents with element named topic are exactly "rrd/srt" are send to collection subset

Aggregated query grouping all tweets by $user.screen_name with count and sorted descending are exported to collection twitterars



Note the clause {allowDiskUse: true} indicating that if needed query is allowed to use disk.

Following query will count occurance of users:

{ $group: {
_id: ‘$user.screen_name’,
count: {$sum: 1}

{$sort: {
count: -1
], {allowDiskUse: true});

Note the option allowDiskUse is needed if error:

“code” : 16819,
“errmsg” : “Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.”,
“message” : “Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.”,
“name” : “MongoError”,
“ok” : 0

Optional output result to collection:

db.tweets.aggregate([{ $group: {_id: ‘$user.text’, count: {$sum: 1}}},{$sort: {count: -1},{$out: “texts”}], {allowDiskUse: true});


Copying selection where field date == 20120105 to new collection subset. If subset not exists then it will be created.

When documents are copied to another collection remove them from source with:


will return timestamp of most recent record in following format:

Index Momngodb fields to accelerate search:

db.tweets.createIndex( { text: “text” } )
db.tweets.find({$text: { $search: “gouda”},{_id:1, text:1}})

Note that search key word is case insensitive.

Presume the tweets are in database twitter and collection tweets

var map = function() {
var text = this.text;
if (text) {
// quick lowercase to normalize per your requirements
text = text.toLowerCase().split(” “);
for (var i = text.length – 1; i >= 0; i–) {
// might want to remove punctuation, etc. here
if (text[i]) { // make sure there’s something
emit(text[i], 1); // store a 1 for each word
var reduce = function( key, values ) {
var count = 0;
values.forEach(function(v) {
count +=v;
return count;
db.tweets.mapReduce(map, reduce, {out: “word_count”})

The query above will produce new collection word_count with following contents:

“_id” : “rt”,
“value” : 43335

/* 2 */
“_id” : “in”,
“value” : 36810

/* 3 */
“_id” : “the”,
“value” : 27765

/* 4 */
“_id” : “to”,
“value” : 20724


Other useful queries

db.tweets.aggregate([{ $group: {_id: ‘$place’, count: {$sum: 1}}},]);
db.tweets.aggregate([{ $group: {_id: ‘$place’, count: {$sum: 1}}},{$sort: {count: -1}}]);
db.tweets.aggregate([{$unwind: ‘$entities.hashtags’}, { $group: {_id: ‘$entities.hashtags.text’, tagCount: {$sum: 1} }}, { $sort: { tagCount: -1 }}
db.tweets.aggregate({$unwind: ‘$entities.hashtags’}, { $group: {_id: ‘$entities.hashtags.text’, tagCount: {$sum: 1} }}, { $sort: { tagCount: -1 }},
db.tweets.aggregate({ $group: {_id: ‘$text’, count: {$sum: 1}},{$sort:{count:-1}}})