//TODO
Also explained well in the Guava guides
11 Jan 2015
//TODO
Also explained well in the Guava guides
//TODO
Also explained well in the Guava guides
Because I do not find Java as repulsive as a lot of people who write blogs. I have been working with Java for a while. If one intends to be an expert in software/web programming then he should spend more time honing complex skills such as availability, data design, cross platform communication, logging, user interaction, continuous deployment, and analytics (see below for details). Unfortunately, you have to relearn few of these from scratch if you decide to use a new programming language. I have attempted to use Rails in a limited manner previously and was extremely happy with it. But I ended up taking a lot of shortcuts by trying to reuse too much code via RubyGems instead of taking the time to learn Ruby well. Ruby is an awesome language and using it in a half-ass manner is doing yourself more harm than good. So, Java.
In order to run a website for your 3 infrequent users or 1 million daily users, it has to be available for each and every minute of the day. If you have a server at home that runs all the time and a few other admin skills and resources, then a home server might be fine. Beginners can always use something like ngrok. It is meant for a different purpose, but can be used on a tiny scale for hosting a website. For everyone else, just host it on a cloud provider. Most of them will start with the 3 basic services: virtual machines (around 0.5-0.6 GB RAM, single core CPU), file storage (5GB), hosted database (RDBMS and key value store). These are the 3 things you will have to interact with the most. Then there are a lot of other services such as load balancing, queues, email service, etc. For my current project, I’ve decided to use Heroku. It provides a free tier with a restrictive virtualized hardware config. But it should be enough to get started. If I find it too restrictive for my use, I can change over to something else such as AWS with Elasticbeanstalk without much effort.
This note will mainly focus on database system rather than data modeling. I’m planning to create a website (later, maybe a native app) to allow people to store events. Irrespective of the domain, most applications will require some users to interact, mostly after signing up. It is crucial to ensure that this data is secured and backed up. I have used RDMS and a few NoSQL databases with varying levels of involvement. I have had a difficult time to get the Spring Security working fluently before and have decided not to waste any time with it this time. But, I don’t wish to spend much time designing user management from scratch either. So this time I have decided to give Parse.com a try. Parse core is nothing but a fancier JSON store. So, if I decide later that their system does not provide X in the future, I can swap it with another database system such as MongoDB. I understand that this might require significant effort. But I’m willing to take a chance this time. The main reason being the simplification of the data as well as UI level user management. More on the UI piece later. By choosing Parse, I’m leaving my data availability worries on the able shoulders of Facebook/Parse engineers.
This section deals with communication of a client and your server using various browsers via HTML, native apps, or RESTful API via JSON. I have done the HTML and JSON part before using Spring MVC. It is simple and works reliably. Because most of the scenarios in a native app would require communicating using a RESTful backend, the Spring MVC backend should work out fine. There could be some scenarios where I might want to communicate with the DB layer directly. Parse gives that option with ease.
The logging library question has been resolved in Java for good. Just use slf4j. Its API is rich and accepted everywhere. Maintain system logs is a different issue though. There are various services available for storing and processing your logs. While selecting look for the primary features such as: 1. retention days (a couple days might be okay), 2. retention size (a few hundred MBs should be enough) 3. passive/active log scanning 4. and alerting (MOST IMPORTANT in my opinion).
I have used the free edition of Splunk and New Relic before. Splunk watches logs passively, whereas New Relic has to be deployed as part of your website’s WAR file. Both free editions are restrictive but New Relic provides alerting and 1 day log retention. I haven’t tried other open source log management tools yet. I will update this section if I do. For now, I’m planning to use PaperTrail. The free edition provides a mere 100MB log each month. It does have alerting support, unlike the free Splunk.
Do not ignore CD (or CI). Even though you might be a lone developer, having an efficient workflow will save much time in the long run. To keep things simple, you can download Jenkins on your machine and call it a day. To make things fancier and cloudy (err.. maybe cloud-sy), look for services such as Codeship. If you are building an OSS then Travis CI or Atlassian products could provide deep discounts too. AWS has (or is about to release) similar products too. I’m planning to use Codeship.io because I prefer to keep my work machine as light as possible. So if I have to work on a different machine for a few days, I can only worry about installing the bare minimum software.
update
Some background first. The $pull
operator in MongoDB is used in conjunction with the update
command to remove (or pull out) elements from an array.
The syntax for an update
command is
db.collection.update(
{ /* find query */ },
{ /* new value */ }
);
Copied from the official documentation :
{ flags: ['vme', 'de', 'pse', 'tsc', 'msr', 'pae', 'mce' ] }
db.cpuinfo.update( { flags: 'msr' }, { $pull: { flags: 'msr' } } )
Personally, I had a hard time understanding the necessity of the first part of the update
command in this case. If the values equallying ‘msr’ are going to be pulled for the key equallying ‘flags’, then why repeat the same in the query part?
Although the documentation is not incorrect, the oversimplified example makes it deceptive. The $pull
operator is does not come into play, till the query part returns any documents. In other words, keep in mind that this is just an extension to update
. So, don’t think about the pull
till the query part is satisfied by at least one document in the collection.
For e.g.,
db.students.insert({
name: 'Bob',
grades: ['low', 'high']
});
db.students.insert({
name: 'Mom',
grades: ['low', 'average']
});
Now, although the $pull
part in the following query would seem to satisfy both the documents, the grade ‘low’ will be removed only from ‘Mom’.
db.students.update(
{ name: 'Mom' },
{ $pull: {grades: 'low'} }
);
If you are using svn, the changelist feature can provide some utility, but it is nothing compared to the git stash
. If you are using IntelliJ and svn, but want the stash
like feature, you are in luck. IntelliJ has a feature under it’s VCS Menu item, named ‘Shelve changes’. Here is the link to the details: http://www.jetbrains.com/idea/webhelp/shelving-and-unshelving-changes.html
Have you ever had a situation where you wanted to join a line of code with the line above. For e.g.,
if (!Strings.isNullOrEmpty(reference.getUserName())
&& reference.getUserName().equalsIgnoreCase(userName)) {
Press Ctrl Shift J
while your cursor is on the line where the merge will result.
Fact 1. : There is way too much criticism of MongoDB in several blog posts. Fact 2. : Most of these criticisms have rebuttal posts.
I have used MongoDB in some small scale attempts and have found it satisfactory. And that is better than most other databases because I don’t have to interact a lot with it. No schema changes. Not worrying much about adding some boilerplate code in the ORM (debatable). It always seems like the invisible force working behind the scenes. Isn’t that the sole purpose of databases?
Here is some documentation of good practices that could be easily followed to avoid the pitfalls of MongoDB:
TLDR: Set write concern to 1 or SAFE
, i.e., receive ack on failures.
Problem: The default is set to 0, i.e., not to send ACK if writes fail. This makes the writes super fast, but in most applications, it will be unacceptable if the write failed.
Detailed solution: All language drivers for MongoDB support this write concern setting. E.g., in Java here is the class WriteConcern. In Spring Data, this can be done while initializing the MongoTemplate
.
TLDR: db.runCommand ( { repairDatabase: 1 } ) Problem: MongoDB does not release the disk storage to the OS, it used for storing a document, even after the document has been deleted. This is one of the reason for the overarching issue of MongoDB consuming more space than actually required for the data it stores. Detailed solution: MongoDB Docs
$ git clone http:/path-to-the-dot-git-file
$ git checkout -b feature_name
// Do work
$ git add -u
// If new files need to be added, then git add <fileName>
$ git commit -m "Commit message"
// Here we are committing to our local repo.. not on a server
$ git checkout master
$ git merge feature_name
// If you have not pulled changes on to your local master branch, this merge should be done without conflicts
$ git pull --rebase
// Pulling from remote server and rebasing your local changes on top of the changes made by others..
// Possibility of conflicts here..
$ git rebase -i HEAD~10
// If you want to squish your multiple commits into one, the replace all the necessary words, "pick", with the letter s.
$ git push
$ git stash
: Stashes your local uncommitted changes, so that you switch from your dirty branch to another clean one.$ git log --graph --abbrev-commit --decorate --format=format:'%C(bold blue)%h%C(reset) - %C(bold cyan)%aD%C(reset) %C(bold green)(%ar)%C(reset)%C(bold yellow)%d%C(reset)%n'' %C(white)%s%C(reset) %C(dim white)- %an%C(reset)' --all
: Should alias it to a shortcut. Copied from..$ git config --global alias.glog log --graph --abbrev-commit --decorate --format=format:'%C(bold blue)%h%C(reset) - %C(bold cyan)%aD%C(reset) %C(bold green)(%ar)%C(reset)%C(bold yellow)%d%C(reset)%n'' %C(white)%s%C(reset) %C(dim white)- %an%C(reset)' --all
: Adding alias for the above command.. This will add an entry in the alias section of .gitconfig
file.The Table interface introduced in Guava is helpful in implementing Tabular data, such as data to be written to a CSV. Think about it as a spreadsheet. All the data in a spreadsheet can be represented by 3 parameters: the row number, column number, and the actual value stored in the cell. Hence the Table<R, C, V>
interface has 3 generic parameters too.
/*
| Name | GPA
0 | Bob | 2.3
1 | Jim | 3.4
2 | Tim | 2.8
*/
Table<Integer, String, Object> studentData = TreeBasedTable.create();
studentData.put(0, "Name", "Bob");
studentData.put(1, "Name", "Jim");
studentData.put(2, "Name", "Tim");
studentData.put(0, "GPA", "2.3");
studentData.put(1, "GPA", "3.4");
studentData.put(2, "GPA", "2.8");
Map<R, V> column(C columnKey)
: Returns a map of Row→Value for the given column. For e.g., studentData.column("Name")
in the above case would return a Map
that looks like: { 0: "Bob", 1: "Jim", 2: "Time"}
.Map<C, V> row(R rowKey)
: Returns a map of Column→Value for the given row. For e.g., studentData.row(2)
would return a Map
that looks like: { "Name" : "Tim", "GPA" : 2.8 }
All the implementations of Table
can be used by the static Table<R,C,V> create()
method, except for ImmutableTable
. As the name suggests, this implementation builds an immutable object. Hence we need to build
it using the provided Builder
, i.e., ImmutableTable.Builder<R,C,V>
. Calling the build()
instance method of the Builder
will return an immutable Table
.
Lunr.js is Javascript library that indexes content provided in JSON format. The index can be used to perform a full text search. How is that different from a simple ‘grep’? It uses modern search techniques such as tokenization, stemming, omitting stop words, etc. Although the default algorithms for each of these techniques are provided by lunr.js, the user can override them to fit their specific needs. And of course, the name lunr, is just a play on solr, which is a full text search engine, but made for heavy duty tasks.
First, because there is no database while using Jekyll. Hence no queries. So searching is not straight forward. Second, because I have started to Jekyll, and I think a blog without search is weird.
You could just View Source of this file and find all the code I’m using for this site in search.js
. Here is the rundown:
Provide the fields of the data to be indexed
var index;
createIndex();
function createIndex() {
index = lunr(function () {
this.field('title', {boost: 10})
this.field('content')
this.field('date')
this.ref('url')
});
}
Being a Jekyll blog, there is no JSON data to represent the blog posts. So you have to store all your blog posts into the HTML on load. I know this is weird, but for blog with a few hundred pages of plain text should not slow down your load time much. Also, if you do not display all the blog posts at a time, it would be better to hide the loaded data using CSS. In the example below, I’m loading all blog data into doc_*
elements, out of which the .doc_content
tag is hidden by default.
loadData();
function loadData() {
$('.doc').each(function(doc_index) {
var doc = {};
doc.date = $(this).find('.doc_date').text();
doc.content = $(this).find('.doc_content').text();
doc.title = $(this).find('.doc_title').text();
doc.url = $(this).find('.doc_title').attr('href');
index.add(doc);
posts.push(doc);
});
}
Although searching the index is as easy as calling index.search(query)
, the return object is not an Array
of loaded documents. Instead it returns the ref
, i.e., reference number of the indexed document along with the confidence level of a match. So we have to find the corresponding document from the list of loaded documents.
function getResults(query) {
var docs = [];
var results = index.search(query);
_.each(results, function(result) {
console.log('Result ref: ' + result.ref);
var doc = _.find(posts, function(post) {
return post.url === result.ref;
});
if (doc) docs.push(doc);
});
return docs;
}
Used to double or halve an integer or double.
short num = 0b0000_0100 << 1; // left operand is 4
// 0b00_01000, i.e., 8
Order of precedence is &(AND) , ^(XOR ..determines if operand bits or booleans are different. Returns 0 for match, and 1 for mismatch), |(OR)
Geocode g1 = new Geocode(51, 110); // g1 refers to memory allocated to geocode object, say 5123-5153
g1 = new Geocode(50, 109); //Block 5123-5153 is not referred any more and ready for GC
int a = 21;
int b = a; // Now JVM has 2 blocks in memory that contain the integer value 21
Geocode g1 = new Geocode(51, 110);
Geocode g2 = g1; // Now JVM has 1 block in memory that contain the object new Geocode(51,110)
If you want to Garbage Collect memory assigned to an object, then assign that object to null.
private int max(int first, int... rest);
is same as
private int max(int first, int[] rest);
can be used to wrap a block of code or in the signature of a class/instance method. Doing this makes the stuff inside thread-safe. E.g., all methods within HashTable
are synchronized.
Note: Use ConcurrentHashMap
instead of HashTable
if you need Thread-safety. HashTable
are slower because it locks the entire table of data for any read/write operation. Whereas, ConcurrentHashMap
has 32 locks, each managing some of the Hash buckets for the table.
Generic methods allow type parameters to be used to express dependencies among the types of one or more arguments to a method and/or its return type. e.g., the error in the following method can be prevented by parameterizing it, i.e., making it generic.
static void fromArrayToCollection(Object[] a, Collection< ? > c) {
for (Object o : a) {
c.add(o); // compile-time error
}
}
static < T > void fromArrayToCollection(T[] a, Collection< T > c) {
for (T o : a) {
c.add(o); // Correct
}
}
If there is no dependency between the return type and/or the arguments of a method, then you are better off, using wildcards instead of generic method. Excellent example
ArrayList
can hold a list of Objects, not primitives, whereas Array
can hold either.
Size of an array cannot grow dynamically. ArrayList’s size can.
Any object can override Object’s finalize() method for cleaning up any resources. This method is only triggered by the GC whenever it deems the object ready to be GCed.
Both are similar for enum, unlike for String
. So it is better to use ==
to avoid NullPointerException
HttpSession
generates a cookie named jsessionid
on the client’s browser. You can store the identifier of the user’s session in this cookie by httpSession.setAttribute("userName", "Bob")
. The server maintains this session in-memory (or on disk, as per your server’s policy) for it’s life. The duration can be set by httpSession.setMaxInactiveInterval(n)
. If the n <= 0
, then the session is maintained for ever by the server. The important thing to understand is that this persistence is on the server, not the client. The jsessionid
cookie is killed as soon as the user closes the browser. The practice of storing the session for ever on the server sounds bad, but in fact is even worse than bad. It’s horrible. The jsessionid
itself has some risks (attacker can steal the cookie), and remembering and honoring it’s value for ever is dangerous.
So how to let the user inside the secure area of your website, without having him to log in each time he closes the browser? Here is a very nice article from 2006 that explains best practices:
Aniket Dahotre
Software Developer
github.com/dahotre
twitter.com/dahotre