Quantcast
Channel: Baeldung
Viewing all 4729 articles
Browse latest View live

Silencing the Output of a Bash Command

$
0
0

1. Overview

In this quick tutorial, we’ll focus on how to silence the output of a Bash command.

2. Prerequisites

Before silencing the output, we first need to understand how Bash handles output when executing a command.

2.1. Standard Output & Error

When executing a Bash command, a new process is created. Any errors from this process are written to the error stream, and any other output is written to the output stream.

Bash also automatically opens multiple files for each process, denoted by a numeric File Descriptor (FD).

Two of these FDs are Standard output (stdout) and standard error (stderr). By default, Bash directs the error stream to stderr and the output stream to stdout. For both stdout and stderr, any characters written to these FDs are displayed in the console where the command was executed.

Additionally, Bash assigns the identifier 1 for the stdout FD and 2 for the stderr FD.

2.2. Redirecting Output

We can change the default destination of error and output streams by redirecting the output of our command to another FD. We redirect output using the redirection operator: >.

For example, we can write Hello world to the file foo.txt using the echo command:

echo "Hello world" > foo.txt

2.3. Null Device

Additionally, Linux systems have a specific device, /dev/null, that does nothing when written to. Therefore, output redirected to /dev/null will not be written anywhere.

3. Silencing Output

To silence the output of a command, we redirect either stdout or stderr — or both — to /dev/null. To select which stream to redirect, we need to provide the FD number to the redirection operator.

3.1. Standard Output

To silence non-error output, we redirect stdout to /dev/null:

command 1> /dev/null

By default, the redirection operator redirects stdout so we can omit the 1:

command > /dev/null

3.2. Standard Error

To silence error output, we redirect stderr to /dev/null:

command 2> /dev/null

3.3. All Output

To redirect both stdout and stderr, we must redirect stderr to stdout and then redirect stdout to /dev/null. To redirect stderr to stdout, we use the following notation:

2>&1

We combine this with redirecting stdout to /dev/null to silence all output:

command > /dev/null 2>&1

Thus, stdout is redirected to /dev/null and stderr is redirected to stdout, causing both streams to be written to /dev/null and silencing all output from our command.

We can shorten this to the following Bash notation:

command &> /dev/null

Note that this shorthand is not portable and is only supported by Bash 4 or higher.

Although not common, we can also separately redirect stdout and stderr to /dev/null, but we do not suggest this approach unless we are auto-generating a Bash script or the previous approaches cannot be used:

command > /dev/null 2> /dev/null

4. Conclusion

In this tutorial, we learned about how a Bash process streams its error and non-error output to FDs and how these streams can be redirected to another FD using the redirection operator. By combining redirection with the /dev/null device, we can silence error output, normal output, or both.


Guide to JUnit 4 Rules

$
0
0

1. Overview

In this tutorial, we’re going to take a look at the Rules feature provided by the JUnit 4 library.

We’ll begin by introducing the JUnit Rules Model before walking through the most important base rules provided by the distribution. Additionally, we’ll also see how to write and use our own custom JUnit Rule.

To learn more about testing with JUnit, check out our comprehensive JUnit series.

Note that if you’re using JUnit 5, rules have been replaced by the Extension model.

2. Introduction to JUnit 4 Rules

JUnit 4 rules provide a flexible mechanism to enhance tests by running some code around a test case execution. In some sense, it’s similar to having @Before and @After annotations in our test class.

Let’s imagine we wanted to connect to an external resource such as a database during test setup and then close the connection after our test finishes. If we want to use that database in multiple tests, we’d end up duplicating that code in every test.

By using a rule, we can have everything isolated in one place and reuse the code easily from multiple test classes.

3. Using JUnit 4 Rules

So how can we use rules? We can use JUnit 4 rules by following these simple steps:

  • Add a public field to our test class and ensure that the type of this field is a subtype of the org.junit.rules.TestRule interface
  • Annotate the field with the @Rule annotation

In the next section, we’ll see what project dependencies we need to get started.

4. Maven Dependencies

First, let’s add the project dependencies we’ll need for our examples. We’ll only need the main JUnit 4 library:

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.12</version>
</dependency>

As always, we can get the latest version from Maven Central.

5. Rules Provided in The Distribution

Of course, JUnit provides a number of useful, predefined rules as part of the library. We can find all these rules in the org.junit.rules package.

In this section, we’ll see some examples of how to use them.

5.1. The TemporaryFolder Rule

When testing, we often need access to a temporary file or folder. However, managing the creation and deletion of these files can be cumbersome. Using the TemporaryFolder rule, we can manage the creation of files and folders that should be deleted when the test method terminates:

@Rule
public TemporaryFolder tmpFolder = new TemporaryFolder();

@Test
public void givenTempFolderRule_whenNewFile_thenFileIsCreated() throws IOException {
    File testFile = tmpFolder.newFile("test-file.txt");

    assertTrue("The file should have been created: ", testFile.isFile());
    assertEquals("Temp folder and test file should match: ", 
      tmpFolder.getRoot(), testFile.getParentFile());
}

As we can see, we first define the TemporaryFolder rule tmpFolder. Next, our test method creates a file called test-file.txt in the temporary folder. We then check that the file has been created and exists where it should. Really nice and simple!

When the test finishes, the temporary folder and file should be deleted. However, this rule doesn’t check whether or not the deletion is successful.

There are also a few other interesting methods worth mentioning in this class:

  • newFile()

    If we don’t provide any file name, then this method creates a randomly named new file.

  • newFolder(String... folderNames)

    To create recursively deep temporary folders, we can use this method.

  • newFolder()

    Likewise, the newFolder() method creates a randomly named new folder.

A nice addition worth mentioning is that starting with version 4.13, the TemporaryFolder rule allows verification of deleted resources:

@Rule 
public TemporaryFolder folder = TemporaryFolder.builder().assureDeletion().build();

If a resource cannot be deleted, the test with fail with an AssertionError.

Finally, in JUnit 5, we can achieve the same functionality using the Temporary Directory extension.

5.2. The ExpectedException Rule

As the name suggests, we can use the ExpectedException rule to verify that some code throws an expected exception:

@Rule
public final ExpectedException thrown = ExpectedException.none();

@Test
public void givenIllegalArgument_whenExceptionThrown_MessageAndCauseMatches() {
    thrown.expect(IllegalArgumentException.class);
    thrown.expectCause(isA(NullPointerException.class));
    thrown.expectMessage("This is illegal");

    throw new IllegalArgumentException("This is illegal", new NullPointerException());
}

As we can see in the example above, we’re first declaring the ExpectedException rule. Then, in our test, we’re asserting that an IllegalArgumentException is thrown.

Using this rule, we can also verify some other properties of the exception, such as the message and cause.

For an in-depth guide to testing exceptions with JUnit, check out our excellent guide on how to Assert an Exception.

5.3. The TestName Rule

Put simply, the TestName rule provides the current test name inside a given test method:

@Rule public TestName name = new TestName();

@Test
public void givenAddition_whenPrintingTestName_thenTestNameIsDisplayed() {
    LOG.info("Executing: {}", name.getMethodName());
    assertEquals("givenAddition_whenPrintingTestName_thenTestNameIsDisplayed", name.getMethodName());
}

In this trivial example, when we run the unit test, we should see the test name in the output:

INFO  c.baeldung.rules.JUnitRulesUnitTest - 
    Executing: givenAddition_whenPrintingTestName_thenTestNameIsDisplayed

5.4. The Timeout Rule

In this next example, we’ll take a look at the Timeout rule. This rule offers a useful alternative to using the timeout parameter on an individual Test annotation.

Now, let’s see how to use this rule to set a global timeout on all the test methods in our test class:

@Rule
public Timeout globalTimeout = Timeout.seconds(10);

@Test
public void givenLongRunningTest_whenTimout_thenTestFails() throws InterruptedException {
    TimeUnit.SECONDS.sleep(20);
}

In the above trivial example, we first define a global timeout for all test methods of 10 seconds. Then we deliberately define a test which will take longer than 10 seconds.

When we run this test, we should see a test failure:

org.junit.runners.model.TestTimedOutException: test timed out after 10 seconds
...

5.5. The ErrorCollector Rule

Next up we’re going to take a look at the ErrorCollector rule. This rule allows the execution of a test to continue after the first problem is found.

Let’s see how we can use this rule to collect all the errors and report them all at once when the test terminates:

@Rule 
public final ErrorCollector errorCollector = new ErrorCollector();

@Test
public void givenMultipleErrors_whenTestRuns_thenCollectorReportsErrors() {
    errorCollector.addError(new Throwable("First thing went wrong!"));
    errorCollector.addError(new Throwable("Another thing went wrong!"));
        
    errorCollector.checkThat("Hello World", not(containsString("ERROR!")));
}

In the above example, we add two errors to the collector. When we run the test, the execution continues, but the test will fail at the end.

In the output, we will see both errors reported:

java.lang.Throwable: First thing went wrong!
...
java.lang.Throwable: Another thing went wrong!

5.6. The Verifier Rule

The Verifier rule is an abstract base class that we can use when we wish to verify some additional behavior from our tests. In fact, the ErrorCollector rule we saw in the last section extends this class.

Let’s now take a look at a trivial example of defining our own verifier:

private List messageLog = new ArrayList();

@Rule
public Verifier verifier = new Verifier() {
    @Override
    public void verify() {
        assertFalse("Message Log is not Empty!", messageLog.isEmpty());
    }
};

Here, we define a new Verifier and override the verify() method to add some extra verification logic. In this straightforward example, we simply check to see that the message log in our example isn’t empty.

Now, when we run the unit test and add a message, we should see that our verifier has been applied:

@Test
public void givenNewMessage_whenVerified_thenMessageLogNotEmpty() {
    // ...
    messageLog.add("There is a new message!");
}

5.7. The DisableOnDebug Rule

Sometimes we may want to disable a rule when we’re debugging. For example, it’s often desirable to disable a Timeout rule when debugging to avoid our test timing out and failing before we’ve had time to debug it properly.

The DisableOnDebug Rule does precisely this and allows us to label certain rules to be disabled when debugging:

@Rule
public DisableOnDebug disableTimeout = new DisableOnDebug(Timeout.seconds(30));

In the example above we can see that in order to use this rule, we simply pass the rule we want to disable to the constructor.

The main benefit of this rule is that we can disable rules without making any modifications to our test classes during debugging.

5.8. The ExternalResource Rule

Typically, when writing integration tests, we may wish to set up an external resource before a test and tear it down afterward. Thankfully, JUnit provides another handy base class for this.

We can extend the abstract class ExternalResource to set up an external resource before a test, such as a file or a database connection. In fact, the TemporaryFolder rule we saw earlier extends ExternalResource.

Let’s take a quick look at how we could extend this class:

@Rule
public final ExternalResource externalResource = new ExternalResource() {
    @Override
    protected void before() throws Throwable {
        // code to set up a specific external resource.
    };
    
    @Override
    protected void after() {
        // code to tear down the external resource
    };
};

In this example, when we define an external resource we simply need to override the before() method and after() method in order to set up and tear down our external resource.

6. Applying Class Rules

Up until now, all the examples we’ve looked at have applied to single test case methods. However, sometimes we might want to apply a rule at the test class level. We can accomplish this by using the @ClassRule annotation.

This annotation works very similarly to @Rule but wraps a rule around a whole test — the main difference being that the field we use for our class rule must be static:

@ClassRule
public static TemporaryFolder globalFolder = new TemporaryFolder();

7. Defining a Custom JUnit Rule

As we’ve seen, JUnit 4 provides a number of useful rules out of the box. Of course, we can define our own custom rules. To write a custom rule, we need to implement the TestRule interface.

Let’s take a look at an example of defining a custom test method name logger rule:

public class TestMethodNameLogger implements TestRule {

    private static final Logger LOG = LoggerFactory.getLogger(TestMethodNameLogger.class);

    @Override
    public Statement apply(Statement base, Description description) {
        logInfo("Before test", description);
        try {
            return new Statement() {
                @Override
                public void evaluate() throws Throwable {
                    base.evaluate();
                }
            };
        } finally {
            logInfo("After test", description);
        }
    }

    private void logInfo(String msg, Description description) {
        LOG.info(msg + description.getMethodName());
    }
}

As we can see, the TestRule interface contains one method called apply(Statement, Description) that we must override to return an instance of Statement. The statement represents our tests within the JUnit runtime. When we call the evaluate() method, this executes our test.

In this example, we log a before and after message and include from the Description object the method name of the individual test.

8. Using Rule Chains

In this final section, we’ll take a look at how we can order several test rules using the RuleChain rule:

@Rule
public RuleChain chain = RuleChain.outerRule(new MessageLogger("First rule"))
    .around(new MessageLogger("Second rule"))
    .around(new MessageLogger("Third rule"));

In the above example, we create a chain of three rules that simply print out the message passed to each MessageLogger constructor.

When we run our test, we’ll see how the chain is applied in order:

Starting: First rule
Starting: Second rule
Starting: Third rule
Finished: Third rule
Finished: Second rule
Finished: First rule

9. Conclusion

To summarize, in this tutorial, we’ve explored JUnit 4 rules in detail.

First, we started by explaining what rules are and how we can use them. Next, we took an in-depth look at the rules that come as part of the JUnit distribution.

Finally, we looked at how we can define our own custom rule and how to chain rules together.

As always, the full source code of the article is available over on GitHub.

Java Weekly, Issue 295

$
0
0

Here we go…

1. Spring and Java

>> Securing Services with Spring Cloud Gateway [spring.io]

As the series continues, we see how to secure services using the Token Relay pattern with OAuth2.

>> Spring Boot on Heroku with Docker, JDK 11 & Maven 3.5.x [blog.codecentric.de]

It’s Docker to the rescue for cases where you can’t build your app using predefined Heroku buildpacks. Very cool!

>> Exercises in Programming Style: spreadsheets [blog.frankel.ch]

And a neat exercise using Kotlin to model the familiar word frequencies problem as a spreadsheet, complete with immutability and a dash of tail-recursion to boost performance.

Also worth reading:

Webinars and presentations:

Time to upgrade:

2. Technical and Musing

>> Y U NO TDD [blog.code-cop.org]

An interesting collection of quotes from developers as to why they aren’t doing Test Driven Development.

>> Serverless on GCP [bravenewgeek.com]

And a good write-up on the benefits of going serverless, as well as what kinds of applications are most (and least) suited for it.

Also worth reading:

3. Comics

>> Boss Wants to Emulate Steve Jobs [dilbert.com]

>> Tina Likes to Hum [dilbert.com]

>> Wally Is New Pet Employee [dilbert.com]

4. Pick of the Week

A few months ago, I discovered Codota – a coding assistant I’ve been using ever since.

I recorded a quick video focused on how to use it as you’re coding, and the response was quite positive – which is always encouraging to see.

The simplest way to get started is just to install it and have it running in the background, in your IDE – as you’re coding normally.

JHipster Authentication with an External Service

$
0
0

1. Introduction

By default, JHipster applications use a local data store to hold usernames and passwords. In many real-world scenarios, however, it might be desirable to use an existing external service for authentication.

In this tutorial, we’ll look at how to use an external service for authentication in JHipster. This could be any well-known service such as LDAP, social login, or any arbitrary service that accepts a username and password.

2. Authentication in JHipster

JHipster uses Spring Security for authentication. The AuthenticationManager class is responsible for validating username and passwords.

The default AuthenticationManager in JHipster simply checks the username and password against a local data store. This could be MySQL, PostgreSQL, MongoDB, or any of the alternatives that JHipster supports.

It’s important to note that the AuthenticationManager is only used for initial login. Once a user has authenticated, they receive a JSON Web Token (JWT) that is used for subsequent API calls.

2.1. Changing Authentication in JHipster

But what if we already have a data store that contains usernames and passwords, or a service that performs authentication for us?

To provide a custom authentication scheme, we simply create a new bean of type AuthenticationManager. This will take precedence over the default implementation.

Below is an example that shows how to create a custom AuthenticationManager. It only has one method to implement:

public class CustomAuthenticationManager implements AuthenticationManager {
    @Override
    public Authentication authenticate(Authentication authentication) throws AuthenticationException {
        try {
            ResponseEntity<LoginResponse> response =
                restTemplate.postForEntity(REMOTE_LOGIN_URL, loginRequest, LoginResponse.class);
            
            if(response.getStatusCode().is2xxSuccessful()) {
                String login = authentication.getPrincipal().toString();
                User user = userService.getUserWithAuthoritiesByLogin(login)
                  .orElseGet(() -> userService.createUser(
                    createUserDTO(response.getBody(), authentication)));
                return createAuthentication(authentication, user);
            }
            else {
                throw new BadCredentialsException("Invalid username or password");
            }
        }
        catch (Exception e) {
            throw new AuthenticationServiceException("Failed to login", e);
        }
    }
}

In this example, we pass the username and credentials from the Authentication object to an external API.

If the call succeeds, we return a new UsernamePasswordAuthenticationToken to indicate success. Note that we also create a local user entry, which we’ll discuss later on.

If the call fails, we throw some variant of AuthenticationException so that Spring Security will gracefully fallback for us.

This example is intentionally simple to show the basics of custom authentication. However, it could perform more complex operations such as LDAP binding and authentication or use OAuth.

3. Other Considerations

Up until now, we’ve focused on the authentication flow in JHipster. But there are several other areas of our JHipster application we have to modify.

3.1. Front-End Code

The default JHipster code implements the following user registration and activation process:

  • A user signs up for an account using their email and other required details
  • JHipster creates an account and sets it as inactive and then sends an email to the new user with an activation link
  • Upon clicking the link, the user’s account is marked as active

There is a similar flow for password reset as well.

These all make sense when JHipster is managing user accounts. But they are not required when we’re relying on an external service for authentication.

Therefore, we need to take steps to ensure these account management features are not accessible to the user.

This means removing them from the Angular or React code, depending on which framework is being used in the JHipster application.

Using Angular as an example, the default login prompt includes links to password reset and registration. We should remove them from app/shared/login/login.component.html:

<div class="alert alert-warning">
  <a class="alert-link" (click)="requestResetPassword()">Did you forget your password?</a>
</div>
<div class="alert alert-warning">
  <span>You don't have an account yet?</span>
   <a class="alert-link" (click)="register()">Register a new account</a>
</div>

We must also remove the unneeded navigation menu items from app/layouts/navbar/navbar.component.html:

<li *ngSwitchCase="true">
  <a class="dropdown-item" routerLink="password" routerLinkActive="active" (click)="collapseNavbar()">
    <fa-icon icon="clock" fixedWidth="true"></fa-icon>
    <span>Password</span>
  </a>
</li>

and

<li *ngSwitchCase="false">
  <a class="dropdown-item" routerLink="register" routerLinkActive="active" (click)="collapseNavbar()">
    <fa-icon icon="user-plus" fixedWidth="true"></fa-icon>
    <span>Register</span>
  </a>
</li>

Even though we removed all the links, a user could still manually navigate to these pages. The final step is to remove the unused Angular routes from app/account/account.route.ts.

After doing this, only the settings route should remain:

import { settingsRoute } from './';
const ACCOUNT_ROUTES = [settingsRoute];

3.2. Java APIs

In most cases, simply removing the front-end account management code should be sufficient. However, to be absolutely sure the account management code is not invoked, we can also lock down the associated Java APIs.

The quickest way to do this is to update the SecurityConfiguration class to deny all requests to the associated URLs:

.antMatchers("/api/register").denyAll()
.antMatchers("/api/activate").denyAll()
.antMatchers("/api/account/reset-password/init").denyAll()
.antMatchers("/api/account/reset-password/finish").denyAll()

This will prevent any remote access to the APIs, without having to remove any of the code.

3.3. Email Templates

JHipster applications come with a set of default email templates for account registration, activation, and password resets. The previous steps will effectively prevent the default emails from being sent, but in some cases, we might want to reuse them.

For example, we might want to send a welcome email when a user logs in for the first time. The default template includes steps for account activation, so we have to modify it.

All of the email templates are located in resources/templates/mail. They are HTML files that use Thymeleaf to pass data from Java code into the emails.

All we have to do is to edit the template to include the desired text and layout and then use the MailService to send it.

3.4. Roles

When we create the local JHipster user entry, we also have to take care to ensure it has at least one role. Normally, the default USER role is sufficient for new accounts.

If the external service provides its own role mapping, we have two additional steps:

  1. Ensure any custom roles exist in JHipster
  2. Update our custom AuthenticationManager to set the custom roles when creating new users

JHipster also provides a management interface for adding and removing roles to users.

3.5. Account Removal

It’s worth mentioning that JHipster also provides an account removal management view and API. This view is only available to administrator users.

We could remove and restrict this code as we did for account registration and password reset, but it’s not really necessary. Our custom AuthenticationManager will always create a new account entry when someone logs in, so deleting the account doesn’t actually do much.

4. Conclusion

In this tutorial, we’ve seen how to replace the default JHipster authentication code with our own authentication scheme. This could be LDAP, OIDC, or any other service that accepts a username and password.

We’ve also seen that using an external authentication service also requires some changes to other areas of our JHipster application. This includes front end views, APIs, and more.

As always, the example code from this tutorial is available in our GitHub repository.

JPA Query Parameters Usage

$
0
0

1. Introduction

Building queries using JPA is not difficult; however, we sometimes forget simple things that make a huge difference.

One of these things is JPA query parameters, and this is what we are going to talk about.

2. What Are Query Parameters?

Let’s start by explaining what query parameters are.

Query parameters are a way to build and execute parametrized queries. So, instead of:

SELECT * FROM employees e WHERE e.emp_number = '123';

We’d do:

SELECT * FROM employees e WHERE e.emp_number = ?;

By using a JDBC prepared statement, we need to set the parameter before executing the query:

pStatement.setString(1, 123);

3. Why Should We Use Query Parameters?

Instead of using query parameters we could have chosen to use literals, though, it’s not the recommended way to do it, as we’ll see now.

Let’s rewrite the previous query to get employees by emp_number using the JPA API, but instead of using a parameter we’ll use a literal so we can clearly illustrate the situation:

String empNumber = "A123";
TypedQuery<Employee> query = em.createQuery(
  "SELECT e FROM Employee e WHERE e.empNumber = '" + empNumber + "'", Employee.class);
Employee employee = query.getSingleResult();

This approach has some drawbacks:

  • Embedding parameters introduce a security risk making us vulnerable to JPQL injection attacks. Instead of the expected value, an attacker may inject any unexpected and possibly dangerous JPQL expression
  • Depending on the JPA implementation we use and the heuristics of our application, the query cache may get exhausted. A new query may get built, compiled and cached each time we do use it with each new value/parameter. At a minimum, it won’t be efficient and it may also lead to an unexpected OutOfMemoryError

4. JPA Query Parameters

Similar to JDBC prepared statement parameters, JPA specifies two different ways to write parameterized queries by using:

  • Positional parameters
  • Named parameters

We may use either positional or named parameters but we must not mix them within the same query.

4.1 Positional Parameters

Using positional parameters is one way to avoid the aforementioned issues listed earlier.

Let’s see how we would write such a query with the help of positional parameters:

TypedQuery<Employee> query = em.createQuery(
  "SELECT e FROM Employee e WHERE e.empNumber = ?1", Employee.class);
String empNumber = "A123";
Employee employee = query.setParameter(1, empNumber).getSingleResult();

As we’ve seen within the previous example, we declare these parameters within the query by typing a question mark followed by a positive integer number. We’ll start with 1 and move forward, incrementing it by one each time.

We may use the same parameter more than once within the same query which makes these parameters more similar to named parameters.

Parameter numbering is a very useful feature since it improves usability, readability, and maintenance.

However, it’s worth mentioning that, as per the JPA specification, we cannot safely use this feature with native queries since the spec does not mandate it.  While some implementations may support it, it may impact the portability of our application.

4.2 Collection-Valued Positional Parameters

As previously stated, we may also use collection-valued parameters:

TypedQuery<Employee> query = entityManager.createQuery(
  "SELECT e FROM Employee e WHERE e.empNumber IN (?1)" , Employee.class);
List<String> empNumbers = Arrays.asList("A123", "A124");
List<Employee> employees = query.setParameter(1, empNumbers).getResultList();

4.3. Named Parameters

Named parameters are quite similar to positional parameters; however, by using them, we make the parameters more explicit and the query becomes more readable:

TypedQuery<Employee> query = em.createQuery(
  "SELECT e FROM Employee e WHERE e.empNumber = :number" , Employee.class);
String empNumber = "A123";
Employee employee = query.setParameter("number", empNumber).getSingleResult();

The previous sample query is the same as the first one but we’ve used :number, a named parameter, instead of ?1.

We can see we declared the parameter with a colon followed by a string identifier (JPQL identifier) which is a placeholder for the actual value that will be set at runtime. Before executing the query, the parameter or parameters have to be set by issuing the setParameter method.

One interesting thing to remark is that the TypedQuery supports method chaining which becomes very useful when multiple parameters have to be set.

Let’s go ahead and create a variation of the previous query using two named parameters to illustrate the method chaining:

TypedQuery<Employee> query = em.createQuery(
  "SELECT e FROM Employee e WHERE e.name = :name AND e.age = :empAge" , Employee.class);
String empName = "John Doe";
int empAge = 55;
List<Employee> employees = query
  .setParameter("name", empName)
  .setParameter("empAge", empAge)
  .getResultList();

Here, we’re retrieving all employees with the given name and age. As we clearly see and one may expect, we can build queries with multiple parameters and as many occurrences of them as required.

If for some reason we do need to use the same parameter many times within the same query, we just need to set it once by issuing the “setParameter” method. At runtime, the specified values will replace each occurrence of the parameter.

Lastly, it’s worth mentioning that the Java Persistence API specification does not mandate named parameters to be supported by native queries. Even when some implementations like Hibernate do support it, we need to take into account that if we do use it, the query will not be as portable.

4.4 Collection-Valued Named Parameters

For clarity, let’s also demonstrate how this works with collection-valued parameters:

TypedQuery<Employee> query = entityManager.createQuery(
		        "SELECT e FROM Employee e WHERE e.empNumber IN (:numbers)" , Employee.class);
List<String> empNumbers = Arrays.asList("A123", "A124");
List<Employee> employees = query.setParameter("numbers", empNumbers).getResultList();

As we can see, it works in a similar way to positional parameters.

5. Criteria Query Parameters

A JPA query may be built by using the JPA Criteria API, which Hibernate’s official documentation explains in great detail.

In this type of query, we represent parameters by using objects instead of names or indices.

Let’s build the same query again but this time using the Criteria API to demonstrate how to handle query parameters when dealing with CriteriaQuery:

CriteriaBuilder cb = em.getCriteriaBuilder();

CriteriaQuery<Employee> cQuery = cb.createQuery(Employee.class);
Root<Employee> c = cQuery.from(Employee.class);
ParameterExpression<String> paramEmpNumber = cb.parameter(String.class);
cQuery.select(c).where(cb.equal(c.get(Employee_.empNumber), paramEmpNumber));

TypedQuery<Employee> query = em.createQuery(cQuery);
String empNumber = "A123";
query.setParameter(paramEmpNumber, empNumber);
Employee employee = query.getResultList();

For this type of query, the parameter’s mechanic is a little bit different since we use a parameter object but in essence, there’s no difference.

Within the previous example, we can see the usage of the Employee_ class. We generated this class with the Hibernate metamodel generator. These components are part of the static JPA metamodel which allows criteria queries to be built in a strongly-typed manner.

6. Conclusion

In this article, we’ve focused on the mechanics of building queries by using JPA query parameters or input parameters.

We’ve learned that we have two types of query parameters, positional and named. It’s up to us which one fits best our objectives.

It’s also worth to take note that all query parameters must be single-valued except for in expressions. For in expressions, we may use collection-valued input parameters, such as arrays or Lists as shown within the previous examples.

The source code of this tutorial, as usual, is available on GitHub.

@Timed Annotation Using Metrics and AspectJ

$
0
0

1. Introduction

Monitoring is very helpful for finding bugs and optimizing performance. We could manually instrument our code to add timers and logging, but this would lead to a lot of distracting boilerplate.

On the other hand, we can use a monitoring framework, driven by annotations, such as Dropwizard Metrics.

In this tutorial, we will instrument a simple class using Metrics AspectJ, and the Dropwizard Metrics @Timed annotation.

2. Maven Setup

First of all, let’s add the Metrics AspectJ Maven dependencies to our project:

<dependency>
    <groupId>io.astefanutti.metrics.aspectj</groupId>
    <artifactId>metrics-aspectj</artifactId>
    <version>1.2.0</version>
    <exclusions>
        <exclusion>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>io.astefanutti.metrics.aspectj</groupId>
    <artifactId>metrics-aspectj-deps</artifactId>
    <version>1.2.0</version>
</dependency>

We’re using metrics-aspectj to provide metrics via aspect oriented programming, and metrics-aspectj-deps to provide its dependencies.

We also need the aspectj-maven-plugin to set up compile time processing of the metrics annotations:

<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>aspectj-maven-plugin</artifactId>
    <version>1.8</version>
    <configuration>
        <complianceLevel>1.8</complianceLevel>
        <source>1.8</source>
        <target>1.8</target>
        <aspectLibraries>
            <aspectLibrary>
                <groupId>io.astefanutti.metrics.aspectj</groupId>
                <artifactId>metrics-aspectj</artifactId>
            </aspectLibrary>
        </aspectLibraries>
    </configuration>
    <executions>
        <execution>
            <goals>
                <goal>compile</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Now our project is ready to have some Java code instrumented.

3. Annotation Instrumentation

Firstly, let’s create a method and annotate it with the @Timed annotation. We’ll also fill the name property with a name for our timer:

import com.codahale.metrics.annotation.Timed;
import io.astefanutti.metrics.aspectj.Metrics;

@Metrics(registry = "objectRunnerRegistryName")
public class ObjectRunner {

    @Timed(name = "timerName")
    public void run() throws InterruptedException {
        Thread.sleep(1000L);
    }
}

We’re using the @Metrics annotation at the class level to let the Metrics AspectJ framework know this class has methods to be monitored. We’re putting @Timed on the method to create the timer.

In addition, @Metrics creates a registry using the registry name provided – objectRunnerRegistryName in this case – to store the metrics.

Our example code just sleeps for one second to emulate an operation.

Now, let’s define a class to start the application and configure our MetricsRegistry:

public class ApplicationMain {
    static final MetricRegistry registry = new MetricRegistry();

    public static void main(String args[]) throws InterruptedException {
        startReport();

        ObjectRunner runner = new ObjectRunner();

        for (int i = 0; i < 5; i++) {
            runner.run();
        }

        Thread.sleep(3000L);
    }

    static void startReport() {
        SharedMetricRegistries.add("objectRunnerRegistryName", registry);

        ConsoleReporter reporter = ConsoleReporter.forRegistry(registry)
                .convertRatesTo(TimeUnit.SECONDS)
                .convertDurationsTo(TimeUnit.MILLISECONDS)
                .outputTo(new PrintStream(System.out))
                .build();
        reporter.start(3, TimeUnit.SECONDS);
    }
}

In the startReport method of ApplicationMain, we set up the registry instance to the SharedMetricRegistries using the same registry name as used in @Metrics.

After that, we create a simple ConsoleReporter to report our metrics from the @Timed annotated method. We should note that there are other types of reporters available.

Our application will call the timed method five times. Let’s compile it with Maven and then execute it:

-- Timers ----------------------------------------------------------------------
ObjectRunner.timerName
             count = 5
         mean rate = 0.86 calls/second
     1-minute rate = 0.80 calls/second
     5-minute rate = 0.80 calls/second
    15-minute rate = 0.80 calls/second
               min = 1000.49 milliseconds
               max = 1003.00 milliseconds
              mean = 1001.03 milliseconds
            stddev = 1.10 milliseconds
            median = 1000.54 milliseconds
              75% <= 1001.81 milliseconds
              95% <= 1003.00 milliseconds
              98% <= 1003.00 milliseconds
              99% <= 1003.00 milliseconds
            99.9% <= 1003.00 milliseconds

As we can see, the Metrics framework provides us with detailed statistics for very little code change to a method we want to instrument.

We should note that running the application without the Maven build – for example, through an IDE – might not get the above output. We need to ensure the AspectJ compilation plugin is included in the build for this to work.

4. Conclusion

In this tutorial, we investigated how to instrument a simple Java application with Metrics AspectJ.

We found Metrics AspectJ annotations a good way to instrument code without needing a large application framework like Spring, JEE, or Dropwizard. Instead, by using aspects, we were able to add interceptors at compile-time.

As always, the complete source code for the example is available over on GitHub.

Run a Java main Method Using Gradle

$
0
0

1. Introduction

In this tutorial, we’ll explore the different methods of executing a Java main method using Gradle.

2. Java main Method

There are several ways in which we can run a Java main method with Gradle. Let us look at them closely using a simple program that prints a message to the standard output:

public class MainClass {
    public static void main(String[] args) {
        System.out.println("Goodbye cruel world ...");
    }
}

3. Running with the Application Plugin

The Application plugin is a core Gradle plugin that defines a collection of ready-to-use tasks that help us package and distribute our application.

Let’s start by inserting the following in our build.gradle file:

plugins {
    id "application"
}
apply plugin : "java" 
ext {
   javaMainClass = "com.baeldung.gradle.exec.MainClass"
}

application {
    mainClassName = javaMainClass
}

The plugin automatically generates a task called run that only requires us to point it to the main class. The closure at line 9 does exactly that, which allows us to trigger the task:

~/work/baeldung/tutorials/gradle-java-exec> ./gradlew run

> Task :run
Goodbye cruel world ...

BUILD SUCCESSFUL in 531ms
2 actionable tasks: 1 executed, 1 up-to-date

4. Running with the JavaExec Task

Next, let’s implement a custom task for running the main method with the help of the JavaExec task type:

task runWithJavaExec(type: JavaExec) {
    group = "Execution"
    description = "Run the main class with JavaExecTask"
    classpath = sourceSets.main.runtimeClasspath
    main = javaMainClass
}

We need to define the main class on line 5 and, additionally, specify the classpath. The classpath is computed from the default properties of the build output and contains the correct path where the compiled class is actually placed.

Notice that in each scenario, we use the fully qualified name, including package, of the main class.

Let’s run our example using JavaExec:

~/work/baeldung/tutorials/gradle-java-exec> ./gradlew runWithJavaExec

> Task :runWithJavaExec
Goodbye cruel world ...

BUILD SUCCESSFUL in 526ms
2 actionable tasks: 1 executed, 1 up-to-date

5. Running with the Exec Task

Finally, we can execute our main class using the base Exec task type. Since this option offers us the possibility to configure the execution in multiple ways, let’s implement three custom tasks and discuss them individually.

5.1. Running from the Compiled Build Output

First, we create a custom Exec task that behaves similarly to JavaExec:

task runWithExec(type: Exec) {
    dependsOn build
    group = "Execution"
    description = "Run the main class with ExecTask"
    commandLine "java", "-classpath", sourceSets.main.runtimeClasspath.getAsPath(), javaMainClass
}

We can run any executable (in this case java) and pass the necessary arguments for it to run.

We configure the classpath and point to our main class on line 5, and we also add a dependency to the build task on line 2. This is necessary, as we can only run our main class after it is compiled:

~/work/baeldung/tutorials/gradle-java-exec> ./gradlew runWithExec

> Task :runWithExec
Goodbye cruel world ...

BUILD SUCCESSFUL in 666ms
6 actionable tasks: 6 executed

5.2. Running from an Output Jar

The second approach relies on the jar packaging of our small application:

task runWithExecJarOnClassPath(type: Exec) {
    dependsOn jar
    group = "Execution"
    description = "Run the mainClass from the output jar in classpath with ExecTask"
    commandLine "java", "-classpath", jar.archiveFile.get(), javaMainClass
}

Notice the dependency to the jar task on line 2 and the second argument to the java executable on line 5. We use a normal jar, so we need to specify the entry point with the fourth parameter:

~/work/baeldung/tutorials/gradle-java-exec> ./gradlew runWithExecJarOnClassPath

> Task :runWithExecJarOnClassPath
Goodbye cruel world ...

BUILD SUCCESSFUL in 555ms
3 actionable tasks: 3 executed

5.3. Running from an Executable Output Jar

The third way also relies on the jar packaging, but we define the entry point with the help of a manifest property:

jar {
    manifest {
        attributes(
            "Main-Class": javaMainClass
        )
    }
}

task runWithExecJarExecutable(type: Exec) {
    dependsOn jar
    group = "Execution"
    description = "Run the output executable jar with ExecTask"
    commandLine "java", "-jar", jar.archiveFile.get()
}

Here, we no longer need to specify the classpath, and we can simply run the jar:

~/work/baeldung/tutorials/gradle-java-exec> ./gradlew runWithExecJarExecutable

> Task :runWithExecJarExecutable
Goodbye cruel world ...

BUILD SUCCESSFUL in 572ms
3 actionable tasks: 3 executed

6. Conclusion

In this article, we explored the various ways of running a Java main method using Gradle.

Out of the box, the Application plugin provides a minimally configurable task to run our method. The JavaExec task type allows us to run the main method without specifying any plugins.

Finally, the generic Exec task type can be used in various combinations with the java executable to achieve the same results but requires a dependency on other tasks.

As usual, the source code for this tutorial is available over on GitHub.

Isomorphic Application with React and Nashorn

$
0
0

1. Overview

In this tutorial, we’ll understand what exactly is an isomorphic app. We’ll also discuss Nashorn, the JavaScript engine bundled with Java.

Furthermore, we’ll explore how we can use Nashorn along with a front-end library like React to create an isomorphic app.

2. A Little Bit of History

Traditionally, client and server applications were written in a manner that was quite heavy on the server-side. Think of PHP as a scripting engine generating mostly static HTML and web browsers rendering them.

Netscape came with the support of JavaScript in its browser way back in mid-nineties. That started to shift some of the processing from server-side to the client-side browser. For a long time, developers struggled with different issues concerning JavaScript support in web browsers.

With the growing demand for faster and interactive user experience, the boundary was already being pushed harder. One of the earliest frameworks that changed the game was jQuery. It brought several user-friendly functions and much-enhanced support for AJAX.

Soon, many frameworks for front-end development started to appear, which improved developer’s experience greatly. Starting with AngularJS from Google, React from Facebook, and later, Vue, they started to capture developer attention.

With modern browser support, remarkable frameworks and required tools, the tides are largely shifting towards client-side.

An immersive experience on increasingly faster hand-held devices requires more client-side processing.

3. What’s an Isomorphic App?

So, we saw how front-end frameworks are helping us develop a web application where the user interface is completely rendered at the client-side.

However, it’s also possible to use the same framework at the server-side and generate the same user interface.

Now, we do not have to stick to client-side only or server-side only solutions necessarily. A better way is to have a solution where the client and server can both process the same front-end code and generate the same user interface.

There are benefits to this approach, which we’ll discuss later.

 

Such web applications are called Isomorphic or Universal. Now the client-side language is most exclusively JavaScript. Hence, for an isomorphic app to work, we have to use JavaScript at the server-side as well.

Node.js is by far the most common choice to build a server-side rendered application.

4. What is Nashorn?

So, where does Nashorn fit in, and why should we use it? Nashorn is a JavaScript engine packaged by default with Java. Hence, if we already have a web application back-end in Java and want to build an isomorphic app, Nashorn is pretty handy!

Nashorn has been released as part of Java 8. This is primarily focused on allowing embedded JavaScript applications in Java.

Nashorn compiles JavaScript in-memory to Java Bytecode and passes it to the JVM for execution. This offers better performance compared to the earlier engine, Rhino.

5. Creating an Isomorphic App

We have gone through enough context now. Our application here will display a Fibonacci sequence and provide a button to generate and display the next number in the sequence. Let’s create a simple isomorphic app now with a back-end and front-end:

  • Front-end: A simple React.js based front-end
  • Back-end: A simple Spring Boot back-end with Nashorn to process JavaScript

6. Application Front-End

We’ll be using React.js for creating our front end. React is a popular JavaScript library for building single-page apps. It helps us decompose a complex user interface into hierarchical components with optional state and one-way data binding.

React parses this hierarchy and creates an in-memory data structure called virtual DOM. This helps React to find changes between different states and make minimal changes to the browser DOM.

6.1. React Component

Let’s create our first React component:

var App = React.createClass({displayName: "App",
    handleSubmit: function() {
    	var last = this.state.data[this.state.data.length-1];
    	var secondLast = this.state.data[this.state.data.length-2];
        $.ajax({
            url: '/next/'+last+'/'+secondLast,
            dataType: 'text',
            success: function(msg) {
                var series = this.state.data;
                series.push(msg);
                this.setState({data: series});
            }.bind(this),
            error: function(xhr, status, err) {
                console.error('/next', status, err.toString());
            }.bind(this)
        });
    },
    componentDidMount: function() {
    	this.setState({data: this.props.data});
    },	
    getInitialState: function() {
        return {data: []};
    },	
    render: function() {
        return (
            React.createElement("div", {className: "app"},
            	React.createElement("h2", null, "Fibonacci Generator"),
            	React.createElement("h2", null, this.state.data.toString()),
                React.createElement("input", {type: "submit", value: "Next", onClick: this.handleSubmit})
            )     
        );
    }
});

Now, let’s understand what is the above code doing:

  • To begin with, we have defined a class component in React called “App”
  • The most important function inside this component is “render”, which is responsible for generating the user interface
  • We have provided a style className that the component can use
  • We’re making use of the component state here to store and display the series
  • While the state initializes as an empty list, it fetches data passed to the component as a prop when the component mounts
  • Finally, on click of the button “Add”, a jQuery call to the REST service is made
  • The call fetches the next number in the sequence and appends it to the component’s state
  • Change in the component’s state automatically re-renders the component

6.2. Using the React Component

React looks for a named “div” element in the HTML page to anchor its contents. All we have to do is provide an HTML page with this “div” element and load the JS files:

<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
    <title>Hello React</title>
    <script type="text/javascript" src="js/react.js"></script>
    <script type="text/javascript" src="js/react-dom.js"></script>
    <script type="text/javascript" src="http://code.jquery.com/jquery-1.10.0.min.js"></script>
</head>
<body>
<div id="root"></div>
<script type="text/javascript" src="app.js"></script>
<script type="text/javascript">
    ReactDOM.render(
        React.createElement(App, {data: [0,1,1]}),
        document.getElementById("root")
    );
</script>
</body>
</html>

So, let’s see what we’ve done here:

  • We imported the required JS libraries, react, react-dom and jQuery
  • After that, we defined a “div” element called “root”
  • We also imported the JS file with our React component
  • Next, we called the React component “App” with some seed data, the first three Fibonacci numbers

7. Application Back-End

Now, let’s see how we can create a fitting back-end for our application. We’ve already decided to use Spring Boot along with Spring Web for building this application. More importantly, we’ve decided to use Nashorn to process the JavaScript-based front-end we developed in the last section.

7.1. Maven Dependencies

For our simple application, we’ll be using JSP together with Spring MVC, so we’ll add a couple of dependencies to our POM:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
    <groupId>org.apache.tomcat.embed</groupId>
    <artifactId>tomcat-embed-jasper</artifactId>
    <scope>provided</scope>
</dependency>

The first one is the standard spring boot dependency for a web application. The second one is needed to compile JSPs.

7.2. Web Controller

Let’s now create our web controller, which will process our JavaScript file and return an HTML using JSP:

@Controller
public class MyWebController {
    @RequestMapping("/")
    public String index(Map<String, Object> model) throws Exception {
        ScriptEngine nashorn = new ScriptEngineManager().getEngineByName("nashorn");
        nashorn.eval(new FileReader("static/js/react.js"));
        nashorn.eval(new FileReader("static/js/react-dom-server.js"));
        nashorn.eval(new FileReader("static/app.js"));
        Object html = nashorn.eval(
          "ReactDOMServer.renderToString(" + 
            "React.createElement(App, {data: [0,1,1]})" + 
          ");");
        model.put("content", String.valueOf(html));
        return "index";
    }
}

So, what exactly is happening here:

  • We fetch an instance of ScriptEngine of type Nashorn from ScriptEngineManager
  • Then, we load relevant libraries to React, react.js, and react-dom-server.js
  • We also load our JS file that has our react component “App”
  • Finally, we evaluate a JS fragment creating react element with the component “App” and some seed data
  • This provides us with an output of React, an HTML fragment as Object
  • We pass this HTML fragment as data to the relevant view – the JSP

7.3. JSP

Now, how do we process this HTML fragment in our JSP?

Recall that React automatically adds its output to a named “div” element – “root” in our case. However, we’ll add our server-side generated HTML fragment to the same element manually in our JSP.

Let’s see how the JSP looks now:

<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
    <title>Hello React!</title>
    <script type="text/javascript" src="js/react.js"></script>
    <script type="text/javascript" src="js/react-dom.js"></script>
    <script type="text/javascript" src="http://code.jquery.com/jquery-1.10.0.min.js"></script>
</head>
<body>
<div id="root">${content}</div>
<script type="text/javascript" src="app.js"></script>
<script type="text/javascript">
	ReactDOM.render(
        React.createElement(App, {data: [0,1,1]}),
        document.getElementById("root")
    );
</script>
</body>
</html>

This is the same page we created earlier, except for the fact that we’ve added our HTML fragment into the “root” div, which was empty earlier.

7.4. REST Controller

Finally, we also need a server-side REST endpoint that gives us the next Fibonacci number in the sequence:

@RestController
public class MyRestController {
    @RequestMapping("/next/{last}/{secondLast}")
    public int index(
      @PathVariable("last") int last, 
      @PathVariable("secondLast") int secondLast) throws Exception {
        return last + secondLast;
    }
}

Nothing fancy here, just a simple Spring REST controller.

8. Running the Application

Now, that we have completed our front-end as well as our back-end, it’s time to run the application.

We should start the Spring Boot application normally, making use of the bootstrapping class:

@SpringBootApplication
public class Application extends SpringBootServletInitializer {
    @Override
    protected SpringApplicationBuilder configure(SpringApplicationBuilder application) {
        return application.sources(Application.class);
    }
    public static void main(String[] args) throws Exception {
        SpringApplication.run(Application.class, args);
    }
}

When we run this class, Spring Boot compiles our JSPs and makes them available on embedded Tomcat along with the rest of the web application.

Now, if we visit our site, we’ll see:

Let’s understand the sequence of events:

  • The browser requests this page
  • When the request for this page arrives, Spring web controller process the JS files
  • Nashorn engine generates an HTML fragment and passes this to the JSP
  • JSP adds this HTML fragment to the “root” div element, finally returning the above HTML page
  • The browser renders the HTML, meanwhile starts downloading JS files
  • Finally, the page is ready for client-side actions — we can add more numbers in the series

The important thing to understand here is what happens if React finds an HTML fragment in the target “div” element. In such cases, React compares this fragment with what it has and does not replace it if it finds a legible fragment. This is exactly what powers server-side rendering and isomorphic apps.

9. What More is Possible?

In our simple example, we have just scratched the surface of what’s possible. Front-end applications with modern JS-based frameworks are getting increasingly more powerful and complex. With this added complexity, there are many things that we need to take care of:

  • We’ve created just one React component in our application when in reality, this can be several components forming a hierarchy which pass data through props
  • We would like to create separate JS files for every component to keep them manageable and manage their dependencies through “exports/require” or “export/import”
  • Moreover, it may not be possible to manage state within components only; we may want to use a state management library like Redux
  • Furthermore, we may have to interact with external services as side-effects of actions; this may require us to use a pattern like redux-thunk or Redux-Saga
  • Most importantly, we would want to leverage JSX, a syntax extension to JS for describing the user interface

While Nashorn is fully compatible with pure JS, it may not support all the features mentioned above. Many of these require trans-compiling and polyfills due to JS compatibility.

The usual practice in such cases is to leverage a module bundler like Webpack or Rollup. What they mainly do is to process all of React source files and bundle them into a single JS file along with all dependencies. This invariably requires a modern JavaScript compiler like Babel to compile JavaScript to be backward compatible.

The final bundle only has good old JS, which browsers can understand and Nashorn adheres to as well.

10. Benefits of an Isomorphic App

So, we’ve talked a great deal about isomorphic apps and have even created a simple application now. But why exactly should we even care about this? Let’s understand some of the key benefits of using an isomorphic app.

10.1. First Page Rendering

One of the most significant benefits of an isomorphic app is the faster rendering of the first page. In the typical client-side rendered application, the browser begins by downloading all the JS and CSS artifacts.

After that, they load and start rendering the first page. If we send the first page rendered from the server-side, this can be much faster, providing enhanced user experience.

10.2. SEO Friendly

Another benefit often cited with server-side rendering is related to SEO. It’s believed that search bots are not able to process JavaScript and hence do not see an index page rendered at client-side through libraries like React. A server-side rendered page, therefore, is SEO friendlier. It’s worth noting, though, that Modern search engine bots claim to process JavaScript.

11. Conclusion

In this tutorial, we went through the basic concepts of isomorphic applications and the Nashorn JavaScript engine. We further explored how to build an isomorphic app with Spring Boot, React, and Nashorn.

Then, we discussed the other possibilities to extend the front-end application and the benefits of using an isomorphic app.

As always, the code can be found over on GitHub.


Counting Words in a String

$
0
0

1. Overview

In this tutorial, we are going to go over different ways of counting words in a given string using Java.

2. Using StringTokenizer

A simple way to count words in a string in Java is to use the StringTokenizer class:

assertEquals(3, new StringTokenizer("three blind mice").countTokens());
assertEquals(4, new StringTokenizer("see\thow\tthey\trun").countTokens());

Note that StringTokenizer automatically takes care of whitespace for us, like tabs and carriage returns.

But, it might goof-up in some places, like hyphens:

assertEquals(7, new StringTokenizer("the farmer's wife--she was from Albuquerque").countTokens());

In this case, we’d want “wife” and “she” to be different words, but since there’s no whitespace between them, the defaults fail us.

Fortunately, StringTokenizer ships with another constructor. We can pass a delimiter into the constructor to make the above work:

assertEquals(7, new StringTokenizer("the farmer's wife--she was from Albuquerque", " -").countTokens());

This comes in handy when trying to count the words in a string from something like a CSV file:

assertEquals(10, new StringTokenizer("did,you,ever,see,such,a,sight,in,your,life", ",").countTokens());

So, StringTokenizer is simple, and it gets us most of the way there.

Let’s see though what extra horsepower regular expressions can give us.

3. Regular Expressions

In order for us to come up with a meaningful regular expression for this task, we need to define what we consider a word: a word starts with a letter and ends either with a space character or a punctuation mark.

With this in mind, given a string, what we want to do is to split that string at every point we encounter spaces and punctuation marks, then count the resulting words.

assertEquals(7, countWordsUsingRegex("the farmer's wife--she was from Albuquerque"));

Let’s crank things up a bit to see the power of regex:

assertEquals(9, countWordsUsingRegex("no&one#should%ever-write-like,this;but:well"));

It is not practical to solve this one through just passing a delimiter to StringTokenizer since we’d have to define a really long delimiter to try and list out all possible punctuation marks.

It turns out we really don’t have to do much, passing the regex [\pP\s&&[^’]]+ to the split method of the String class will do the trick:

public static int countWordsUsingRegex(String arg) {
    if (arg == null) {
        return 0;
    }
    final String[] words = arg.split("[\pP\s&&[^']]+");
    return words.length;
}

The regex [\pP\s&&[^’]]+ finds any length of either punctuation marks or spaces and ignores the apostrophe punctuation mark.

To find out more about regular expressions, refer to Regular Expressions on Baeldung.

4. Loops and the String API

The other method is to have a flag that keeps track of the words that have been encountered.

We set the flag to WORD when encountering a new word and increment the word count, then back to SEPARATOR when we encounter a non-word (punctuation or space characters).

This approach gives us the same results we got with regular expressions:

assertEquals(9, countWordsManually("no&one#should%ever-write-like,this but   well"));

We do have to be careful with special cases where punctuation marks are not really word separators, for example:

assertEquals(6, countWordsManually("the farmer's wife--she was from Albuquerque"));

What we want here is to count “farmer’s” as one word, although the apostrophe ” ‘ ” is a punctuation mark.

In the regex version, we had the flexibility to define what doesn’t qualify as a character using the regex. But now that we are writing our own implementation, we have to define this exclusion in a separate method:

private static boolean isAllowedInWord(char charAt) {
    return charAt == '\'' || Character.isLetter(charAt);
}

So what we have done here is to allow in a word all characters and legal punctuation marks, the apostrophe in this case.

We can now use this method in our implementation:

public static int countWordsManually(String arg) {
    if (arg == null) {
        return 0;
    }
    int flag = SEPARATOR;
    int count = 0;
    int stringLength = arg.length();
    int characterCounter = 0;

    while (characterCounter < stringLength) {
        if (isAllowedInWord(arg.charAt(characterCounter)) && flag == SEPARATOR) {
            flag = WORD;
            count++;
        } else if (!isAllowedInWord(arg.charAt(characterCounter))) {
            flag = SEPARATOR;
        }
        characterCounter++;
    }
    return count;
}

The first condition marks a word when it encounters one, and increments the counter. The second condition checks if the character is not a letter, and sets the flag to SEPARATOR.

5. Conclusion

In this tutorial, we have looked at ways to count words using several approaches. We can pick any depending on our particular use-case. As usual, the source code for this tutorial can be found in our GitHub.

Logging HTTP Requests with Spring Boot Actuator HTTP Tracing

$
0
0

1. Introduction

When we work with microservices or web services in general, it’s quite useful to know how our users interact with our services. This can be achieved by tracing all the requests that hit our services and collect this information to analyze it later.

There some systems available out there that can help us with this and can be easily integrated with Spring like Zipkin. However, Spring Boot Actuator has this functionality built-in and can be used through its httpTrace endpoint which traces all HTTP requests. In this tutorial, we’ll show how to use it and how to customize it to fit better our requirements.

2. HttpTrace Endpoint Setup

For the sake of this tutorial, we’ll use a Maven Spring Boot project.

The first thing we need to do is to add the Spring Boot Actuator dependency to our project:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

After that, we’ll have to enable the httpTrace endpoint in our application.

To do so, we just need to modify our application.properties file to include the httpTrace endpoint:

management.endpoints.web.exposure.include=httptrace

In case we need more endpoints, we can just concatenate them separated by commas or we can include all of them by using the wildcard *.

Now, our httpTrace endpoint should appear in the actuator endpoints list of our application:

{
  "_links": {
    "self": {
      "href": "http://localhost:8080/actuator",
      "templated": false
    },
    "httptrace": {
      "href": "http://localhost:8080/actuator/httptrace",
      "templated": false
    }
  }
}

Notice that we can list all the enabled actuator endpoints by going to the /actuator endpoint of our web service.

3. Analyzing the Traces

Let’s analyze now the traces that the httpTrace actuator endpoint returns.

Let’s make some requests to our service, call the /actuator/httptrace endpoint and take one of the traces returned:

{
  "traces": [
    {
      "timestamp": "2019-08-05T19:28:36.353Z",
      "principal": null,
      "session": null,
      "request": {
        "method": "GET",
        "uri": "http://localhost:8080/echo?msg=test",
        "headers": {
          "accept-language": [
            "en-GB,en-US;q=0.9,en;q=0.8"
          ],
          "upgrade-insecure-requests": [
            "1"
          ],
          "host": [
            "localhost:8080"
          ],
          "connection": [
            "keep-alive"
          ],
          "accept-encoding": [
            "gzip, deflate, br"
          ],
          "accept": [
            "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
          ],
          "user-agent": [
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36 OPR/62.0.3331.66"
          ]
        },
        "remoteAddress": null
      },
      "response": {
        "status": 200,
        "headers": {
          "Content-Length": [
            "12"
          ],
          "Date": [
            "Mon, 05 Aug 2019 19:28:36 GMT"
          ],
          "Content-Type": [
            "text/html;charset=UTF-8"
          ]
        }
      },
      "timeTaken": 82
    }
  ]
}

As we can see, the response is divided into several nodes:

  • timestamp: the time when the request was received
  • principal: the authenticated user who did the request, if applicable
  • session: any session associated with the request
  • request: information about the request such as the method, full URI or headers
  • response: information about the response such as the status or the headers
  • timeTaken: the time taken to handle the request

We can adapt this response to our needs if we feel it’s too verbose. We can tell Spring what fields we want to be returned by specifying them in the management.trace.http.include property of our application.properties:

management.trace.http.include=RESPONSE_HEADERS

In this case, we specified that we only want the response headers. Hence, we can see that fields that were included before like the request headers or the time taken are not present in the response now:

{
  "traces": [
    {
      "timestamp": "2019-08-05T20:23:01.397Z",
      "principal": null,
      "session": null,
      "request": {
        "method": "GET",
        "uri": "http://localhost:8080/echo?msg=test",
        "headers": {},
        "remoteAddress": null
      },
      "response": {
        "status": 200,
        "headers": {
          "Content-Length": [
            "12"
          ],
          "Date": [
            "Mon, 05 Aug 2019 20:23:01 GMT"
          ],
          "Content-Type": [
            "text/html;charset=UTF-8"
          ]
        }
      },
      "timeTaken": null
    }
  ]
}

All the possible values that can be included can be found in the source code, as well as the default ones.

4. Customizing the HttpTraceRepository

By default, the httpTrace endpoint only returns the last 100 requests and it stores them in memory. The good news is that we can also customize this by creating our own HttpTraceRepository.

Let’s now create our repository. The HttpTraceRepository interface is very simple and we only need to implement two methods: findAll() to retrieve all the traces; and add() to add a trace to the repository.

For simplicity, our repository will also store the traces in memory and we’ll store only the last GET request that hits our service:

@Repository
public class CustomTraceRepository implements HttpTraceRepository {

    AtomicReference<HttpTrace> lastTrace = new AtomicReference<>();

    @Override
    public List<HttpTrace> findAll() {
        return Collections.singletonList(lastTrace.get());
    }

    @Override
    public void add(HttpTrace trace) {
        if ("GET".equals(trace.getRequest().getMethod())) {
            lastTrace.set(trace);
        }
    }

}

Even though this simple example may not look very useful, we can see how powerful this can get and how we could store our logs anywhere.

5. Filtering the Paths to Trace

The last thing we’re going to cover is how to filter the paths that we want to trace, so we can ignore some requests that we’re not interested in.

If we play with the httpTrace endpoint a little after making some requests to our service, we can see that we also get traces for the actuator requests:

{
  "traces": [
    {
      "timestamp": "2019-07-28T13:56:36.998Z",
      "principal": null,
      "session": null,
      "request": {
        "method": "GET",
        "uri": "http://localhost:8080/actuator/",
         // ...
}

We may not find these traces useful for us and we’d prefer to exclude them. In that case, we just need to create our own HttpTraceFilter and specify what paths we want to ignore in the shouldNotFilter method:

@Component
public class TraceRequestFilter extends HttpTraceFilter {

  public TraceRequestFilter(HttpTraceRepository repository, HttpExchangeTracer tracer) {
      super(repository, tracer);
  }

  @Override
  protected boolean shouldNotFilter(HttpServletRequest request) throws ServletException {
      return request.getServletPath().contains("actuator");
  }
}

Notice that the HttpTraceFilter is just a regular Spring filter but with some tracing-specific functionality.

6. Conclusion

In this tutorial, we’ve introduced the httpTrace Spring Boot Actuator endpoint and shown its main features. We’ve also dug a bit deeper and explained how to change some default behaviors to fit better our specific needs.

As always, the full source code for the examples is available over on GitHub.

An Overview of QuickSort Algorithm

$
0
0

1. Introduction

In this article, we’re going to look at the quicksort algorithm and understand how it works.

Quicksort is a divide-and-conquer algorithm. This means each iteration works by dividing the input into two parts and then sorting those, before combining them back together

It was originally developed by Tony Hoare and published in 1961 and is still one of the more efficient general-purpose sorting algorithms available.

2. Algorithm Requirements

The only real requirement for using the quicksort algorithm is a well-defined operation to compare two elements, such that we can determine if any element is strictly less than another one. The exact nature of this comparison isn’t important, as long as it is consistent. Note that direct equality comparison isn’t required, only a less-than comparison.

For many types, this is an undeniable comparison. For example, numbers implicitly define how to do this. Other types are less obvious, but we can still define this based on the requirements of the sort. For example, when sorting strings, we would need to decide if a character case is important or how Unicode characters work.

3. Binary Tree Sort

The Binary Tree Sort is an algorithm where we build a balanced binary tree consisting of the elements we’re sorting. Once we have this, we can then build the results from this tree.

The idea is to select a pivot as a node on the tree, and then assign all elements to either the left or right branch of the node based on whether they are less than the pivot element or not. We can then recursively sort these branches until we have a completely sorted tree.

3.1. Worked Example

For example, to sort the list of numbers “3 7 8 5 2 1 9 5 4”. Our first pass would be as follows:

Input: 3 7 8 5 2 1 9 5 4
Pivot = 3
Left = 2 1
Right = 7 8 5 9 5 4

This has given us two partitions from the original input. Everything in the Left list is strictly less than the Pivot, and everything else is in the Right list.

Next, we sort these two lists using the same algorithm:

Input: 2 1
Pivot = 2
Left = 1
Right = Empty

Input: 7 8 5 9 5 4
Pivot = 7
Left = 5 5 4
Right = 8 9

When we sorted the left partition from the first pass, we have ended up with two lists that are both length 1 or less. These are then already sorted – because it is impossible to have a list of size one that is unsorted. This means that we can stop here, and instead focus on the remaining parts from the right partition.

At this point, we have the following structure:

      / [1]
    2
  /   \ []
3
  \   / [5 5 4]
    7
      \ [8 9]

At this point, we’re already getting close to a sorted list. We have two more partitions to sort and then we’re finished:

        1
      /
    2       4
  /       /
3       5
  \   /   \
    7       5
      \ 
        8
          \
            9

This has sorted the list in 5 passes of the algorithm, applied to increasingly smaller sub-lists. However, the memory needs are relatively high, having had to allocate an additional 17 elements worth of memory to sort the nine elements in our original list.

4. Quicksort Algorithm

Quicksort algorithm is similar in concept to a Binary Tree Sort. Rather than building sublists at each step that we then need to sort, it does everything in place within the original list.

It works by dynamically swapping elements within the list around a selected pivot, and then recursively sorting the sub-lists to either side of this pivot. This makes it significantly more space-efficient, which can be important for huge lists.

Quicksort depends on two key factors — the selection of the pivot and the mechanism for partitioning the elements.

The key to this algorithm is the partition function, which we will cover soon. This returns an index into the input array such that every element below this index sorts as less than the element at this index, and the element at this index sorts as less than all the elements above it.

Doing this will involve swapping some of the elements in the array around so that they are the appropriate side of this index.

Once we’ve done this partitioning, we then apply the algorithm to the two partitions on either side of this index. This eventually finishes when we have partitions that contain only one element each, at which point the input array is now sorted.

4.1. Lomuto Partitioning

Lomuto Partitioning is attributed to Nico Lomuto. This works by iterating over the input array, swapping elements that are strictly less than a pre-selected pivot element such that they appear earlier in the array, but on a sliding target index.

This sliding target index is then the new partition index that we will return for the next recursions of the greater algorithm to work with.

The goal of this is to ensure that our sliding target index is in a position such that all elements before it in the array are less than this element and that this element is less than all elements after it in the array.

Let’s have a look at this in pseudocode:

fun quicksort(input : T[], low : int, high : int) 
    if (low < high) 
        p := partition(input, low, high) 
        quicksort(input, low, p - 1) 
        quicksort(input, p + 1, high)

fun partition(input: T[], low: int, high: int) : int
    pivot := input[high]
    partitionIndex := low
    loop j from low to (high - 1)
        if (input[j] < pivot) then
            swap(input[partitionIndex], input[j])
            partitionIndex := partitionIndex + 1
    swap(input[partitionIndex], input[high]
    return partitionIndex

As a worked example, we can partition our array from earlier:

Sorting input: 3,7,8,5,2,1,9,5,4 from 0 to 8
Pivot: 4
Partition Index: 0

When j == 0 => input[0] == 3 => Swap 3 for 3 => input := 3,7,8,5,2,1,9,5,4, partitionIndex := 1
When j == 1 => input[1] == 7 => No Change
When j == 2 => input[2] == 8 => No Change
When j == 3 => input[3] == 5 => No Change
When j == 4 => input[4] == 7 => Swap 7 for 2 => input := 3,2,8,5,7,1,9,5,4, partitionIndex := 2
When j == 5 => input[5] == 8 => Swap 8 for 1 => input := 3,2,1,5,7,8,9,5,4, partitionIndex := 3
When j == 6 => input[6] == 9 => No Change
When j == 7 => input[7] == 5 => No Change

After Loop => Swap 4 for 5 => input := 3,2,1,4,7,8,9,5,5, partitionIndex := 3

We can see from working through this that we have performed three swaps and determined a new partition point of index “3”. The array after these swaps is such that elements 0, 1, and 2 are all less than element 3, and element 3 is less than elements 4, 5, 6, 7 and 8.

Having done this, the greater algorithm then recurses, such that we will be sorting the sub-array from 0 to 2, and the sub-array from 4 to 8. For example, repeating this for the sub-array from 0 to 2, we will do:

Sorting input: 3,2,1,4,7,8,9,5,5 from 0 to 2
Pivot: 1
Partition Index: 0

When j == 0 => input[0] == 3 => No Change
When j == 1 => input[1] == 2 => No Change

After Loop => Swap 1 for 3 => input := 1,2,3,4,7,8,9,5,5, partitionIndex := 0

Note that we are still passing the entire input array in for the algorithm to work with, but because we’ve got low and high indices we only actually pay attention to the bit we care about. This is an efficiency that means we’ve had no need to duplicate the entire array or sections of it.

Across the entire algorithm, sorting the entire array, we have performed 12 different swaps to get to the result.

4.2. Hoare Partitioning

Hoare partitioning was proposed by Tony Hoare when the quicksort algorithm was originally published. Instead of working across the array from low to high, it iterates from both ends at once towards the center. This means that we have more iterations, and more comparisons, but fewer swaps.

This can be important since often comparing memory values is cheaper than swapping them.

In pseudocode:

fun quicksort(input : T[], low : int, high : int) 
    if (low < high) 
        p := partition(input, low, high) 
        quicksort(input, low, p) // Note that this is different than when using Lomuto
        quicksort(input, p + 1, high)

fun partition(input : T[], low: int, high: int) : int
    pivotPoint := low + (high - low) / 2
    pivot := input[pivotPoint]
    loop
        loop while (input[low] < pivot)
            low := low + 1
        loop while (pivot < input[high])
            high := high - 1
        if (low >= high)
            return high
        swap(input[low], input[high])
        low := low + 1
        high := high - 1

As a worked example, we can partition our array from earlier:

Sorting input: 3,7,8,5,2,1,9,5,4 from 0 to 8
Pivot: 2

Loop #1
    Iterate low => input[0] == 3 => Stop, low == 0
    Iterate high => input[8] == 4 => high := 7
    Iterate high => input[7] == 5 => high := 6
    Iterate high => input[6] == 9 => high := 5
    Iterate high => input[5] == 1 => Stop, high == 5
    Swap 1 for 3 => input := 1,7,8,5,2,3,9,5,4
    Low := 1
    High := 4
Loop #2
    Iterate low => input[1] == 7 => Stop, low == 1
    Iterate high => input[4] == 2 => Stop, high == 4
    Swap 2 for 7 => input := 1,2,8,5,7,3,9,5,4
    Low := 2
    High := 3
Loop #3
    Iterate low => input[2] == 8 => Stop, low == 2
    Iterate high => input[3] == 5 => high := 2
    Iterate high => input[2] == 8 => high := 1
    Iterate high => input[1] == 2 => Stop, high == 1
    Return 1

On the face of it, this looks to be a more complicated algorithm that is doing more work. However, it does less expensive work overall. The entire algorithm only needs 8 swaps instead of the 12 needed by the Lomuto partitioning scheme to achieve the same results.

5. Algorithm Adjustments

There are several adjustments that we can make to the normal algorithm, depending on the exact requirements. These don’t fit every single case, and so we should use them only when appropriate, but they can make a significant difference to the result.

5.1. Pivot Selection

The choice of the element to pivot around can be significant to how efficient the algorithm is. Above, we selected a fixed element. This works well if the list is truly shuffled in a random order, but the more ordered the list is, the less efficient this is.

If we were to sort the list 1, 2, 3, 4, 5, 6, 7, 8, 9 then the Hoare partitioning scheme does it with zero swaps, but the Lomuto scheme needs 44.  Equally, the list 9, 8, 7, 6, 5, 4, 3, 2, 1 needs 4 swaps with Hoare and 24 with Lomuto.

In the case of the Hoare partitioning scheme, this is already very good, but the Lomuto scheme can improve a lot. By introducing a change to how we select the pivot, to use a median of three fixed points, we can get a dramatic improvement.

This adjustment is known simply as Median-of-three:

mid := (low + high) / 2
if (input[mid] < input[low])
    swap(input[mid], input[low])
if (input[high] < input[low])
    swap(input[high], input[low])
if (input[mid] < input[high])
    swap(input[mid], input[high])

We apply this on every pass of the algorithm. This takes the three fixed points and ensures that they are pre-sorted in reverse order.

This seems unusual, but the impact speaks for itself. Using this to sort the list 1, 2, 3, 4, 5, 6, 7, 8, 9 now takes 16 swaps, where before it took 44. That’s a 64% reduction in the work done. However, the list 9, 8, 7, 6, 5, 4, 3, 2, 1 only drops to 19 swaps with this, instead of 24 before, and the list 3, 7, 8, 5, 2, 1, 9, 5, 4 goes up to 18 where it was 12 before.

5.2. Repeated Elements

Quicksort suffers slightly when there are large numbers of elements that are directly equal. It will still try to sort all of these, and potentially do a lot more work than is necessary.

One adjustment that we can make is to detect these equal elements as part of the partitioning phase and return bounds either side of them instead of just a single point. We can then treat an entire stretch of equal elements as already sorted and just handle the ones on either side.

Let’s see this in pseudocode:

fun quicksort(input : T[], low : int, high : int) 
    if (low < high) 
        (left, right) := partition(input, low, high) 
        quicksort(input, low, left - 1) 
        quicksort(input, right + 1, high)

Here, every time the partitioning scheme returns a pivot, it returns the lower and upper indices for all adjacent elements that have the same value. This can quickly remove larger swaths of the list without needing to process them.

To implement this, we need to be able to compare elements for equality as well as for less-than. However, this is typically an easier comparison to implement.

6. Algorithm Performance

The quicksort algorithm is generally considered to be very efficient. On average, it has O(n log(n)) performance for sorting arbitrary inputs.

The original Lomuto partitioning scheme will degrade to O(n²) in the case where the list is already sorted and we pick the final element as the pivot. As we’ve seen, this improves when we implement median-of-three for our pivot selection, and in fact, this takes us back to O(n log(n)).

Conversely, the Hoare partitioning scheme can result in more comparisons because it recurses on low -> p instead of low -> p-1. This means the recursion makes more comparisons, even though it results in fewer swaps.

7. Summary

Here we’ve had an introduction to what quicksort is and how the algorithm works. We’ve also covered some variations that can be made to the algorithm for different cases.




                       

An Intro to the Java Debug Interface (JDI)

$
0
0

1. Overview

We may wonder how widely recognized IDEs like IntelliJ IDEA and Eclipse implement debugging features. These tools rely heavily on the Java Platform Debugger Architecture (JPDA).

In this introductory article, we’ll discuss the Java Debug Interface API (JDI) available under JPDA.

At the same time, we’ll write a custom debugger program step-by-step, familiarizing ourselves with handy JDI interfaces.

2. Introduction to JPDA

Java Platform Debugger Architecture (JPDA) is a set of well-designed interfaces and protocols provided by Sun to debug Java.

It provides three specially designed interfaces, to implement custom debuggers for a development environment in desktop systems.

To begin, the Java Virtual Machine Tool Interface (JVMTI) helps us interact and control the execution of applications running in the JVM.

Then, there’s the Java Debug Wire Protocol (JDWP) which defines the protocol used between the application under test (debuggee) and the debugger.

At last, the Java Debug Interface (JDI) is used to implement the debugger application.

3. What is JDI?

Java Debug Interface API is a set of interfaces provided by Java, to implement the frontend of the debugger. JDI is the highest-layer of the JPDA.

A debugger built with JDI can debug applications running in any JVM which supports JPDA. At the same time, we can hook it into any layer of debugging.

It provides the ability to access the VM and its state along with access to variables of the debuggee. At the same time, it allows to set the breakpoints, stepping, watchpoints and handle threads.

4. Setup

We’ll require two separate programs – a debuggee and a debugger – to understand JDI’s implementations.

First, we’ll write a sample program as the debuggee.

Let’s create a JDIExampleDebuggee class with a few String variables and println statements:

public class JDIExampleDebuggee {
    public static void main(String[] args) {
        String jpda = "Java Platform Debugger Architecture";
        System.out.println("Hi Everyone, Welcome to " + jpda); // add a break point here

        String jdi = "Java Debug Interface"; // add a break point here and also stepping in here
        String text = "Today, we'll dive into " + jdi;
        System.out.println(text);
    }
}

Then, we’ll write a debugger program.

Let’s create a JDIExampleDebugger class with properties to hold the debugging program (debugClass) and line numbers for breakpoints (breakPointLines):

public class JDIExampleDebugger {
    private Class debugClass; 
    private int[] breakPointLines;

    // getters and setters
}

4.1. LaunchingConnector

At first, a debugger requires a connector to establish a connection with the target Virtual Machine (VM).

Then, we’ll need to set the debuggee as the connector’s main argument. At last, the connector should launch the VM for debugging.

To do so, JDI provides a Bootstrap class which gives an instance of the LaunchingConnector. The LaunchingConnector provides a map of the default arguments, in which we can set the main argument.

Therefore, let’s add the connectAndLaunchVM method to the JDIDebuggerExample class:

public VirtualMachine connectAndLaunchVM() throws Exception {
 
    LaunchingConnector launchingConnector = Bootstrap.virtualMachineManager()
      .defaultConnector();
    Map<String, Connector.Argument> arguments = launchingConnector.defaultArguments();
    arguments.get("main").setValue(debugClass.getName());
    return launchingConnector.launch(arguments);
}

Now, we’ll add the main method to the JDIDebuggerExample class to debug the JDIExampleDebuggee:

public static void main(String[] args) throws Exception {
 
    JDIExampleDebugger debuggerInstance = new JDIExampleDebugger();
    debuggerInstance.setDebugClass(JDIExampleDebuggee.class);
    int[] breakPoints = {6, 9};
    debuggerInstance.setBreakPointLines(breakPoints);
    VirtualMachine vm = null;
    try {
        vm = debuggerInstance.connectAndLaunchVM();
        vm.resume();
    } catch(Exception e) {
        e.printStackTrace();
    }
}

Let’s compile both of our classes, JDIExampleDebuggee (debuggee) and JDIExampleDebugger (debugger):

javac -g -cp "/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/lib/tools.jar" 
com/baeldung/jdi/*.java

Let’s discuss the javac command used here, in detail.

The -g option generates all the debugging information without which, we may see AbsentInformationException.

And -cp will add the tools.jar in the classpath to compile the classes.

All JDI libraries are available under tools.jar of the JDK. Therefore, make sure to add the tools.jar in the classpath at both compilation and execution.

That’s it, now we are ready to execute our custom debugger JDIExampleDebugger:

java -cp "/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/lib/tools.jar:." 
JDIExampleDebugger

Note the “:.” with tools.jar. This will append tools.jar to the classpath for current run time (use “;.” on windows).

4.2. Bootstrap and ClassPrepareRequest

Executing the debugger program here will give no results since we haven’t prepared the class for debugging and set the breakpoints.

The VirtualMachine class has the eventRequestManager method to create various requests like ClassPrepareRequest, BreakpointRequest, and StepEventRequest.

So, let’s add the enableClassPrepareRequest method to the JDIExampleDebugger class.

This will filter the JDIExampleDebuggee class and enables the ClassPrepareRequest:

public void enableClassPrepareRequest(VirtualMachine vm) {
    ClassPrepareRequest classPrepareRequest = vm.eventRequestManager().createClassPrepareRequest();
    classPrepareRequest.addClassFilter(debugClass.getName());
    classPrepareRequest.enable();
}

4.3. ClassPrepareEvent and BreakpointRequest

Once, ClassPrepareRequest for the JDIExampleDebuggee class is enabled, the event queue of the VM will start having instances of the ClassPrepareEvent.

Using ClassPrepareEvent, we can get the location to set a breakpoint and creates a BreakPointRequest.

To do so, let’s add the setBreakPoints method to the JDIExampleDebugger class:

public void setBreakPoints(VirtualMachine vm, ClassPrepareEvent event) throws AbsentInformationException {
    ClassType classType = (ClassType) event.referenceType();
    for(int lineNumber: breakPointLines) {
        Location location = classType.locationsOfLine(lineNumber).get(0);
        BreakpointRequest bpReq = vm.eventRequestManager().createBreakpointRequest(location);
        bpReq.enable();
    }
}

4.4. BreakPointEvent and StackFrame

So far, we’ve prepared the class for debugging and set the breakpoints. Now, we need to catch the BreakPointEvent and display the variables.

JDI provides the StackFrame class, to get the list of all the visible variables of the debuggee.

Therefore, let’s add the displayVariables method to the JDIExampleDebugger class:

public void displayVariables(LocatableEvent event) throws IncompatibleThreadStateException, 
AbsentInformationException {
    StackFrame stackFrame = event.thread().frame(0);
    if(stackFrame.location().toString().contains(debugClass.getName())) {
        Map<LocalVariable, Value> visibleVariables = stackFrame
          .getValues(stackFrame.visibleVariables());
        System.out.println("Variables at " + stackFrame.location().toString() +  " > ");
        for (Map.Entry<LocalVariable, Value> entry : visibleVariables.entrySet()) {
            System.out.println(entry.getKey().name() + " = " + entry.getValue());
        }
    }
}

5. Debug Target

At this step, all we need is to update the main method of the JDIExampleDebugger to start debugging.

Hence, we’ll use the already discussed methods like enableClassPrepareRequest, setBreakPoints, and displayVariables:

try {
    vm = debuggerInstance.connectAndLaunchVM();
    debuggerInstance.enableClassPrepareRequest(vm);
    EventSet eventSet = null;
    while ((eventSet = vm.eventQueue().remove()) != null) {
        for (Event event : eventSet) {
            if (event instanceof ClassPrepareEvent) {
                debuggerInstance.setBreakPoints(vm, (ClassPrepareEvent)event);
            }
            if (event instanceof BreakpointEvent) {
                debuggerInstance.displayVariables((BreakpointEvent) event);
            }
            vm.resume();
        }
    }
} catch (VMDisconnectedException e) {
    System.out.println("Virtual Machine is disconnected.");
} catch (Exception e) {
    e.printStackTrace();
}

Now firstly, let’s compile the JDIDebuggerExample class again with the already discussed javac command.

And last, we’ll execute the debugger program along with all the changes to see the output:

Variables at com.baeldung.jdi.JDIExampleDebuggee:6 > 
args = instance of java.lang.String[0] (id=93)
Variables at com.baeldung.jdi.JDIExampleDebuggee:9 > 
jpda = "Java Platform Debugger Architecture"
args = instance of java.lang.String[0] (id=93)
Virtual Machine is disconnected.

Hurray! We’ve successfully debugged the JDIExampleDebuggee class. At the same time, we’ve displayed the values of the variables at the breakpoint locations (line number 6 and 9).

Therefore, our custom debugger is ready.

5.1. StepRequest

Debugging also requires stepping through the code and checking the state of the variables at subsequent steps. Therefore, we’ll create a step request at the breakpoint.

While creating the instance of the StepRequest, we must provide the size and depth of the step. We’ll define STEP_LINE and STEP_OVER respectively.

Let’s write a method to enable the step request.

For simplicity, we’ll start stepping at the last breakpoint (line number 9):

public void enableStepRequest(VirtualMachine vm, BreakpointEvent event) {
    // enable step request for last break point
    if (event.location().toString().
        contains(debugClass.getName() + ":" + breakPointLines[breakPointLines.length-1])) {
        StepRequest stepRequest = vm.eventRequestManager()
            .createStepRequest(event.thread(), StepRequest.STEP_LINE, StepRequest.STEP_OVER);
        stepRequest.enable();    
    }
}

Now, we can update the main method of the JDIExampleDebugger, to enable the step request when it is a BreakPointEvent:

if (event instanceof BreakpointEvent) {
    debuggerInstance.enableStepRequest(vm, (BreakpointEvent)event);
}

5.2. StepEvent

Similar to the BreakPointEvent, we can also display the variables at the StepEvent.

Let’s update the main method accordingly:

if (event instanceof StepEvent) {
    debuggerInstance.displayVariables((StepEvent) event);
}

At last, we’ll execute the debugger to see the state of the variables while stepping through the code:

Variables at com.baeldung.jdi.JDIExampleDebuggee:6 > 
args = instance of java.lang.String[0] (id=93)
Variables at com.baeldung.jdi.JDIExampleDebuggee:9 > 
args = instance of java.lang.String[0] (id=93)
jpda = "Java Platform Debugger Architecture"
Variables at com.baeldung.jdi.JDIExampleDebuggee:10 > 
args = instance of java.lang.String[0] (id=93)
jpda = "Java Platform Debugger Architecture"
jdi = "Java Debug Interface"
Variables at com.baeldung.jdi.JDIExampleDebuggee:11 > 
args = instance of java.lang.String[0] (id=93)
jpda = "Java Platform Debugger Architecture"
jdi = "Java Debug Interface"
text = "Today, we'll dive into Java Debug Interface"
Variables at com.baeldung.jdi.JDIExampleDebuggee:12 > 
args = instance of java.lang.String[0] (id=93)
jpda = "Java Platform Debugger Architecture"
jdi = "Java Debug Interface"
text = "Today, we'll dive into Java Debug Interface"
Virtual Machine is disconnected.

If we compare the output, we’ll realize that debugger stepped in from line number 9 and displays the variables at all subsequent steps.

6. Read Execution Output

We might notice that println statements of the JDIExampleDebuggee class haven’t been part of the debugger output.

As per the JDI documentation, if we launch the VM through LaunchingConnector, its output and error streams must be read by the Process object.

Therefore, let’s add it to the finally clause of our main method:

finally {
    InputStreamReader reader = new InputStreamReader(vm.process().getInputStream());
    OutputStreamWriter writer = new OutputStreamWriter(System.out);
    char[] buf = new char[512];
    reader.read(buf);
    writer.write(buf);
    writer.flush();
}

Now, executing the debugger program will also add the println statements from the JDIExampleDebuggee class to the debugging output:

Hi Everyone, Welcome to Java Platform Debugger Architecture
Today, we'll dive into Java Debug Interface

7. Conclusion

In this article, we’ve explored the Java Debug Interface (JDI) API available under the Java Platform Debugger Architecture (JPDA).

Along the way, we’ve built a custom debugger utilizing the handy interfaces provided by JDI. At the same time, we’ve also added stepping capability to the debugger.

As this was just an introduction to JDI, it is recommended to look at the implementations of other interfaces available under JDI API.

As usual, all the code implementations are available over on GitHub.

A Guide to System.exit()

$
0
0

1. Overview

In this tutorial, we’ll have a look at what System.exit means in Java.

We’ll see its purposes, where to use and how to use it. We’ll also see what’s the difference in invoking it with different status codes.

2. What is System.exit?

System.exit is a void method. It takes an exit code, which it passes on to the calling script or program.

Exiting with a code of zero means a normal exit:

System.exit(0);

We can pass any integer as an argument to the method. A non-zero status code is considered as an abnormal exit.

Calling the System.exit method terminates the currently running JVM and exits the program. This method does not return normally.

This means that the subsequent code after the System.exit is effectively unreachable and yet, the compiler does not know about it.

System.exit(0);
System.out.println("This line is unreachable");

It’s not a good idea to shut down a program with System.exit(0). It gives us the same result of exiting from the main method and also stops the subsequent lines from executing, also the thread invoking System.exit blocks until the JVM terminates. If a shutdown hook submits a task to this thread, it leads to a deadlock. 

3. Why do we need it?

The typical use-case for System.exit is when there is an abnormal condition and we need to exit the program immediately.

Also, if we have to terminate the program from a place other than the main method, System.exit is one way of achieving it.

4. When do we need it?

It’s common for a script to rely on the exit codes of commands it invokes. If such a command is a Java application, then System.exit is handy for sending this exit code.

For example, instead of throwing an exception, we can return an abnormal exit code that can then be interpreted by the calling script.

Or, we can use System.exit to invoke any shutdown hooks we’ve registered. These hooks can be set to clean up the resources held and exit safely from other non-daemon threads.

5. A Simple Example

In this example, we try to read a file and if it exists, we print a line from it. If the file does not exist, we exit the program with System.exit from the catch block.

try {
    BufferedReader br = new BufferedReader(new FileReader("file.txt"));
    System.out.println(br.readLine());
    br.close();
} catch (IOException e) {
    System.exit(2);
} finally {
    System.out.println("Exiting the program");
}

Here, we must note that the finally block does not get executed if the file is not found. Because the System.exit on the catch blocks exits the JVM and does not allow the finally block to execute.

6. Choosing a Status Code

We can pass any integer as a status code but, the general practice is that a System.exit with status code 0 is normal and others are abnormal exits.

Note that this is only a “good practice” and is not a strict rule that the compiler would care about.

Also, it’s worth noting when we invoke a Java program from the command-line that the status code is taken into account.

In the below example, when we try to execute SystemExitExample.class, if it exits the JVM by calling the System.exit with a non-zero status code, then the following echo does not get printed.

java SystemExitExample && echo "I will not be printed"

To make our program able to communicate with other standard tools, we might consider following the standard codes that the related systems use to communicate.

For example, UNIX status codes define 128 as the standard for “invalid argument to exit”. So, it might be a good idea to use this code when we need our status code to be communicated to the operating system. Otherwise, we are free to choose our code.

7. Conclusion

In this tutorial, we discussed how System.exit works when to use it, and how to use it.

It’s a good practice to use exception handling or plain return statements to exit a program when working with application servers and other regular applications. Usage of System.exit method suit better for script-based applications or wherever the status codes are interpreted.

You can check out the examples provided in this article over on GitHub.

Checked and Unchecked Exceptions in Java

$
0
0

1. Overview

Java exceptions fall into two main categories: checked exceptions and unchecked exceptions. In this article, we’ll provide some code samples on how to use them.

2. Checked Exceptions

In general, checked exceptions represent errors outside the control of the program. For example, the constructor of FileInputStream throws FileNotFoundException if the input file does not exist.

Java verifies checked exceptions at compile-time.

Therefore, we should use the throws keyword to declare a checked exception:

private static void checkedExceptionWithThrows() throws FileNotFoundException {
    File file = new File("not_existing_file.txt");
    FileInputStream stream = new FileInputStream(file);
}

We can also use a try-catch block to handle a checked exception:

private static void checkedExceptionWithTryCatch() {
    File file = new File("not_existing_file.txt");
    try {
        FileInputStream stream = new FileInputStream(file);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

Some common checked exceptions in Java are IOException, SQLException, and ParseException.

The Exception class is the superclass of checked exceptions. Therefore, we can create a custom checked exception by extending Exception:

public class IncorrectFileNameException extends Exception {
    public IncorrectFileNameException(String errorMessage) {
        super(errorMessage);
    }
}

3. Unchecked Exceptions

If a program throws an unchecked exception, it reflects some error inside the program logic. For example, if we divide a number by 0, Java will throw ArithmeticException:

private static void divideByZero() {
    int numerator = 1;
    int denominator = 0;
    int result = numerator / denominator;
}

Java does not verify unchecked exceptions at compile-time. Furtheremore, we don’t have to declare unchecked exceptions in a method with the throws keyword. And although the above code does not have any errors during compile-time, it will throw ArithmeticException at runtime.

Some common unchecked exceptions in Java are NullPointerException, ArrayIndexOutOfBoundsException, and IllegalArgumentException.

The RuntimeException class is the superclass of all unchecked exceptions. Therefore, we can create a custom unchecked exception by extending RuntimeException:

public class NullOrEmptyException extends RuntimeException {
    public NullOrEmptyException(String errorMessage) {
        super(errorMessage);
    }
}

4. When to Use Checked Exceptions and Unchecked Exceptions

It’s a good practice to use exceptions in Java so that we can separate error-handling code from regular code. However, we need to decide which type of exception to throw. The Oracle Java Documentation provides guidance on when to use checked exceptions and unchecked exceptions:

“If a client can reasonably be expected to recover from an exception, make it a checked exception. If a client cannot do anything to recover from the exception, make it an unchecked exception.”

For example, before we open a file, we can first validate the input file name. If the user input file name is invalid, we can throw a custom checked exception:

if (!isCorrectFileName(fileName)) {
    throw new IncorrectFileNameException("Incorrect filename : " + fileName );
}

In this way, we can recover the system by accepting another user input file name. However, if the input file name is a null pointer or it is an empty string, it means that we have some errors in the code. In this case, we should throw an unchecked exception:

if (fileName == null || fileName.isEmpty())  {
    throw new NullOrEmptyException("The filename is null or empty.");
}

5. Conclusion

In this article, we discussed the difference between checked and unchecked exceptions. We also provided some code examples to show when to use checked or unchecked exceptions.

As always, all code found in this article can be found over on GitHub.

Machine Learning with Spark MLlib

$
0
0

1. Overview

In this tutorial, we’ll understand how to leverage Apache Spark MLlib to develop machine learning products. We’ll develop a simple machine learning product with Spark MLlib to demonstrate the core concepts.

2. A Brief Primer to Machine Learning

Machine Learning is part of a broader umbrella known as Artificial Intelligence. Machine learning refers to the study of statistical models to solve specific problems with patterns and inferences. These models are “trained” for the specific problem by the means of training data drawn from the problem space.

We’ll see what exactly this definition entails as we take on our example.

2.1. Machine Learning Categories

We can broadly categorize machine learning into supervised and unsupervised categories based on the approach. There are other categories as well, but we’ll keep ourselves to these two:

  • Supervised learning works with a set of data that contains both the inputs and the desired output — for instance, a data set containing various characteristics of a property and the expected rental income. Supervised learning is further divided into two broad sub-categories called classification and regression:
    • Classification algorithms are related to categorical output, like whether a property is occupied or not
    • Regression algorithms are related to a continuous output range, like the value of a property
  • Unsupervised learning, on the other hand, works with a set of data which only have input values. It works by trying to identify the inherent structure in the input data. For instance, finding different types of consumers through a data set of their consumption behavior.

2.2. Machine Learning Workflow

Machine learning is truly an inter-disciplinary area of study. It requires knowledge of the business domain, statistics, probability, linear algebra, and programming. As this can clearly get overwhelming, it’s best to approach this in an orderly fashion, what we typically call a machine learning workflow:

As we can see, every machine learning project should start with a clearly defined problem statement. This should be followed by a series of steps related to data that can potentially answer the problem.

Then we typically select a model looking at the nature of the problem. This is followed by a series of model training and validation, which is known as model fine-tuning. Finally, we test the model on previously unseen data and deploy it to production if satisfactory.

3. What is Spark MLlib?

Apache Spark is an open-source cluster computing framework with implicit data parallelism and fault tolerance. The parallelism and fault-tolerance are abstracted in what is called Resilient Distributed Dataset (RDD).

RDD – a fundamental data structure of Spark – is an immutable, distributed collection of objects. APIs for working with Spark RDD are available in multiple languages like Java, Scala, and Python.

Spark consists of multiple modules, including Spark Core – the foundation module providing task dispatching, scheduling, and I/O functionalities. Apart from this, Spark has modules like Spark SQL, Spark Streaming, Spark MLlib, and Spark GraphX.

Spark MLlib is a module on top of Spark Core that provides machine learning primitives as APIs. Machine learning typically deals with a large amount of data for model training.

The base computing framework from Spark is a huge benefit. On top of this, MLlib provides most of the popular machine learning and statistical algorithms. This greatly simplifies the task of working on a large-scale machine learning project.

4. Machine Learning with MLlib

We now have enough context on machine learning and how MLlib can help in this endeavor. Let’s get started with our basic example of implementing a machine learning project with Spark MLlib.

If we recall from our discussion on machine learning workflow, we should start with a problem statement and then move on to data. Fortunately for us, we’ll pick the “hello world” of machine learning, Iris Dataset. This is a multivariate labeled dataset, consisting of length and width of sepals and petals of different species of Iris.

This gives our problem objective: can we predict the species of an Iris from the length and width of its sepal and petal?

4.1. Setting the Dependencies

First, we have to define the following dependency in Maven to pull the relevant libraries:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_2.11</artifactId>
    <version>2.4.3</version>
    <scope>provided</scope>
</dependency>

And we need to initialize the SparkContext to work with Spark APIs:

SparkConf conf = new SparkConf()
  .setAppName("Main")
  .setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);

4.2. Loading the Data

First things first, we should download the data, which is available as a text file in CSV format. Then we have to load this data in Spark:

String dataFile = "data\\iris.data";
JavaRDD<String> data = sc.textFile(dataFile);

Spark MLlib offers several data types, both local and distributed, to represent the input data and corresponding labels. The simplest of the data types are Vector:

JavaRDD<Vector> inputData = data
  .map(line -> {
      String[] parts = line.split(",");
      double[] v = new double[parts.length - 1];
      for (int i = 0; i < parts.length - 1; i++) {
          v[i] = Double.parseDouble(parts[i]);
      }
      return Vectors.dense(v);
});

Note that we’ve included only the input features here, mostly to perform statistical analysis.

A training example typically consists of multiple input features and a label, represented by the class LabeledPoint:

Map<String, Integer> map = new HashMap<>();
map.put("Iris-setosa", 0);
map.put("Iris-versicolor", 1);
map.put("Iris-virginica", 2);
		
JavaRDD<LabeledPoint> labeledData = data
  .map(line -> {
      String[] parts = line.split(",");
      double[] v = new double[parts.length - 1];
      for (int i = 0; i < parts.length - 1; i++) {
          v[i] = Double.parseDouble(parts[i]);
      }
      return new LabeledPoint(map.get(parts[parts.length - 1]), Vectors.dense(v));
});

Our output label in the dataset is textual, signifying the species of Iris. To feed this into a machine learning model, we have to convert this into numeric values.

4.3. Exploratory Data Analysis

Exploratory data analysis involves analyzing the available data. Now, machine learning algorithms are sensitive towards data quality, hence a higher quality data has better prospects for delivering the desired outcome.

Typical analysis objectives include removing anomalies and detecting patterns. This even feeds into the critical steps of feature engineering to arrive at useful features from the available data.

Our dataset, in this example, is small and well-formed. Hence we don’t have to indulge in a lot of data analysis. Spark MLlib, however, is equipped with APIs to offer quite an insight.

Let’s begin with some simple statistical analysis:

MultivariateStatisticalSummary summary = Statistics.colStats(inputData.rdd());
System.out.println("Summary Mean:");
System.out.println(summary.mean());
System.out.println("Summary Variance:");
System.out.println(summary.variance());
System.out.println("Summary Non-zero:");
System.out.println(summary.numNonzeros());

Here, we’re observing the mean and variance of the features we have. This is helpful in determining if we need to perform normalization of features. It’s useful to have all features on a similar scale. We are also taking a note of non-zero values, which can adversely impact model performance.

Here is the output for our input data:

Summary Mean:
[5.843333333333332,3.0540000000000003,3.7586666666666666,1.1986666666666668]
Summary Variance:
[0.6856935123042509,0.18800402684563744,3.113179418344516,0.5824143176733783]
Summary Non-zero:
[150.0,150.0,150.0,150.0]

Another important metric to analyze is the correlation between features in the input data:

Matrix correlMatrix = Statistics.corr(inputData.rdd(), "pearson");
System.out.println("Correlation Matrix:");
System.out.println(correlMatrix.toString());

A high correlation between any two features suggests they are not adding any incremental value and one of them can be dropped. Here is how our features are correlated:

Correlation Matrix:
1.0                   -0.10936924995064387  0.8717541573048727   0.8179536333691672   
-0.10936924995064387  1.0                   -0.4205160964011671  -0.3565440896138163  
0.8717541573048727    -0.4205160964011671   1.0                  0.9627570970509661   
0.8179536333691672    -0.3565440896138163   0.9627570970509661   1.0

4.4. Splitting the Data

If we recall our discussion of machine learning workflow, it involves several iterations of model training and validation followed by final testing.

For this to happen, we have to split our training data into training, validation, and test sets. To keep things simple, we’ll skip the validation part. So, let’s split our data into training and test sets:

JavaRDD<LabeledPoint>[] splits = parsedData.randomSplit(new double[] { 0.8, 0.2 }, 11L);
JavaRDD<LabeledPoint> trainingData = splits[0];
JavaRDD<LabeledPoint> testData = splits[1];

4.5. Model Training

So, we’ve reached a stage where we’ve analyzed and prepared our dataset. All that’s left is to feed this into a model and start the magic! Well, easier said than done. We need to pick a suitable algorithm for our problem – recall the different categories of machine learning we spoke of earlier.

It isn’t difficult to understand that our problem fits into classification within the supervised category. Now, there are quite a few algorithms available for use under this category.

The simplest of them is Logistic Regression (let the word regression not confuse us; it is, after all, a classification algorithm):

LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
  .setNumClasses(3)
  .run(trainingData.rdd());

Here, we are using a three-class Limited Memory BFGS based classifier. The details of this algorithm are beyond the scope of this tutorial, but this is one of the most widely used ones.

4.6. Model Evaluation

Remember that model training involves multiple iterations, but for simplicity, we’ve just used a single pass here. Now that we’ve trained our model, it’s time to test this on the test dataset:

JavaPairRDD<Object, Object> predictionAndLabels = testData
  .mapToPair(p -> new Tuple2<>(model.predict(p.features()), p.label()));
MulticlassMetrics metrics = new MulticlassMetrics(predictionAndLabels.rdd());
double accuracy = metrics.accuracy();
System.out.println("Model Accuracy on Test Data: " + accuracy);

Now, how do we measure the effectiveness of a model? There are several metrics that we can use, but one of the simplest is Accuracy. Simply put, accuracy is a ratio of the correct number of predictions and the total number of predictions. Here is what we can achieve in a single run of our model:

Model Accuracy on Test Data: 0.9310344827586207

Note that this will vary slightly from run to run due to the stochastic nature of the algorithm.

However, accuracy is not a very effective metric in some problem domains. Other more sophisticated metrics are Precision and Recall (F1 Score), ROC Curve, and Confusion Matrix.

4.7. Saving and Loading the Model

Finally, we often need to save the trained model to the filesystem and load it for prediction on production data. This is trivial in Spark:

model.save(sc, "model\\logistic-regression");
LogisticRegressionModel sameModel = LogisticRegressionModel
  .load(sc, "model\\logistic-regression");
Vector newData = Vectors.dense(new double[]{1,1,1,1});
double prediction = sameModel.predict(newData);
System.out.println("Model Prediction on New Data = " + prediction);

So, we’re saving the model to the filesystem and loading it back. After loading, the model can be straight away used to predict output on new data. Here is a sample prediction on random new data:

Model Prediction on New Data = 2.0

5. Beyond The Primitive Example

While the example we went through covers the workflow of a machine learning project broadly, it leaves a lot of subtle and important points. While it isn’t possible to discuss them in detail here, we can certainly go through some of the important ones.

Spark MLlib through its APIs has extensive support in all these areas.

5.1. Model Selection

Model selection is often one of the complex and critical tasks. Training a model is an involved process and is much better to do on a model that we’re more confident will produce the desired results.

While the nature of the problem can help us identify the category of machine learning algorithm to pick from, it isn’t a job fully done. Within a category like classification, as we saw earlier, there are often many possible different algorithms and their variations to choose from.

Often the best course of action is quick prototyping on a much smaller set of data. A library like Spark MLlib makes the job of quick prototyping much easier.

5.2. Model Hyper-Parameter Tuning

A typical model consists of features, parameters, and hyper-parameters. Features are what we feed into the model as input data. Model parameters are variables which model learns during the training process. Depending on the model, there are certain additional parameters that we have to set based on experience and adjust iteratively. These are called model hyper-parameters.

For instance, the learning rate is a typical hyper-parameter in gradient-descent based algorithms. Learning rate controls how fast parameters are adjusted during training cycles. This has to be aptly set for the model to learn effectively at a reasonable pace.

While we can begin with an initial value of such hyper-parameters based on experience, we have to perform model validation and manually tune them iteratively.

5.3. Model Performance

A statistical model, while being trained, is prone to overfitting and underfitting, both causing poor model performance. Underfitting refers to the case where the model does not pick the general details from the data sufficiently. On the other hand, overfitting happens when the model starts to pick up noise from the data as well.

There are several methods for avoiding the problems of underfitting and overfitting, which are often employed in combination. For instance, to counter overfitting, the most employed techniques include cross-validation and regularization. Similarly, to improve underfitting, we can increase the complexity of the model and increase the training time.

Spark MLlib has fantastic support for most of these techniques like regularization and cross-validation. In fact, most of the algorithms have default support for them.

6. Spark MLlib in Comparision

While Spark MLlib is quite a powerful library for machine learning projects, it is certainly not the only one for the job. There are quite a number of libraries available in different programming languages with varying support. We’ll go through some of the popular ones here.

6.1. Tensorflow/Keras

Tensorflow is an open-source library for dataflow and differentiable programming, widely employed for machine learning applications. Together with its high-level abstraction, Keras, it is a tool of choice for machine learning. They are primarily written in Python and C++ and primarily used in Python. Unlike Spark MLlib, it does not have a polyglot presence.

6.2. Theano

Theano is another Python-based open-source library for manipulating and evaluating mathematical expressions – for instance, matrix-based expressions, which are commonly used in machine learning algorithms. Unlike Spark MLlib, Theano again is primarily used in Python. Keras, however, can be used together with a Theano back end.

6.3. CNTK

Microsoft Cognitive Toolkit (CNTK) is a deep learning framework written in C++ that describes computational steps via a directed graph. It can be used in both Python and C++ programs and is primarily used in developing neural networks. There’s a Keras back end based on CNTK available for use that provides the familiar intuitive abstraction.

7. Conclusion

To sum up, in this tutorial we went through the basics of machine learning, including different categories and workflow. We went through the basics of Spark MLlib as a machine learning library available to us.

Furthermore, we developed a simple machine learning application based on the available dataset. We implemented some of the most common steps in the machine learning workflow in our example.

We also went through some of the advanced steps in a typical machine learning project and how Spark MLlib can help in those. Finally, we saw some of the alternative machine learning libraries available for us to use.

As always, the code can be found over on GitHub.


FreeMarker Common Operations

$
0
0

1. Introduction

FreeMarker is a template engine, written in Java, and maintained by the Apache Foundation. We can use the FreeMarker Template Language, also known as FTL, to generate many text-based formats like web pages, email, or XML files.

In this tutorial, we’ll see what we can do out-of-the-box with FreeMarker, though note that it is quite configurable and even integrates nicely with Spring.

Let’s get started!

2. Quick Overview

To inject dynamic content in our pages, we need to use a syntax that FreeMarker understands:

  • ${…} in the template will be replaced in the generated output with the actual value of the expression inside the curly brackets – we call this interpolation – a couple of examples are ${1 + 2} and ${variableName}
  • FTL tags are like HTML tags (but contain # or @) and FreeMarker interprets them, for example <#if…></#if>
  • Comments in FreeMarker start with <#– and end with –>

3. The Include Tag

The FTL include directive is a way for us to follow the DRY principle in our application. We will define the repetitive content in a file and reuse it across different FreeMarker templates with single include tag.

One such use case is when we want to include the menu section inside many pages. First, we’ll define the menu section inside a file – we’ll call it menu.ftl – with the following content:

<a href="#dashboard">Dashboard</a>
<a href="#newEndpoint">Add new endpoint</a>

And on our HTML page, let’s include the created menu.ftl:

<!DOCTYPE html>
<html>
<body>
<#include 'fragments/menu.ftl'>
    <h6>Dashboard page</h6>
</body>
</html>

And we can also include FTL in our fragments, which is great.

4. Handling Value Existence

FTL will consider any null value as a missing value. Thus, we need to be extra careful and add logic to handle null inside our template.

We can use the ?? operator to check if an attribute, or nested property, exists. The result is a boolean:

${attribute??}

So, we’ve tested the attribute for null, but that’s not always enough. Let’s now define a default value as a fallback for this missing value. To do this, we need the ! operator placed after the name of the variable:

${attribute!'default value'}

Using round brackets, we can wrap many nested attributes.

For example, to check if the attribute exists and has a nested property with another nested property, we wrap everything:

${(attribute.nestedProperty.nestedProperty)??}

Finally, putting everything together, we can embed these among static content:

<p>Testing is student property exists: ${student???c}</p>
<p>Using default value for missing student: ${student!'John Doe'}</p>
<p>Wrapping student nested properties: ${(student.address.street)???c}</p>

And, if the student were null, we’d see:

<p>Testing is student property exists: false</p>
<p>Using default value for missing student: John Doe</p>
<p>Wrapping student nested properties: false</p>

Please notice the additional ?c directive used after the ??. We did it to convert the boolean value to a human-readable string.

5. The If-Else Tag

Control structures are present in FreeMarker, and the traditional if-else is probably familiar:

<#if condition>
    <!-- block to execute if condition is true -->
<#elseif condition2>
    <!-- block to execute if condition2 is the first true condition -->
<#elseif condition3>
    <!-- block to execute if condition3 is the first true condition -->
<#else>
    <!-- block to execute if no condition is true -->
</#if>

While the elseif and else branches are optional, the conditions must resolve to a boolean value.

To help us with our evaluations, we’ll likely use one of:

  • x == y to check is x is equal to y
  • x != y to return true only if x differs from y
  • x lt y means that x must be strictly smaller than y – we can also use < instead of lt
  • x gt y evaluates to true only if x is strictly greater than y – we can use > instead of gt
  • x lte y tests if x is less than or equal to y – the alternative to lte is <=
  • x gte y tests if x is greater than or equal to y – the alternative of gte is >=
  • x?? to check the existence of x
  • sequence?seqContains(x) validates the existence of x inside a sequence

It’s very important to keep in mind that FreeMarker considers >= and > as closing characters for an FTL tag. The solution is to wrap their usage in parentheses or use gte or gt instead.

Putting it together, for the following template:

<#if status??>
    <p>${status.reason}</p>
<#else>
    <p>Missing status!</p>
</#if>

We end up with the resulting HTML code:

 <!-- When status attribute exists -->
<p>404 Not Found</p>

<!-- When status attribute is missing -->
<p>Missing status!</p>

6. Containers of Sub-Variables

In FreeMarker, we have three types of containers for sub-variables:

  • Hashes are a sequence of key-value pairs – the key must be unique inside the hash and we don’t have an ordering
  • Sequences are lists where we have an index associated with each value – a noteworthy fact is that sub-variables can be of different types
  • Collections are a special case of sequences where we can’t access the size or retrieve values by index – we can still iterate them with the list tag though!

6.1. Iterating Items

We can iterate over a container in two basic ways. The first one is where we iterate over each value and have logic happening for each of them:

<#list sequence as item>
    <!-- do something with ${item} -->
</#list>

Or, when we want to iterate a Hash, accessing both the key and the value:

<#list hash as key, value>
    <!-- do something with ${key} and ${value} -->
</#list>

The second form is more powerful because it also allows us to define the logic that should happen at various steps in the iteration:

<#list sequence>
    <!-- one-time logic if the sequence is not empty -->
    <#items as item>
        <!-- logic repeated for every item in sequence -->
    </#items>
    <!-- one-time logic if the sequence is not empty -->
<#else>
    <!-- one-time logic if the sequence is empty -->
</#list>

The item represents the name of the looped variable, but we can rename it to what we want. The else branch is optional.

For a hands-on example, well define a template where we list some statuses:

<#list statuses>
    <ul>
    <#items as status>
        <li>${status}</li>
    </#items>
    </ul>
<#else>
    <p>No statuses available</p>
</#list>

This will return us the following HTML when our container is [“200 OK”, “404 Not Found”, “500 Internal Server Error”]:

<ul>
<li>200 OK</li>
<li>404 Not Found</li>
<li>500 Internal Server Error</li>
</ul>

6.2. Items Handling

A hash allows us two simple functions: keys to retrieve only the keys contained, and values to retrieve only the values.

A sequence is more complex; we can group the most useful functions:

  • chunk and join to get a sub-sequence or combine two sequences
  • reverse, sort, and sortBy for modifying the order of elements
  • first and last will retrieve the first or last element, respectively
  • size represents the number of elements in the sequence
  • seqContains, seqIndexOf, or seqLastIndexOf to look for an element

7. Type Handling

FreeMarker comes with a huge variety of functions (built-ins) available for working with objects. Let’s see some frequently used functions.

7.1. String Handling

  • url and urlPath will URL-escape the string, with the exception that urlPath will not escape slash /
  • jString, jsString, and jsonString will apply the escaping rules for Java, Javascript and JSON, respectively
  • capFirst, uncapFirst, upperCase, lowerCase and capitalize are useful for changing the case of our string, as implied by their names
  • boolean, date, time, datetime and number are functions for converting from a string to other types

Let’s now use a few of those functions:

<p>${'http://myurl.com/?search=Hello World'?urlPath}</p>
<p>${'Using " in text'?jsString}</p>
<p>${'my value?upperCase}</p>
<p>${'2019-01-12'?date('yyyy-MM-dd')}</p>

And the output for the template above will be:

<p>http%3A//myurl.com/%3Fsearch%3DHello%20World</p>
<p>MY VALUE</p>
<p>Using \" in text</p>
<p>12.01.2019</p>

When using the date function, we’ve also passed the pattern to use for parsing the String object. FreeMarker uses the local format unless specified otherwise, for example in the string function available for date objects.

7.2. Number Handling

  • round, floor and ceiling can help with rounding numbers
  • abs will return a number’s absolute value
  • string will convert the number to a string. We can also pass four pre-defined number formats: computer, currency, number, or percent or define our own format, like [ “0.###” ]

Let’s do a chain of a few mathematical operations:

<p>${(7.3?round + 3.4?ceiling + 0.1234)?string('0.##')}</p>
<!-- (7 + 4 + 0.1234) with 2 decimals -->

And as expected, the resulting value is 11.12.

7.3. Date Handling

  • .now represents the current date-time
  • date, time and datetime can return the date and time sections of the date-time object
  • string will convert date-times to strings – we can also pass the desired format or use a pre-defined one

We’re going to now get the current time and format the output to a string containing only the hours and minutes:

<p>${.now?time?string('HH:mm')}</p>

The resulting HTML will be:

<p>15:39</p>

8. Exception Handling

We’ll see two ways to handle exceptions for a FreeMarker template.

The first way is to use attempt-recover tags to define what we should try to execute and a block of code that should execute in case of error.

The syntax is:

<#attempt>
    <!-- block to try -->
<#recover>
    <!-- block to execute in case of exception -->
</#attempt>

Both attempt and recover tags are mandatory. In case of an error, it rolls back the attempted block and will execute only the code in the recover section.

Keeping this syntax in mind, let’s define our template as:

<p>Preparing to evaluate</p>
<#attempt>
    <p>Attribute is ${attributeWithPossibleValue??}</p>
<#recover>
    <p>Attribute is missing</p>
</#attempt>
<p>Done with the evaluation</p>

When attributeWithPossibleValue is missing, we’ll see:

<p>Preparing to evaluate</p>
    <p>Attribute is missing</p>
<p>Done with the evaluation</p>

And the output when attributeWithPossibleValue exists is:

<p>Preparing to evaluate</p>
    <p>Attribute is 200 OK</p>
<p>Done with the evaluation</p>

The second way is to configure FreeMarker what should happen in case of exceptions.

With Spring Boot, we easily configure this via properties file; here are some available configurations:

  • spring.freemarker.setting.template_exception_handler=rethrow re-throws the exception
  • spring.freemarker.setting.template_exception_handler=debug outputs the stack trace information to the client and then re-throws the exception.
  • spring.freemarker.setting.template_exception_handler=html_debug outputs the stack trace information to the client, formatting it so it will be usually well readable in the browser, and then re-throws the exception.
  • spring.freemarker.setting.template_exception_handler=ignore skips the failing instructions, letting the template continue executing.
  • spring.freemarker.setting.template_exception_handler=default

9. Calling Methods

Sometimes we want to call Java methods from our FreeMarker templates. We’ll now see how to do it.

9.1. Static Members

To start accessing static members, we could either update our global FreeMarker configuration or add a StaticModels type attribute on the model, under the attribute name statics:

model.addAttribute("statics", new DefaultObjectWrapperBuilder(new Version("2.3.28"))
    .build().getStaticModels());

Accessing static elements is straight-forward.

First, we import the static elements of our class using the assign tag, then decide on a name and, finally, the Java classpath.

Here’s how we’ll import Math class in our template, show the value of the static PI field, and use the static pow method:

<#assign MathUtils=statics['java.lang.Math']>
<p>PI value: ${MathUtils.PI}</p>
<p>2*10 is: ${MathUtils.pow(2, 10)}</p>

The resulting HTML is:

<p>PI value: 3.142</p>
<p>2*10 is: 1,024</p>

9.2. Bean Members

Bean members are very easy to access: use the dot (.) and that’s it!

For our next example, we will add a Random object to our model:

model.addAttribute("random", new Random());

In our FreeMarker template, let’s generate a random number:

<p>Random value: ${random.nextInt()}</p>

This will cause output similar to:

<p>Random value: 1,329,970,768</p>

9.3. Custom Methods

The first step for adding a custom method is to have a class that implements FreeMarker’s TemplateMethodModelEx interface and defines our logic inside the exec method:

public class LastCharMethod implements TemplateMethodModelEx {
    public Object exec(List arguments) throws TemplateModelException {
        if (arguments.size() != 1 || StringUtils.isEmpty(arguments.get(0)))
            throw new TemplateModelException("Wrong arguments!");
        String argument = arguments.get(0).toString();
        return argument.charAt(argument.length() - 1);
    }
}

We’ll add an instance of our new class as an attribute on the model:

model.addAttribute("lastChar", new LastCharMethod());

The next step is to use our new method inside our template:

<p>Last char example: ${lastChar('mystring')}</p>

Finally, the resulting output is:

<p>Last char example: g</p>

10. Conclusion

In this article, we’ve seen how to use the FreeMarker template engine inside our project. We’ve focused on common operations, how to manipulate different objects, and a few more advanced topics.

The implementation of all these snippets is available over on GitHub.

Append Lines to a File in Linux

$
0
0

1. Introduction

In this tutorial, we’re going to explore several ways to append one or more lines to a file in Linux using Bash commands.

First, we’ll examine the most common commands like echo, printf, and cat. Second, we’ll take a look at the tee command, a lesser-known but useful Bash utility.

2. The echo Command

The echo command is one of the most commonly and widely used built-in commands for Linux Bash. Usually, we can use it to display a string to standard output, which is the terminal by default:

echo "This line will be displayed to the terminal"

Now, we’re going to change the default standard output and divert the input string to a file. This capability is provided by the redirection operator (>). If the file specified below already contains some data, the data will be lost:

echo "This line will be written into the file" > file.txt

In order to append a line to our file.txt and not overwrite its contents, we need to use another redirection operator (>>):

echo "This line will be appended to the file" >> file.txt

Note that the ‘>’ and ‘>>’ operators are not dependent on the echo command and they can redirect the output of any command:

ls -al >> result.txt
find . -type f >> result.txt

Moreover, we can enable the interpretation of backslash escapes using the -e option. So, some special characters like the new line character ‘\n’ will be recognized and we can append multiple lines to a file:

echo -e "line3\n line4\n line5\n" >> file.txt

3. The printf Command

The printf command is similar to the C function with the same name. It prints any arguments to standard output in the format:

printf FORMAT [ARGUMENTS]

Let’s build an example and append a new line to our file using the redirection operator:

printf "line%s!" "6" >> file.txt

Unlike the echo command, we can see that the printf‘s syntax is simpler when we need to append multiple lines. Here, we don’t have to specify special options in order to use the newline character:

printf "line7\nline8!" >> file.txt

4. The cat Command

The cat command concatenates files or standard input to standard output.

It uses a syntax similar to the echo command:

cat [OPTION] [FILE(s)]

The difference is that, instead of a string as a parameter, cat accepts one or more files and copies their contents to the standard output, in the specified order.

Let’s suppose we already have some lines in file1.txt and we want to append them to result.txt

cat file1.txt >> result.txt
cat file1.txt file2.txt file3.txt >> result.txt

Next, we’re going to remove the input file from the command:

cat >> file.txt

In this case, the cat command will read from the terminal and appends the data to the file.txt.

So, let’s then type something into the terminal including new lines and then press CTRL + D to exit:

root@root:~/Desktop/baeldung/append-lines-to-a-file$ cat >> file.txt
line1 using cat command
line2 using cat command
<press CTRL+D to exit>

This will add two lines to the end of file.txt.

5. The tee Command

Another interesting and useful Bash command is the tee command. It reads data from standard input and writes it to the standard output and to files:

tee [OPTION] [FILE(s)]
tee file1.txt
tee file1.txt file2.txt

In order to append the input to the file and not overwrite its contents, we need to apply the -a option:

root@root:~/Desktop/baeldung/append-lines-to-a-file$ tee -a file.txt
line1 using tee command

Once we hit Enter, we’ll actually see our same line repeated back to us:

root@root:~/Desktop/baeldung/append-lines-to-a-file$ tee -a file.txt
line1 using tee command
line1 using tee command

This is because, by default, the terminal acts as both standard input and standard output.

We can continue to input how many lines we want and hit the Enter key after each line. We’ll notice that each line will be duplicated in the terminal and also appended to our file.txt:

root@root:~/Desktop/baeldung/append-lines-to-a-file$ tee -a file.txt
line1 using tee command 
line1 using tee command 
line2 using tee command 
line2 using tee command
<press CTRL+D to exit>

Now, let’s suppose we don’t want to append the input to the terminal, but only to a file. This is also possible with the tee command. Thus, we can remove the file parameter and redirect the standard input to our file.txt by using the redirection operator:

root@root:~/Desktop/baeldung/append-lines-to-a-file$ tee >> file.txt
line3 using tee command
<press CTRL+D to exit>

Finally, let’s take a look at the file.txt‘s contents:

This line will be written into the file
This line will be appended to the file
line3
line4
line5
line6
line7
line8
line1 using cat command
line2 using cat command
line1 using tee command
line2 using tee command
line3 using tee command

6. Conclusion

In this tutorial, we’ve described a few ways that help us append one or more lines to a file in Linux.

First, we’ve studied the echo, printf and cat Bash commands and learned how we can combine them with the redirection operator in order to append some text to a file. We’ve also learned how to append the content of one or more files to another file.

Second, we’ve examined the lesser-known tee command that already has a built-in mechanism of appending some text to a file and doesn’t necessarily need the redirection operator.

Mesos vs. Kubernetes

$
0
0

1. Overview

In this tutorial, we'll understand the basic need for a container orchestration system.

We'll evaluate the desired characteristic of such a system. From that, we'll try to compare two of the most popular container orchestration systems in use today, Apache Mesos and Kubernetes.

2. Container Orchestration

Before we begin comparing Mesos and Kubernetes, let's spend some time in understanding what containers are and why we need container orchestration after all.

2.1. Containers

A container is a standardized unit of software that packages code and all its required dependencies.

Hence, it provides platform independence and operational simplicity. Docker is one of the most popular container platforms in use.

Docker leverages Linux kernel features like CGroups and namespaces to provide isolation of different processes. Therefore, multiple containers can run independently and securely.

It's quite trivial to create docker images, all we need is a Dockerfile:

FROM openjdk:8-jdk-alpine
VOLUME /tmp
COPY target/hello-world-0.0.1-SNAPSHOT.jar app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
EXPOSE 9001

So, these few lines are good enough to create a Docker image of a Spring Boot application using the Docker CLI:

docker build -t hello_world .

2.2. Container Orchestration

So, we've seen how containers can make application deployment reliable and repeatable. But why do we need container orchestration?

Now, while we've got a few containers to manage, we're fine with Docker CLI. We can automate some of the simple chores as well. But what happens when we've to manage hundreds of containers?

For instance, think of architecture with several microservices, all with distinct scalability and availability requirements.

Consequently, things can quickly get out of control, and that's where the benefits of a container orchestration system realize. A container orchestration system treats a cluster of machines with a multi-container application as a single deployment entity. It provides automation from initial deployment, scheduling, updates to other features like monitoring, scaling, and failover.

3. Brief Overview of Mesos

Apache Mesos is an open-source cluster manager developed originally at UC Berkeley. It provides applications with APIs for resource management and scheduling across the cluster. Mesos gives us the flexibility to run both containerized and non-containerized workload in a distributed manner.

3.1. Architecture

Mesos architecture consists of Mesos Master, Mesos Agent, and Application Frameworks:

Let's understand the components of architecture here:

  • Frameworks: These are the actual applications that require distributed execution of tasks or workload. Typical examples are Hadoop or Storm. Frameworks in Mesos comprise of two primary components:
    • Scheduler: This is responsible for registering with the Master Node such that the master can start offering resources
    • Executor: This is the process which gets launched on the agent nodes to run the framework's tasks
  • Mesos Agents: These are responsible for actually running the tasks. Each agent publishes its available resources like CPU and memory to the master. On receiving tasks from the master, they allocate required resources to the framework's executor.
  • Mesos Master: This is responsible for scheduling tasks received from the Frameworks on one of the available agent nodes. Master makes resource offers to Frameworks. Framework's scheduler can choose to run tasks on these available resources.

3.2. Marathon

As we just saw, Mesos is quite flexible and allows frameworks to schedule and execute tasks through well defined APIs. However, it's not convenient to implement these primitives directly, especially when we want to schedule custom applications. For instance, orchestrating applications packaged as containers.

This is where a framework like Marathon can help us. Marathon is a container orchestration framework which runs on Mesos. In this regard, Marathon acts as a framework for the Mesos cluster. Marathon provides several benefits which we typically expect from an orchestration platform like service discovery, load balancing, metrics, and container management APIs.

Marathon treats a long-running service as an application and an application instance as a task. A typical scenario can have multiple applications with dependencies forming what is called Application Groups.

3.3. Example

So, let's see how we can use Marathon to deploy our simple Docker image we created earlier. Note that installing a Mesos cluster can be little involved and hence we can use a more straightforward solution like Mesos Mini. Mesos Mini enables us to spin up a local Mesos cluster in a Docker environment. It includes a Mesos Master, single Mesos Agent, and Marathon.

Once we've Mesos cluster with Marathon up and running, we can deploy our container as a long-running application service. All we need a small JSON application definition:

#hello-marathon.json
{
  "id": "marathon-demo-application",
  "cpus": 1,
  "mem": 128,
  "disk": 0,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "hello_world:latest",
      "portMappings": [
        { "containerPort": 9001, "hostPort": 0 }
      ]
    }
  },
  "networks": [
    {
      "mode": "host"
    }
  ]
}

Let's understand what exactly is happening here:

  • We have provided an id for our application
  • Then, we defined the resource requirements for our application
  • We also defined how many instances we'd like to run
  • Then, we've provided the container details to launch an app from
  • Finally, we've defined the network mode for us to be able to access the application

We can launch this application using the REST APIs provided by Marathon:

curl -X POST \
  http://localhost:8080/v2/apps \
  -d @hello-marathon.json \
  -H "Content-type: application/json"

4. Brief Overview of Kubernetes

Kubernetes is an open-source container orchestration system initially developed by Google. It's now part of Cloud Native Computing Foundation (CNCF). It provides a platform for automating deployment, scaling, and operations of application container across a cluster of hosts.

4.1. Architecture

Kubernetes architecture consists of a Kubernetes Master and Kubernetes Nodes:

Let's go through the major parts of this high-level architecture:

  • Kubernetes Master: The master is responsible for maintaining the desired state of the cluster. It manages all nodes in the cluster. As we can see, the master is a collection of three processes:
    • kube-apiserver: This is the service that manages the entire cluster, including processing REST operations, validating and updating Kubernetes objects, performing authentication and authorization
    • kube-controller-manager: This is the daemon that embeds the core control loop shipped with Kubernetes, making the necessary changes to match the current state to the desired state of the cluster
    • kube-scheduler: This service watches for unscheduled pods and binds them to nodes depending upon requested resources and other constraints
  • Kubernetes Nodes: The nodes in a Kubernetes cluster are the machines that run our containers. Each node contains the necessary services to run the containers:
    • kubelet: This is the primary node agent which ensures that the containers described in PodSpecs provided by kube-apiserver are running and healthy
    • kube-proxy: This is the network proxy running on each node and performs simple TCP, UDP, SCTP stream forwarding or round-robin forwarding across a set of backends
    • container runtime: This is the runtime where container inside the pods are run, there are several possible container runtimes for Kubernetes including the most widely used, Docker runtime

4.2. Kubernetes Objects

In the last section, we saw several Kubernetes objects which are persistent entities in the Kubernetes system. They reflect the state of the cluster at any point in time.

Let's discuss some of the commonly used Kubernetes objects:

  • Pods: Pod is a basic unit of execution in Kubernetes and can consist of one or more containers, the containers inside a Pod are deployed on the same host
  • Deployment: Deployment is the recommended way to deploy pods in Kubernetes, it provides features like continuously reconciling the current state of pods with the desired state
  • Services: Services in Kubernetes provide an abstract way to expose a group of pods, where the grouping is based on selectors targetting pod labels

There are several other Kubernetes objects which serve the purpose of running containers in a distributed manner effectively.

4.3. Example

So, now we can try to launch our Docker container into the Kubernetes cluster. Kubernetes provides Minikube, a tool that runs single-node Kubernetes cluster on a Virtual Machine. We'd also need kubectl, the Kubernetes Command Line Interface to work with the Kubernetes cluster.

After we've kubectl and Minikube installed, we can deploy our container on the single-node Kubernetes cluster within Minikube. We need to define the basic Kubernetes objects in a YAML file:

# hello-kubernetes.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-world
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: hello-world
    spec:
      containers:
      - name: hello-world
        image: hello-world:latest
        ports:
        - containerPort: 9001
---
apiVersion: v1
kind: Service
metadata:
  name: hello-world-service
spec:
  selector:
    app: hello-world
  type: LoadBalancer
  ports:
  - port: 9001
    targetPort: 9001

A detailed analysis of this definition file is not possible here, but let's go through the highlights:

  • We have defined a Deployment with labels in the selector
  • We define the number of replicas we need for this deployment
  • Also, we've provided the container image details as a template for the deployment
  • We've also defined a Service with appropriate selector
  • We've defined the nature of the service as LoadBalancer

Finally, we can deploy the container and create all defined Kubernetes objects through kubectl:

kubectl apply -f yaml/hello-kubernetes.yaml

5. Mesos vs. Kubernetes

Now, we've gone through enough context and also performed basic deployment on both Marathon and Kubernetes. We can attempt to understand where do they stand compared to each other.

Just a caveat though, it's not entirely fair to compare Kubernetes with Mesos directly. Most of the container orchestration features that we seek are provided by one of the Mesos frameworks like Marathon. Hence, to keep things in the right perspective, we'll attempt to compare Kubernetes with Marathon and not directly Mesos.

We'll compare these orchestration systems based on some of the desired properties of such a system.

5.1. Supported Workloads

Mesos is designed to handle diverse types of workloads which can be containerized or even non-containerised. It depends upon the framework we use. As we've seen, it's quite easy to support containerized workloads in Mesos using a framework like Marathon.

Kubernetes, on the other hand, works exclusively with the containerized workload. Most widely, we use it with Docker containers, but it has support for other container runtimes like Rkt. In the future, Kubernetes may support more types of workloads.

5.2. Support for Scalability

Marathon supports scaling through the application definition or the user interface. Autoscaling is also supported in Marathon. We can also scale Application Groups which automatically scales all the dependencies.

As we saw earlier, Pod is the fundamental unit of execution in Kubernetes. Pods can be scaled when managed by Deployment, this is the reason pods are invariably defined as a deployment. The scaling can be manual or automated.

5.3. Handling High Availability

Application instances in Marathon are distributed across Mesos agents providing high availability. Typically a Mesos cluster consists of multiple agents. Additionally, ZooKeeper provides high availability to the Mesos cluster through quorum and leader election.

Similarly, pods in Kubernetes are replicated across multiple nodes providing high availability. Typically a Kubernetes cluster consists of multiple worker nodes. Moreover, the cluster can also have multiple masters. Hence, Kubernetes cluster is capable of providing high availability to containers.

5.4. Service Discovery and Load Balancing

Mesos-DNS can provide service discovery and a basic load balancing for applications. Mesos-DNS generates an SRV record for each Mesos task and translates them to the IP address and port of the machine running the task. For Marathon applications, we can also use Marathon-lb to provide port-based discovery using HAProxy.

Deployment in Kubernetes creates and destroys pods dynamically. Hence, we generally expose pods in Kubernetes through Service, which provides service discovery.  Service in Kubernetes acts as a dispatcher to the pods and hence provide load balancing as well.

5.5 Performing Upgrades and Rollback

Changes to application definitions in Marathon is handled as deployment. Deployment supports start, stop, upgrade, or scale of applications. Marathon also supports rolling starts to deploy newer versions of the applications. However, rolling back is as straight forward and typically requires the deployment of an updated definition.

Deployment in Kubernetes supports upgrade as well as rollback. We can provide the strategy for Deployment to be taken while relacing old pods with new ones. Typical strategies are Recreate or Rolling Update. Deployment's rollout history is maintained by default in Kubernetes, which makes it trivial to roll back to a previous revision.

5.6. Logging and Monitoring

Mesos has a diagnostic utility which scans all the cluster components and makes available data related to health and other metrics. The data can be queried and aggregated through available APIs. Much of this data we can collect using an external tool like Prometheus.

Kubernetes publish detailed information related to different objects as resource metrics or full metrics pipelines. Typical practice is to deploy an external tool like ELK or Prometheus+Grafana on the Kubernetes cluster. Such tools can ingest cluster metrics and present them in a much user-friendly way.

5.7. Storage

Mesos has persistent local volumes for stateful applications. We can only create persistent volumes from the reserved resources. It can also support external storage with some limitations. Mesos has experimental support for Container Storage Interface (CSI), a common set of APIs between storage vendors and container orchestration platform.

Kubernetes offers multiple types of persistent volume for stateful containers. This includes storage like iSCSI, NFS. Moreover, it supports external storage like AWS, GCP as well. The Volume object in Kubernetes supports this concept and comes in a variety of types, including CSI.

5.8. Networking

Container runtime in Mesos offers two types of networking support, IP-per-container, and network-port-mapping. Mesos defines a common interface to specify and retrieve networking information for a container. Marathon applications can define a network in host mode or bridge mode.

Networking in Kubernetes assigns a unique IP to each pod. This negates the need to map container ports to the host port. It further defines how these pods can talk to each other across nodes. This is implemented in Kubernetes by Network Plugins like Cilium, Contiv.

6. When to use What?

Finally, in comparison, we usually expect a clear verdict! However, it's not entirely fair to declare one technology better than another, regardless. As we've seen, both Kubernetes and Mesos are powerful systems and offers quite competing features.

Performance, however, is quite a crucial aspect. A Kubernetes cluster can scale to 5000-nodes while Marathon on Mesos cluster is known to support up to 10,000 agents. In most practical cases, we'll not be dealing with such large clusters.

Finally, it boils down to the flexibility and types of workloads that we've. If we're starting afresh and we only plan to use containerized workloads, Kubernetes can offer a quicker solution. However, if we've existing workloads, which are a mix of containers and non-containers, Mesos with Marathon can be a better choice.

7. Other Alternatives

Kubernetes and Apache Mesos are quite powerful, but they are not the only systems in this space. There are quite several promising alternatives available to us. While we'll not go into their details, let's quickly list a few of them:

  • Docker Swarm: Docker Swarm is an open-source clustering and scheduling tool for Docker containers. It comes with a command-line utility to manage a cluster of Docker hosts. It's restricted to Docker containers, unlike Kubernetes and Mesos.
  • Nomad: Nomad is a flexible workload orchestrator from HashiCorp to manage any containerized or non-containerised application. Nomad enables declarative infrastructure-as-code for deploying applications like Docker container.
  • OpenShift: OpenShift is a container platform from Red Hat, orchestrated and managed by Kubernetes underneath. OpenShift offers many features on top of what Kubernetes provide like integrated image registry, a source-to-image build,  a native networking solution, to name a few.

8. Conclusion

To sum up, in this tutorial, we discussed containers and container orchestration systems. We briefly went through two of the most widely used container orchestration systems, Kubernetes and Apache Mesos. We also compared these system based on several features. Finally, we saw some of the other alternatives in this space.

Before closing, we must understand that the purpose of such a comparison is to provide data and facts. This is in no way to declare one better than others, and that normally depends on the use-case. So, we must apply the context of our problem in determining the best solution for us.

Using a Mutex Object in Java

$
0
0

1. Overview

In this tutorial, we'll see different ways to implement a mutex in Java.

2. Mutex

In a multithreaded application, two or more threads may need to access a shared resource at the same time, resulting in unexpected behavior. Examples of such shared resources are data-structures, input-output devices, files, and network connections.

We call this scenario a race condition. And, the part of the program which accesses the shared resource is known as the critical section. So, to avoid a race condition, we need to synchronize access to the critical section.

A mutex (or mutual exclusion) is the simplest type of synchronizer – it ensures that only one thread can execute the critical section of a computer program at a time.

To access a critical section, a thread acquires the mutex, then accesses the critical section, and finally releases the mutex. In the meantime, all other threads block till the mutex releases. As soon as a thread exits the critical section, another thread can enter the critical section.

3. Why Mutex?

First, let's take an example of a SequenceGeneraror class, which generates the next sequence by incrementing the currentValue by one each time:

public class SequenceGenerator {
    
    private int currentValue = 0;

    public int getNextSequence() {
        currentValue = currentValue + 1;
        return currentValue;
    }

}

Now, let's create a test case to see how this method behaves when multiple threads try to access it concurrently:

@Test
public void givenUnsafeSequenceGenerator_whenRaceCondition_thenUnexpectedBehavior() throws Exception {
    int count = 1000;
    Set<Integer> uniqueSequences = getUniqueSequences(new SequenceGenerator(), count);
    Assert.assertEquals(count, uniqueSequences.size());
}

private Set<Integer> getUniqueSequences(SequenceGenerator generator, int count) throws Exception {
    ExecutorService executor = Executors.newFixedThreadPool(3);
    Set<Integer> uniqueSequences = new LinkedHashSet<>();
    List<Future<Integer>> futures = new ArrayList<>();

    for (int i = 0; i < count; i++) {
        futures.add(executor.submit(generator::getNextSequence));
    }

    for (Future<Integer> future : futures) {
        uniqueSequences.add(future.get());
    }

    executor.awaitTermination(1, TimeUnit.SECONDS);
    executor.shutdown();

    return uniqueSequences;
}

Once we execute this test case, we can see that it fails most of the time with the reason similar to:

java.lang.AssertionError: expected:<1000> but was:<989>
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:834)
  at org.junit.Assert.assertEquals(Assert.java:645)

The uniqueSequences is supposed to have the size equal to the number of times we've executed the getNextSequence method in our test case. However, this is not the case because of the race condition. Obviously, we don't want this behavior.

So, to avoid such race conditions, we need to make sure that only one thread can execute the getNextSequence method at a time. In such scenarios, we can use a mutex to synchronize the threads.

There are various ways, we can implement a mutex in Java. So, next, we'll see the different ways to implement a mutex for our SequenceGenerator class.

4. Using synchronized keyword

First, we'll discuss the synchronized keyword, which is the simplest way to implement a mutex in Java.

Every object in Java has an intrinsic lock associated with it. The synchronized method and the synchronized block use this intrinsic lock to restrict the access of the critical section to only one thread at a time.

Therefore, when a thread invokes a synchronized method or enters a synchronized block, it automatically acquires the lock. The lock releases when the method or block completes or an exception is thrown from them.

Let's change getNextSequence to have a mutex, simply by adding the synchronized keyword:

public class SequenceGeneratorUsingSynchronizedMethod extends SequenceGenerator {
    
    @Override
    public synchronized int getNextSequence() {
        return super.getNextSequence();
    }

}

The synchronized block is similar to the synchronized method, with more control over the critical section and the object we can use for locking.

So, let's now see how we can use the synchronized block to synchronize on a custom mutex object:

public class SequenceGeneratorUsingSynchronizedBlock extends SequenceGenerator {
    
    private Object mutex = new Object();

    @Override
    public int getNextSequence() {
        synchronized (mutex) {
            return super.getNextSequence();
        }
    }

}

5. Using ReentrantLock

The ReentrantLock class was introduced in Java 1.5. It provides more flexibility and control than the synchronized keyword approach.

Let's see how we can use the ReentrantLock to achieve mutual exclusion:

public class SequenceGeneratorUsingReentrantLock extends SequenceGenerator {
    
    private ReentrantLock mutex = new ReentrantLock();

    @Override
    public int getNextSequence() {
        try {
            mutex.lock();
            return super.getNextSequence();
        } finally {
            mutex.unlock();
        }
    }
}

6. Using Semaphore

Like ReentrantLock, the Semaphore class was also introduced in Java 1.5.

While in case of a mutex only one thread can access a critical section, Semaphore allows a fixed number of threads to access a critical section. Therefore, we can also implement a mutex by setting the number of allowed threads in a Semaphore to one.

Let's now create another thread-safe version of SequenceGenerator using Semaphore:

public class SequenceGeneratorUsingSemaphore extends SequenceGenerator {
    
    private Semaphore mutex = new Semaphore(1);

    @Override
    public int getNextSequence() {
        try {
            mutex.acquire();
            return super.getNextSequence();
        } catch (InterruptedException e) {
            // exception handling code
        } finally {
            mutex.release();
        }
    }
}

7. Using Guava's Monitor Class

So far, we've seen the options to implement mutex using features provided by Java.

However, the Monitor class of Google's Guava library is a better alternative to the ReentrantLock class. As per its documentation, code using Monitor is more readable and less error-prone than the code using ReentrantLock.

First, we'll add the Maven dependency for Guava:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.0-jre</version>
</dependency>

Now, we'll write another subclass of SequenceGenerator using the Monitor class:

public class SequenceGeneratorUsingMonitor extends SequenceGenerator {
    
    private Monitor mutex = new Monitor();

    @Override
    public int getNextSequence() {
        mutex.enter();
        try {
            return super.getNextSequence();
        } finally {
            mutex.leave();
        }
    }

}

8. Conclusion

In this tutorial, we've looked into the concept of a mutex. Also, we've seen the different ways to implement it in Java.

As always, the complete source code of the code examples used in this tutorial is available over on GitHub.

Generating SSH Keys in Linux

$
0
0

1. Overview

In this tutorial, we'll cover the basics of SSH keys and how to generate an SSH key pair in Linux.

2. Secure Shell (SSH)

Secure Shell (SSH) is a secure remote login protocol that leverages public-key cryptography to encrypt the communication between a client and a server.

2.1. Public-Key Encryption

In general, public-key encryption (also called asymmetric encryption) requires a key pair — a public key and a private key — that act as complements of one another. We use a public key to decrypt a message encrypted with a corresponding private key and vice versa.

Note that we cannot decrypt a message with the same key that encrypted it. Therefore, we cannot decrypt a message using a private key if the corresponding private key encrypted it.

Likewise, we cannot decrypt a message using a public key if the same public key encrypted it. Thus, we can share the public key freely, so long as we don't share the private key since both are required to encrypt and decrypt messages.

2.2. Authenticating a Sender

SSH uses public-key encryption to ensure that a client is who it claims to be. First, the client must register with the server by sending the client's public key to the server. The server then records this public key in a list of authenticated clients and assigns it an ID. Both the server and client know this ID.

Once registration is complete, a client can later authenticate with the server using the following steps:

  1. The client and server agree on a secret key using the Diffie-Hellman key exchange algorithm.
  2. The client sends the server its ID.
  3. The server checks to ensure that the received ID is in its list of authenticated clients.
  4. The server creates a random number, encrypts it using the client's public key corresponding to the received ID, and sends it to the client.
  5. The client decrypts the random number with its private key.
  6. The client combines the random number with the secret key, hashes it, and sends it to the server.
  7. The server computes the hash of the combined random number and secret.
  8. The server compares the computed hash with the one received from the client.

If the computed and received hashes match, authentication is successful.

Essentially, SSH tests the client by encrypting some data with the recorded public key, sending it to the client, and requiring that the client decrypt and send back the same data. If the client can successfully decrypt and send back the same data, it must have the private key associated with the recorded public key. Therefore, the client is who it claims to be.

3. Generating SSH Keys

Unsurprisingly, many of the most popular websites, including GitHub and GitLab, use SSH authentication.

To use this mechanism on these websites, we must create an SSH key pair. To generate a key-pair for the current user, execute:

ssh-keygen

We will be prompted to enter a location to save the key pair, a passphrase, and a passphrase confirmation. Select the defaults for all three by hitting the Enter key at each prompt.

By default, ssh-keygen will save the key pair to ~/.ssh. In that directory, we see two files — id_rsa and id_rsa.pub — corresponding to the private and public keys, respectively.

Note that the private key (id_rsa) should never be shared.

Additionally, we can view the contents of the public key by executing cat ~/.ssh/id_rsa.pub. The output will resemble:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD1Y6mRUepVaEZ+6ghg+ju/iirHQqQbuE3Wy6aEb+b
nKlzqFgAyGFuQSw+DuDqyZkWFd9O4Al4TOr7bQsS6Xji2GUt0ikr9/gv2pVwUd9LBiEAks+HEfb4tMO
77FMGQ4BytU9ssYCjCRT4F7rx0li0qUqhgao7syaxu4PTI+p+Auz1Y1wwVf7T4Pwd9YFcThTa6+Lr5r
vbft8Ws2KwHDEzNH2lf9UoiN1Lcd5szHpT1iXz9jvb3Fmd2I8d8seThde5WHI7N6t+ojyqntIc9bMW9
TL2uw/kFtZfIsC4/OKDVscWRBxY7xkb/4N5UJJ9OL2cpu1fZBHL9T6TG4lOyIEMj
my-username@my-computer-name

3.1. Changing Default Location

To save the key pair to a specific location, we execute ssh-keygen and enter the location when prompted:

Enter file in which to save the key (/home/my-username/.ssh/id_rsa):

While the prompt only includes the name of the private key, ssh-keygen will generate a public key file with the .pub extension in the same directory.

We can also generate a key pair in the current directory with a specific file name using the -f flag:

ssh-keygen -f example

The above command will generate a key pair of example and example.pub in the current directory.

3.2. Adding a Passphrase

Also, we can require a passphrase to unlock our generated key pair. To add a passphrase, we execute ssh-keygen and enter the passphrase when prompted:

Enter passphrase (empty for no passphrase):

ssh-keygen will prompt us to confirm the passphrase. If the passphrases don't match, ssh-keygen will prompt us to enter a passphrase and reconfirm again.

3.3. Selecting an Algorithm and Length

Unless otherwise specified, ssh-keygen uses the Rivest–Shamir–Adleman (RSA) algorithm when generating the key pair. We can specify another algorithm using the -t flag. For example:

ssh-keygen -t dsa

By default, this will generate a key pair of id_dsa and id_dsa.pub. In general, the default file names of the generated pair will have the format id_<algorithm> and id_<algorithm>.pub.

We can view the list of supported algorithms by supplying the –help flag:

ssh-keygen --help

This will result in an output resembling the following:

usage: ssh-keygen ... [-t dsa | ecdsa | ed25519 | rsa]
    ...

If required, the length of the generated key can also be specified in bytes using the -b flag:

ssh-keygen -b 4096

4. Conclusion

In this quick tutorial, we learned the basics of SSH and how it can be used to authenticate a user. Using this understanding, we can use the ssh-keygen command to generate SSH key pairs using various algorithms and of varying lengths.

We can then use these key pairs to authenticate automatically with applications that support SSH.

Viewing all 4729 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>