This is the second part of this former post : Some points about what we think about CMS.
After few projects and few architecture designs, we're up to deliver the second part of our mastering of content management with eZPublish.
Different ideas and concepts
Here are the following concepts you should know about before going on.
Content, messages and settings
eZPublih as all kind of software is made to store a lot of information regarding its source. We can distinguish several types of information :
- Contents : information used in the website pages, owner of the real value of the website. It can be text based pieces of information or pictures, or videos...
- Messages : information used in the website to deliver messages. It could be a label or an error message, something nearly static but which can change.
- Settings : technical information globally used in the website to structure the content.
Those types structure each website and each type must keep its original aim.
Use context are very important in a solution. The aim of a context is to ensure that users can do only what is coherent to do. It's a main principle that must be declared during the design phasis of any product and it allows the mapping between events and actions. More information about this on the spectacular book The Psychology of Everyday Things by Donald A. Norman.
In eZPublish, such contexts are defined inside the back office : browse, edit, search... For example, if you are editing a content, you cannot edit another content or browse something. However, it's possible to switch from one context to another with a controlled input (Online Editor dialog box, for example).
The closest thing in eZPublish could be the navigation part used in menu.ini.
As explained in the former post, the aggregate is the concept that allows you to use different contents so it can be mashed up in one page. The edit of this kind of content has to respect the aggregate context : if you are aggregating, you can not edit a content. It also means that you have to prepare all the content you want to aggregate before aggregate it.
Composition and layout
The composition action relies on several axis :
- Definition of placeholders in the page : you shape the layout of your page, preparing the page to receive different contents, adding constraints to it : columns, height, width...
- Definition of widgets : you shape a bunch of independent blocks implementing different features : main story, google map, search box, gallery and so on... You may set each widget depending on its type.
Currently, this is mostly implemented in ezflow's eZPublish extension.
Dealing with complexity : content structure is not monolithic
Now that we're equipped with some concepts, we can approach complex needs of our customers.
The rule of thumb that should be learned is the following : each use context deserves a separate content structure. Currently eZPublish is coming with already set content structures (Content Structure, Media, Users and Settings) that could be extended so you can provide your own. As a matter of fact, we should always try to keep the solution genuine and only add things to it. We should also try to not replicate old ways of thinking like those used in DBMS but rely on what the software can do.
Case 1 : Dealing with messages
The case : your business team would like to manage some of the message shown on the website, for example, the small explanation on the confirm page of the shop or the next steps on the zero result search page. Their aim is to be self-sufficient on this part.
The solution : you can just do a smart class to store your messages and then organize it in one of the provided content structure. The point is to define a sub content structure dedicated to a use context, it's still not following the rule of thumb entirely but it's ok.
Solution 1.1 : Adding a hidden folder
You can create a folder and set its visibility to hidden. Then create a class called Message containing 2 fields : id (Text Line) and description (Text Block).
In your different templates or modules, just fetch the object of the class Message with the corresponding id.
Solution 1.2 : Adding a folder in the Media content structure
You create the folder like in the solution 1.1 but in the Media content structure. The benefit is that the Media content structure is visible but not simply reachable. Ok, you may access it by typing it's url directly.
Other solutions that could work but that are too complex considering the need
We could also store all the messages using other storage facilities like the translation file or a specific database table. We can reach the same goal but with a lot of development or worse, strong constraints for the user. I strongly think than using a content is still more efficient than modifying an XML and cleaning caches...
Case 2 : Dealing with settings
The case : your business team would like to make links between pages and mutualized contents. The aim is to manage reference data that don't need a page (reference to the one node = one page rule) but that is not changing very often. To be closer to reality, let's take a real example. We have a company with a lot of departments. A department is just a part of a corporate body, so it doesn't really need a page for itself but it stores some typical data (address, logo, boss, etc.). In other pages, you will have to specify a department among the full list of departments.
Solution 2.1 : INI files+ Specific Datatype
In a typical application server based solution, you would use some reference tables declared as application variables, but with PHP, you don't have application variables. Instead we will use INI files that will be called from a specific datatype to generate a specific list.
The main limits are that you are limited to the INI file way of storing data. It means that you could not store complex objects but just key / value elements. The value must also be text based (no pictures or other media).
Solution 2.2 : Selection
The Selection datatype is very powerful and should be used very often to solve this kind of need. In our case, you can create a class department with all the required attributes. Then you create a folder called Administration (not in the sense of Set up, in a business way), and eventually sub folders if needed. In a folder, you create all the departments you need. In your other content class, you set the Selection attribute to use : the Administration sub folder previously created. That's done.
In order to have a clear back office interface, you way want to distinguish your use contexts. This can be done by adding specific RootNode and block settings to the menu.ini file.
Case 3 : Dealing with a reference
The case : your business team would like to organize a reference, like a catalog of product, and in the meantime have the flexibility to organize pages of the website. Their aim is to be flexible and shorten the time to market for the product marketing materials. We're talking of a lot of products, let's say 20k so the list approach described above is not useful).
The solution : Follow the rule of thumb by defining two use contexts : the first is obviously contribution for all the content you've already set. The second use context, less obvious, is the aggregation. It means that people would like to change some things that are not pages, because in eZPublish, each node is a page.
Solution 3.1 : Original eZFlow content structure
If you have just installed a fresh install of eZPublish, you will see the original content structure provided by eZSystems. Their main approach is to demonstrate the power of their solution so they have mixed contexts. eZFlow, an extension of eZPublish, allows you to aggregate content through the selection of already filled contents.
The content structure is designed so the FrontPage used by eZFlow contains classic content (Folders, articles, Galleries...).
This structure is efficient enough if you don't have so much content but has some limits. Indeed, the parent-child relation implies a 1-to-n relation between the main parent FrontPage and its children (understood that you will aggregate only content that are under the FrontPage).
Solution 3.2 : Separate use contexts and content structure
To go further, you may organize strictly your content structure to have two content structures :
- Global Fund : structure containing all the content, independent from the storyboard of your website.
- Sites : structure containing all the FrontPage based content, implementing the storyboard of your website.
By this way, you will have a n-to-m relation between your aggregation content and your classical content. You also separate totally your use contexts by telling user that if they want to put content in the system, they have to use the first content structure and if they want to compose pages, they use the second one. Then, it's easy to set up rights, sections, workflows and so on.
Think before !
We have seen different cases with different solutions all implying content and content, respecting or not the rule of use context separation. My recommendation is to carelly design you content structure before implementing it in eZPublish (or other systems by the way).
Long time since the last blog post. Time to have an idea, to make it mature and then prototype it.
So, the next extension after WSE Status we're working on concerns the developer ability to easily modify templates and make overrides in eZ Publish's back office.
In last version of eZPublish, developers have been prejudiced due to focus on high value added features. This is mainly linked to the emergence of the eZ Market place that provided packaged extension to companies.
By the way, our extension provides a new menu in the back office allowing you to do the followings :
- Understanding easily what design and what site access that are set in your eZ Publish installation
- Create an extension and a design so you can work independently from other extensions
- Add a new template to this design
- Find a template in the existing designs and override it
- Fine edit online code with Ace and live preview !
This extension is still in progress and may be out at the end of January.
Working on one of my projects with eZ Systems French consultant Jérôme Cohonner, we got an excellent conversation on how users were handled in eZPublish and how sometimes this could lead to some troubles. This post will give you some clues on how important users management can be, what are the limits and some common solutions to get the best way of doing things. I will not talk about SSO or procurement systems as I have already dealt with or as it's out of the scope.
Users management, some concepts
Let's get back to the roots and have a look on principle concepts of AAA :
- Authentication : the fact that someone can prove that he is who he is by any way of proof : password, certificate, tokens, fingerprint...
- Authorization : the fact that someone has been given some rights, credentials, habilities to do something.
- Accounting : the fact that someone activities could be observed, monitored, audited to get data to be exploited after.
Generally, in all IT systems, these concepts are implemented in different ways, together or alone, merged with other systems or not. When an IT solution becomes complex, you will need to provide a strong user management strategy to be sure that all will work together. The strategy is defined by combining different approaches that could be listed like this :
- Authentication : handle authentication ways, simple to complex, available everywhere in the system
- Data : handle user data and make it available everywhere in the system
- Organization : handle an organization of people, available everywhere in the system
- Access : handle rights, what people are allowed to do
Moreover, all those approaches are submitted to the centralization dilemma : do we need to centralize all those things in one service or not ? If one of these approach is not centralized, do all the software of our solution are able to do the job ?
On each IT projects, choices are done, sometimes depending on the software capabilities, sometimes not. The most important is to know where you want to get the maximum flexibility.
eZ Publish and the limits
In eZ Publish, users are stored and considered as content objects which is a choice in itself. It means that the accent has been set on data management before everything. The cool thing with that is that you can handle your users as pages (as they are nodes) and that you can add and remove attributes as you want. Best, data can be versionned. The only thing you have to do is to ensure that the user content class gets the User account datatype.
You can also plug the LDAP Login Handler to access a remote directory. The mechanic is quite good. At the authentication, the user provides its login and password. eZ Publish will try to log this user in the LDAP and if it succeeds, eZ Publish will create an eZ user or update it if it does not exist. Then the user is authenticated and receive eZ Publish credentials.
It's also possible to use the multi locations mechanism to get some flexibility on the role assignment. For example, as you can set a user in several groups, you can give each group a different role so multi located users will inherit all the roles from all their parents. You can look at one of my former post about content design, it explains how to organize your content in eZPublish.
The limits of this model are :
- Data and user account are in the same place and that the data container is not efficient when there is a lot of users.
- If user data has to be shared, size and count of data really imports as it has to be managed locally or remotely.
- The remote model implies a direction on the way data are managed. Data needs a reference that should be unique at one time and on which all other software must refer. It also implies that you will have to have a simple model in your directory as rights must be managed locally.
Some examples :
- Try to have a user class with 50 class attributes, which is possible if you are storing every information of your users at the same place. Create then 100 000 users, that is quite normal for a big website. Requests that are made against the generic model of eZ Publish are just to slow for standard fetches. Having a lot off attributes in a user is quite common and is resulting from very strong business needs or technical needs. For example, you may need an attribute to avoid to use a directory. This has been explained in one of my former post about content design.
- Having a very big LDAP with a lot of users with a lot of attributes can be long to synchronize with regular scripts.
- The LDAP Login Handler is very powerful but a bit tricky to master. If you got a complex LDAP, your LDAP configuration will be crazy. Moreover, there's no script to impact an LDAP by users updates in eZ Publish.
eZ Systems is refactoring, time after time eZ Publish's model so everything is split and highly efficient little by little.
Solutions to common issues
Case 1 - It's too late : there's too much data in the user !
It can give you some troubles on performance when you reached high number of users. The main issue is that user data are in eZ Publish and not away. The first point is to know if you need some customization or finally if you just need the user to be logged in to just access some private area.
Solution 1.1 - Store it elsewhere
Make a datatype or extend the eZ User type to only let eZ Publish manage what it needs to authenticate the user, I mean the eZ User Account. Ok, it's cool to have users as pages but in real life, customers don't really want a picture directory of the whole members of the web site. It's not done today in eZ but the global approach of un-content-ization has began. More recently, the eZ Comments extension provides comments for everything in eZPublish but is not set with the classic content mechanism.
The approach of the datatype may not be the right one. You may except some troubles, depending on your storage mechanism. I was thinking about several containers, like LDAP (of course), a custom SQL table, even a file (XML or whatever).
Solution 1.2 - You don't (really) need that
Sometimes troubles are coming from a bad interpretation of the customers need. Sometimes, people don't want data to be hold by the system, it's just an helper for them and avoid to get information elsewhere. Sometimes, people, I mean the main population of the website, doesn't know that there are data about them loaded in the system. The point is that finally you don't need the data, you can do without it.
An example : your customer is telling you that their eZ Publish instance is holding 400K users with a lot of data. This data is not shown on the front side because it's an institutional web site with a poor logged in section. The data is shown in the back office to the webmaster in the user section (so for 1 guy).
One good approach is the following : ask why customer needs all these data and try to figure out if the data is stored elsewhere and if it can be accessed in a asynchronous way, by requesting a LDAP or other external data source.
The solution that will work is to set up a meta user by big business role you are using. For example, define an HR user, IT user, a Board user and so on. When people are logging in, check access against the source and then log the user with eZ Meta User predefined. You will get a severe reduction of your members count : 400K to 4 !
Case 2 - It's not too late : how to share users data ?
The second point is the way you can share users between different application of your IT system. We can think of it on two aspects : the fact to share data (first name, last name and so on) and the fact to authenticate people. It's different and this could be implemented in different ways.
Solution 2.1 - Define a reference
The most important thing in your IT system is to define an architecture block that will handle a centralized reference of user data, both for the data and the authentication. From a strict architecture point of view, directories can provide both features and that's not good. However, as the password is generally the cheapest and easiest way to authenticate people, architects do not recommend two services and prefer to have only layer for this.
So, the most important thing is to use an external service to hold the data and the password mechanism.
Solution 2.2 - Purely share user data
Sometimes, it's a bit difficult to find the allocation of your data between the remote (I mean the reference users data) and the local (data from your application). At this point, you may have three choices with drawbacks and advantages :
- Full local : so why put a remote reference ? :-)
- Full remote : all your data are hold in a remote system and you need to request it each time you want to have an information.
- Half remote - half local : data are stored at remote's and some are synchronized (or not) with local.
As your system needs some consistency, you have the choice to centralized everything at the remote reference but this is implying a bottleneck. Moreover, you will have to get a fine strategy to synchronize remote and local, replicate data from one to the other.
Questions to ask yourself :
- Do all data have to be in the reference ? Does the business piece of data that I manage in my application can be shared with others ?
- What is important, performance or consistency ? Do I have to store all data inside the directory ? How do I synchronize all this ? What if I have update at local's ?
For eZ Publish, it's quite simple as the User mechanism is not so efficient with a lot of users. So the best way, if possible, is to
- simplify the user attributes to the minimal user account datatype attribute,
- store everything which can be shared out in the directory,
- store all others attributes in another place (for example a custom datatype that writes a list of fields in a table).
Users management is not so easy, there's a lot of thing to think and some merged concepts that make us difficult to take decisions about how to manage the users in an IT system. This is a common issue that is shared by all companies aver the world and leads to interesting solutions like oAuth or OpenID.
Regarding to eZ publish, the difficulty is coming from the technical lock inferred by the eZ User Account type that forces you to have a data user instance in eZ Publish. My recommendation is to quickly kick out the user account (login, email and password) from the user class. This will lead to the division between data and authentication and then it will be possible to authenticate someone without any data inside eZ publish. Then if we actually need a node, we can make a synchronization or an after account creation trigger to generate the node. This mechanism has to be disengageable.
The WSE Status extension is nearly complete and we need some help to beta test the product !
The full description of the features are in the related description page.
If you want to participate to this beta, fill the contact form indicating your email, I will create you an account.
There's only 40 seats so, don't hesitate !
From now, the ezdebug extension for firefox 3.0 and higher isn't supported anymore.
A quick tip for those using Google Chrome.
- Right click on the address bar.
- Choose Modify the search engines...
- Set things like this :
How does it work ? In your address bar you type the following : ez site.ini and you will get the results search page from doc.ez.no.
Here are some points you should definitively look at before announcing to everybody that your website is going live. This article is not exhaustive and if you think it lacks some important thing, just go and share by a comment.
How can I put some mess in your website ?
This is the standard question of the potential hacker. "How can I hijack your website to make it do something else ?"
Some tracks about how we can do that in a simple way :
- Use login forms to be logged as someone else
- Use standard HTTP methods (get and post) to access data and simulate normal operations
- Use content to generate bad things : bugs or bad content
- Use scripts / robots to generate bad content or decrease your performance
- Let the common configuration of the used solution
Those different tracks are more or less well managed by software. In the next part, I will take an example, the eZPublish CMS and show how they've protected each point.
An example : eZPublish
The editor user
A common mistake is to forget to remove default users (or to change their passwords). For example, try to login with editor/editor on a website running eZPublish. It's also very common to set password that are name of the project or common values ("test", "ezpublish" and so on). The best option is to remove this users and generates real passwords with a dedicated tool.
The user view
In eZPublish, the URL allow you to access point and features of the website. For example, if features are not explicitly unplugged, we can easily reach the following URLs : /user/login, /user/register and /user/forgotpassword. It means that by typing in the URL the correct /module/action URI I can have access to features you don't want to be shown on your website. The best choice is to use rights and policies to unplug them and to check the PolicyOmitList setting.
The Ajax based feature
It's not because you have an Ajax feature that delighted people cannot see how things works. Most modern browsers have an XHR Sniffer (see Firebug for Firefox and Google Chrome Inspector for Google Chrome) that allows you to see what URL is called and what are the response. If you use Ajax to get only compiled HTML, it will be ok, but if you use JSON, it can be tricky : what happens if I set different settings in the call and I analyze what it returns ?
The utility of a captcha
The captcha will avoid to let non human contributors to contribute. For example, if you write an article and that someone create a robot to automatically post generated comments on it, you may have to problems : the first is that the amount of not real comments will depreciate your article and the second is that it can show bad things you don't want to have on your website (bad words).
The solution of the captcha introduces a random field that is used to check that the person on the other side is a human. Some very clever robots can analyze pictures inside the generated picture with characters recognition.
I personally recommend the excellent reCaptcha service that is free and efficient.
The mail bomb
Another similar point is to use the system, eZPublish, to send mail to other people. A tip a friend feature without restriction of rights can be very dangerous. For example, a robot can send tip a friend messages with bad words to a list of people.
The security point is to insure that mail are sent via the system only when users are logged (with the form that uses a captcha) or to limit the number of sendings.
If you do extra development in eZPublish, juste take care to follow the novel philosophy of this software. Store everything the user has typed and process it for restitution. If you don't process stored data, you may have the following issues :
- The user has set HTML in the body field of his comment. Rendering that directly, you can have extra design appearing because the user put a h1 markup in the input. Mostly, it's unintended because people does a lot of copy/paste things.
To prevent this, you can use the wash operator in eZPublish. It kills every extra markup to have pure data.
This methods can also be used to try to access restricted areas of the website using SQL injection for example. This is prevented in eZPublish with the eZDB class.
The best way to secure a website is always to understand how people will try to attack it. This post has listed some known tracks, feel free to add your owns.
This is the first topic on the subject but as it deals with the WSE Status extension for eZPublish, it can be interesting to have some feedback on this feature.
So, this post is about the Search component from the eZC / AZC stack. In the following parts, we will explain what was the need, how to install and set up SolR and how the component can be used with our configuration.
As our project mainly relies on eZPublish and eZFind, we won't detail how they work but only how they have been modified to make our extension work.
Our goal was to provide a powerful search engine indexation to index extra data we put in specific tables in eZPublish. For example, let say we have the following table :
|| Primary key
| field one
| filed two
And it is perfect because eZFind is the best search engine for eZPublish. However, after a few moments diving in the code, it appears that it was not usable directly as the its implementation was designed to only index eZ Content Object and it would not do the trick.
The best option was finally to use the instance of SolR packed in eZFind and configure it to add our own data.
The solution : a custom SolR schema
This was a bit tricky due to the documentation of SolR. There's a Wiki reachable here but important are not always documented and you have to get in the XML file to look how it works.
Here are the reference that can be helpful before going further :
eZFind comes with two configuration sets :
- the normal one with only one core / index : your website
- the shared one with mulitple cores / index : your website in fre-FR, eng-GB, esp-SP and so on
The normal set is located in extension/ezfind/java/solr meanwhile the multicore one is located in extension/ezfind/java/solr.multicore.
A set is made of two directories :
- conf : handle the configuration
- data : handle the binary data
You may also have other directories that will be helpful to use specfic filters or external features but SolR needs those two directories at least.
By default eZFind just use the solo configuration set, so we must enable the multi core one.
In extension/ezfind/settings/ezfind.ini :
This settings allow you to map
In extension/ezfind/settings/solr.ini :
SolR has another cool feature called the sharding that allows you to make one query on several cores / index. It's useful in case you have several index that are hetrogeneous : it means you can ask for one term in one dictionnary and have a result for all dictionaries. In eZFind, it's used to have translated result : you searchbanana and you will get results for banana in english and banane in french.
In SolR, there are three XML files to set up to have a full configuration :
This file is simple, it declares cores for the SolR system (sorry about this one :) ). A core is an index. In comparison, we can say that a core is a reference, like a dictionnay. You can have several dictionnaries : English, French, Spanish, Portuguese and so on. But you can also have several application domains, like dictionnary about medecine, about computer science or whatever.
Our only modification to this file was the add of a specific core :
<core name="example" instanceDir="example" />
The attributes are defined like this :
- name : name of your index / core, will be available at http://localhost:8983/<name>
- instanceDir : directory that contains all conf for this index / core (conf and data)
Then, copy the directory extension/ezfind/java/solr to extension/ezfin/java/solr.multicore/example.
This file has been left by default for us but you may care about the language specifications made (search English for example).
This is the main file which will help you to map your data with fields inside Solr. But before some explanation about the SolR concept.
SolR can have different index, we've seen that just above with the core, it's useful because you can separate the index and query different index with one query (sharding).
For one index, SolR can handle several types of data :
- Structured data : identified fields that will be required for each piece of data you want to index. For example, if you want to index homogeneous documents that is in eZPublish, you will need to provide data like the node_id, the section and so on.
- Not structured data : not identified fields that will be indexed in the index and that are not required. For example, if you want to index heterogeneous documents, you can index data about a video and data about a picture even if picture and video content does not share fields together. Those fields are named dynamic fields.
- Mixed data : you can have both identified and required fields and not identified and not required fields.
In eZFind, the configuration is set to Mixed data and all content fields are required. This is why the eZFind implementation is not so extendable. So I used another client has indicated by Paul Borgermans.
In ezcSearch, the eZComponents / Apache Zeta Components search component, the schema.xml that is provided is a base for what you want to do and is also Mixed Data. ezcSearch also need to implement some interfaces that are compliant with the Persistent Object definition.
I made some testing with this client and I found the following a bit strange (or I did not properly understand it) :
- There's a hard coded field called ezcsearch_type. If you know about this one, just share on the forum.
- The unique id field in the schema must be id.
More info about this in the Fisheye repository for Apache Zeta Component Search.
I finally inherited the SolR manager class to change the index function so requests can work whith my fields and I also kicked out all the static fields I don't want from the schema.xml and set my own fields. It's much better and you're free to do what you want to do.
SolR is very powerful and not so accessible due to in-code documentation. Maybe the best point would have been to buy a book on it before starting.
eZFind is an out-of-the-box solution that works only for eZPublish content, which is a bit restrictive in our case. According to Paul Borgermans, the next version of eZFind will be able to take care of extra non related content fields !
ezcSearch is useful to generate all the queries sent to SolR but it's too much restrictive for an out-of-the-box use. However it exists so thank you guys for having already done the job !